Search Engine

zesearch

zesearch is ZeroEntropy’s end-to-end search engine, abstracting away data processing from OCR and chunking, to embedding and storing, to querying and reranking.

Index

Add documents to a collection

When you add a document to a collection in zesearch, it goes through a fully managed ingestion pipeline:

Parse: Binary files (PDF, DOCX, PPT, images, etc.) are OCR’d and converted to text. Plain text and CSV inputs skip this step.
Chunk: The parsed text is split into chunks at multiple granularities: coarse (~2000 chars) and fine (~200 chars), optimized for retrieval.
Embed: Each chunk is embedded using zembed-1, ZeroEntropy’s state-of-the-art multilingual embedding model, and stored in our vector index.

When you call add-document, documents are automatically added to a collection with a unique path (like a filepath). ZeroEntropy supports three content types:

text: Plain text content.
text-pages / text-pages-unordered: Pre-paginated text (array of strings). Use unordered for data like CSVs where pages are independent.
auto: Binary files (PDF, DOCX, PPT, etc.) as base64. ZeroEntropy handles OCR and parsing automatically.

Set overwrite: true to upsert (atomically replace if the path already exists). \

Custom Chunking

If you want control over how your data is chunked, use the text-pages content type. Each string in the pages array becomes its own page in the index, letting you define chunk boundaries yourself. Use text-pages-unordered when pages are independent (e.g. CSV rows, FAQ entries). See examples for detailed walkthroughs of different ingestion strategies.

Using zembed-1 as a standalone

You can also call zembed-1 directly via the embed endpoint and plug it into the vector database of your choice. See Models for more details.

Query

There are three granularity levels for querying your indexed data: documents, pages, and snippets. All query endpoints accept a natural language query, a collection_name, and a k parameter controlling how many results to return. All query endpoints support metadata filtering via the optional filter parameter.

Top Pages

Returns the top K most relevant pages. Ideal for page-level retrieval over PDFs, DOCX, or documents ingested with text-pages content type.
Set include_content to true to return the full text of each page. A URL to an image of the page will also be provided in the results.

Top Snipepts

Returns the top K most relevant text snippets. This is the most granular query type.
Each snippet includes the exact character range (start_index, end_index) and page_span within the source document.
You can choose between coarse snippets (averaging ~2000 characters, default) and precise snippets (averaging ~200 characters) using the precise_responses parameter.
Pass a reranker, such as zerank-2 for even better ranking.

from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()
response = zclient.queries.top_snippets(
collection_name="pdfs",
query="What is Retrieval Augmented Generation?",
k=10,
reranker="zerank-2",
precise_responses=True,
)
for snippet in response.results:
print(f"{snippet.path} [pages {snippet.page_span}] (score: {snippet.score})")
print(snippet.content)

Data Management

zesearch organizes data into collections, each containing documents. Think of collections as databases and documents as records.

Collections

Collections Create, list, and delete collections. Collection names are strings up to 1024 UTF-8 bytes.

from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()
# Create a collection
zclient.collections.add(collection_name="contracts")
# List all collections
response = zclient.collections.get_list()
print(response.collection_names)
# Delete a collection
zclient.collections.delete(collection_name="contracts")

Documents

After adding a document to a collection, it takes time to parse and index. Use the Get Document Info endpoint to track progress. \ Each document response includes file_url for downloading the raw file, index_status for tracking processing state, raw content, and num_pages (null if still parsing or unsupported filetype). \ You can delete one or more documents by path. We supports batch deletion of up to 64 paths at once.

from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()
# Delete a single document
zclient.documents.delete(
collection_name="contracts",
path="contracts/acme-nda.txt",
)
# Batch delete
response = zclient.documents.delete(
collection_name="contracts",
path=["old/doc1.txt", "old/doc2.txt", "old/doc3.txt"],
)
print(response.deleted_paths)  # paths that were actually found and deleted

More examples can be found here.

Get Started

zesearch

Index

Add documents to a collection

Custom Chunking

Using zembed-1 as a standalone

Query

Top Documents

Top Pages

Top Snipepts

Data Management

Collections

Documents

Get Started

​zesearch

​Index

​Add documents to a collection

​Custom Chunking

​Using zembed-1 as a standalone

​Query

​Top Documents

​Top Pages

​Top Snipepts

​Data Management

​Collections

​Documents

zesearch

Index

Add documents to a collection

Custom Chunking

Using zembed-1 as a standalone

Query

Top Documents

Top Pages

Top Snipepts

Data Management

Collections

Documents