Get Started
Core Concepts
Understand the core concepts of the ZeroEntropy API.
The ZeroEntropy API is designed to give you full control over your search index and query granularity.
Data
- Collections: Collections will act as separate and independent datastores for your documents. If you want to index multiple distinct datasets, or are looking to maintain a multi-tenant architecture, then you will want to create collections in order to separate those datasets into their own search indexes.
- Documents: The foundational units for indexing. You can upload and delete documents, and query them using various filters. Metadata can be applied per-document, in order to allow for document-level filtering.
- Pages: Many documents, such as PDFs, .docx files, and powerpoint files, are fundamentally segmented into pages. For full control, you can upload an array of strings as a document, and each string will be interpreted as a page.
- Pages are unique in that they are indexed within the context of the pages preceding them, with the interpretation that a sequence of pages are to be read in that order. In contrast, documents have no particular order to them.
- Pages often refer to actual pages of a PDF, but they do not have to. For example, if you want to index a conversation between two people, then you will need to preserve the order of messages. Therefore, you should upload the conversation as an array of strings, and our system will automatically and intelligently index each message within the context of the overall conversation.
Queries
- Top K Documents Retrieval: Specify a value for k to retrieve the k documents most relevant to your query.
- Top K Pages Retrieval: Specify a value for k to retrieve the k pages most relevant to your query.
- Top K Snippets Retrieval: Specify a value for k to retrieve the k snippets most relevant to your query. You can choose to between coarse snippets (~2000 characters on average), or precise snippets (~200 characters on average).
Examples
- Files: For PDFs, .docx, and .txt files, a simple and direct upload will suffice. A collection can refer to an entire Google Drive, for example.
- Conversations: If you want to index a slack channel, then each message must be interpreted within the context of the messages before it, and the order of the messages is crucial to their comprehension. Therefore, each slack channel should be a document, and each slack message should be a page. Then, Top K Documents will show you the most relevant slack channels, and Top K Pages will show you the most relevant slack messages.
- CSVs: If you want to index a CSV of SKUs with 100,000 rows, then the order of the rows is often irrelevant. The best way to index this would be to create a collection for the CSV, with each SKU as a single document. Then, Top K Documents will show you the most relevant SKUs for your query.
By understanding these core concepts, you’ll be well-equipped to harness the full power of the ZeroEntropy API. Proceed to the API Reference for detailed information on how to implement these features.