ZeroEntropy SDK Helper
ZeroEntropy can be installed using:
• Python: pip install zeroentropy
• Node.js: npm install zeroentropy
Client Usage
from zeroentropy import ZeroEntropy
client = ZeroEntropy(api_key="your_api_key")
The following SDK methods are available:
📌 API Methods & Expected Outputs
Each method returns structured responses defined by pydantic.BaseModel.
📂 Collections
• client.collections.add(collection_name: str) -> None
Always specify a collection name using client.collections.add(collection_name="my_collection")
If the collection already exists, it will be throw an error, so you need to check if the collection exists first.
• client.collections.get_list() -> List[str]
• client.collections.delete(collection_name: str) -> None
📄 Documents
• client.documents.add(collection_name: str, path: str, content, metadata: dict = None, overwrite: bool = False) -> None
The add method already handles parsing for PDFs etc. The content dict can take the following formats:
content={"type":"auto", "base64_data":"my_document.pdf"} for a PDF, content={"type":"text", "text":"my_document.pdf"} for a text file, and content={"type":"text-pages", "pages":[ "page 1 content", "page 2 content"]} for pages of text.
If the document already exists, it will be throw an error, so you need to check if the document exists first.
• client.documents.get_info(collection_name: str, path: str, include_content: bool = False) -> DocumentResponse
• client.documents.get_info_list(collection_name: str, limit: int = 1024, id_gt: Optional[str] = None) -> List[DocumentMetadataResponse]
• client.documents.update(collection_name: str, path: str, metadata: Optional[dict]) -> UpdateDocumentResponse
• client.documents.delete(collection_name: str, path: str) -> None
🔎 Queries
• client.queries.top_documents(collection_name: str, query: str, k: int, filter: Optional[dict] = None, include_metadata: bool = False, latency_mode: str = "low") -> List[DocumentRetrievalResponse]
• client.queries.top_pages(collection_name: str, query: str, k: int, filter: Optional[dict] = None, include_content: bool = False, latency_mode: str = "low") -> List[PageRetrievalResponse]
• client.queries.top_snippets(collection_name: str, query: str, k: int, filter: Optional[dict] = None, precise_responses: bool = False) -> List[SnippetResponse]
📊 Status
• client.status.get(collection_name: Optional[str] = None) -> StatusResponse
📑 Parsers
• client.parsers.parse_document(base64_data: str) -> ParseDocumentResponse
📌 Expected Response Models
All responses return structured BaseModel objects as follows:
1️⃣ DocumentResponse
Used in get_info()
python
class DocumentResponse(BaseModel):
id: str # UUID of the document
collection_name: str
path: str
metadata: Dict[str, str] # Metadata key-value pairs
index_status: str # Enum: "parsing_failed", "not_parsed", "parsing", "not_indexed", "indexing", "indexed"
num_pages: Optional[int] = None # Can be null
content: Optional[str] = None # Null unless `include_content=True`
2️⃣ UpdateDocumentResponse
Used in update()
python
class UpdateDocumentResponse(BaseModel):
previous_id: str # Old document UUID
new_id: str # New updated document UUID
3️⃣ DocumentRetrievalResponse
Used in top_documents()
python
class DocumentRetrievalResponse(BaseModel):
results: List[Response]
class Response(BaseModel):
path: str
metadata: Optional[Dict[str, str]] = None # Null if `include_metadata=False`
score: float # Relevancy score
4️⃣ PageRetrievalResponse
Used in top_pages()
python
class PageRetrievalResponse(BaseModel):
results: List[Response]
class Response(BaseModel):
path: str # Document path
page_index: int # 0-indexed page number
score: float # Relevancy score
content: Optional[str] = None # Null if `include_content=False`
5️⃣ SnippetResponse
Used in top_snippets()
python
class SnippetResponse(BaseModel):
results: List[Response]
class Response(BaseModel):
path: str
start_index: int # Start character index of snippet
end_index: int # End character index of snippet
page_span: List[int] # (start_page, end_page) index range
content: Optional[str] = None # Snippet text
score: float # Relevancy score
6️⃣ StatusResponse
Used in status.get()
python
class StatusResponse(BaseModel):
num_documents: int # Total document count
num_parsing_documents: int # Documents still being parsed
num_indexing_documents: int # Documents currently being indexed
num_indexed_documents: int # Successfully indexed documents
num_failed_documents: int # Documents that failed
7️⃣ ParseDocumentResponse
Used in parse_document()
python
class ParseDocumentResponse(BaseModel):
pages: List[str] # List of extracted page contents
📌 Additional Notes
1. Cursor should always use these BaseModels when generating SDK-based responses.
2. Metadata Filtering
• Document metadata is always dict[str, str | list[str]].
• Filters support operators: $eq, $ne, $gt, $gte, $lt, $lte (for equality and range queries).
3. Responses will always match these structures unless otherwise stated.