Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.zeroentropy.dev/llms.txt

Use this file to discover all available pages before exploring further.

System Architecture

ZeroEntropy is built with the purpose of bringing advanced document intelligence to your knowledge base. Weโ€™ve designed our retrieval system to solve common failures found in native hybrid search implementations.

Ingestion Architecture

Query Architecture

Core Components

  1. Document Processing Pipeline
    • Handles document ingestion and parsing
    • Supports a variety document formats (PDF, DOCX, PPT, TXT, etc.)
    • Supports complex diagrams found in medicine, manufacturing, and deep tech.
    • Correctly parses the hierarchical structure found in legal, healthcare, and other industries.
    • Uses LLMs to tag the data, as if you had hired thousands of SEO engineers to manually annotate your corpus.
  2. Data Storage
    • Document raw data is stored in object storage, along with images for PDF/DOCX/PPT pages.
    • Document metadata is stored in PostgreSQL.
    • The document ingestion pipeline stores vector data in turbopuffer, keyword data in ParadeDB BM25 indices, collection dictionaries in S3 with the BK-tree data structure.
  3. Query Processing Engine
    • Interprets natural language queries without any special syntax required.
    • Uses LLM-in-the-loop to automatically generate potential keywords, semantic searches, and to make a final review of everything retrieved before making a final decision on exactly what is most important and relevant to your query.

Security & Performance

  • End-to-end encryption for data in transit and at rest.
  • On-Prem deployment available for enterprise users, as easy-to-use docker images.