Rerankers

Rerankers are cross-encoder neural networks that can boost the accuracy of any search system, you can read more about what rerankers are and when they are most useful in this blog post.

zerank-1 and zerank-1-small

zerank-1 and zerank-1-small are reranker models developed by ZeroEntropy. zerank-1 is our flagship state-of-the-art reranker, you can read more about its performance and cost in this blog post. Both these models can be called using:
  • Using the models/rerank API endpoint, which is callable via the Python and Node SDKs.
  • By passing in the reranker query parameter into top-snippets
  • Downloading from our HuggingFace and self-hosting the models.
We’ve open-sourced zerank-1-small under an Apache 2.0 license, and it is also available through HuggingFace and Baseten.Our flagship model zerank-1 can be downloaded from HuggingFace under a non-commercial license. To use in a commercial setting, contact us at founders@zeroentropy.dev and we’ll get you a license ASAP!

Using the ZeroEntropy SDK

# Create an API Key at https://dashboard.zeroentropy.dev
# pip install zeroentropy
from zeroentropy import ZeroEntropy

# Initialize the ZeroEntropy client (reads ZEROENTROPY_API_KEY from env)
zclient = ZeroEntropy()

response = zclient.models.rerank(
    model="zerank-1",
    query="What is 2+2?",
    documents=[
        "4",
        "The answer is definitely 1 million.",
    ],
)
print(response.model_dump_json(indent=4))

Using top-snippets

When querying for /top-snippets from a ZeroEntropy collection, you can easily apply the reranker and get a significantly better ranking. Scores from a reranker are deterministic and more readily interpretable, which is another benefit over just hybrid search.
from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()

# Assuming you have already added documents to the collection "pdfs"
response = zclient.queries.top_snippets(
    collection_name="pdfs",
    query="What is Retrieval Augemented Generation?",
    k=10,
    reranker="zerank-1", # All K results will be reranked using our reranker.
)

print(response.results)

Ratelimiting

Each API key is limited to 2,000,000 UTF-8 bytes per minute. A reranker request consumes bytes based on the number of documents and the total length of the input. The formula is:
Total bytes = 150 
+ len(query.encode("utf-8")) 
+ len(document.encode("utf-8"))
This is calculated per document, so the query is counted once for each document you pass in. For example, if you send a request with 10 documents, the total usage is:
10 × len(query.encode("utf-8"))
+ ∑ len(document_i.encode("utf-8")) for i in 1…10
If you exceed the 2,000,000 bytes/minute limit:
  • Your requests will still be served.
  • However, they will be throttled to a high-throughput but high-latency mode.
  • You may experience latency of several seconds per request.
  • In this degraded mode, throughput can go up to 20,000,000 bytes per minute, but with reduced responsiveness.
To avoid throttling, keep your per-minute usage below the 2MB soft limit.

Pricing

Our pricing is simple and transparent. We charge $0.025/1M tokens. If you use the reranker as a standalone API, we will invoice you based on your usage.