Rerankers
Rerankers are cross-encoder neural networks that can boost the accuracy of any search system, you can read more about what rerankers are and when they are most useful in this blog post.zerank-1 and zerank-1-small
zerank-1
and zerank-1-small
are reranker models developed by ZeroEntropy.
zerank-1
is our flagship state-of-the-art reranker, you can read more about its performance and cost in this blog post.
Both these models can be called using:
- Using the models/rerank API endpoint, which is callable via the Python and Node SDKs.
- By passing in the
reranker
query parameter into top-snippets - Downloading from our HuggingFace and self-hosting the models.
We’ve open-sourced
zerank-1-small
under an Apache 2.0 license, and it is also available through HuggingFace and Baseten.Our flagship model zerank-1
can be downloaded from HuggingFace under a non-commercial license. To use in a commercial setting, contact us at founders@zeroentropy.dev and we’ll get you a license ASAP!Using the ZeroEntropy SDK
Using top-snippets
When querying for /top-snippets from a ZeroEntropy collection, you can easily apply the reranker and get a significantly better ranking. Scores from a reranker are deterministic and more readily interpretable, which is another benefit over just hybrid search.Ratelimiting
Each API key is limited to2,000,000 UTF-8 bytes per minute
.
A reranker request consumes bytes based on the number of documents and the total length of the input. The formula is:
If you exceed the 2,000,000 bytes/minute limit:
- Your requests will still be served.
- However, they will be throttled to a high-throughput but high-latency mode.
- You may experience latency of several seconds per request.
- In this degraded mode, throughput can go up to 20,000,000 bytes per minute, but with reduced responsiveness.
Pricing
Our pricing is simple and transparent. We charge$0.025/1M tokens
.
If you use the reranker as a standalone API, we will invoice you based on your usage.