Reranks the provided documents, according to the provided query.
The results will be sorted by descending order of relevance. For each document, the index and the score will be returned. The index is relative to the documents array that was passed in. The score is the query-document relevancy determined by the reranker model. The results will be returned in descending order of relevance.
Organizations will, by default, have a ratelimit of 2,500,000 bytes-per-minute. If this is exceeded, requests will be throttled into latency: "slow" mode, up to 10,000,000 bytes-per-minute. If even this is exceeded, you will get a 429 error. To request higher ratelimits, please contact [email protected] or message us on Discord or Slack!
The model ID to use for reranking. Options are: ["zerank-2", "zerank-1", "zerank-1-small"]
The query to rerank the documents by.
The list of documents to rerank. Each document is a string.
If provided, then only the top n documents will be returned in the results array. Otherwise, n will be the length of the provided documents array.
Whether the call will be inferenced "fast" or "slow". RateLimits for slow API calls are orders of magnitude higher, but you can expect >10 second latency. Fast inferences are guaranteed subsecond, but rate limits are lower. If not specified, first a "fast" call will be attempted, but if you have exceeded your fast rate limit, then a slow call will be executed. If explicitly set to "fast", then 429 will be returned if it cannot be executed fast.
fast, slow Successful Response
The results, ordered by descending order of relevance to the query.