Embeds the provided input text with ZeroEntropy embedding models.
The results will be returned in the same order as the text provided. The embedding is such that queries will have high cosine similarity with documents that are relevant to that query.
Organizations will, by default, have a ratelimit of 2,500,000 bytes-per-minute. If this is exceeded, requests will be throttled into latency: "slow" mode, up to 20,000,000 bytes-per-minute. If even this is exceeded, you will get a 429 error. To request higher ratelimits, please contact founders@zeroentropy.dev or message us on Discord or Slack!
The model ID to use for embedding. Options are: ["zembed-1"]
The input type. For retrieval tasks, either query or document.
query, document The string, or list of strings, to embed
The output dimensionality of the embedding model.
The output format of the embedding. base64 is significantly more efficient than float. The default is float.
float, base64 Whether the call will be inferenced "fast" or "slow". RateLimits for slow API calls are orders of magnitude higher, but you can expect >10 second latency. Fast inferences are guaranteed subsecond, but rate limits are lower. If not specified, first a "fast" call will be attempted, but if you have exceeded your fast rate limit, then a slow call will be executed. If explicitly set to "fast", then 429 will be returned if it cannot be executed fast.
fast, slow