Skip to main content

Embed queries and text

zembed-1 is the default embedding model used in zesearch, ZeroEntropy’s search engine. You can also call the embedding model directly and plug it into the vector database of your choice using the /models/embed endpoint or directly through the SDKs.
from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()

query = "What is Retrieval Augmented Generation?"
documents = [
    "RAG combines retrieval with generation by conditioning the LLM on external documents.",
    "Retrieval-Augmented Generation is a machine learning technique introduced by Meta AI in 2020.",
    "It uses reinforcement learning to generate music sequences.",
    "RAG can improve factual accuracy by grounding answers in retrieved evidence.",
    "Transformers are a type of deep learning architecture."
]

# Embed the query
query_response = zclient.models.embed(
    model="zembed-1",
    input=query,
    input_type="query",
)

# Embed the documents
docs_response = zclient.models.embed(
    model="zembed-1",
    input=documents,
    input_type="document",
)

Compute similarity

Use cosine similarity to rank documents by relevance to the query.
import numpy as np

query_embedding = np.array(query_response.results[0].embedding)
doc_embeddings = np.array([d.embedding for d in docs_response.results])

# Cosine similarity
similarities = doc_embeddings @ query_embedding / (
    np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_embedding)
)

for i in np.argsort(similarities)[::-1]:
    print(f"{similarities[i]:.4f}  {documents[i][:80]}")

Configuring embedding parameters

You can customize the embedding output with additional parameters:
  • dimensions: Output dimensionality. For zembed-1, the available options are: 2560, 1280, 640, 320, 160, 80, 40. Lower dimensions reduce storage cost at the expense of accuracy.
  • encoding_format: "float" (default) or "base64". Base64 is significantly more efficient for transfer.
  • latency: "fast" for subsecond inference, "slow" for higher throughput. Omit to let the API choose automatically.
response = zclient.models.embed(
    model="zembed-1",
    input="What is RAG?",
    input_type="query",
    dimensions=320,
    encoding_format="float",
    latency="fast",
)
The embedding will return a list of floats (or a base64 string) that represent the chunk of text embedded. You can read more about available embedding models in the Models section. You can read more about how to pick the right parameters, such as embedding size, by reading this blog post.