Embed - ZeroEntropy

Embed queries and text

zembed-1 is the default embedding model used in zsearch, ZeroEntropy’s search engine. You can also call the embedding model directly and plug it into the vector database of your choice using the /models/embed endpoint or directly through the SDKs.

from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()

query = "What is Retrieval Augmented Generation?"
documents = [
    "RAG combines retrieval with generation by conditioning the LLM on external documents.",
    "Retrieval-Augmented Generation is a machine learning technique introduced by Meta AI in 2020.",
    "It uses reinforcement learning to generate music sequences.",
    "RAG can improve factual accuracy by grounding answers in retrieved evidence.",
    "Transformers are a type of deep learning architecture."
]

# Embed the query
query_response = zclient.models.embed(
    model="zembed-1",
    input=query,
    input_type="query",
)

# Embed the documents
docs_response = zclient.models.embed(
    model="zembed-1",
    input=documents,
    input_type="document",
)

import ZeroEntropy from 'zeroentropy';
const zclient = new ZeroEntropy();

const query = "What is Retrieval Augmented Generation?";
const documents = [
  "RAG combines retrieval with generation by conditioning the LLM on external documents.",
  "Retrieval-Augmented Generation is a machine learning technique introduced by Meta AI in 2020.",
  "It uses reinforcement learning to generate music sequences.",
  "RAG can improve factual accuracy by grounding answers in retrieved evidence.",
  "Transformers are a type of deep learning architecture."
];

// Embed the query
const queryResponse = await zclient.models.embed({
    model: "zembed-1",
    input: query,
    input_type: "query",
});

// Embed the documents
const docsResponse = await zclient.models.embed({
    model: "zembed-1",
    input: documents,
    input_type: "document",
});

Compute similarity

Use cosine similarity to rank documents by relevance to the query.

import numpy as np

query_embedding = np.array(query_response.results[0].embedding)
doc_embeddings = np.array([d.embedding for d in docs_response.results])

# Cosine similarity
similarities = doc_embeddings @ query_embedding / (
    np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_embedding)
)

for i in np.argsort(similarities)[::-1]:
    print(f"{similarities[i]:.4f}  {documents[i][:80]}")

function cosineSimilarity(a: number[], b: number[]): number {
    const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
    const normA = Math.sqrt(a.reduce((sum, ai) => sum + ai * ai, 0));
    const normB = Math.sqrt(b.reduce((sum, bi) => sum + bi * bi, 0));
    return dot / (normA * normB);
}

const queryEmbedding = queryResponse.results[0].embedding as number[];
const similarities = docsResponse.results.map((d, i) => ({
    score: cosineSimilarity(queryEmbedding, d.embedding as number[]),
    text: documents[i],
}));

similarities.sort((a, b) => b.score - a.score);
similarities.forEach(s => console.log(`${s.score.toFixed(4)}  ${s.text.slice(0, 80)}`));

Configuring embedding parameters

You can customize the embedding output with additional parameters:

dimensions: Output dimensionality. For zembed-1, the available options are: 2560 (default), 1280, 640, 320, 160, 80, 40. Lower dimensions reduce storage cost at the expense of accuracy.
encoding_format: "float" (default) or "base64". Base64 is significantly more efficient for transfer.
latency: "fast" for subsecond inference, "slow" for higher throughput. Omit to let the API choose automatically.

response = zclient.models.embed(
    model="zembed-1",
    input="What is RAG?",
    input_type="query",
    dimensions=320,
    encoding_format="float",
    latency="fast",
)

const response = await zclient.models.embed({
    model: "zembed-1",
    input: "What is RAG?",
    input_type: "query",
    dimensions: 320,
    encoding_format: "float",
    latency: "fast",
});

The embedding will return a list of floats (or a base64 string) that represent the chunk of text embedded. You can read more about available embedding models in the Models section. You can read more about how to pick the right parameters, such as embedding size, on our blog.

​Embed queries and text

​Compute similarity

​Configuring embedding parameters

Embed queries and text

Compute similarity

Configuring embedding parameters