Skip to main content

Embeddings

Learn how to use our embedding models for semantic search and RAG applications.

Available Models

  • snowflake-arctic-embed-l-v2.0 ($0.015/1M tokens): High-performance embedding model optimized for RAG applications and semantic search

Getting Embeddings

Python Example

from openai import OpenAI

client = OpenAI(
base_url="https://api.brilliantai.co",
api_key="your-api-key"
)

response = client.embeddings.create(
model="snowflake-arctic-embed-l-v2.0",
input="Your text string goes here"
)

vector = response.data[0].embedding

Node.js Example

import OpenAI from "openai";

const openai = new OpenAI({
baseURL: "https://api.brilliantai.co",
apiKey: "your-api-key"
});

const embedding = await openai.embeddings.create({
model: "snowflake-arctic-embed-l-v2.0",
input: "Your text string goes here"
});

const vector = embedding.data[0].embedding;

Model Features

snowflake-arctic-embed-l-v2.0

  • Optimized for RAG applications
  • High-quality semantic search
  • Very cost-effective ($0.015/1M tokens)
  • Fast inference speed
  • Compatible with popular vector databases

Use Cases

import numpy as np
from openai import OpenAI

def get_embedding(text):
response = client.embeddings.create(
model="snowflake-arctic-embed-l-v2.0",
input=text
)
return response.data[0].embedding

def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Get embeddings for your documents
documents = ["doc1", "doc2", "doc3"]
doc_embeddings = [get_embedding(doc) for doc in documents]

# Search query
query = "your search query"
query_embedding = get_embedding(query)

# Find most similar documents
similarities = [
cosine_similarity(query_embedding, doc_emb)
for doc_emb in doc_embeddings
]

RAG Applications

  1. Document Processing

    • Split documents into chunks
    • Generate embeddings for each chunk
    • Store in vector database
  2. Query Processing

    • Generate embedding for user query
    • Retrieve relevant chunks
    • Use with LLM for enhanced responses

Vector Databases

Our embeddings work well with popular vector databases:

  • Pinecone
  • Weaviate
  • Milvus
  • Qdrant
  • ChromaDB

Best Practices

  1. Document Processing

    • Choose appropriate chunk sizes (recommended: 512 tokens)
    • Maintain context in chunks with overlap
    • Handle different document types appropriately
  2. Performance

    • Batch embedding requests for efficiency
    • Cache common embeddings
    • Use appropriate vector similarity metrics
    • Consider dimensionality reduction for large datasets
  3. Cost Management

    • Optimize chunk sizes to reduce token usage
    • Implement caching for frequently accessed content
    • Batch process documents when possible

Next Steps