Embeddings
Learn how to use our embedding models for semantic search and RAG applications.
Available Models
- snowflake-arctic-embed-l-v2.0 ($0.015/1M tokens): High-performance embedding model optimized for RAG applications and semantic search
Getting Embeddings
Python Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.brilliantai.co",
api_key="your-api-key"
)
response = client.embeddings.create(
model="snowflake-arctic-embed-l-v2.0",
input="Your text string goes here"
)
vector = response.data[0].embedding
Node.js Example
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "https://api.brilliantai.co",
apiKey: "your-api-key"
});
const embedding = await openai.embeddings.create({
model: "snowflake-arctic-embed-l-v2.0",
input: "Your text string goes here"
});
const vector = embedding.data[0].embedding;
Model Features
snowflake-arctic-embed-l-v2.0
- Optimized for RAG applications
- High-quality semantic search
- Very cost-effective ($0.015/1M tokens)
- Fast inference speed
- Compatible with popular vector databases
Use Cases
Semantic Search
import numpy as np
from openai import OpenAI
def get_embedding(text):
response = client.embeddings.create(
model="snowflake-arctic-embed-l-v2.0",
input=text
)
return response.data[0].embedding
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Get embeddings for your documents
documents = ["doc1", "doc2", "doc3"]
doc_embeddings = [get_embedding(doc) for doc in documents]
# Search query
query = "your search query"
query_embedding = get_embedding(query)
# Find most similar documents
similarities = [
cosine_similarity(query_embedding, doc_emb)
for doc_emb in doc_embeddings
]
RAG Applications
-
Document Processing
- Split documents into chunks
- Generate embeddings for each chunk
- Store in vector database
-
Query Processing
- Generate embedding for user query
- Retrieve relevant chunks
- Use with LLM for enhanced responses
Vector Databases
Our embeddings work well with popular vector databases:
- Pinecone
- Weaviate
- Milvus
- Qdrant
- ChromaDB
Best Practices
-
Document Processing
- Choose appropriate chunk sizes (recommended: 512 tokens)
- Maintain context in chunks with overlap
- Handle different document types appropriately
-
Performance
- Batch embedding requests for efficiency
- Cache common embeddings
- Use appropriate vector similarity metrics
- Consider dimensionality reduction for large datasets
-
Cost Management
- Optimize chunk sizes to reduce token usage
- Implement caching for frequently accessed content
- Batch process documents when possible
Next Steps
- Try building a RAG application
- Learn about our API Reference
- Explore LLM Inference for combining with RAG