Chat API Reference
Complete reference for the BrilliantAI Chat Completions API.
Create Chat Completion
Endpoint
POST https://api.brilliantai.co/chat/completions
Request Parameters
Parameter | Type | Required | Description |
---|---|---|---|
model | string | Yes | ID of the model to use |
messages | array | Yes | Array of messages in the conversation |
temperature | number | No | Sampling temperature (0-2) |
max_tokens | integer | No | Maximum tokens to generate |
stream | boolean | No | Stream responses back |
top_p | number | No | Nucleus sampling parameter |
frequency_penalty | number | No | Frequency penalty (-2 to 2) |
presence_penalty | number | No | Presence penalty (-2 to 2) |
Example Request
from openai import OpenAI
client = OpenAI(
base_url="https://api.brilliantai.co",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is quantum computing?"}
],
temperature=0.7,
max_tokens=500
)
Response Format
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "llama-3.3-70b",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing is..."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 32,
"total_tokens": 57
}
}
Streaming Responses
Example
stream = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
Error Handling
try:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum entanglement."}
]
)
except openai.APIError as e:
print(f"API Error: {e}")
except openai.RateLimitError as e:
print(f"Rate Limit Error: {e}")
except openai.APIConnectionError as e:
print(f"Connection Error: {e}")
Best Practices
-
Prompt Engineering
- Clear, specific prompts
- Proper system messages
- Context management
-
Performance
- Use streaming for long responses
- Implement retry logic
- Handle rate limits
-
Cost Management
- Monitor token usage
- Set appropriate max_tokens
- Use caching when possible
Next Steps
- Learn about Embeddings API
- Check out example applications
- Explore available models