Skip to main content

Chat API Reference

Complete reference for the BrilliantAI Chat Completions API.

Create Chat Completion

Endpoint

POST https://api.brilliantai.co/chat/completions

Request Parameters

ParameterTypeRequiredDescription
modelstringYesID of the model to use
messagesarrayYesArray of messages in the conversation
temperaturenumberNoSampling temperature (0-2)
max_tokensintegerNoMaximum tokens to generate
streambooleanNoStream responses back
top_pnumberNoNucleus sampling parameter
frequency_penaltynumberNoFrequency penalty (-2 to 2)
presence_penaltynumberNoPresence penalty (-2 to 2)

Example Request

from openai import OpenAI

client = OpenAI(
base_url="https://api.brilliantai.co",
api_key="your-api-key"
)

response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is quantum computing?"}
],
temperature=0.7,
max_tokens=500
)

Response Format

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "llama-3.3-70b",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing is..."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 32,
"total_tokens": 57
}
}

Streaming Responses

Example

stream = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)

for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")

Error Handling

try:
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum entanglement."}
]
)
except openai.APIError as e:
print(f"API Error: {e}")
except openai.RateLimitError as e:
print(f"Rate Limit Error: {e}")
except openai.APIConnectionError as e:
print(f"Connection Error: {e}")

Best Practices

  1. Prompt Engineering

    • Clear, specific prompts
    • Proper system messages
    • Context management
  2. Performance

    • Use streaming for long responses
    • Implement retry logic
    • Handle rate limits
  3. Cost Management

    • Monitor token usage
    • Set appropriate max_tokens
    • Use caching when possible

Next Steps