Auto-Routing
HAR Commands
HAR commands let you control routing behavior inline with your API requests using the provider object or model slug suffixes.
HAR Command Syntax
Send a HAR command as the model field value to control routing mode per-request. No other changes to your request are needed.
| Command | Effect |
|---|---|
| HAR:Auto | Switch to balanced auto-routing (default) |
| HAR:Quality | Switch to quality-first routing |
| HAR:Cost | Switch to cost-first routing |
| HAR:Fast | Switch to latency-first routing |
| HAR:Off | Disable HAR — use model field as a literal model ID |
| HAR:? | Query current HAR mode (returns mode in response) |
Example
{
"model": "HAR:Cost",
"messages": [{"role": "user", "content": "Hello"}]
}Model Slug Suffixes
Append a suffix to any model slug to apply a routing hint:
| Command | Syntax | Effect |
|---|---|---|
| :auto | model/slug:auto | Balanced routing — 40% health, 30% reputation, 20% cost, 10% latency (default) |
| :quality | model/slug:quality | Highest-reputation provider — 60% reputation, 30% health, 10% cost |
| :cost | model/slug:cost | Cheapest provider — 70% cost, 20% health, 10% reputation |
| :fast | model/slug:fast | Lowest-latency provider — 60% latency, 30% health, 10% cost |
| :nitro | model/slug:nitro | Alias for :fast — prefer lowest-latency endpoint |
| :floor | model/slug:floor | Cheapest model in same capability tier (e.g., gpt-4o:floor → deepseek-chat) |
Provider Object Commands
For finer control, use the provider field in the request body:
{
"model": "deepseek/deepseek-chat-v3.2",
"messages": [...],
"provider": {
"sort": "latency", // "price" or "latency"
"only": ["deepseek"], // restrict to specific providers
"ignore": ["glm"] // exclude specific providers
}
}Fallback Chains
Use the models array to define an explicit fallback order. HAR tries each model in sequence until one succeeds:
{
"models": [
"deepseek/deepseek-chat-v3.2",
"qwen/qwen3-235b",
"glm/glm-4-plus"
],
"messages": [{"role": "user", "content": "Hello"}]
}Suffixes and provider objects can be combined. When both are present, the provider object takes precedence for conflicting settings.