Overview
Hober Auto-Routing (HAR) intelligently routes inference requests across providers based on cost, latency, and availability. Instead of hard-coding a single provider, HAR automatically selects the best endpoint for each request.
How It Works
Request arrives with a model slug (e.g. "deepseek/deepseek-chat-v3.2").
HAR checks provider health, latency scores, and current load.
The optimal provider endpoint is selected based on your routing preferences.
If the selected provider fails, HAR automatically retries with the next best option.
Routing Modes
| Mode | Description | Best For |
|---|---|---|
| auto | Balanced selection based on task classification (default) | General-purpose workloads |
| quality | Prioritize response quality — selects best available models | Complex reasoning, production outputs |
| cost | Prioritize lowest cost provider | High-volume, cost-sensitive workloads |
| fast | Prioritize lowest latency endpoint | Real-time and interactive applications |
| off | Disable auto-routing, use specified model directly | Pinning to a specific model |
Usage
Set routing strategy via the provider.sort field or model suffixes (:nitro for latency, :floor for price). See Hober Features for full provider preference options.
Zero config required. HAR is enabled by default for all requests. Your requests automatically benefit from failover and health-aware routing without any changes to your code.