Overview -- Hober Docs

Auto-Routing

Overview

Hober Auto-Routing (HAR) intelligently routes inference requests across providers based on cost, latency, and availability. Instead of hard-coding a single provider, HAR automatically selects the best endpoint for each request.

How It Works

Request arrives with a model slug (e.g. "deepseek/deepseek-chat-v3.2").

HAR checks provider health, latency scores, and current load.

The optimal provider endpoint is selected based on your routing preferences.

If the selected provider fails, HAR automatically retries with the next best option.

Routing Modes

Mode	Description	Best For
auto	Balanced selection based on task classification (default)	General-purpose workloads
quality	Prioritize response quality — selects best available models	Complex reasoning, production outputs
cost	Prioritize lowest cost provider	High-volume, cost-sensitive workloads
fast	Prioritize lowest latency endpoint	Real-time and interactive applications
off	Disable auto-routing, use specified model directly	Pinning to a specific model

Usage

Set routing strategy via the provider.sort field or model suffixes (:nitro for latency, :floor for price). See Hober Features for full provider preference options.

Zero config required. HAR is enabled by default for all requests. Your requests automatically benefit from failover and health-aware routing without any changes to your code.