Router One

Smart LLM routing across latency, cost, and quality

Hardcoding one model means you inherit its bad days: latency spikes, rate limits, regional incidents. Smart routing treats the model fleet as candidates and picks per request, using live signals instead of static config. Router One routes across 25+ models behind one OpenAI-compatible endpoint — and records every decision in the request trace, so routing never becomes a black box.

The signals behind every decision

EWMA latency

An exponentially weighted moving average over the last 50 requests per provider, computed independently for each model family. Recent requests weigh more, so routing reacts to real degradation within seconds without overreacting to one slow response.

Error rate

5xx and gateway-timeout rate over the same rolling window. A provider that crosses the configured threshold is temporarily down-ranked until it recovers.

Posted cost

The token-level cost of each candidate for the request's model family. Cost acts as a weight, not a hard rank — a pricier route can still win for latency-weighted projects.

Your weights

Each project sets its own latency/cost/quality balance. Favor latency in production and cost in development, or override per request when one call needs different behavior.

Pin or model="auto"

Pin an exact model and the gateway calls it directly — full determinism. Send model="auto" and the gateway scores candidates with your weights and picks the best one.

Same-family fallback

When the chosen route returns a 5xx, times out, or errors at the network level, the request fails over within the same model family — typically adding under 200ms — and the trace records both attempts.

Predictable by design

Routing only helps if you can trust it. Three constraints keep it predictable: fallback never crosses model families (a GPT request is answered by a GPT-family route, a Claude request by a Claude-family route); every trace shows which route served the request and why, including failed attempts; and enterprise contracts can disable fallback entirely for projects that need single-provider behavior for compliance or evaluation.

Route a request with model="auto"

Send your candidates and weights in the request body. Pin an exact model id any time you need a specific one — routing is opt-in per request.

request.json
# POST https://api.router.one/v1/chat/completions
{
  "model": "auto",
  "messages": [{"role": "user", "content": "Hello"}],
  "router": {
    "candidates": ["claude-3-5-sonnet", "gpt-4o", "deepseek-chat"],
    "weights": {"latency": 0.4, "cost": 0.4, "quality": 0.2}
  }
}

FAQ

How does smart routing decide which route to use?

Each candidate is scored on EWMA latency over the last 50 requests, posted token cost, and the rolling 5xx/timeout rate, combined using your project's latency/cost/quality weights. The highest-scoring healthy route gets the request, and the trace records the decision.

Can I keep using one exact model?

Yes. Pin the model id and the gateway calls it directly with no candidate selection. Smart routing only engages when you send model="auto" or attach a routing config.

What happens when a provider degrades?

Crossing the error-rate threshold temporarily down-ranks the route, and in-flight failures trigger same-family fallback — typically adding under 200ms. The per-request trace shows the failed attempt, the error code, and the route that completed the request.

Will routing ever swap to a different model family?

No. Fallback is constrained to the same model family — GPT to GPT, Claude to Claude. The exact variant that served the request is recorded in the trace.

Can I disable fallback?

Enterprise contracts can disable fallback for projects that require single-provider behavior. Pay-as-you-go projects use the default fallback configuration.

Where do I see routing decisions?

Every request appears in the dashboard with its full trace: model, route, tokens, cost, latency, status, and the routing or fallback decision. The methodology behind the signals is documented on the routing methodology page.

Related

Stop hardcoding one provider's bad days.

Get started free