Automatic fallback when a model provider fails
When a single upstream provider returns a 5xx or times out, your app shouldn't go dark. Router One is an OpenAI-compatible LLM API gateway that fails over to the next healthy provider on the same model family — and records the route and fallback decision in a per-request trace so you can see exactly what happened. One endpoint for 25+ models, reachable globally and from Mainland China.
Three things fallback gives you
Rate-limit survival (429 / 529)
When a route is overloaded and returns 429 or 529, smart routing retries and fails over to a healthy same-family route when one is available. The decision is recorded in the trace — you see the retry, not a dead request, whenever a healthy route exists.
Provider-outage failover (5xx / timeout)
Provider 5xx errors and timeouts fail over to the next healthy provider for the same model. This is the core path that ships today: one request, multiple candidates, automatic retry on the next healthy route.
Visibility (the trace shows the path)
Every request gets a trace with model, provider, tokens, cost, latency, status, and the route/fallback path taken. When a fallback fires, you can see which route served the final response and why the first one was skipped.
What triggers a failover
Fallback is driven by the health of the route the request was sent to. Two conditions cause Router One to move on to the next healthy candidate in the same model family:
Provider 5xx or timeout
A 500-class error or a connection/read timeout marks the route unhealthy for that request and the call is retried on the next healthy provider. This ships today.
Overload / rate limit (429, 529)
When latency or error rates spike — including 429 and 529 overload responses — smart routing retries and fails over to a healthy same-family route as long as one is available. The trace shows the decision.
The retry concept
Conceptually, a single call walks an ordered list of healthy candidates for the model family and returns the first success. You send one request; Router One handles the retry loop.
// one request -> first healthy candidate that succeeds for (const route of healthyRoutes(modelFamily)) { try { return await call(route, request); } catch (e) { if (isRetryable(e)) continue; // 5xx, timeout, 429, 529 throw e; } }
Reliability competitors under-target
| Capability | Router One | Single-provider SDK |
|---|---|---|
| Failover on provider 5xx / timeout | Automatic retry on next healthy provider | Request fails; you build retry yourself |
| 429 / 529 overload handling | Retry/failover to a healthy same-family route when available | Returns the error to your app |
| Per-request route trace | Shows route + fallback path, status, cost, latency | None |
| Reachable from Mainland China | China-friendly routing with published latency benchmark | Often blocked or inconsistent |
| One endpoint, many models | 25+ supported models behind one OpenAI-compatible URL | One provider, one API surface |
Reliable globally and from Mainland China
Reliability isn't only about provider errors — reachability matters too. Router One's endpoints are reachable globally and from Mainland China networks. Based on the China latency benchmark last updated 2026-05-15, Router One measured 110-130ms p50 across Beijing, Shanghai, and Shenzhen; individual networks may vary.
View the China latency benchmark ->FAQ
What triggers fallback?
A provider 5xx error or a timeout on the route a request was sent to triggers an automatic retry on the next healthy provider for the same model. Overload responses (429, 529) and spikes in latency or error rate also cause smart routing to retry and fail over to a healthy same-family route when one is available.
Does fallback cross model families?
No. Fallback moves between healthy routes within the same model family so the response stays consistent with what you requested. It does not silently swap GPT for Claude or Gemini. If you want cross-family selection, that's the job of model="auto" smart routing, which is a separate, explicit choice.
How do I see which route served my request?
Every request produces a trace in the real-time dashboard with the model, provider, tokens, cost, latency, status, and the route/fallback path. When a fallback fires, the trace shows which route served the final response.
Does fallback add latency?
A retry adds the time spent on the failed attempt before the healthy route responds, so a fallback request is slower than a clean first-try success. The per-request trace reports the latency of the path that was actually taken so you can measure the real cost.
What about rate limits?
When a route returns 429 or 529, smart routing retries and fails over to a healthy same-family route as long as one is available; the trace records that decision. There is no promise of zero downtime — if every same-family route is simultaneously overloaded, the request can still surface an error, and the trace shows why.
Related
Add automatic fallback with one base URL change
Get started free