Can I disable fallback for a specific project?

Yes — enterprise contracts can configure fallback off for projects that require single-provider behaviour. Pay-as-you-go projects use the default fallback configuration.

Is Router One swapping models silently?

No. Fallback is constrained to the same model family (GPT → GPT, Claude → Claude). The exact model variant used is recorded in the per-request trace.

How quickly do routing decisions adapt to new conditions?

EWMA over the last 50 requests per provider per model family means routing reacts within seconds to a degradation, but is not whip-lashed by a single anomalous slow request.

Smart routing methodology

Smart routing is the load-bearing claim of Router One. This page documents the signals, weights, and fallback rules — including the constraints that keep routing predictable.

Last updated: 2026-05-15

Routing signals

Latency (EWMA): Exponentially weighted moving average over the last 50 requests per provider, computed independently for each model family. The window is short enough to track real-world degradation, long enough not to overreact to a single slow request.
Error rate: 5xx and gateway timeout rate computed over the same rolling window. Crossing a configured threshold causes the provider to be temporarily down-ranked.
Cost: Token-level cost of each candidate provider for the request's model family. Used as a weight, not a hard rank — high-quality providers can still win for latency-weighted projects even at higher cost.
Customer weights: Each project can set its own (latency, cost, quality) weights. Default profile favors latency in production projects and cost in development projects, but every project can override.

Fallback behavior

Trigger: A request hits fallback on: 5xx response, network error, or response time exceeding a per-model timeout budget.
Same family only: Fallback only swaps between providers serving the same model family (GPT → GPT, Claude → Claude, Gemini → Gemini). Router One never silently downgrades a request to a different model.
Fallback latency: Typical end-to-end fallback adds < 200ms over the failing request. The fallback chain is recorded in the per-request trace, so customers see exactly what happened.
Bounded retries: Fallback is capped at one provider switch per request by default. Higher caps are available via per-project configuration.

What appears in your trace

Provider used: The upstream provider name, model variant, and routing decision for each request.
Fallback chain: If fallback occurred, both the failed provider and the successful one — plus the error code and latency for the failed attempt.
Token counts and cost: Input, output, and cached-input token counts, plus the computed cost at the rate that applied at request time.

Per-project configuration

Weights: Set the relative importance of latency, cost, and quality. Defaults are sensible; explicit overrides are honored.
Disable fallback: Enterprise contracts can disable fallback for projects that need a single provider for compliance or evaluation reasons.
Allowed providers: Customers can restrict a project to a subset of upstream providers — useful for data residency or vendor governance.

FAQ

Can I disable fallback for a specific project?: Yes — enterprise contracts can configure fallback off for projects that require single-provider behaviour. Pay-as-you-go projects use the default fallback configuration.
Is Router One swapping models silently?: No. Fallback is constrained to the same model family (GPT → GPT, Claude → Claude). The exact model variant used is recorded in the per-request trace.
How quickly do routing decisions adapt to new conditions?: EWMA over the last 50 requests per provider per model family means routing reacts within seconds to a degradation, but is not whip-lashed by a single anomalous slow request.

Smart routing methodology

Routing signals

Fallback behavior

What appears in your trace

Per-project configuration

FAQ

Related