Smart routing methodology
Smart routing is the load-bearing claim of Router One. This page documents the signals, weights, and fallback rules — including the constraints that keep routing predictable.
Last updated:
Routing signals
- Latency (EWMA)
- Exponentially weighted moving average over the last 50 requests per provider, computed independently for each model family. The window is short enough to track real-world degradation, long enough not to overreact to a single slow request.
- Error rate
- 5xx and gateway timeout rate computed over the same rolling window. Crossing a configured threshold causes the provider to be temporarily down-ranked.
- Cost
- Token-level cost of each candidate provider for the request's model family. Used as a weight, not a hard rank — high-quality providers can still win for latency-weighted projects even at higher cost.
- Customer weights
- Each project can set its own (latency, cost, quality) weights. Default profile favors latency in production projects and cost in development projects, but every project can override.
Fallback behavior
- Trigger
- A request hits fallback on: 5xx response, network error, or response time exceeding a per-model timeout budget.
- Same family only
- Fallback only swaps between providers serving the same model family (GPT → GPT, Claude → Claude, Gemini → Gemini). Router One never silently downgrades a request to a different model.
- Fallback latency
- Typical end-to-end fallback adds < 200ms over the failing request. The fallback chain is recorded in the per-request trace, so customers see exactly what happened.
- Bounded retries
- Fallback is capped at one provider switch per request by default. Higher caps are available via per-project configuration.
What appears in your trace
- Provider used
- The upstream provider name, model variant, and routing decision for each request.
- Fallback chain
- If fallback occurred, both the failed provider and the successful one — plus the error code and latency for the failed attempt.
- Token counts and cost
- Input, output, and cached-input token counts, plus the computed cost at the rate that applied at request time.
Per-project configuration
- Weights
- Set the relative importance of latency, cost, and quality. Defaults are sensible; explicit overrides are honored.
- Disable fallback
- Enterprise contracts can disable fallback for projects that need a single provider for compliance or evaluation reasons.
- Allowed providers
- Customers can restrict a project to a subset of upstream providers — useful for data residency or vendor governance.
FAQ
- Can I disable fallback for a specific project?
- Yes — enterprise contracts can configure fallback off for projects that require single-provider behaviour. Pay-as-you-go projects use the default fallback configuration.
- Is Router One swapping models silently?
- No. Fallback is constrained to the same model family (GPT → GPT, Claude → Claude). The exact model variant used is recorded in the per-request trace.
- How quickly do routing decisions adapt to new conditions?
- EWMA over the last 50 requests per provider per model family means routing reacts within seconds to a degradation, but is not whip-lashed by a single anomalous slow request.