One LLM API gateway for every model
An LLM API gateway is a single endpoint you route every model call through, instead of wiring each model provider into your app one by one. Router One is an OpenAI-compatible gateway for 25+ models — GPT, Claude, Gemini, DeepSeek, Mistral, and Llama — behind one base URL and one key. Direct model calls are a black box; routing through a gateway gives you a ledger (what every call cost), a trace (which model and route served it), and control (per-key budgets and rate limits). It is reachable globally and from Mainland China.
What the gateway does
One OpenAI-compatible endpoint
Point your existing OpenAI SDK at https://api.router.one/v1 and call 25+ models — GPT, Claude, Gemini, DeepSeek, Mistral, Llama — through the same Chat Completions interface. No per-model SDK, no rewrite.
Smart routing
Send model="auto" and the gateway picks a candidate using latency, cost, and quality weights. Tune the balance per request, or pin an exact model when you need determinism.
Automatic fallback
When a provider returns a 5xx or times out, the gateway fails over to the next healthy provider in the same model family. Requests keep flowing without you writing retry glue.
Per-request traces
Every request is traced in a real-time dashboard: model, provider, token counts, cost, latency, status, and the route or fallback decision. No more guessing where spend or latency went.
Keys with budgets + limits
Create multiple API keys per account, each with its own maxSpend cap, rate limit (rateLimit), and per-minute token ceiling (tokenLimitTpm). Give each app or environment its own scoped key.
Prepaid pay-as-you-go wallet
Top up a prepaid wallet and pay per token at the posted model-rate token line, with checkout-visible FX/channel fees kept separate. Fund it by card, WeChat Pay, Alipay, Stripe, or USDT/USDC. Spend draws down as you go.
Native Claude Code & Codex
Claude Code talks to the Anthropic-compatible endpoint at https://api.router.one; OpenAI Codex CLI uses the OpenAI-compatible base URL. Both work by swapping one environment variable.
Global + China reachable
The same gateway is reachable from global networks and from Mainland China, so one integration serves teams on both sides without a separate setup.
Ledger, trace, and control
The gateway sits between your app and the models so it can do three things a direct call can't. The ledger records what every request cost at posted rates, drawn from your prepaid wallet. The trace records which model, provider, and route served each request, with tokens and latency. The control layer enforces per-key budgets and rate limits before a request reaches a provider, so a runaway loop or a leaked key can't drain the wallet.
Switch with one base URL change
Router One speaks the OpenAI Chat Completions API, so most integrations move over by changing the base URL and key. Set the two environment variables and your existing code keeps working.
# Route every model call through the gateway export OPENAI_BASE_URL=https://api.router.one/v1 export OPENAI_API_KEY=sk-your-router-one-key
FAQ
What is an LLM API gateway?
An LLM API gateway is one endpoint you route every model call through instead of integrating each provider separately. It gives you a unified interface for 25+ models, plus routing, fallback, cost and latency traces, and per-key budget controls in one place — a ledger, a trace, and a control layer over your model usage.
Is Router One OpenAI-compatible?
Yes. Router One implements the OpenAI Chat Completions API at https://api.router.one/v1, so any library or tool that works with OpenAI works by changing the base URL. Claude Code uses the Anthropic-compatible endpoint at https://api.router.one.
Which models can I call?
25+ models across the major families — GPT, Claude, Gemini, DeepSeek, Mistral, and Llama — all through the same unified endpoint. Browse the current catalog and per-model rates on the models page.
How does routing decide which model to use?
Pin an exact model and the gateway calls it directly. Send model="auto" and it selects a candidate using configurable latency, cost, and quality weights. Every trace shows which model and route actually served the request.
What happens when a provider goes down?
Smart routing fails over to a healthy same-family route when a provider returns a 5xx, times out, or its latency and error rates spike — as long as one is available. The request trace shows the route and fallback decision so you can see exactly what happened.
How does billing work?
You top up a prepaid wallet and pay per token at the posted model-rate token line, with checkout-visible FX/channel fees kept separate. Top up by card, WeChat Pay, Alipay, Stripe, or USDT/USDC. Each API key can carry its own spend cap and rate limit so usage stays within budget.
Related
Route every model through one gateway.
Get started free