Router One

One LLM API gateway for every model

An LLM API gateway is a single endpoint you route every model call through, instead of wiring each model provider into your app one by one. Router One is an OpenAI-compatible gateway for 25+ models — GPT, Claude, Gemini, DeepSeek, Mistral, and Llama — behind one base URL and one key. Direct model calls are a black box; routing through a gateway gives you a ledger (what every call cost), a trace (which model and route served it), and control (per-key budgets and rate limits). It is reachable globally and from Mainland China.

What the gateway does

One OpenAI-compatible endpoint

Point your existing OpenAI SDK at https://api.router.one/v1 and call 25+ models — GPT, Claude, Gemini, DeepSeek, Mistral, Llama — through the same Chat Completions interface. No per-model SDK, no rewrite.

Smart routing

Send model="auto" and the gateway picks a candidate using latency, cost, and quality weights. Tune the balance per request, or pin an exact model when you need determinism.

Automatic fallback

When a provider returns a 5xx or times out, the gateway fails over to the next healthy provider in the same model family. Requests keep flowing without you writing retry glue.

Per-request traces

Every request is traced in a real-time dashboard: model, provider, token counts, cost, latency, status, and the route or fallback decision. No more guessing where spend or latency went.

Keys with budgets + limits

Create multiple API keys per account, each with its own maxSpend cap, rate limit (rateLimit), and per-minute token ceiling (tokenLimitTpm). Give each app or environment its own scoped key.

Prepaid pay-as-you-go wallet

Top up a prepaid wallet and pay per token at the posted model-rate token line, with checkout-visible FX/channel fees kept separate. Fund it by card, WeChat Pay, Alipay, Stripe, or USDT/USDC. Spend draws down as you go.

Native Claude Code & Codex

Claude Code talks to the Anthropic-compatible endpoint at https://api.router.one; OpenAI Codex CLI uses the OpenAI-compatible base URL. Both work by swapping one environment variable.

Global + China reachable

The same gateway is reachable from global networks and from Mainland China, so one integration serves teams on both sides without a separate setup.

Ledger, trace, and control

The gateway sits between your app and the models so it can do three things a direct call can't. The ledger records what every request cost at posted rates, drawn from your prepaid wallet. The trace records which model, provider, and route served each request, with tokens and latency. The control layer enforces per-key budgets and rate limits before a request reaches a provider, so a runaway loop or a leaked key can't drain the wallet.

Switch with one base URL change

Router One speaks the OpenAI Chat Completions API, so most integrations move over by changing the base URL and key. Set the two environment variables and your existing code keeps working.

terminal
# Route every model call through the gateway
export OPENAI_BASE_URL=https://api.router.one/v1
export OPENAI_API_KEY=sk-your-router-one-key

FAQ

What is an LLM API gateway?

An LLM API gateway is one endpoint you route every model call through instead of integrating each provider separately. It gives you a unified interface for 25+ models, plus routing, fallback, cost and latency traces, and per-key budget controls in one place — a ledger, a trace, and a control layer over your model usage.

Is Router One OpenAI-compatible?

Yes. Router One implements the OpenAI Chat Completions API at https://api.router.one/v1, so any library or tool that works with OpenAI works by changing the base URL. Claude Code uses the Anthropic-compatible endpoint at https://api.router.one.

Which models can I call?

25+ models across the major families — GPT, Claude, Gemini, DeepSeek, Mistral, and Llama — all through the same unified endpoint. Browse the current catalog and per-model rates on the models page.

How does routing decide which model to use?

Pin an exact model and the gateway calls it directly. Send model="auto" and it selects a candidate using configurable latency, cost, and quality weights. Every trace shows which model and route actually served the request.

What happens when a provider goes down?

Smart routing fails over to a healthy same-family route when a provider returns a 5xx, times out, or its latency and error rates spike — as long as one is available. The request trace shows the route and fallback decision so you can see exactly what happened.

How does billing work?

You top up a prepaid wallet and pay per token at the posted model-rate token line, with checkout-visible FX/channel fees kept separate. Top up by card, WeChat Pay, Alipay, Stripe, or USDT/USDC. Each API key can carry its own spend cap and rate limit so usage stays within budget.

Related

Route every model through one gateway.

Get started free