Track LLM costs per request, model, and key
Most teams discover their LLM spend when the invoice arrives — long after the runaway loop, the over-provisioned model, or the leaked key did the damage. Cost tracking at the gateway fixes the timing: every call through Router One is metered as it happens, priced at the posted model rate, and attributed to the API key that made it. One prepaid wallet, 25+ models, and a ledger you can actually reconcile.
What gets tracked
A cost trace per request
Every request records its model, input and output tokens, cost at the posted rate, latency, status, and the route that served it — visible in the dashboard the moment the request completes.
Per-model breakdowns
Usage rolls up by model over time, so you can see which families drive spend, compare cost per workload, and catch a quiet shift toward an expensive model before it compounds.
Per-key attribution
Give each app, environment, or agent its own API key and spend attributes cleanly. No more guessing which integration burned the budget — the ledger says exactly which key did.
Per-key spend caps
Each key can carry its own maxSpend cap, rate limit (rateLimit), and per-minute token ceiling (tokenLimitTpm). A runaway loop or leaked key hits its cap instead of draining the wallet.
A prepaid wallet ledger
Spend draws down from a prepaid wallet at the pay-as-you-go token line of each model's posted rate; FX and channel fees stay visible at checkout, separate from the token line.
Hard stop at zero
When usable credit runs out, calls return HTTP 402 instead of accruing surprise debt. Top up — by card, WeChat Pay, Alipay, Stripe, or USDT/USDC — and requests resume immediately.
A ledger, not a guess
Direct provider calls scatter your spend across dashboards that update on their own schedules. A gateway puts one ledger in the request path: cost is computed per call at posted rates, attributed to a key, and aggregated in real time. That changes cost work from forensic reconstruction at month-end to a live number you can query while the workload is still running — and act on with per-key caps before the bill grows.
Point your stack at the gateway
Cost tracking starts working the moment your traffic flows through the gateway. Change the base URL, keep your code, and every request after that lands in the ledger.
# Every request after this lands in the ledger export OPENAI_BASE_URL=https://api.router.one/v1 export OPENAI_API_KEY=sk-your-router-one-key
FAQ
How is the cost of each request computed?
Each request is priced at the pay-as-you-go token line of the model's posted rate — input and output tokens times the per-1M-token price listed on the models page. FX and payment-channel fees are shown at checkout and kept separate from the token line.
Can I cap spend per API key?
Yes. Every key can carry its own maxSpend cap, rate limit, and per-minute token ceiling (tokenLimitTpm). When a key hits its cap, its requests stop while the rest of the account keeps working.
Where do I see usage and costs?
Dashboard -> Logs shows the per-request traces; Dashboard -> Usage shows requests, tokens, and spend aggregated by model and API key over time. Traces appear in real time as requests complete.
What happens when my balance runs out?
API calls return HTTP 402 until the wallet is topped up or plan coverage applies. There is no silent overage — spend can never exceed what you have prepaid plus your plan.
Do you store my prompts to compute costs?
No. Cost metering uses token counts and metadata only. Prompt and completion bodies are not retained — the trace records model, tokens, cost, latency, and status.
Does this work for Claude Code and Codex CLI?
Yes. CLI agents route through the same gateway, so every coding-agent request gets the same per-request cost trace and counts against the same per-key budgets as your application traffic.
Related
Know what every model call costs — as it happens.
Get started free