Ask a team what their LLM API spend was last month and most can answer. Ask which app, agent, or teammate spent it — and on which models — and the answer is usually a shrug followed by a spreadsheet project. This post describes a setup that makes cost attribution a property of your infrastructure instead of a monthly forensic exercise.

The core idea is simple: route every model call through one gateway, give every workload its own API key, and let the gateway's ledger do the bookkeeping.

Why provider dashboards aren't enough

If you call two or three providers directly, your spend lives in two or three dashboards, each with its own update cadence, currency handling, and definition of a "request". Worse, attribution stops at the account level: the dashboard can tell you what the whole account spent, but not which of your services made the calls, unless you maintain separate accounts per service — which multiplies billing overhead instead of reducing it.

The usual workaround is application-side logging: wrap every SDK call, estimate token counts, multiply by a price table you maintain by hand, and hope nobody calls a model you forgot to add. It works until a price changes or someone adds a provider, and it silently breaks.

Step 1: One key per workload

A gateway inverts the problem. Router One sits between your apps and 40+ supported models, so every call already passes through one place that knows the model, the token counts, and the posted rate. Attribution then comes down to one practice: create one API key per app, agent, or environment.

sk-rk-...prod-chatbot for the production assistant
sk-rk-...batch-pipeline for the nightly enrichment job
sk-rk-...claude-code-alice for a teammate's coding agent
sk-rk-...staging for everything pre-production

Keys are free to create, so the granularity is yours to choose. Once each workload has its own key, the usage dashboard gives you a spend line per workload with no instrumentation in your code at all.

Step 2: Read the per-request traces

Aggregates tell you that spend moved; traces tell you why. Every request through the gateway records its model, input and output tokens, cost at the posted rate, latency, status, and the route that served it. When the batch pipeline's spend doubles, the traces show whether it sent more requests, longer prompts, or quietly switched to a pricier model.

Cost is computed at the pay-as-you-go token line of each model's posted rate — the per-1M-token prices published on the models page — with FX and payment-channel fees shown separately at checkout. There is no price table for you to maintain, and no estimation: the trace records what the request actually cost. Prompt and completion bodies are not retained; metering uses token counts and metadata only.

Step 3: Cap before it hurts

Attribution without enforcement still leaves you reading about the incident in the invoice. Each key can carry three limits:

maxSpend — a hard ceiling on what the key may spend
rateLimit — requests per unit time
tokenLimitTpm — tokens per minute

A retry loop in the staging environment hits the staging key's cap and stops; production keeps running. A leaked key is bounded by its own ceiling instead of the whole wallet. And when usable credit reaches zero, the gateway returns HTTP 402 rather than accruing surprise debt — the pricing methodology page documents the billing semantics in detail.

Step 4: Review weekly, not monthly

With per-key attribution in place, a useful cadence is a five-minute weekly review: sort keys by spend, scan per-model distribution for surprises, and check whether any key is approaching its cap. Teams that do this catch model drift and prompt bloat while they are still cheap; the patterns and the dashboard views are described on the LLM cost tracking and LLM observability pages.

The takeaway

Cost attribution is not a reporting feature you bolt on later — it falls out of routing your calls through one ledger and naming your workloads with keys. Set that up once and "who spent what, on which models, and why" becomes a dashboard view instead of a quarterly mystery.

Start with one key per workload at router.one, and see the cost tracking overview for what the ledger records. Once attribution is in place, the next two pieces of the same setup are fallback that holds in production and, if you resell access, spend-capped keys per customer.

How to Track LLM API Costs per Key, Model, and Request

Why provider dashboards aren't enough

Step 1: One key per workload

Step 2: Read the per-request traces

Step 3: Cap before it hurts

Step 4: Review weekly, not monthly

The takeaway

Related canonical pages

Related reads

Reselling LLM API Access Safely with Spend-Capped Keys

AI Agents in Production: Observability, Cost Caps, Recovery

LLM Fallback Strategies: Production Failover That Holds