Router One
Back to Blog

How to Track LLM API Costs per Key, Model, and Request

|Router One Team

Ask a team what their LLM API spend was last month and most can answer. Ask which app, agent, or teammate spent it — and on which models — and the answer is usually a shrug followed by a spreadsheet project. This post describes a setup that makes cost attribution a property of your infrastructure instead of a monthly forensic exercise.

The core idea is simple: route every model call through one gateway, give every workload its own API key, and let the gateway's ledger do the bookkeeping.

Why provider dashboards aren't enough

If you call two or three providers directly, your spend lives in two or three dashboards, each with its own update cadence, currency handling, and definition of a "request". Worse, attribution stops at the account level: the dashboard can tell you what the whole account spent, but not which of your services made the calls, unless you maintain separate accounts per service — which multiplies billing overhead instead of reducing it.

The usual workaround is application-side logging: wrap every SDK call, estimate token counts, multiply by a price table you maintain by hand, and hope nobody calls a model you forgot to add. It works until a price changes or someone adds a provider, and it silently breaks.

Step 1: One key per workload

A gateway inverts the problem. Router One sits between your apps and 25+ supported models, so every call already passes through one place that knows the model, the token counts, and the posted rate. Attribution then comes down to one practice: create one API key per app, agent, or environment.

  • sk-rk-...prod-chatbot for the production assistant
  • sk-rk-...batch-pipeline for the nightly enrichment job
  • sk-rk-...claude-code-alice for a teammate's coding agent
  • sk-rk-...staging for everything pre-production

Keys are free to create, so the granularity is yours to choose. Once each workload has its own key, the usage dashboard gives you a spend line per workload with no instrumentation in your code at all.

Step 2: Read the per-request traces

Aggregates tell you that spend moved; traces tell you why. Every request through the gateway records its model, input and output tokens, cost at the posted rate, latency, status, and the route that served it. When the batch pipeline's spend doubles, the traces show whether it sent more requests, longer prompts, or quietly switched to a pricier model.

Cost is computed at the pay-as-you-go token line of each model's posted rate — the per-1M-token prices published on the models page — with FX and payment-channel fees shown separately at checkout. There is no price table for you to maintain, and no estimation: the trace records what the request actually cost. Prompt and completion bodies are not retained; metering uses token counts and metadata only.

Step 3: Cap before it hurts

Attribution without enforcement still leaves you reading about the incident in the invoice. Each key can carry three limits:

  • maxSpend — a hard ceiling on what the key may spend
  • rateLimit — requests per unit time
  • tokenLimitTpm — tokens per minute

A retry loop in the staging environment hits the staging key's cap and stops; production keeps running. A leaked key is bounded by its own ceiling instead of the whole wallet. And when usable credit reaches zero, the gateway returns HTTP 402 rather than accruing surprise debt — the pricing methodology page documents the billing semantics in detail.

Step 4: Review weekly, not monthly

With per-key attribution in place, a useful cadence is a five-minute weekly review: sort keys by spend, scan per-model distribution for surprises, and check whether any key is approaching its cap. Teams that do this catch model drift and prompt bloat while they are still cheap; the patterns and the dashboard views are described on the LLM cost tracking and LLM observability pages.

The takeaway

Cost attribution is not a reporting feature you bolt on later — it falls out of routing your calls through one ledger and naming your workloads with keys. Set that up once and "who spent what, on which models, and why" becomes a dashboard view instead of a quarterly mystery.

Start with one key per workload at router.one, and see the cost tracking overview for what the ledger records.

Related canonical pages

This article belongs to the LLM API Gateway and Routing cluster. These pages are the commercial page, setup docs, evidence source, and trust references.

Related reads