LLM observability and cost tracking, per request
Router One is an OpenAI-compatible LLM API gateway, and every request through it produces a trace. For each call across 25+ models — GPT, Claude, Gemini, DeepSeek, Mistral, Llama — you see the model, provider, input/output tokens, cost at the posted rate, latency, status code, and the exact route or fallback path the request took. It is request-level visibility and spend control for developers and small teams, not an enterprise audit, compliance, or RBAC system.
What you get in every trace
| Signal | Router One trace | Calling the provider directly |
|---|---|---|
| Model & provider used | Recorded per request, including the resolved route | Whatever your one SDK call targeted |
| Input / output tokens | Counted and stored per request | In the response, if you log it yourself |
| Cost per request | Computed at the posted model rate, no markup added | You reconcile invoices later |
| Latency | Measured end to end and stored | Only if you instrument it |
| Route / fallback decision | Trace shows the route taken and any fallback | No fallback; a 5xx is just an error |
| Status code & errors | Logged per request, searchable in Logs | In your own logs, if any |
| Spend ceiling | Per-key maxSpend stops a runaway loop | No hard ceiling on the key |
Where to look
Dashboard -> Logs
Every request is one row: model, provider, tokens, cost, latency, status, and the route/fallback path. Filter by API key or model to find the call that misbehaved, then open it for the full trace.
Dashboard -> Usage
Per-model and per-API-key breakdowns of requests, tokens, and spend over time. See which model and which key are driving cost, all priced at the posted rate from your prepaid wallet.
Budgets & rate limits
Each API key carries its own maxSpend plus rateLimit and tokenLimitTpm. A runaway loop hits its own ceiling and stops instead of draining the whole balance — spend control without a governance suite.
A trace, in shape
// one request -> one trace row { "model": "claude-sonnet", "provider": "<routed>", "input_tokens": 512, "output_tokens": 200, "cost_usd": 0.0042, "latency_ms": 1180, "status": 200, "route": "primary", "fallback": null }
What stays private
Router One does not retain prompt or completion bodies. Only metadata — model, provider, token counts, cost, latency, status, and the route/fallback path — is logged, and it is used for billing, routing, and observability. You get the numbers you need to debug and budget without your prompts being stored.
Read the data retention policy ->FAQ
What exactly is in a trace?
Each request records the model, provider, input and output token counts, cost at the posted rate, end-to-end latency, the HTTP status code, and the route or fallback path the request took. You can see it per request in Dashboard -> Logs and in aggregate in Dashboard -> Usage.
Does Router One store my prompts and responses?
No. Router One does not retain prompt or completion bodies. Only request metadata is logged — for billing, routing, and observability. See the data retention page for the full boundary.
How do I stop a runaway loop from draining my balance?
Give the API key a maxSpend, plus rateLimit and tokenLimitTpm. When a loop hits the key's spend ceiling or rate limit, requests on that key stop instead of consuming the whole wallet. Budgets are per key, not per project.
Is this an enterprise audit or compliance platform?
No. This is request-level observability and spend control for developers and small teams. It is not an audit log, compliance, or RBAC system. There is no organization/role structure — limits and budgets attach to API keys.
What does the trace show when a provider fails?
When latency spikes or error rates climb on a route, smart routing fails over to a healthy same-family route as long as one is available, and the trace records both the original route and the fallback decision so you can see exactly what happened.
Related
See every request, control every dollar
Get started free