Router One
Back to Blog

Production-grade LLM Gateway vs Unofficial API Relays: Stability, Compliance, and Traceability

|Router One Team

If you have shopped for LLM API access from inside China in the past two years, you have run into them: small, fast-moving platforms that resell access to overseas models at attractive prices, accept WeChat Pay or Alipay or USDT, and let you start calling within minutes. They make demos and side projects feel effortless.

The trouble starts when you try to put one of them behind a production workload. Accounts get suspended without notice. Bills change shape mid-cycle. A request fails and there is no one to call. You cannot tell whether the model that just answered was the model you paid for. You cannot tell where your prompt went after the response came back.

This post is not about naming any single platform. It describes the structural pattern of unofficial LLM API relays, and where a production-grade gateway has to be different. The comparison is restrained on purpose — the choice should be obvious once the dimensions are laid out side by side.

If you have already read Router One vs OpenRouter China, this article complements it. That one compares two legitimate gateways. This one is about the gap between any legitimate gateway and the unofficial relay tier underneath.

What "unofficial relay platform" usually means

The pattern is consistent enough to describe in general terms. You typically see some combination of:

  • A pool of upstream provider accounts the platform rotates through, often topped up by intermediaries
  • Resale of someone else's API keys, with the markup hidden inside a per-token rate or a flat monthly plan
  • No clear legal entity behind the service — payment goes to a personal WeChat, an Alipay individual account, or a USDT wallet
  • No published terms of service, no refund policy, no incident page, no SLA
  • A control panel that shows you a balance and a usage counter, but no per-request trace and no breakdown of where each call actually went

None of these are inherently illegal in isolation, and many of these platforms genuinely help individual developers get unblocked. The problem is what happens when you depend on one. Five dimensions matter, and each maps to a specific kind of failure you will eventually see in production.

Dimension 1: Legal entity and compliance boundary

A production service needs to know who it is paying and who is on the hook when something goes wrong.

A production-grade LLM gateway has a registered company, public terms of service, a published refund policy (/refund), a security boundary statement (/security), and a public facts page (/trust/facts). Payments flow through documented rails — card processors and supported stablecoin top-up flows — with checkout terms and fee boundaries visible before payment. You have a counterparty.

An unofficial relay typically does not. Payment goes to an individual wallet, terms are absent or only displayed at signup and never linked, and there is no public entity to invoice, dispute, or escalate against. For an indie developer with a hobby project this is acceptable risk. For a team that will eventually be asked "who are we paying, and what are their obligations to us" by finance or legal, it is not.

Dimension 2: SLA and remediation

Production workloads do not just need a service to work most of the time. They need a service to take responsibility when it does not.

Router One publishes availability measurement rules on /sla, a public incident surface at /status, and trace-level visibility when fallback happens. Annual enterprise contracts can include service credits when contractual availability targets are missed. You can argue with the measurement rules. You can hold us to the contract you signed.

The unofficial relay tier does not generally do this. There is no published availability target, no incident page, and no remediation when the upstream account pool gets banned and the platform goes dark for a day. The economics do not support it: a platform whose margin comes from arbitraging shared accounts cannot afford to refund customers when an account gets suspended.

This is the single most expensive surprise in practice. The platform is cheap on paper, until your agent stops responding at 3am and the answer is a message in a group chat saying "上游被封了,明天处理."

Dimension 3: Per-request observability

The reason you put a gateway in front of LLM calls in the first place is to know what happened on each call.

Router One emits a per-request trace covering the model that served the request, the route decision, token counts, latency, and cost — the methodology is documented at /routing-methodology. You can answer "why did this request cost what it cost", "which step of the fallback chain ran", and "is the p99 latency drift coming from one upstream or all of them". You can do cost attribution by project, by API key, by agent.

An unofficial relay typically gives you a balance and an aggregate usage counter. The response is a black box. If the model output quality drops, you cannot tell whether the platform silently downgraded the route to a cheaper variant. If costs spike, you cannot tell which calls drove them. You are paying for a number on a dashboard, not for an auditable record.

For demos this does not matter. For anything you have to explain to a manager or a customer, it matters a lot.

Dimension 4: Pricing transparency

A legitimate gateway charges a posted per-token rate. The rate is on the model marketplace at /models. FX and channel fees, where they apply, are shown at checkout, not buried. The methodology behind those numbers is on /pricing-methodology. You can compare them, line by line, against the upstream provider's own published rates.

The unofficial relay tier frequently uses a different structure: bundled monthly plans, opaque multipliers on top of upstream prices, FX rates that show up only at the moment of payment, and auto-renewing subscriptions that are harder to cancel than to start. None of this is fraud — it is just the structure that maximizes the margin a reseller can extract. But it makes capacity planning impossible. You cannot model your unit economics on top of a number that changes shape every month.

Dimension 5: Data retention and security boundary

The last dimension is the one teams notice last and regret most.

Router One's position is documented in /security and /data-retention: we do not retain prompt or completion bodies, we retain only the metadata needed to bill and to operate the service, and the retention windows are public. If you need to argue to your security team that your prompts are not sitting in someone else's database, you have a document to point at.

An unofficial relay typically has no retention statement at all. The data flows through the platform's infrastructure on its way upstream, and you have no contract describing what happens to it. For personal experimentation this is fine. For anything touching customer data, code repositories, or internal documents, it is a problem you cannot delegate to "I trust the operator."

Side-by-side

DimensionProduction-grade gatewayTypical unofficial relay
Legal entityRegistered company, public ToS, refund policyIndividual wallet, no public entity
SLAPublished measurement rules, public incident timeline, enterprise credits by contractNone
Per-request observabilityTrace with model, route, tokens, latency, costAggregate balance and usage counter
Pricing structurePosted per-token rates, FX shown at checkout, methodology pageBundled plans, hidden multipliers, opaque FX
Data retentionPublished retention windows, no prompt/completion bodies keptUndisclosed
Recourse on failureStatus page, support channel, enterprise credits by contractGroup chat, no remediation
Upstream sourcingOfficial upstream API channelsPooled accounts, resold keys

A purchasing checklist

Before you put a relay platform behind a production workload, ask the operator to answer these questions in writing. A legitimate gateway answers all eight in under a minute. If your current provider cannot answer most of them, you have your answer about whether they should be running your production traffic.

  1. What is the legal entity? Where is it registered? Is there a business license number you can share?
  2. Where is the link to your terms of service, refund policy, and acceptable use policy?
  3. What is your published uptime target, and what is the remediation when it is missed?
  4. Are you calling the upstream model providers through their official API channels, or through a pool of consumer or third-party accounts?
  5. Can a single request show me the model that served it, the token counts, the latency, and the cost?
  6. Do you retain prompt or completion bodies? For how long? Where are they stored?
  7. Is your per-token rate published? Are FX and channel fees displayed at checkout?
  8. Do you publish a status page with historical availability?

When the relay tier is fine, and when it is not

This piece is not arguing that every cheap relay platform is unusable. For weekend projects, demos, learning, and one-off experiments, the unofficial relay tier is often the fastest path to a working integration, and the lack of structure around it is not a real cost.

The argument is narrower: the moment a workload starts mattering to your users, your team, your finance department, or your customers' data, the absence of an entity, an SLA, a trace, a published rate, and a retention statement stops being abstract. Each of those gaps maps to a concrete incident you will eventually have to explain.

If you are ready to put a gateway behind real traffic, see the Router One vs OpenRouter China comparison for the legitimate-gateway tier, the product comparison page for a quick side-by-side, or sign up at router.one to get started.

Related canonical pages

This article belongs to the LLM API Gateway and Routing cluster. These pages are the commercial page, setup docs, evidence source, and trust references.

Related reads