Claude Code billing surprises: why your subscription or credits drain unexpectedly

Five common reasons Claude Code consumes your credits or quota in a way you didn't expect — each with how to check it and the fix. All checks are read-only and run on your own machine.

If your Claude Code usage is being billed somewhere you didn't expect — purchased/API credits draining while your subscription sits idle, quota vanishing faster than your work explains, or a sudden change after an update — it is almost always one of the five causes below. Work down the list; the first two cover the large majority of reports.

1. An API key in your environment overrides your subscription

most common

Claude Code's documented credential precedence puts ANTHROPIC_AUTH_TOKEN and ANTHROPIC_API_KEY above your Pro/Max subscription. If either is set anywhere in your environment, Claude Code authenticates as a pay-per-token API request and bills your purchased/Console credits — even when your subscription quota is at 0% used. This is the single most common cause of "subscription available, but it's draining my paid tokens."

How to check

echo $ANTHROPIC_API_KEY
echo $ANTHROPIC_AUTH_TOKEN

If either prints anything, that's the cause. Inside Claude Code, /status shows which auth method is active (subscription vs API key).

Fix

Unset both variables and relaunch. Check the places that re-export them on every new shell: your shell profile (.zshrc / .bashrc / .profile) and any .env or direnv .envrc. In /config, the "Use custom API key" toggle remembers a previous approval — switch it off too. Then confirm with /status.

2. From June 15, 2026: Agent SDK and claude -p draw from a separate credit pool

timely — June 15

Per the official docs: "Starting June 15, 2026, Agent SDK and claude -p usage on subscription plans will draw from a new monthly Agent SDK credit, separate from your interactive usage limits." If you run headless/scripted Claude Code (cron jobs, CI, claude -p pipelines, Agent SDK apps), that usage moves to a different pool on that date. A job that "just worked" against your interactive limit may start hitting a separate credit ceiling — or, if you've wired in an API key for automation, bill API credits instead.

How to check

Inventory where you call claude -p or the Agent SDK (cron, CI, scripts). For each, confirm which credential it uses — subscription OAuth token (CLAUDE_CODE_OAUTH_TOKEN from claude setup-token) vs an API key. Mixed setups are where the surprise lands.

Fix

Decide per workload before June 15 whether it should draw the new SDK credit (subscription auth) or API credits, and set the credential explicitly so the routing isn't accidental.

3. Cache creation inflated on a single large request

common

A single turn that reads a very large file, a huge diff, or a long tool output can balloon cache_creation tokens far beyond what the visible work suggests. Because cache-creation tokens are billed, one oversized request can spike a session's cost with no obvious cause in the transcript.

How to check

Watch /cost across turns and note which turn jumps. The cache health tool and token checkup help localize it.

Fix

Avoid pulling whole large files/outputs into context in one turn; read targeted ranges. cc-safe-setup ships a cache-creation-drift-detector hook that warns when a single request's cache creation jumps above a trailing average.

4. Parallel sub-agents or multi-tool workers re-create the cache per lane

common

When several sub-agents (or several vendors' workers) run in parallel against the same repo, each has its own context window and can re-create the prompt cache independently. The combined token cost is super-linear: a fan-out of N workers does not cost 1× — observed multipliers run 3–5×, and one reported run hit ~887K tokens/min.

How to check

Watch whether cost scales with the number of parallel lanes, not the amount of work. If a 5-way fan-out costs far more than 5× a single run, the cache is being re-created per lane.

Fix

Cap parallel fan-out, give workers disjoint scopes, and prefer one lead lane for edits. If you run multiple AI CLIs at once, see the multi-vendor fleet diagnostic.

5. Trial-and-error loops burn quota without converging

common

Sometimes the model keeps editing-running-reverting on a problem it could have reasoned out first, consuming quota each cycle without resolving the issue. This isn't a billing bug — it's a workflow one — but it shows up as "my quota vanished and the bug is still there."

How to check

If you see repeated edit/run cycles on the same symptom with no stated diagnosis, that's the pattern.

Fix

Force the reasoning before the edit: use plan mode (Shift+Tab) and ask for the root-cause diagnosis and the exact change before any code is written; approve, then let it edit. For open-ended exploration, a cheaper model is fine — switch to the stronger one once the approach is locked.

Going deeper. These are the fast checks. For the full treatment of where Claude Code tokens actually go — cache tiers, the cost math, and the staged optimization playbook — see the Token Savings Guide (¥2,500) (intro + chapter 1 free on Zenn). For the June 15 billing/quota cliff specifically, see the failure-cluster tracker. All the defensive hooks referenced above are free in cc-safe-setup (MIT).