Five common reasons Claude Code consumes your credits or quota in a way you didn't expect — each with how to check it and the fix. All checks are read-only and run on your own machine.
Claude Code's documented credential precedence puts ANTHROPIC_AUTH_TOKEN and ANTHROPIC_API_KEY above your Pro/Max subscription. If either is set anywhere in your environment, Claude Code authenticates as a pay-per-token API request and bills your purchased/Console credits — even when your subscription quota is at 0% used. This is the single most common cause of "subscription available, but it's draining my paid tokens."
echo $ANTHROPIC_API_KEY
echo $ANTHROPIC_AUTH_TOKEN
If either prints anything, that's the cause. Inside Claude Code, /status shows which auth method is active (subscription vs API key).
Unset both variables and relaunch. Check the places that re-export them on every new shell: your shell profile (.zshrc / .bashrc / .profile) and any .env or direnv .envrc. In /config, the "Use custom API key" toggle remembers a previous approval — switch it off too. Then confirm with /status.
claude -p draw from a separate credit poolPer the official docs: "Starting June 15, 2026, Agent SDK and claude -p usage on subscription plans will draw from a new monthly Agent SDK credit, separate from your interactive usage limits." If you run headless/scripted Claude Code (cron jobs, CI, claude -p pipelines, Agent SDK apps), that usage moves to a different pool on that date. A job that "just worked" against your interactive limit may start hitting a separate credit ceiling — or, if you've wired in an API key for automation, bill API credits instead.
Inventory where you call claude -p or the Agent SDK (cron, CI, scripts). For each, confirm which credential it uses — subscription OAuth token (CLAUDE_CODE_OAUTH_TOKEN from claude setup-token) vs an API key. Mixed setups are where the surprise lands.
Decide per workload before June 15 whether it should draw the new SDK credit (subscription auth) or API credits, and set the credential explicitly so the routing isn't accidental.
A single turn that reads a very large file, a huge diff, or a long tool output can balloon cache_creation tokens far beyond what the visible work suggests. Because cache-creation tokens are billed, one oversized request can spike a session's cost with no obvious cause in the transcript.
Watch /cost across turns and note which turn jumps. The cache health tool and token checkup help localize it.
Avoid pulling whole large files/outputs into context in one turn; read targeted ranges. cc-safe-setup ships a cache-creation-drift-detector hook that warns when a single request's cache creation jumps above a trailing average.
When several sub-agents (or several vendors' workers) run in parallel against the same repo, each has its own context window and can re-create the prompt cache independently. The combined token cost is super-linear: a fan-out of N workers does not cost 1× — observed multipliers run 3–5×, and one reported run hit ~887K tokens/min.
Watch whether cost scales with the number of parallel lanes, not the amount of work. If a 5-way fan-out costs far more than 5× a single run, the cache is being re-created per lane.
Cap parallel fan-out, give workers disjoint scopes, and prefer one lead lane for edits. If you run multiple AI CLIs at once, see the multi-vendor fleet diagnostic.
Sometimes the model keeps editing-running-reverting on a problem it could have reasoned out first, consuming quota each cycle without resolving the issue. This isn't a billing bug — it's a workflow one — but it shows up as "my quota vanished and the bug is still there."
If you see repeated edit/run cycles on the same symptom with no stated diagnosis, that's the pattern.
Force the reasoning before the edit: use plan mode (Shift+Tab) and ask for the root-cause diagnosis and the exact change before any code is written; approve, then let it edit. For open-ended exploration, a cheaper model is fine — switch to the stronger one once the approach is locked.