If your 5-hour Max Plan limit is exhausting in 1-2 hours with normal usage, hidden token sinks are the cause. This guide helps you find and fix them.
Quick install — two hooks that diagnose the problem:
npx cc-safe-setup --install-example prompt-usage-logger
npx cc-safe-setup --install-example compact-alert-notification
The prompt-usage-logger hook logs every prompt with timestamps to /tmp/claude-usage-log.txt. After a session, compare the log with your usage dashboard.
# Example log output:
12:34:56 prompt=Read the file src/main.ts and explain the error handling
12:35:23 prompt=Fix the bug in the validateInput function
12:35:45 prompt=(agent spawned — new context window)
The compact-alert-notification hook logs when auto-compaction fires. Each compaction cycle burns tokens to re-summarize the entire context.
# Check after a session:
cat /tmp/claude-compact-log.txt
# If you see 3+ entries, compact-rebuild cycles are a major token sink
| Cause | How to Check | Fix |
|---|---|---|
| Too many MCP servers | claude mcp list | Remove unused servers |
| Large CLAUDE.md / MEMORY.md | wc -c CLAUDE.md | Move reference content to separate files |
| Auto-compact cycles | compact-alert count > 3 | Use manual /compact before threshold |
| Large file reads | Check prompt-usage-log | Use offset and limit parameters |
| Subagent spawning | Count "agent spawned" in log | Each Agent call = new context window |
| Deferred Tool Loading (v2.1.89+) | ToolSearch calls in transcript | Set ENABLE_TOOL_SEARCH=false in settings.json env |
--resume cache breakage | "hi" costs 2-5% quota after resume | Start fresh sessions instead of resuming |
| Session file self-reads | Claude reads its own .jsonl files | Install read-budget-guard hook |
In v2.1.89+, Deferred Tool Loading can break the cache prefix, causing every prompt to rebuild context from scratch. This is one of the most impactful token consumption bugs reported (#41617, #40524).
Fix: Add to .claude/settings.json:
{
"env": {
"ENABLE_TOOL_SEARCH": "false"
}
}
This prevents ToolSearch deferred loading and preserves the cache prefix across turns. Symptoms: "hi" consuming 2% quota, simple questions using 20%+ quota, session hitting 100% in under 70 minutes.
Auto-compact fires at a fixed threshold. Manual /compact before that point is cheaper because you control the timing.
Planning first reduces total tool calls vs. trial-and-error implementation.
Don't load full files when you only need a section. A 10K-line file costs the same tokens whether you need line 1 or all 10K.
Every byte of CLAUDE.md is loaded into every turn's context. Move reference-only content to files that Claude reads on demand.
Each MCP server adds tool definitions to every turn's context. 5+ servers can silently double your token usage.
# List all active MCP servers
claude mcp list
# Check how many tools each server exposes
# More tools = more tokens per turn, even if you never call them
Fix: Remove MCP servers you don't use daily. Keep only what's needed for the current task. You can re-add them later.
This is a recurring problem. If you're experiencing it, you're not alone:
Full safety setup (8 core hooks + project-specific recommendations):
npx cc-safe-setup --shield
Quick diagnostic: Token Checkup — 5 questions to find where your tokens are going
Complete token optimization system:
📊 Token Optimization Book — 10 chapters, templates included (¥2,500)