Why Your Claude Code Max Plan Runs Out So Fast

If your 5-hour Max Plan limit is exhausting in 1-2 hours with normal usage, hidden token sinks are the cause. This guide helps you find and fix them.

Quick install — two hooks that diagnose the problem:

npx cc-safe-setup --install-example prompt-usage-logger
npx cc-safe-setup --install-example compact-alert-notification

Step 1: Track What's Consuming Tokens

The prompt-usage-logger hook logs every prompt with timestamps to /tmp/claude-usage-log.txt. After a session, compare the log with your usage dashboard.

# Example log output:
12:34:56 prompt=Read the file src/main.ts and explain the error handling
12:35:23 prompt=Fix the bug in the validateInput function
12:35:45 prompt=(agent spawned — new context window)

Step 2: Count Auto-Compactions

The compact-alert-notification hook logs when auto-compaction fires. Each compaction cycle burns tokens to re-summarize the entire context.

# Check after a session:
cat /tmp/claude-compact-log.txt
# If you see 3+ entries, compact-rebuild cycles are a major token sink

Step 3: Check Common Token Sinks

CauseHow to CheckFix
Too many MCP serversclaude mcp listRemove unused servers
Large CLAUDE.md / MEMORY.mdwc -c CLAUDE.mdMove reference content to separate files
Auto-compact cyclescompact-alert count > 3Use manual /compact before threshold
Large file readsCheck prompt-usage-logUse offset and limit parameters
Subagent spawningCount "agent spawned" in logEach Agent call = new context window
Deferred Tool Loading (v2.1.89+)ToolSearch calls in transcriptSet ENABLE_TOOL_SEARCH=false in settings.json env
--resume cache breakage"hi" costs 2-5% quota after resumeStart fresh sessions instead of resuming
Session file self-readsClaude reads its own .jsonl filesInstall read-budget-guard hook

Step 3b: Disable Deferred Tool Loading (v2.1.89+)

In v2.1.89+, Deferred Tool Loading can break the cache prefix, causing every prompt to rebuild context from scratch. This is one of the most impactful token consumption bugs reported (#41617, #40524).

Fix: Add to .claude/settings.json:

{
  "env": {
    "ENABLE_TOOL_SEARCH": "false"
  }
}

This prevents ToolSearch deferred loading and preserves the cache prefix across turns. Symptoms: "hi" consuming 2% quota, simple questions using 20%+ quota, session hitting 100% in under 70 minutes.

Step 4: Reduce Consumption

Manual /compact before threshold

Auto-compact fires at a fixed threshold. Manual /compact before that point is cheaper because you control the timing.

Use plan mode for complex tasks

Planning first reduces total tool calls vs. trial-and-error implementation.

Read files with offset/limit

Don't load full files when you only need a section. A 10K-line file costs the same tokens whether you need line 1 or all 10K.

Trim CLAUDE.md

Every byte of CLAUDE.md is loaded into every turn's context. Move reference-only content to files that Claude reads on demand.

Step 5: Audit MCP Servers

Each MCP server adds tool definitions to every turn's context. 5+ servers can silently double your token usage.

# List all active MCP servers
claude mcp list

# Check how many tools each server exposes
# More tools = more tokens per turn, even if you never call them

Fix: Remove MCP servers you don't use daily. Keep only what's needed for the current task. You can re-add them later.

Related Issues

This is a recurring problem. If you're experiencing it, you're not alone:

Full safety setup (8 core hooks + project-specific recommendations):

npx cc-safe-setup --shield

GitHub · Getting Started · All Tools

Quick diagnostic: Token Checkup — 5 questions to find where your tokens are going

Complete token optimization system:

📊 Token Optimization Book — 10 chapters, templates included (¥2,500)

📘 Hook Design Guide — Chapter 3 free (¥800 on Zenn)