Cut your token consumption in half. Based on 800 hours of real operation data.
After 800 hours of autonomous Claude Code operation, here's the measured breakdown of where tokens are consumed:
| Source | % of Total | Optimization Potential |
|---|---|---|
| CLAUDE.md / instructions | 15–30% | 200 lines → 35 lines = up to 50% reduction |
| File reads | 25–30% | read-budget-guard prevents redundant reads |
| Code generation (output) | 20–25% | Model selection + effort level |
| Tool schemas / MCP servers | 12–20% | Disable unused MCP servers |
| Conversation history / compaction | 10–25% | /clear and /compact management |
Your CLAUDE.md is the single biggest lever for token savings. It's included in every API call, so every line costs you tokens on every turn.
# BEFORE: 200+ lines (costs ~5,000 tokens/turn)
## Project Rules
- Do not modify files in /config/
- Do not modify files in /migrations/
- Do not modify files in /.github/
- Do not delete any files without asking
- Do not use rm -rf
- Do not force-push
- Do not commit directly to main
- Always run tests before committing
- Use TypeScript strict mode
... (190 more lines of rules, examples, and explanations)
# AFTER: 35 lines (costs ~800 tokens/turn)
# my-app
## Rules
- Only modify files in /src/ and /tests/ (hook enforced)
- Test before commit (hook enforced)
- TypeScript strict mode
## Architecture
| Layer | Tech | Path |
|-------|------|------|
| API | Express + Zod | /src/api/ |
| DB | Prisma + Postgres | /prisma/ |
| Auth | JWT + bcrypt | /src/auth/ |
## Conventions
- Files: kebab-case
- Functions: camelCase
- One export per file
Result: Same behavior enforcement, 84% fewer tokens per turn. Over a 30-turn session, this saves ~126,000 tokens.
Analyze your current CLAUDE.md: CLAUDE.md Analyzer (free tool)
Hooks are shell scripts that run before or after Claude Code's tool calls. They execute locally (zero token cost) and can prevent token waste automatically.
# Install 691+ safety and optimization hooks in 10 seconds
npx cc-safe-setup
| Hook | What It Does | Token Savings |
|---|---|---|
read-budget-guard | Limits file read count per session. Prevents Claude from re-reading the same file 5 times. | 10–25% reduction |
token-budget-guard | Sets a session token budget. Warns at 70%, blocks at 90%. | Prevents runaway sessions |
pre-compact-checkpoint | Auto-creates a git checkpoint before compaction. Prevents hallucination-induced rework. | Saves entire redo sessions |
context-monitor | Warns at 75% context usage, alerts at 90%. Prompts you to /clear or /compact. | 5–15% by preventing overflow |
subagent-spawn-limiter | Limits concurrent subagent spawns. Each subagent has its own context window. | 20–40% on agent-heavy workflows |
large-read-guard | Blocks reads of files over a size threshold. Forces targeted reads with offset/limit. | 10–30% on large codebases |
read-budget-guard alone saved us 18% of tokens per session by catching Claude re-reading the same configuration file on every turn.
# Option 1: Full safety + token optimization suite
npx cc-safe-setup
# Option 2: Token guards only
npx cc-safe-setup
# Then use the Hook Selector to pick only token-related hooks
Choose exactly which hooks you need: Hook Selector (interactive)
Not every task needs Opus. Using the right model per task can cut costs by 60–80%.
| Task Type | Recommended Model | Why |
|---|---|---|
| Routine coding, bug fixes | Sonnet 4.6 | 1/5 the cost of Opus. Handles 80% of tasks equally well. |
| Complex architecture decisions | Opus 4.7 | Better reasoning, but 5x the cost. Use only when needed. |
| Subagent tasks | Haiku 4.5 | Simple search/read tasks don't need Opus-level reasoning. |
| Code review | Sonnet 4.6 | Pattern matching is Sonnet's strength. |
Switch models mid-session with /model. No need to restart.
Context is the most expensive resource in Claude Code. Every message accumulates in the context window, and you pay for all of it on every turn.
/clear between tasks — When you switch from one feature to another, clear the context. Old context is dead weight that costs tokens on every subsequent turn./compact for long sessions — Compresses conversation history. You can add custom instructions: /compact keep only code changes and test results/mcp and disable what you don't need.Read file.py lines 50-80 instead of reading entire files. The large-read-guard hook enforces this.Opus 4.7 (released April 16, 2026) changed the token economics significantly:
cache_read_input_tokens appearing with no prior cache_creation, inflating costs (#49302)/model sonnet for routine tasks — Opus 4.7 is expensive for simple work/cost — Check your actual spending per sessionFull Opus 4.7 issue tracker: Opus 4.7 Survival Guide (17 sections, 28 tracked issues)
This page covers the basics. The full Token Book includes copy-paste templates, hook configurations, before/after data for every technique, and the complete Opus 4.7 chapter.
Token Book — 10 chapters, 44K wordsIntroduction + Chapter 1 available as free preview
Usually caused by a bloated CLAUDE.md (200+ lines), multiple MCP servers running simultaneously, or subagents spawning in loops. Start with Step 2 (CLAUDE.md optimization) — it's the single highest-impact fix. Issue #42796 (1,700+ reactions) tracks this problem.
That matches community reports. With optimization, you can extend usable days to 20–25 by reducing per-session token consumption. The techniques in Step 2–5 above apply to Pro plan users as well.
No. Hooks execute locally in 10–50ms. The token savings (thousands to tens of thousands per session) far outweigh the negligible execution time.
Yes. Opus 4.7's new tokenizer uses more tokens for the same input. Combined with increased thinking tokens and output length, costs can increase 2–4x. See Section 6 for mitigation steps.
The CLAUDE.md optimization and context management principles apply to any Claude usage. The hooks are specific to Claude Code CLI/desktop.