Claude Code Token Optimization Guide

Cut your token consumption in half. Based on 800 hours of real operation data.

$13

avg. daily cost per developer
(official data)

$150–250

monthly cost per user

faster quota burn
on Opus 4.7

Where your tokens actually go
CLAUDE.md optimization (highest impact)
Hook-based token guards
Model selection strategy
Context management
Opus 4.7 survival tips
FAQ

1. Where Your Tokens Actually Go

After 800 hours of autonomous Claude Code operation, here's the measured breakdown of where tokens are consumed:

Source	% of Total	Optimization Potential
CLAUDE.md / instructions	15–30%	200 lines → 35 lines = up to 50% reduction
File reads	25–30%	read-budget-guard prevents redundant reads
Code generation (output)	20–25%	Model selection + effort level
Tool schemas / MCP servers	12–20%	Disable unused MCP servers
Conversation history / compaction	10–25%	/clear and /compact management

Key finding: CLAUDE.md is loaded into context on every single turn. A 100-line CLAUDE.md in a 30-turn session consumes ~75,000 tokens just from instructions. A 35-line version achieves the same results.

2. CLAUDE.md Optimization (Highest Impact)

Your CLAUDE.md is the single biggest lever for token savings. It's included in every API call, so every line costs you tokens on every turn.

5 Optimization Patterns

Allowlist pattern — Instead of listing 10 things to avoid, state what is allowed. "Only modify files in /src/" is cheaper than listing 10 forbidden directories.
One example per rule — Three examples don't help more than one precise example. Cut the extras.
One-line reasons — Write "why" in one line. Move detailed explanations to Skills files.
Table format — Tables are more token-efficient than bullet lists for structured data like architecture or conventions.
Delegate to hooks — Rules you want strictly enforced belong in hooks, not CLAUDE.md. Hooks are free (they run locally), CLAUDE.md costs tokens every turn.

Before & After

# BEFORE: 200+ lines (costs ~5,000 tokens/turn)
## Project Rules
- Do not modify files in /config/
- Do not modify files in /migrations/
- Do not modify files in /.github/
- Do not delete any files without asking
- Do not use rm -rf
- Do not force-push
- Do not commit directly to main
- Always run tests before committing
- Use TypeScript strict mode
... (190 more lines of rules, examples, and explanations)

# AFTER: 35 lines (costs ~800 tokens/turn)
# my-app

## Rules
- Only modify files in /src/ and /tests/ (hook enforced)
- Test before commit (hook enforced)
- TypeScript strict mode

## Architecture
| Layer | Tech | Path |
|-------|------|------|
| API | Express + Zod | /src/api/ |
| DB | Prisma + Postgres | /prisma/ |
| Auth | JWT + bcrypt | /src/auth/ |

## Conventions
- Files: kebab-case
- Functions: camelCase
- One export per file

Result: Same behavior enforcement, 84% fewer tokens per turn. Over a 30-turn session, this saves ~126,000 tokens.

Analyze your current CLAUDE.md: CLAUDE.md Analyzer (free tool)

3. Hook-Based Token Guards

Hooks are shell scripts that run before or after Claude Code's tool calls. They execute locally (zero token cost) and can prevent token waste automatically.

# Install 700+ safety and optimization hooks in 10 seconds
npx cc-safe-setup

Token-Saving Hooks

Hook	What It Does	Token Savings
`read-budget-guard`	Limits file read count per session. Prevents Claude from re-reading the same file 5 times.	10–25% reduction
`token-budget-guard`	Sets a session token budget. Warns at 70%, blocks at 90%.	Prevents runaway sessions
`pre-compact-checkpoint`	Auto-creates a git checkpoint before compaction. Prevents hallucination-induced rework.	Saves entire redo sessions
`context-monitor`	Warns at 75% context usage, alerts at 90%. Prompts you to /clear or /compact.	5–15% by preventing overflow
`subagent-spawn-limiter`	Limits concurrent subagent spawns. Each subagent has its own context window.	20–40% on agent-heavy workflows
`large-read-guard`	Blocks reads of files over a size threshold. Forces targeted reads with offset/limit.	10–30% on large codebases

Example: read-budget-guard alone saved us 18% of tokens per session by catching Claude re-reading the same configuration file on every turn.

Quick Setup

# Option 1: Full safety + token optimization suite
npx cc-safe-setup

# Option 2: Token guards only
npx cc-safe-setup
# Then use the Hook Selector to pick only token-related hooks

Choose exactly which hooks you need: Hook Selector (interactive)

4. Model Selection Strategy

Not every task needs Opus. Using the right model per task can cut costs by 60–80%.

Task Type	Recommended Model	Why
Routine coding, bug fixes	Sonnet 4.6	1/5 the cost of Opus. Handles 80% of tasks equally well.
Complex architecture decisions	Opus 4.7	Better reasoning, but 5x the cost. Use only when needed.
Subagent tasks	Haiku 4.5	Simple search/read tasks don't need Opus-level reasoning.
Code review	Sonnet 4.6	Pattern matching is Sonnet's strength.

Switch models mid-session with /model. No need to restart.

Pro tip: On the Max plan, switching to Sonnet for simple tasks doesn't save money (it's a flat fee), but it does reduce token consumption, which means your daily allowance lasts longer.

5. Context Management

Context is the most expensive resource in Claude Code. Every message accumulates in the context window, and you pay for all of it on every turn.

Key Practices

/clear between tasks — When you switch from one feature to another, clear the context. Old context is dead weight that costs tokens on every subsequent turn.
/compact for long sessions — Compresses conversation history. You can add custom instructions: /compact keep only code changes and test results
Split sessions every 2–3 hours — After 2–3 hours, compaction quality degrades. Start a fresh session.
Disable unused MCP servers — Each MCP server adds its tool schemas to every turn. Check with /mcp and disable what you don't need.
Use targeted file reads — Read file.py lines 50-80 instead of reading entire files. The large-read-guard hook enforces this.

6. Opus 4.7 Survival Tips

Opus 4.7 (released April 16, 2026) changed the token economics significantly:

4x faster quota burn — The new tokenizer uses more tokens for the same text. Thinking tokens also increased.
Safety classifier issues — The April 16 launch introduced auto-mode safety classifier bugs, leading to 20+ data loss incidents in 3 days. (#49302, #50027)
cache_read billing anomaly — cache_read_input_tokens appearing with no prior cache_creation, inflating costs (#49302)

Immediate Actions for Opus 4.7

Lower your token-budget-guard threshold to 70% of your previous setting
Use /model sonnet for routine tasks — Opus 4.7 is expensive for simple work
Enable pre-compact-checkpoint — Critical with Opus 4.7's increased hallucination rate under pressure
Monitor with /cost — Check your actual spending per session

Full Opus 4.7 issue tracker: Opus 4.7 Survival Guide (66 sections, 80+ tracked issues)

Get the Complete Guide

This page covers the basics. The full Token Book includes copy-paste templates, hook configurations, before/after data for every technique, and the complete Opus 4.7 chapter.

Token Book — 10 chapters, 44K words

Introduction + Chapter 1 available as free preview

Free: Token Checkup (5 questions) Free: Cost Calculator Free: CLAUDE.md Analyzer

7. FAQ

Q: My Max Plan ($200/month) runs out in 15 minutes. What's wrong?

Usually caused by a bloated CLAUDE.md (200+ lines), multiple MCP servers running simultaneously, or subagents spawning in loops. Start with Step 2 (CLAUDE.md optimization) — it's the single highest-impact fix. Issue #42796 (1,700+ reactions) tracks this problem.

Q: Pro Plan ($20/month) — can I only use Claude Code 12 out of 30 days?

That matches community reports. With optimization, you can extend usable days to 20–25 by reducing per-session token consumption. The techniques in Step 2–5 above apply to Pro plan users as well.

Q: Do hooks slow down my sessions?

No. Hooks execute locally in 10–50ms. The token savings (thousands to tens of thousands per session) far outweigh the negligible execution time.

Q: I upgraded to Opus 4.7 and my costs doubled. Is this expected?

Yes. Opus 4.7's new tokenizer uses more tokens for the same input. Combined with increased thinking tokens and output length, costs can increase 2–4x. See Section 6 for mitigation steps.

Q: Can I use these techniques with the API (not Claude Code)?

The CLAUDE.md optimization and context management principles apply to any Claude usage. The hooks are specific to Claude Code CLI/desktop.

Related Resources

📘 Token Book (¥2,500) — Full 10-chapter guide with copy-paste templates
🔍 Token Checkup — Free 30-second diagnosis
📊 Cache Health Checker — Paste /cost output for instant analysis
🇯🇵 日本語版: トークン消費を減らす5つの方法 — Zenn記事
📝 Qiita: トークン消費を減らす5つの方法 — Opus 4.7対応