Claude Code Token Optimization Guide

Cut your token consumption in half. Based on 800 hours of real operation data.

$13
avg. daily cost per developer
(official data)
$150–250
monthly cost per user
4x
faster quota burn
on Opus 4.7

Table of Contents

  1. Where your tokens actually go
  2. CLAUDE.md optimization (highest impact)
  3. Hook-based token guards
  4. Model selection strategy
  5. Context management
  6. Opus 4.7 survival tips
  7. FAQ

1. Where Your Tokens Actually Go

After 800 hours of autonomous Claude Code operation, here's the measured breakdown of where tokens are consumed:

Source% of TotalOptimization Potential
CLAUDE.md / instructions15–30%200 lines → 35 lines = up to 50% reduction
File reads25–30%read-budget-guard prevents redundant reads
Code generation (output)20–25%Model selection + effort level
Tool schemas / MCP servers12–20%Disable unused MCP servers
Conversation history / compaction10–25%/clear and /compact management
Key finding: CLAUDE.md is loaded into context on every single turn. A 100-line CLAUDE.md in a 30-turn session consumes ~75,000 tokens just from instructions. A 35-line version achieves the same results.

2. CLAUDE.md Optimization (Highest Impact)

Your CLAUDE.md is the single biggest lever for token savings. It's included in every API call, so every line costs you tokens on every turn.

5 Optimization Patterns

  1. Allowlist pattern — Instead of listing 10 things to avoid, state what is allowed. "Only modify files in /src/" is cheaper than listing 10 forbidden directories.
  2. One example per rule — Three examples don't help more than one precise example. Cut the extras.
  3. One-line reasons — Write "why" in one line. Move detailed explanations to Skills files.
  4. Table format — Tables are more token-efficient than bullet lists for structured data like architecture or conventions.
  5. Delegate to hooks — Rules you want strictly enforced belong in hooks, not CLAUDE.md. Hooks are free (they run locally), CLAUDE.md costs tokens every turn.

Before & After

# BEFORE: 200+ lines (costs ~5,000 tokens/turn)
## Project Rules
- Do not modify files in /config/
- Do not modify files in /migrations/
- Do not modify files in /.github/
- Do not delete any files without asking
- Do not use rm -rf
- Do not force-push
- Do not commit directly to main
- Always run tests before committing
- Use TypeScript strict mode
... (190 more lines of rules, examples, and explanations)
# AFTER: 35 lines (costs ~800 tokens/turn)
# my-app

## Rules
- Only modify files in /src/ and /tests/ (hook enforced)
- Test before commit (hook enforced)
- TypeScript strict mode

## Architecture
| Layer | Tech | Path |
|-------|------|------|
| API | Express + Zod | /src/api/ |
| DB | Prisma + Postgres | /prisma/ |
| Auth | JWT + bcrypt | /src/auth/ |

## Conventions
- Files: kebab-case
- Functions: camelCase
- One export per file

Result: Same behavior enforcement, 84% fewer tokens per turn. Over a 30-turn session, this saves ~126,000 tokens.

Analyze your current CLAUDE.md: CLAUDE.md Analyzer (free tool)

3. Hook-Based Token Guards

Hooks are shell scripts that run before or after Claude Code's tool calls. They execute locally (zero token cost) and can prevent token waste automatically.

# Install 691+ safety and optimization hooks in 10 seconds
npx cc-safe-setup

Token-Saving Hooks

HookWhat It DoesToken Savings
read-budget-guardLimits file read count per session. Prevents Claude from re-reading the same file 5 times.10–25% reduction
token-budget-guardSets a session token budget. Warns at 70%, blocks at 90%.Prevents runaway sessions
pre-compact-checkpointAuto-creates a git checkpoint before compaction. Prevents hallucination-induced rework.Saves entire redo sessions
context-monitorWarns at 75% context usage, alerts at 90%. Prompts you to /clear or /compact.5–15% by preventing overflow
subagent-spawn-limiterLimits concurrent subagent spawns. Each subagent has its own context window.20–40% on agent-heavy workflows
large-read-guardBlocks reads of files over a size threshold. Forces targeted reads with offset/limit.10–30% on large codebases
Example: read-budget-guard alone saved us 18% of tokens per session by catching Claude re-reading the same configuration file on every turn.

Quick Setup

# Option 1: Full safety + token optimization suite
npx cc-safe-setup

# Option 2: Token guards only
npx cc-safe-setup
# Then use the Hook Selector to pick only token-related hooks

Choose exactly which hooks you need: Hook Selector (interactive)

4. Model Selection Strategy

Not every task needs Opus. Using the right model per task can cut costs by 60–80%.

Task TypeRecommended ModelWhy
Routine coding, bug fixesSonnet 4.61/5 the cost of Opus. Handles 80% of tasks equally well.
Complex architecture decisionsOpus 4.7Better reasoning, but 5x the cost. Use only when needed.
Subagent tasksHaiku 4.5Simple search/read tasks don't need Opus-level reasoning.
Code reviewSonnet 4.6Pattern matching is Sonnet's strength.

Switch models mid-session with /model. No need to restart.

Pro tip: On the Max plan, switching to Sonnet for simple tasks doesn't save money (it's a flat fee), but it does reduce token consumption, which means your daily allowance lasts longer.

5. Context Management

Context is the most expensive resource in Claude Code. Every message accumulates in the context window, and you pay for all of it on every turn.

Key Practices

6. Opus 4.7 Survival Tips

Opus 4.7 (released April 16, 2026) changed the token economics significantly:

Immediate Actions for Opus 4.7

  1. Lower your token-budget-guard threshold to 70% of your previous setting
  2. Use /model sonnet for routine tasks — Opus 4.7 is expensive for simple work
  3. Enable pre-compact-checkpoint — Critical with Opus 4.7's increased hallucination rate under pressure
  4. Monitor with /cost — Check your actual spending per session

Full Opus 4.7 issue tracker: Opus 4.7 Survival Guide (17 sections, 28 tracked issues)

Get the Complete Guide

This page covers the basics. The full Token Book includes copy-paste templates, hook configurations, before/after data for every technique, and the complete Opus 4.7 chapter.

Token Book — 10 chapters, 44K words

Introduction + Chapter 1 available as free preview

Free: Token Checkup (5 questions) Free: Cost Calculator Free: CLAUDE.md Analyzer

7. FAQ

Q: My Max Plan ($200/month) runs out in 15 minutes. What's wrong?

Usually caused by a bloated CLAUDE.md (200+ lines), multiple MCP servers running simultaneously, or subagents spawning in loops. Start with Step 2 (CLAUDE.md optimization) — it's the single highest-impact fix. Issue #42796 (1,700+ reactions) tracks this problem.

Q: Pro Plan ($20/month) — can I only use Claude Code 12 out of 30 days?

That matches community reports. With optimization, you can extend usable days to 20–25 by reducing per-session token consumption. The techniques in Step 2–5 above apply to Pro plan users as well.

Q: Do hooks slow down my sessions?

No. Hooks execute locally in 10–50ms. The token savings (thousands to tens of thousands per session) far outweigh the negligible execution time.

Q: I upgraded to Opus 4.7 and my costs doubled. Is this expected?

Yes. Opus 4.7's new tokenizer uses more tokens for the same input. Combined with increased thinking tokens and output length, costs can increase 2–4x. See Section 6 for mitigation steps.

Q: Can I use these techniques with the API (not Claude Code)?

The CLAUDE.md optimization and context management principles apply to any Claude usage. The hooks are specific to Claude Code CLI/desktop.