Claude Code Extended Thinking Consumed 16 Million Tokens in 25 Minutes

April 20, 2026 · 3 min read · by yurukusa

TL;DR: Extended thinking can silently consume millions of tokens with no output. Install thinking-stall-detector.sh to get warned when thinking exceeds 5 minutes. Takes 30 seconds.

What Happened

A user on Sonnet 4.6 asked Claude Code to work on a task. The model entered its extended thinking phase — and never came out.

25 minutes later, 16 million tokens were consumed. No useful output was produced. The user's entire quota was gone. They requested a refund. (#51092)

This is not an isolated incident. A related pattern (#49884) shows Opus 4.7 reading a single file, then freezing for 30+ minutes before the API terminates the request. No output, but tokens consumed.

Why This Happens

Extended thinking has no built-in token ceiling. When the model enters a reasoning loop without finding a resolution:

The model generates thinking tokens internally
No tools are called, so no hooks fire
From the outside, the session looks like it's "working"
Tokens are consumed at full rate
This continues until the context window limit is hit or the API times out

The critical gap: during thinking, there's no way for hooks to intervene. The thinking happens between tool calls, not during them.

The Fix: Thinking Stall Detector

30-second install:

npx cc-safe-setup --install-example thinking-stall-detector

This hook can't stop thinking mid-stream (nothing can). But it detects the aftermath: if more than 5 minutes pass between consecutive tool calls, it warns you that a thinking stall occurred. This gives you a signal to:

Check /cost immediately
Restart the session if tokens spiked
Avoid the same pattern that triggered the stall

Manual Installation

Save as ~/.claude/hooks/thinking-stall-detector.sh:

#!/bin/bash
INPUT=$(cat)
TOOL=$(echo "$INPUT" | jq -r '.tool_name // empty' 2>/dev/null)
STATE_FILE="/tmp/cc-thinking-stall-last-call"
WARN_SECS="${CC_STALL_WARN_SECS:-300}"
NOW=$(date +%s)
LAST=$(cat "$STATE_FILE" 2>/dev/null || echo "$NOW")
echo "$NOW" > "$STATE_FILE"
GAP=$((NOW - LAST))
if [ "$GAP" -ge "$WARN_SECS" ]; then
    MINUTES=$((GAP / 60))
    echo "⚠️ Thinking stall: ${MINUTES}m with no tool activity." >&2
    echo "Check /cost. If tokens spiked, restart the session." >&2
fi
exit 0

Add to ~/.claude/settings.json:

{
  "hooks": {
    "PreToolUse": [{
      "matcher": "",
      "hooks": [{"type": "command",
        "command": "bash ~/.claude/hooks/thinking-stall-detector.sh"}]
    }]
  }
}

For API Users

If you're using the Claude API directly, you have more control:

Set max_tokens to cap total output (including thinking)
Set thinking budget limits via the API's thinking parameter
Implement client-side timeouts that abort requests after N minutes

For Max/Pro plan users in Claude Code, these API-level controls aren't available — the hook-based detector is your best defense.

Prevention Checklist

☐ Install thinking-stall-detector.sh
☐ Check /cost every 10-15 minutes during active work
☐ Never leave extended thinking sessions unattended
☐ If output seems frozen for 5+ minutes, press Ctrl+C

40 token drain symptoms diagnosed

The extended thinking drain is Symptom 40 of 40 documented in the Token Book. Each symptom includes root cause, hook-based fix, and prevention steps.

Token Book — $17 Details + Free Chapters Free Token Checkup (30 sec)