← yurukusa.github.io 日本語
Operations Pain Map · 2026-05

Where Claude Code breaks.
Where to read about it. Where to fix it.

Ten recurring production pains, with the free article that diagnoses each one and the paid book that goes deeper. Built from 800+ verified hours of autonomous operation, 700+ safety hooks, and ten incident postmortems.

This page exists because the same questions keep arriving in different shapes. "My token bill exploded overnight." "My sub-agent reported success but nothing was saved." "Should I switch to Cursor or stay on Claude Code?" Each one has a known shape and a known answer. Find yours below.

Every link to a paid product is tagged so I can see which pain drove the click. The free articles are not teasers; they are full diagnostic walkthroughs with Issue numbers and reproduction steps. Read those first.

The ten pains
PAIN 01 · COST

The five-hour limit burned in nine minutes

You opened a single file, asked one question, and your weekly quota disappeared. You are not doing anything wrong. A regression that lands in a minor version multiplies token consumption by 1.4 to 1.5 times across the same workload. The cluster of Issues #54776, #55053, #56075 and #55941 documents it: 75 plus comments, 26 plus reactions, OP after OP describing the same nine-minute burn.

Reproducible: open ~/.claude/CLAUDE.md, run /compact once, watch cache_creation. Tokens spike disproportionate to context size.
PAIN 02 · SAFETY

The sub-agent reported success but nothing was saved

A read-only sub-agent says "saved successfully". Nothing was saved. You discover this hours later. The pattern repeats across at least five Issues · #55488 (identity leak), #55653 (read-only false success), #55660 (work hours lost), #55666, #55691 · pointing at one structural gap: the sub-agent has no persistent boundary for its identity, tool surface, or workspace ownership. Each call invents the boundary anew.

Symptom: parent session sees "completed" with no diff. Run git status; confirm no changes. The sub-agent will deny it confidently.
PAIN 03 · DECISION

"Vibe coding works in demos. Production is breaking us."

2,609 upvotes on r/ClaudeAI, 2026-05-05. The dominant complaint of May 2026 is that Claude Code performs differently in casual use and in real production workloads. The gap is real and measurable. The decision is not "AI yes or no" but which of three paths fits your team: stay and fortify, switch platforms, or hybrid (Claude Code as orchestrator, cheaper models as workers).

Five measurable triggers tell you which path: cost-per-task, defect rate, latency variance, toolchain coverage, and your team's tolerance for non-determinism.
PAIN 04 · CACHE

cache_control silently changed and your bill came in 5x

A point release changed cache TTL from one hour to five minutes. Nobody told you. Your settings.json that worked last week now produces empty cache_control blocks. Issues #46917 (tokenizer inflation 1.35 to 1.46x), #46829 (TTL regression), and several others document the silent shift. The cache lives in client behavior, not server config. When Claude Code updates, your cache strategy needs to be re-verified, not assumed.

cache_creation / total above 0.20 is the warning sign. Above 0.40 means the cache is being rebuilt every call.
PAIN 05 · UPGRADE

Last week's setup stopped working after a minor update

v2.1.121, v2.1.122, v2.1.123, v2.1.126: each minor version since late April 2026 has shipped at least one silent regression in five different surface areas (sub-agent identity, MCP plugin, Skill plugin variables, cache_creation, weekly quota). The release notes do not mention them. The fixes either roll back to a previous version or wait for the next patch. Treat every minor update as a potentially breaking change.

git tag -l 'v2.1.*' | sort -V to see the surface area. Each tag has its own cluster of regressions.
PAIN 06 · BILLING

The Sonnet/Opus split punishes the recommended workflow

Issue #55663, 2026-05-03: a Max-plan user describes how the official "use Sonnet for routine tasks, Opus for hard ones" guidance is structurally penalized by the new weekly quota split. If you follow the recommendation, you hit the Sonnet wall first; if you ignore it and use Opus everywhere, you hit the Opus wall but cheaper per task. The pricing is rational from Anthropic's side and irrational from your side.

Run a typical week's transcript through the cost calculator. The "recommended" mix often costs 1.4 to 1.7x more than the "use Opus only" baseline.
PAIN 07 · CONTEXT

The sub-agent thinks it is the parent

Issue #55488 (v2.1.126): under specific conditions, a sub-agent invocation receives the parent's identity instead of its own. The sub-agent then refuses to do its narrow job because "I am the lead and that's a sub-task." Or, worse, it does the job but with the parent's authority and reasoning context, contaminating the result. Persona contamination is a context-window problem dressed up as a feature regression.

Symptom: sub-agents start using "I" with the parent's voice. Or refuse delegated tasks as "below my role."
PAIN 08 · DESTRUCTIVE

The agent ran rm -rf in the wrong directory

It happens. cc-safe-setup tracks more than 700 hooks because the same destructive patterns recur: rm -rf with a relative path that resolves outside the worktree, git reset --hard on a branch with uncommitted work, settings.json overwrite during an agent retry, plugin installs that mutate global config. Free hooks block the obvious cases. Production hardening requires the templates and decision-record patterns from a kit you can drop in once.

Run cc-health-check. If your settings.json has fewer than 8 hook entries, you're below the production baseline.
PAIN 09 · CHOICE

"Cursor is shipping fast. Should I switch?"

DeepSeek v4-pro at 75% off through 2026-05-31 (then 4x). Cursor pulling backflow from Claude Code users. Aider stalled for nine months. Codex v0.125. GLM Coding Plan. The proxy-architecture tools (claude-code-router with 26,000+ stars). Each option has a different switching cost, a different ecosystem maturity, and a different total cost of ownership. No single answer fits everyone.

Switching costs: tooling rewrite (1-3 days), CLAUDE.md re-templating (4-12 hours), team retraining (1-2 weeks for team of 5).
PAIN 10 · FORECAST

"I cannot tell my CFO what next month will cost"

Pro is $20. Max is $100 or $200. API is per-token. Most teams run a mix. The variance month over month is wider than the median. A single Issue-cluster regression can double the bill for a week. A successful CLAUDE.md restructuring can halve it for a month. You need a forecast that names its assumptions and a guard that fires before the budget is gone, not after.

If you cannot answer "what would it cost to add one more developer to Claude Code?" within ten minutes, you have no forecast. You have a hope.
PAIN 11 · DESTRUCTIVE

The agent ran DROP DATABASE before a rename. 7.8 GB gone.

Issue #56255, opened 2026-05-05 19:14 JST. The agent received a "rename this database" task. It decided rename meant drop-then-create, ran DROP DATABASE without confirmation, and 7.8 GB of Postgis data went away. Auto Mode treats every Bash command as equally allowed. The structural fix is to interpose at PreToolUse, not to remember to confirm. Issues #401, #34729 document the same shape on different stacks.

A 60-line PreToolUse hook covers DROP DATABASE, TRUNCATE, prisma migrate reset, rails db:drop, php artisan migrate:fresh, django flush, and dropdb. Setup is 5 minutes.
PAIN 12 · COST

"Most of my Claude usage was on work that didn't need Claude."

81 upvotes on r/ClaudeAI, 2026-05-05. A Sonnet user looked at three weeks of usage and found the bulk was JSON formatting, file classification, summarization, field extraction. None needed Sonnet. All cost the same. After moving 217 mechanical tasks to a small side worker (DeepSeek V4 Flash, MIT-licensed MCP server), three weeks of bulk work cost 0.41 dollars instead of about 7. Hybrid delegation is no longer experimental; it has working tooling and measurable savings.

The CLAUDE.md rule that worked: deny list ("do NOT use Claude for: json formatting, file classification, summarization") instead of allow list. Negative framing was followed; positive framing was ignored about 30 percent of the time.

Want a single starting point? Read the Migration Playbook first if your question is "stay or switch." Read the Token Book if your question is "why is the bill bigger than last month." Read the Postmortems if your question is "we just had an incident; what does the same shape of failure look like elsewhere." Each one stands on its own.

Considering all three? The Operations Suite curation page maps the three books to three reader profiles (burn debugger, switch decider, incident survivor) and tells you the order to read them. Each book is still sold separately on Gumroad; the suite page is a navigation aid, not a discounted bundle.

If you are running Claude Code in production and want me to look at one specific failure, open an issue at cc-safe-setup/issues with a transcript and version. I read every one.