Claude Code Failure-Mode Cluster Tracker

Public registry of structural failure clusters in Claude Code, with first-detected dates, user reaction counts, operator-side defense hooks shipped, and upstream status. Updated as clusters evolve.

A cluster here means a group of independent user-filed issues that share an architectural root cause but surface as different symptoms. Tracking clusters (rather than individual issues) is what CC Safety Lab does monthly — this page is the public registry of what's currently being tracked, with evidence links so you can verify the cluster framing against the underlying issues yourself. Last updated: 2026-05-27.

The seven clusters being tracked (May 2026)

Cluster Issues Reactions First detected Hooks shipped Upstream
Sub-Agent Observability (SOH) 8 ~800 2026-05-20 4 (PRs #282, #283, #286, #298) Unresolved
Multi-account session management 3 1,178 2025-09 (#18435) 2 (PR #328: routing-preflight, billing-log) Unresolved (8+ months)
AGENTS.md interop 2+ 5,405 2025-04-21 (#6235) 0 (operator-side route exists) Unresolved (13+ months)
Pro Max quota anomaly 10 ~2,200 2026-01-03 (#16157) 2 (PRs #340, #342: drift/version detectors) Unresolved (5+ months)
TUI / Terminal UX 6 ~2,106 2025-04-12 (#769) 0 (operator-side terminal alternatives only) Unresolved (13+ months, area:tui label acknowledged)
Permission matching boundary 25+ (30+ via meta-issue) ~804 2025-08 (#5140) 1 partial (subagent-permission-mode-guard.sh) Unresolved (9+ months, meta-issue #30519 with no staff engagement)
Skills metadata and loading 10+ (area:skills) + 20+ (area:agent-view related) ~100+ (new, growth signal) 2026-05 (label rollout 2026-05-17) 1 (PR #357: skills-settings-validator) Unresolved (no staff engagement since label rollout)
Server-side prompt injection (v2.1.150+) 42+ (co-occurring regression bundle) + #62061 (core) 46+ (#62061 anchor) 2026-05-24 (#62061) 1 (PR #383: server-side-prompt-injection-detector) Acknowledged intentional (Anthropic same-day comment), 2 opt-out env vars provided, audit trail gap remains
Usage Policy classifier over-trigger (AUP) 25+ (filed 2026-05-18 to 2026-05-27) ~40 (wide-and-thin reaction shape) 2026-05-18 (#60366) 1 (PR #388, #389: aup-false-positive-helper) Unresolved (no Anthropic comment, github-actions[bot] auto-grouped 3 duplicate chains)
GrowthBook A/B flag client-side overrides 58 (2-week surge, filed 2026-05-14 onward) ~30 (root-cause analysis in #62205) 2026-05-25 (#62205) 3 shipped advisory hooks (PR #402, #413, plus existing permission-mode-drift-guard); all observational, not preventive Unresolved (5 documented override paths; no Anthropic acknowledgment of the broader pattern)
Cowork sandbox / Desktop remote-control failure surface 195 (2-week window, filed 2026-05-14 onward, ~17/day pace) 0-1 per issue (wide-and-thin; volume-driven cluster) 2026-05-28 (cluster framing date) 1 standalone script (cowork-claudemd-helper.sh, PR #403) + 3 shipped CLI-side hooks (PRs #409, #410, #411) Unresolved (4 sub-clusters: filesystem/mount/path, platform/binary mismatch, subscription/access boundary, infrastructure incident)
Tool Call Parsing failures in Opus 4.7 5+ (filed 2026-04-17 to 2026-05-27; central case #62123 has 21 reactions) 28+ (cumulative across 5 filings; 21 on #62123 alone) 2026-05-25 (#62123) 4 (PR #406: long-session-malformed-tool-call-detector for sub-pattern 12A; PR #419: extended-thinking-tool-use-mismatch-detector for 12B; PR #423: spurious-malformed-notice-detector for 12C; PR #424: xml-format-leak-detector for 12D) Unresolved (4 root-cause hypotheses: in-context few-shot poisoning, extended-thinking serialization defect, spurious malformed notice, legacy XML format mix)

Combined: ~11,820 user reactions across 400+ issues, all twelve clusters currently open as of 2026-05-28. The combined volume exceeds the top-10 most-reacted issues in the entire repository.

Cluster 1: Sub-Agent Observability (SOH)

Subagent silent-failure cluster — 72-hour convergence window ACTIVE
Issues: 8 Reactions: ~800 (combined) First filed: 2026-05-20 Cluster framing: 2026-05-23

Between 2026-05-20 21:48 UTC and 2026-05-22 09:05 UTC, six independent users filed issues describing the same architectural gap from different angles. A seventh case landed three hours after the Claim-Verify Handbook launch on 2026-05-22 evening. An eighth case (parallel-Bash 14-hour silent gap) was filed 2026-05-25. All cases converge on four distinct sub-patterns of subagent failure, all rooted in the same surface-level observability gap.

The 7+1 cases

IssueReporterSub-patternOne-line summary
#60987MarkAWardsilent stallpty-less spawn → subprocess dies → parent reports "spawned successfully"
#61102Awis13scope expansionsubagent recommends, parent treats recommendation as authorization (~120GB deletion)
#61107nvst18dispatch fabricationstructurally correct code generated, validated input silently discarded in dead branch
#61167nvst18dispatch fabricationOpenClaw deploy: "39 agents dispatched" narrated, session log shows 5 dispatched and 0 aggregated
#61315mitseleksilent stallMCP permission gate stops subagent indefinitely, no signal to parent UI
#61405meefsmissing observation/control12-hour subagent hang, no timeout / progress / abort primitive at Agent-tool surface
#61547alanrezendeeesilent stallsubagent idle at entry-tool-dispatch gate (since confirmed: bypassPermissions not propagated from parent)
#62161(independent)missing observation/controlparallel-Bash 14-hour silent gap (8th case, 2026-05-25)

Operator-side defenses shipped

Four MIT defense hooks in cc-safe-setup, one per sub-pattern:

Reading material

English Chapter 1 preview (Sub-Agent Observability Handbook) · meta-analysis Gist · nested-spawn cluster Gist · issue #61993 (4-architecture convergence discussion).

Upstream status

No upstream fix as of 2026-05-26. Issue #62153 tracks the IPC positive-path work. The four-architecture convergence (contract-vs-runtime, file-based handoff.md, hook-emitted receipts, separate-process dispatch) suggests the upstream fix surface is exposing ephemeral spawn primitives in nested contexts with a depth limit.

Cluster 2: Multi-account session management

Three-surface multi-account primitive absence ACTIVE (8+ months)
Issues: 3 Reactions: 1,178 (combined) First filed: 2025-09 (#18435) Cluster framing: 2026-05-26

Three independent users on three different surfaces (desktop, web, mobile) filed separate issues describing the same primitive absence — Claude Code provides no way to manage multiple accounts simultaneously. Surface symptoms differ; the architectural gap is identical.

The cases

IssueReporterSurfaceReactions
#18435Agentic-MarketerDesktop app542
#27302nathanmargaglioWeb app (Connectors)327
#36151CorneAussemsMobile app309

Operator-side defenses shipped

PR #328 — two hooks targeting the multi-account boundary:

Reading material

English operator field guide (5 alternative routes) · Japanese operator field guide · 7-question persona-based self-audit (interactive HTML).

Competitive landscape

Independent OSS tools targeting this gap total ~348 stars across 8+ repos (e.g., tickernelz/opencode-kiro-auth 134★, andyvandaric/opencode-ag-auth 68★, quinnjr/claude-code-profiles 38★, KarpelesLab/teamclaude 27★). Market is saturated for direct switching tools; cc-safe-setup hooks target the adjacent reconciliation/safety surface where existing tools are weak.

Upstream status

No upstream fix. #18435 has been open 8+ months. Will be the core theme of CC Safety Lab's June 2026 issue (ships 6/1, edit 5/30, proof 5/31).

Cluster 3: AGENTS.md interop

Largest single feature request in Claude Code's tracker ACTIVE (13+ months)
Issues: 2 primary + 173 mentions Reactions: 5,405 (combined) First filed: 2025-04-21 (#6235) Cluster framing: 2026-05-26

Claude Code uses its own CLAUDE.md instruction-file format; the rest of the agent ecosystem (Codex, Cursor, Amp, Aider) is converging on AGENTS.md as a shared standard. Users who run multiple agent tools must maintain duplicate instruction files. #6235 is the largest single feature request in the Claude Code repository — 5,185 reactions over 13+ months without an official position.

The cases

IssueReactionsStatusNotes
#62355,185openFiled 2025-04-21, the largest feature request in the tracker
#31005220openFiled 2026-03, follow-up requesting AGENTS.md + .agents/skills/
173 additional issuesmention AGENTS.md in body or comments

Operator-side defenses

Two cc-safe-setup hooks shipped covering the two actionable moments: agents-md-sync-checker (SessionStart, PR #377) detects drift at the start of a session and surfaces the candidate-path enumeration; agents-md-edit-drift-warner (PostToolUse on Edit / Write / MultiEdit, PR #420) catches the drift at the actual edit moment — when the operator is mid-flow on the change and the diff is still in head, the cheapest correction point. The two hooks compose: the edit-time warner is the in-the-loop signal, the SessionStart checker is the backstop for drift the operator deferred. Five operator-side routes remain documented for the unhooked angles: symlink (ln -s CLAUDE.md AGENTS.md), pre-commit hook sync, direnv environment variable, CI sync verification, and runtime-mirror via a third companion hook. English field guide (3,500 words). 6-question interactive self-audit.

Competitive landscape

Direct sync tools total ~400 stars: agent-sh/agnix (258★, active development as of 2026-05-24, validation tool), iannuttall/source-agents (125★, sync tool), intellectronica/claude-agentsmd (17★, Claude Code-specific). Adjacent context: ciembor/agent-rules-books (1,593★), wshobson/agents (35,933★).

Upstream status

No official statement. Will be the core theme of CC Safety Lab's July 2026 issue (early draft already at ~20,000 characters as of 2026-05-26).

Cluster 4: Pro Max quota anomaly

Five-month quota-consumption divergence cluster ACTIVE (5+ months)
Issues: 10 Reactions: ~2,200 (combined) First filed: 2026-01-03 (#16157) Cluster framing: 2026-05-26

Ten independent issues over five months describing the same family of symptoms: Pro Max plan users hitting quota limits abnormally fast, with multiple time-window boundary signals (2026-03-23, v2.1.89, v2.1.100, v2.1.1). The cluster includes server-side measurable evidence (#46917 with reproducible cache_creation inflation of ~20K tokens).

The core three cases (1,460 combined reactions)

IssueFiledReactionsOne-line summary
#161572026-01-03717Instantly hitting Max-plan usage limits after 3-day non-use
#383352026-03-245255-hour window quota exhausting abnormally fast since 2026-03-23 (specific boundary date)
#469172026-04-12218v2.1.100+ inflates cache_creation by ~20K tokens vs v2.1.98 (server-side, reproducible)

Operator-side defenses shipped

Two cc-safe-setup hooks targeting the cluster:

Reading material

English operator field guide (Pro Max quota anomaly: 5 measurement routes, 2,639 words). Cites ccusage (14,647★, the dominant external measurement tool), raw JSONL inspection, the claude-code-logger proxy, the cc-safe-setup hook, and Anthropic Console comparison.

Upstream status

No upstream fix as of 2026-05-26. The cluster spans 5 months without an official statement on the cache_creation inflation. Candidate theme for CC Safety Lab's August 2026 issue.

Cluster 5: TUI / Terminal UX

Thirteen-month rendering-layer divergence cluster ACTIVE (13+ months)
Issues: 6 Reactions: ~2,106 (combined) First filed: 2025-04-12 (#769) Cluster framing: 2026-05-26

Claude Code's TUI text buffer and rendering layer (built on a custom Ink/React-for-CLI stack) does not integrate cleanly with terminal emulator native behaviors — scroll, redraw, copy buffer, IME composition. Six issues filed across 13 months describe distinct surface symptoms (scroll-to-top, flicker, IME composition, copy/paste indentation) that share one architectural root: the TUI re-renders screen state in a way that competes with the emulator's native scroll buffer and input handling. Five of six issues carry the official area:tui label (and three carry oncall), so Anthropic acknowledges the cluster — but no structural fix has shipped in 13 months.

The six cases

IssueFiledReactionsPlatformOne-line symptom
#8262025-04-19819macOS / iTerm2Console scrolls to top of history when text is added (long sessions)
#7692025-04-12329Windows / UbuntuIn-progress call causes screen flickering
#19132025-06-10316multiTerminal flickering (video repro attached)
#15472025-06-04259macOSIME input causes performance issues + duplicate conversion candidates
#181702026-01-14250multiCopy/paste from terminal includes unwanted indentation + trailing spaces
#365822026-03-20133macOSTerminal scrolls to top when conversation gets long

Operator-side defenses (terminal-level workarounds)

No cc-safe-setup hook ships for this cluster — the failure surface is at the terminal emulator boundary, not at the Claude Code hook surface. Operator-side mitigations documented in the field:

Why this cluster matters operationally (even without direct $$ impact)

This is the only cluster of the seven with no direct revenue or quota impact — the failure mode is friction, not loss. But the cluster partly explains the 1,127 unique CLI users / 30 stars disparity on cc-safe-setup (a 2.7% star rate, well below the 5-10% baseline for actively-used dev tools): users who hit TUI friction tend to disengage from public surfaces (stars, issues, PRs) even when they continue using the underlying tool. The cluster is included here for completeness — it shapes user behavior, but it is not a Safety Lab monthly theme candidate.

Upstream status

Five of six issues acknowledged with area:tui label; three with oncall. No structural fix in 13 months. The TUI rendering layer is a Claude Code architectural choice (custom React-for-CLI stack) and fixing the cluster requires either rebuilding on a different TUI layer or extensive integration work with terminal emulators' scroll/redraw protocols.

Cluster 6: Permission matching boundary

Nine-month permission-rule enforcement gap, 30+ issues, meta-issue with no staff engagement ACTIVE (9+ months)
Issues: 25+ (30+ via meta-issue #30519) Reactions: ~804+ (combined, top 25 area:permissions issues, plus the newly surfaced Axis 8 in #62437) First filed: 2025-08 (#5140) Cluster framing: 2026-05-26

Users configure allow, deny, and ask rules in settings.json expecting them to match the bash commands Claude generates. The matching engine has eight independent failure axes (seven from meta-issue #30519, plus an eighth surfaced 2026-05-26 in #62437) that combine to make wildcards, "Always Allow," scope hierarchy, and PreToolUse hook enforcement unreliable in practice. Meta-issue #30519 (filed 2026-03-03, 71 reactions) articulates the original seven axes with 13 referenced sub-issues and documents zero Anthropic staff engagement across 9 months. A second meta-issue, #39523 (16 reactions), tracks specifically the bypass-mode regression with "9-month trail, 12+ duplicates."

The eight failure axes

  1. Wildcards don't match compound commands. Bash(git:*) doesn't match git add file && git commit -m "msg". The * only spans a single simple command, but Claude generates compound bash constantly.
  2. "Always Allow" saves dead rules. Approving git commit -m "fix typo" saves the verbatim string; next time the message differs, it prompts again. settings.local.json accumulates hundreds of one-off rules.
  3. User-level scope doesn't apply at project level. Rules in ~/.claude/settings.json appear in /permissions output but match nothing; same rules in project-level settings.local.json work.
  4. Quote-tracking bypasses allow list. Commands with quote characters in # comments trigger a safety warning that ignores all allow rules.
  5. Deny rules have the same bugs. Multiline commands and flag-reordering bypass deny rules — so the system isn't just annoying, it's not enforcing the safety constraints users configured.
  6. Colon vs space syntax contradicts. Bash(git:*) and Bash(git *) behave differently; docs disagree on which is correct; "Always Allow" generates one syntax while users configure the other.
  7. Bypass-mode is partially broken. --dangerously-skip-permissions doesn't bypass Edit prompts (#36192), Cowork scheduled tasks ignore "Always allow" (#47180), and the flag itself stopped working entirely after v2.1.77 (#36168).
  8. Session approval suppresses PreToolUse hook deny (2026-05-26). Once a static ask rule like Bash(docker --host:*) is session-approved, subsequent matching commands bypass the PreToolUse hook entirely — the hook's permissionDecision: deny output is never reached. A session-cached approval of a broad pattern silently whitelists destructive subcommands the hook is explicitly denying. New axis surfaced by issue #62437.

Representative cases (top 14 by reactions)

IssueFiledReactionsStateOne-line symptom
#282402026-02-22180openPermission prompt triggers on cd in compound bash, not the actual command
#30519 (meta)2026-03-0371openPermissions matching fundamentally broken — 30+ open issues, no staff engagement
#292142026-02-2571openRemote Control: mobile app shows permission prompts despite --dangerously-skip-permissions
#113802025-1164closedClaude continually asks for permission even after "always allow"
#361682026-0363openBypass/dangerously-skip-permissions broken in all CC versions newer than v2.1.77
#437132026-0450openautoAllowBashIfSandboxed bypassed for commands with shell expansions
#68502025-0845opensettings.local.json allow not working — keeps asking, wants to re-add existing items
#181602026-0141openClaude ignoring allow permissions in global settings.json
#304352026-0339openAllow suppressing bash safety heuristic prompts via settings
#51402025-0833openPermissions from user settings.json not applied at project level
#313732026-0331openShould not encourage $(...) in system prompt — causes prompt spam
#359542026-0326openAdd option to disable "Contains backslash-escaped whitespace" warning
#329852026-0324openAllow configuring auto-approval for cd+git compound commands
#471802026-0523openCowork scheduled tasks ignore "Always allow" folder/tool permissions
#624372026-05-260openPreToolUse hook not invoked after a static ask rule receives session-level approval (Axis 8)

Operator-side defenses (1 of 4 axis-specific hooks shipped, 3 in design)

Two hooks ship now: subagent-permission-mode-guard.sh (permission-adjacent, Issue #55691) covers the sub-agent permissionMode override boundary, and always-allow-pattern-suggester.sh (PR #359, main-merged 2026-05-27) addresses Axis 2 directly. Three hook designs remain in the pipeline (2026-05 to 2026-06):

Upstream status

Meta-issue #30519 documents zero Anthropic staff engagement across 30+ issues over 9 months. One workaround comment in September 2025 didn't fix anything. No milestones, no Anthropic-authored PRs, no roadmap, no tracking issue. The community is now writing custom Python PreToolUse hooks to reimplement permission enforcement (#18846), which is the option the meta-issue identifies as "what people are actually doing." This is a security-relevant subsystem where the documented contract diverges materially from the runtime behavior.

Cluster 7: Skills metadata and loading

10+ open issues in 14 days across area:skills, four failure modes, zero Anthropic-side fix ACTIVE (May 2026, expanding)
Issues: 10+ (area:skills) + 20+ (area:agent-view related) Reactions: ~100+ combined (cluster is new; reaction-count growth signal) First filed: 2026-05 (area:skills label rolled out 2026-05-17) Cluster framing: 2026-05-26

Claude Code's Skills feature (added in v2.x) exhibits a coherent class of failures in metadata fabrication, frontmatter respect, discovery, and partial loading. The runtime accepts non-existent settings fields without validation; the documented paths: auto-load trigger does not fire; the argument-hint: frontmatter is not displayed; the agent view loads skills incompletely. No first-class observability for which skill is active during a given tool call exists.

The four failure modes

  1. Settings fabrication. Sub-agents write non-existent fields (e.g. disabledSkills) into ~/.claude/settings.json with no validation. The write succeeds, restart succeeds, the targeted skills remain active. Silent no-op. (#62421)
  2. Frontmatter not honored. The documented paths: auto-load trigger never fires; argument-hint: is replaced by ... ellipsis in the slash command hint area. (#62049, #62127)
  3. Discovery doesn't match docs. .claude/skills/ discovery via parent walk and additionalDirectories does not produce the documented effect in nested git repo layouts. (#62237)
  4. Partial loading and hook integration gaps. The "claude agents" interface starts the sub-agent before skills finish loading; PreToolUse hooks have no first-class way to know which skill is active. (#62386, #62108, #62078)

Representative cases (top 9 of the 14-day window)

IssueFiledStateOne-line symptom
#624212026-05-26openLLM agents fabricate non-existent disabledSkills setting — silent no-op
#620492026-05-24openpaths: frontmatter never triggers skill auto-loading
#621272026-05-25openargument-hint frontmatter not displayed in slash command hint area
#622372026-05-25openSkill discovery via parent-walk and additionalDirectories doesn't match docs
#623862026-05-26openClaude agents interface: incomplete skill loading
#621082026-05-25openAdd active_skill field to PreToolUse hook input
#620782026-05-24openExpose current skill name as env var to hooks
#622592026-05-25openAllow user override of sandbox auto-deny on .claude/skills/
#624092026-05-26openPlugin skill shadows built-in /release-notes slash command

Operator-side defenses (hook integration in design)

Three hook designs in the pipeline (June 2026):

Upstream status

All area:skills issues filed since the label rollout (2026-05-17) remain open. No Anthropic staff engagement signal across the cluster. Skills is a v2.x feature; the 14-day filing rate of 10+ issues since label rollout is the strongest signal that the feature shipped without the validation and observability primitives that the documented behavior implies.

Where to read more

Skills Metadata and Loading cluster field report (2,106 words, four failure modes, six issues with deep-dive, three detection paths, four defense paths). For the cross-cluster framework synthesis, see the 9-Cluster Framework (3,300 words).

Cluster 8: Server-side prompt injection (v2.1.150+)

Claude Code v2.1.150 introduces a function (named nAA in the minified source) that reads an arbitrary string from two network-backed channels and registers it as a peer-level section of the system prompt — sitting alongside the documented anti_verbosity, thinking_guidance, and action_caution sections. The two channels are the bootstrap API client_data field (validated only as z.record(z.unknown()), cached to disk) and the GrowthBook feature flag tengu_heron_brook (refreshing every 60 seconds in the background, also cached to disk). Whatever value Anthropic assigns to these channels gets injected into the agent's instructions, with shell access, with no client-side audit trail.

Anthropic's stance

Confirmed intentional on the issue thread: "We sometimes run experiments on changes to our system prompt." Two opt-out env vars exist: CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 (disables bootstrap client_data) and DISABLE_GROWTHBOOK=1 (disables tengu_heron_brook sync). The opt-outs close the injection channel but do not produce an audit trail of what was injected before the operator opted out, or what would have been injected after.

Representative issues

Defense path shipped (1 of 4)

Defense paths in design (3 of 4)

Upstream status

Acknowledged intentional in the same-day Anthropic comment. Opt-out env vars provided. Audit trail gap remains as the structural concern: operators in regulated industries cannot reconstruct what their agent was instructed at the time of a given logged action.

Where to read more

v2.1.150 server-side prompt injection audit paths (1,133 words, four audit paths). The 2026-11 Safety Lab issue covers this cluster in full.

Cluster 9: Usage Policy classifier over-trigger (AUP)

Starting 2026-05-18, the Anthropic server-side Usage Policy classifier began over-triggering on benign Claude Code prompts. 25+ open issues filed between 2026-05-18 and 2026-05-27, including single-word "hi" greetings, ordinary code reads, and non-English benign input (Russian, Polish, Spanish). The block fires server-side before the prompt reaches the model. The classifier is non-deterministic on identical input — the same prompt blocks one attempt and passes the next.

Cluster signature (three converging axes)

Representative issues

Four operator-side mitigation paths

  1. Swap to Sonnet for affected sessions (export ANTHROPIC_MODEL=claude-sonnet-4-7). Highest-leverage immediate workaround; the cluster signature is Opus-specific.
  2. Warm up the session with project context before sensitive prompts. Cold sessions hit the classifier at a noticeably higher rate than warmed sessions.
  3. Apply for the Cyber Verification Program (CVP). Long-term path; note that #61889 reports CVP approval is not currently sufficient to exempt approved users from this specific cluster.
  4. Retry on identical input. The classifier is non-deterministic; reports describe the same prompt passing on attempt 2 or 3 with no other change.

Defense path shipped (1 of 3)

Defense paths in design (2 of 3)

Upstream status

No explicit Anthropic comment on the cluster as of 2026-05-28. github-actions[bot] has auto-grouped at least three duplicate chains, confirming intake-side recognition. No public fix timeline.

Where to read more

Claude Code's AUP False-Positive Cluster: 4 Operator-Side Paths Through It (1,462 words, the four paths with reproducible examples). Companion interactive 4-question diagnostic outputs the highest-leverage path tailored to your model / frequency / domain / CVP status. The 2026-12 Safety Lab issue covers this cluster in full.

Cluster 10: GrowthBook A/B flag client-side overrides

Server-pushed feature flags rewriting client-side state and dispatch behavior ACTIVE
Issues: 58 Reactions: ~30 First filing: 2026-05-14 Root-cause analysis: 2026-05-25 (#62205) Cluster framing: 2026-05-28

Starting around 2026-05-14 a 2-week surge of bug reports surfaced one common shape: client behavior changing without a release boundary or changelog entry. The reporter #62205 (2026-05-25) traced the macOS Desktop variant to GrowthBook A/B feature flags being sync'd from the server every ~9 minutes and silently overriding the user's local settings.jsonpermissions.defaultMode: bypassPermissions flipping back to acceptEdits. The same shape applies on different surfaces: #63015 (2026-05-28) suspects tengu_compact_cache_prefix gating an auto-compact dispatch rewrite that silently fails to fire.

Cluster signature (three converging axes)

Representative issues

Operator-side diagnostic paths

  1. Inspect the cache directly. cat ~/Library/Application\ Support/Claude/cachedGrowthBookFeatures | jq '.features | keys' (macOS Desktop) or grep for tengu_ prefixed keys in ~/.claude.json (CLI). Lists the flags currently sync'd to your account.
  2. Watch for re-sync. stat -f %m ~/Library/Application\ Support/Claude/cachedGrowthBookFeatures every 30s for 15 minutes. If the mtime advances on the same ~9-minute cadence #62205 documented, you're being re-sync'd.
  3. Compare transcripts across versions. When a behavior regression is suspected, diffing a transcript from the prior version against the current version for the affected event (e.g., compact_boundary events) is the cleanest local confirmation that a dispatch path went silent.

Defense paths shipped (3 of 3)

Operator-side surface for this cluster is narrow: neither hooks nor settings.json overrides reach the dispatch path or the server-side flag rollout. The shipped defenses are observational, not preventive — they make the silent server-side changes visible at session boundaries so the operator can react.

Upstream status

No public Anthropic acknowledgment of the broader pattern as of 2026-05-28. #62205's root-cause analysis has 4 comments but no engineering response. The 5 override paths are observed in production, not in any official documentation.

Where to read more

The 2026-12 Safety Lab issue treats this cluster as the lead chapter, including the full 5-override-path enumeration and the diagnostic jq queries against cachedGrowthBookFeatures.

Cluster 11: Cowork sandbox / Desktop remote-control failure surface

Architectural axis

Claude Code's new Cowork surface (the Claude desktop app's sandboxed remote-control session) is a different distribution path than the CLI or the existing Desktop surfaces. Hooks defined in ~/.claude/settings.json do not fire in the Cowork sandbox, which narrows the operator-side defense surface relative to clusters 1-10. The 2-week filing surge (195 open issues filed between 2026-05-14 and 2026-05-28, ~17/day pace) does not show in any single high-reaction issue — instead it shows in the volume of independent users hitting different failure modes at the boundary of the new surface.

Sub-clusters within Cluster 11

Four sub-clusters articulated from the 195-issue window:

Defense status (operator-side)

Why the defense surface is narrow

Cowork runs in the Claude desktop app's GUI sandbox, not the CLI. Hooks defined in ~/.claude/settings.json do not fire in that environment. This is the same constraint that made the original cluster framing acknowledge the narrow defense surface. Standalone scripts (the cowork-claudemd-helper shape) and operator workflow changes are the only operator-side surfaces that reach the user before they start their Cowork session.

Upstream status

Cowork is a new surface (rolled out in May 2026), so the 195-issue volume reflects launch-window discovery rather than a long-standing failure pattern. Anthropic has acknowledged individual infrastructure incidents (#62873 is a public incident report) but the broader cross-issue pattern is not formally articulated upstream.

Where to read more

The original cluster-framing research note: ~/ops/customer-pain-cowork-cluster11-candidate-2026-05-28.md. The 2027-02 or 2027-03 Safety Lab issue is the candidate slot for the chapter-length treatment, once the FUSE-staleness and capability-detector hooks are shipped.

Cluster 12: Tool Call Parsing failures in Opus 4.7

Architectural axis

Opus 4.7 reaches a turn where the model intends to call a tool. The model emits a tool-use block. The harness reports back: the model's tool call could not be parsed; retry also failed. The session halts. Five filings between 2026-04-17 and 2026-05-27 pin four independent root-cause hypotheses for the same surface symptom — each consistent with the data the filer observed, none colliding with the others, together articulating four distinct mechanisms that could produce the central #62123 symptom.

Sub-clusters within Cluster 12

Defense status (operator-side)

The cluster's recovery surface is structurally narrower than the previously catalogued clusters. The earlier clusters (sub-agent observability, permission matching, etc.) all had operator-side defenses that could be installed via hooks — cc-safe-setup ships forty-plus production hooks against those clusters. The tool-call parsing cluster's recovery surface is narrower because the recovery requires interventions at layers (model attention, harness parser, serialization layer) that hooks cannot reach. The closest hook-shaped defenses are advisory:

Upstream status

No public Anthropic acknowledgment of the broader pattern as of 2026-05-28. The central case #62123 has 21 reactions and 10 comments but no engineering response. The four sub-clusters pin distinct mechanisms; the upstream fix surface depends on which mechanism Anthropic prioritizes.

Where to read more

Free preview Gist articulating all four sub-clusters and the recovery limits: Tool Call Parsing Failures in Opus 4.7 — A Five-Issue Cluster (1,982 words, MIT). The 2027-01 Safety Lab issue treats this cluster as the lead chapter, including the four advisory hook designs with installation surfaces, configurations, and known limits.

How clusters are detected and tracked

The clusters above are not curated from internal data — they are reconstructed from public GitHub Issues, public reaction counts, and public release notes. Anyone can verify the cluster framing against the underlying evidence. The monthly cadence is:

  1. Continuous Issue scanning. The top ~50 most-reacted open issues in anthropics/claude-code are reviewed weekly. Reaction-count growth + filing-date proximity surface candidate clusters.
  2. Sub-pattern articulation. Once 3+ issues share a structural shape, the cluster is framed (architectural axis, sub-pattern enumeration, lifecycle event mapping).
  3. Defense hook implementation. Where possible, an operator-side cc-safe-setup hook is built that detects or prevents the failure shape. Each hook ships as a PR with tests.
  4. Monthly synthesis. The CC Safety Lab monthly issue compiles the month's cluster findings into a single readable chapter (12,000-20,000 characters per cluster), with install walkthroughs, evidence links, and the operator-side cadence to monitor for ongoing changes.

The twelve clusters tracked here represent ~11,820 combined user reactions across 400+ issues. None has an official upstream fix as of 2026-05-28. Operator-side defenses (hooks or terminal-level workarounds) cover nine of twelve clusters at the symptom level; the permission matching cluster (Cluster 6) has two shipped hooks with three axis-specific hooks remaining in design, the Skills metadata cluster (Cluster 7) has one shipped hook with two more in design for June 2026, the v2.1.150 server-side prompt injection cluster (Cluster 8) has one shipped hook with three audit hooks in design, the AUP false-positive cluster (Cluster 9) has one shipped hook with two more in design for January-February 2027, the GrowthBook A/B override cluster (Cluster 10) has three shipped observational hooks (growthbook-flag-monitor, compact-dispatch-watchdog, permission-mode-drift-guard) covering all three defense paths in the original design — the server-pushed dispatch path itself remains unreachable from any operator surface, the Cowork sandbox cluster (Cluster 11) has one shipped standalone script (the helper script — hooks don't fire in the Cowork sandbox itself) plus three shipped CLI-side hooks (cowork-claude-md-load-checker, cowork-fuse-staleness-watcher, cowork-model-picker-advisor) that warn CLI users about Cowork-shaped failures before they switch surfaces, and the Tool Call Parsing cluster (Cluster 12) has four shipped advisory hooks (long-session-malformed-tool-call-detector for sub-pattern 12A, PR #406; extended-thinking-tool-use-mismatch-detector for sub-pattern 12B, PR #419; spurious-malformed-notice-detector for sub-pattern 12C, PR #423; xml-format-leak-detector for sub-pattern 12D, PR #424) covering all four sub-patterns at the advisory level (the recovery surface is structurally narrower than other clusters because hooks cannot reach the layers — model attention, harness parser, serialization layer — where the failures originate; the advisory hooks at least point operators at the correct workaround layer per sub-pattern).

Get the monthly synthesis of new clusters as they emerge

CC Safety Lab Founder — ¥500/month (~$3.50), Founder pricing grandfathered.
4-8 incidents per month, 1 deep-dive failure case, 1-2 copy-paste safety hooks, updated checklist, product update notes.

Coming up: June (multi-account), July (AGENTS.md interop), August (Pro Max quota anomaly), September (Permission matching), October (Skills metadata), November (v2.1.150 server-side prompt injection), December (AUP false-positive on Opus).

Join as Founder → Ko-fi Read the full Safety Lab page →

Free previews: May 2026 issue, full first chapter (3,500 words, English) · June 2026 issue preview (multi-account cluster opening, ~1,500 chars)

Related artifacts

This page is updated as clusters evolve. Last update: 2026-05-28 (Clusters 11 and 12 added: Cowork sandbox / Desktop remote-control failure surface, and Tool Call Parsing failures in Opus 4.7). Author: yurukusa. Cluster framings are independent operator-side analysis, not affiliated with Anthropic. Reaction counts and issue states are read from the public GitHub API at update time.