If "This request triggered cyber-related safeguards" or "appears to violate our Usage Policy" stopped you mid-task on your own legitimate engineering — and then every message after it was blocked too — you're hitting a documented false-positive cluster. Here's why one block kills the whole session, how to keep working today, and how to argue back the tokens it burned.
The usage-policy / "cyber" classifier over-triggers on ordinary, vendor-documented engineering: embedded firmware flashing and eFuse provisioning (#64405), routine sysadmin audit commands (#61185), hardening your own software (#63751), and the session getting stuck in a blocked state afterward (#62071). It's well past anecdote: The Register tracked the escalation from ~2–3 reports/month in mid-2025 to 30+ in April 2026 alone. Reaction and comment counts move over time; the linked issues are the source of truth.
The short version. The classifier doesn't just judge your latest message — sensitivity rises with the accumulated security-adjacent context in the session. A firmware or security session naturally piles up that vocabulary, so eventually a completely benign sentence ("flash the remaining boards", "continue") trips it. Once it does, every later turn re-reads the now-"poisoned" context and re-blocks. /clear or a fresh session is the only escape — which kills any in-flight work and re-bills you to rebuild context. You can't fix the classifier, but you can stop feeding it and argue the billing back.
The failure is not "one bad message." It's context accumulation: as a session collects security-shaped tokens (eFuse, secure-boot, iptables, /etc/shadow, exploit/audit terminology — even when entirely legitimate), the running context looks more and more like the thing the classifier is trained to stop. A single large security-shaped tool output can do it in one shot (#61185: a 17,000-line blocklist cat). After the first trip, the poisoned context is re-sent every turn, so even "ok" gets blocked (#62071, #63751). That's why the session, not the message, is what dies.
The over-trigger signature is Opus-family-wide; operators report the same workflows pass far more often on a Sonnet variant. Switch with /model before the firmware/audit/security session. If Sonnet passes the exact message Opus blocked, that's both an immediate unblock and clean evidence for your report.
Keep heavy secure-boot / eFuse / exploit-analysis discussion in a separate session from the execution turns, and keep execution terse ("flash board 3"). Read large security files with head -200 / grep -c instead of dumping the whole thing into context. Less accumulated security shape = fewer trips.
If you run hooks, the free MIT cc-safe-setup repo ships four advisory-only hooks built for exactly this cluster — none of them block, they just warn or break the bleed:
npx cc-safe-setup
| Hook | What it does |
|---|---|
aup-large-tool-output-warner | PreToolUse: warns before a cat/find on a security-shaped path dumps a large output that can flip the classifier — suggests a size-capped variant |
aup-retry-loop-guard | PostToolUse: detects 3+ blocks in a short window on the same tool and tells you to swap/restart before you re-ingest context on retries — stops the double-billing loop |
aup-block-pattern-logger | Logs each block (timestamp / model / pattern) so you can show the classifier-shift over time when you report |
aup-false-positive-helper | SessionStart advisory naming the cluster and the swap/refund paths |
A blocked request that still consumes tokens, then forces a full-context-rebuild restart, charges you twice: the work doesn't happen and you pay to rebuild the context the kill-switch threw away. That is a defensible refund case — but the framing matters. Don't ask for a generic refund; frame it as:
"Tokens consumed by a classifier false-positive and the forced session rebuild it caused — not by requested output," with the request IDs from the blocked turns attached.
Capture the request ID shown with each block (the req_… string) at the time it happens — it's the single most actionable thing you can give support, and the vendor-defect-cost framing is the one that tends to land versus "the filter is too aggressive."
This classifier keeps changing. The trigger surface, the version boundaries, and the workarounds shift release to release — what passes on one build blocks on the next. The free hooks and field guide above are the operator playbook; if you want the evolving picture (new trigger patterns, version boundaries, and defenses as they're found) tracked monthly instead of re-searching GitHub each time, that's what the Claude Code Safety Lab digest (¥500/mo) is for. Start free: the Cluster 9 field guide and the 4-question diagnostic that routes you to the highest-leverage path for your model and frequency.
The product-native paths are inconsistent (/feedback and /bug are unavailable in some surfaces) and the Cyber Verification Program appeal often auto-declines individual developers, so a clean public repro plus a private form submission is currently the best combination. Add your case to the matching open issue rather than filing a new duplicate — #64405 (embedded/firmware), #61185 (sysadmin), #63751 (own-software hardening) — and 👍 the request for a first-class private false-positive report path (#64287). A clean repro with the exact two messages (one that passes, one that blocks) and the request IDs is far more actionable than "the filter is too cautious." Check the failure-mode cluster tracker to see if your variant is already documented.
Because the block isn't about that message — it's the accumulated context being re-sent each turn. Once the session holds enough security-shaped content to trip the classifier, every subsequent turn re-reads it and re-blocks. The fix is a fresh session (or /clear), which is also why it costs you a context rebuild.
Often, yes — the over-trigger signature is Opus-family-wide and the same workflows pass more often on Sonnet. It's a mitigation, not a guarantee; treat a Sonnet pass/Opus block on the identical message as both an unblock and a data point for your report.
It's a defensible case when the tokens were consumed by the false-positive and the forced restart rather than by output you asked for. Frame it exactly that way, attach the req_… IDs from the blocked turns, and submit through support / the Cyber Verification Program false-positive form. The vendor-defect-cost framing lands better than a generic refund request.
No. Flashing your own boards, auditing your own systems, hardening your own software, and reviewing security are first-class, documented engineering. The classifier cannot reliably tell legitimate work from abuse yet — that's the defect (#64405). Keep a clean repro; you're not the outlier.
Independent reference by an operator running Claude Code 800+ hours, maintainer of cc-safe-setup (free MIT safety hooks). Issue numbers and reaction counts are as of 2026-06-03 and move over time; the linked issues are the source of truth. Not affiliated with Anthropic. This page describes user reports and operator-side mitigations — it is not legal or account advice; for Usage Policy questions, confirm against Anthropic's own documentation and support.