When Claude Code says "done" or "tests passed" — would your setup catch it if that were not true?
The pain. A recurring, well-documented Claude Code failure is the model reporting work it did not do: a fabricated tool result, a "task complete" that isn't, a "tests passed" with no test that ran. Public examples on the anthropics/claude-code tracker include #33781 (tokens burned producing fabricated results), #44955 (repeated false verification), and #27430 (fabricated technical claims auto-published to eight platforms). The model-specific Opus 4.8 variant is written up separately in why Opus 4.8 fabricates tool results.
What this does. Eight questions about your setup score how well your defenses catch a fabricated "done" before it costs you — wasted tokens, a bad commit, a published falsehood. Each gap comes with the operator-side hook pattern that closes it.
Three minutes. Local-only. No tracking — nothing leaves your browser.
1. Completion gate
Do you have a Stop hook that can block the model from ending a turn on an unverified "done" / "complete"?
2. Test-result verification
When the model says "tests passed," is that backed by a real command exit code something checks — not just the model's word?
3. Evidence requirement
Are claims like "verified" / "implemented" / "fixed" required to carry concrete evidence (file paths, a diff, command output)?
4. Pre-publish / pre-commit gate
Before the model commits, pushes, or sends anything outward, does a hook verify the claimed work actually exists?
5. File / change existence check
After the model claims it created or edited files, do you confirm they actually exist and changed?
6. Sub-agent result verification
When sub-agents report what they found or did, do you verify it rather than trust the summary?
7. Context / instruction fabrication guard
Do you guard against the model inventing context — fabricated "warnings," made-up requirements, or claims about files it never read? (see #35357)
8. Audit trail
Do you log the tool calls that actually ran, so you can reconcile "what ran" against "what was claimed" afterward?
The lower the score, the more a fabricated "done" can pass through unchecked. Your gaps, strongest-first:
The free cc-safe-setup hooks cover several of these patterns at no cost. For the complete defense — 130 documented fabrication cases, a 3-stage claim-verify framework, and 14 ready-to-install defenses keyed to the exact failure each one stops — see the Claim-Verify Handbook ($19). One avoided fabricated-commit cleanup pays for it.
Japanese readers: the same ground is covered in the 月刊の安全運用便 and the 事故防止ガイド (¥800).