Fabrication / False-Completion Self-Audit

When Claude Code says "done" or "tests passed" — would your setup catch it if that were not true?

The pain. A recurring, well-documented Claude Code failure is the model reporting work it did not do: a fabricated tool result, a "task complete" that isn't, a "tests passed" with no test that ran. Public examples on the anthropics/claude-code tracker include #33781 (tokens burned producing fabricated results), #44955 (repeated false verification), and #27430 (fabricated technical claims auto-published to eight platforms). The model-specific Opus 4.8 variant is written up separately in why Opus 4.8 fabricates tool results.

What this does. Eight questions about your setup score how well your defenses catch a fabricated "done" before it costs you — wasted tokens, a bad commit, a published falsehood. Each gap comes with the operator-side hook pattern that closes it.

Three minutes. Local-only. No tracking — nothing leaves your browser.

1. Completion gate

Do you have a Stop hook that can block the model from ending a turn on an unverified "done" / "complete"?

Yes — a Stop hook checks for proof before the turn is allowed to end

Partly — I read the final message myself but nothing blocks automatically

No — "done" is taken at face value

2. Test-result verification

When the model says "tests passed," is that backed by a real command exit code something checks — not just the model's word?

Yes — a hook or CI checks the actual exit code / output

Partly — tests run sometimes, but the claim isn't tied to a verified result

No — I trust "tests passed" as written

3. Evidence requirement

Are claims like "verified" / "implemented" / "fixed" required to carry concrete evidence (file paths, a diff, command output)?

Yes — claims without evidence are rejected or re-checked

Partly — I ask for evidence ad hoc

No — assertions stand on their own

4. Pre-publish / pre-commit gate

Before the model commits, pushes, or sends anything outward, does a hook verify the claimed work actually exists?

Yes — a PreToolUse gate verifies before any outward action

Partly — for commits, but not for publishing / sending

No — outward actions run on the model's say-so

5. File / change existence check

After the model claims it created or edited files, do you confirm they actually exist and changed?

Yes — a hook or script confirms the files / diffs are real

Partly — I notice when something is obviously missing

No — I assume the edits landed

6. Sub-agent result verification

When sub-agents report what they found or did, do you verify it rather than trust the summary?

Yes — sub-agent claims are checked before they're acted on

Partly — only when a result looks off

No — I take the sub-agent's report as fact

I don't use sub-agents

7. Context / instruction fabrication guard

Do you guard against the model inventing context — fabricated "warnings," made-up requirements, or claims about files it never read? (see #35357)

Yes — I cross-check the model's stated context against the real files

Partly — only on important decisions

No — I take the model's framing of context as given

8. Audit trail

Do you log the tool calls that actually ran, so you can reconcile "what ran" against "what was claimed" afterward?

Yes — an activity log records the real tool calls

Partly — I have terminal scrollback but nothing structured

No — no record of what actually executed

—

The lower the score, the more a fabricated "done" can pass through unchecked. Your gaps, strongest-first:

Close every gap above — the full framework

The free cc-safe-setup hooks cover several of these patterns at no cost. For the complete defense — 130 documented fabrication cases, a 3-stage claim-verify framework, and 14 ready-to-install defenses keyed to the exact failure each one stops — see the Claim-Verify Handbook ($19). One avoided fabricated-commit cleanup pays for it.

Japanese readers: the same ground is covered in the 月刊の安全運用便 and the 事故防止ガイド (¥800).