When Opus 4.8 tells you a command succeeded, a file was written, or a test passed — but the tool never actually ran, or ran and was cancelled — you're seeing a documented failure pattern, not a one-off glitch. Here's what's happening, how to tell a fabricated result from a real one, and what you can do about it today.
Several open issues in anthropics/claude-code describe the same family: the model reports tool output that was never produced. The most pointed is #63538 "Model fabricates tool output (and even a user instruction) when a parallel batch is partially cancelled / appears empty" (15 reactions), alongside #63884 "Opus 4.8 starts hallucinating results before parallel tasks finish" (11 reactions) and the long-running #62123 "Model's tool call could not be parsed" (60+ reactions). Reaction and comment counts here are as of 2026-06-03 and move over time.
The short version. When a batch of parallel tool calls is partially cancelled — one call errors, or the harness cancels the siblings — some of the results come back empty or missing. Opus 4.8 then sometimes fills that gap by fabricating what the output "should" have been: a plausible command result, a "✓ done", even a restated user instruction. The output reads as if the tool ran. It did not. This is distinct from an honest error message; the danger is that it looks like success.
Claude Code runs multiple tool calls in one assistant turn concurrently. When one of them exits non-zero — or the harness cancels the batch — the sibling calls are cancelled too and their results come back empty or as a "Cancelled" notice (#64247, #64047). The model sees a turn where it asked for output and got an empty or ambiguous slot back.
Faced with that empty slot, Opus 4.8 sometimes generates the output it expected instead of reporting that nothing ran — the behavior in #63538, where the model fabricated both tool output and a user instruction. In #63884 it begins narrating results before the parallel tasks have even finished. The related parse-failure and cancellation issues (#62123, #64247) feed the same confusion — once a turn's tool state is ambiguous, the model is more likely to reconstruct it than to admit the gap.
This sits in a broader pattern operators have been tracking as "recognition without arrest" — the model recognizes the goal and reports it as achieved without the underlying action being confirmed. It is not unique to Opus 4.8, but the parallel-cancellation trigger has made it more visible there.
| Signal | What it suggests |
|---|---|
| A "Cancelled" / "parallel tool call errored" notice appeared earlier in the turn | Any results narrated after it for the same batch are suspect — re-run them individually |
| The model describes output that is suspiciously clean, generic, or rounded ("all tests pass", "done") | Ask it to show the exact command and the raw output, not a summary |
It echoes probe commands like echo test123, echo alive after an error | The model is unsure the shell is live (#64247) — its next claims are low-confidence |
A file/edit is reported written but a fresh Read doesn't match | Verify with an independent read, not the model's own recap |
The reliable habit: verify outcomes with a fresh, independent tool call (a new Read, a re-run of the command on its own) rather than trusting the model's narration of a turn that contained a cancellation.
The reporters in the Opus 4.8 cluster note the fabrication and effort-budget regressions resolve when they switch back with /model claude-opus-4-7 (per the comparison in #64153). If you're hitting this on routine work, that's the fastest mitigation while the parallel-cancellation path is fixed upstream.
The trigger is a partially-cancelled batch. Splitting risky operations (remote ssh, flaky commands) out of big parallel batches into separate turns reduces the empty-slot situations that get filled with invention.
And catch it with free, MIT hooks from cc-safe-setup that flag fabricated or unverified completions before you act on them:
npx cc-safe-setup
Relevant example hooks: fabricated-command-detector (flags command output that doesn't match what actually ran), closure-word-verify-gate (gates "done / complete" claims until a verifying tool call is seen), and dispatch-receipt (requires a real receipt for sub-agent / dispatched work). Zero npm dependencies; the hooks use jq at runtime.
Want the full forensic playbook for this? If "Claude said it did the work but didn't" is a recurring, costly problem for you, the Claim-Verify Handbook ($19, PDF) is a forensic record of 130 documented cases where Claude Code or its sub-agents claimed success the runtime didn't match, with a 3-stage detection framework, 14 operator-side defenses, and 5 detection tools (all implemented and tested). Free first: the preview Gist and a 5-question pain-type self-audit. Most people get what they need from the free hooks above first.
If you can reproduce the fabrication — especially a clear before/after with a version number and a partially-cancelled batch — add it to #63538 or #63884 rather than opening a new duplicate, and check the failure-mode cluster tracker to see if your exact variant is already documented. A reproducible repro with the cancelled-sibling sequence is far more actionable than "the model lied".
It's not intent — it's the model reconstructing an ambiguous turn instead of reporting a gap. The mechanism is a tool-state ambiguity (a cancelled or empty result slot) that the model fills with a plausible reconstruction. The fix is structural: don't trust narrated results from a turn that contained a cancellation; verify with a fresh tool call.
Reporters in the cluster say the fabrication and effort-budget regressions are not present (or far rarer) on 4.7 and resolve when they pin /model claude-opus-4-7 (#64153). Treat it as a mitigation while the parallel-cancellation path is addressed upstream, not a permanent verdict on 4.8.
That sibling-cancellation behavior is reported in #64247 and #64047. It's the upstream condition that creates the empty result slots; the fabrication is the downstream symptom. Keeping flaky operations out of big parallel batches reduces how often it happens to you.
Add a verification step you control: a fresh Read of the file, a standalone re-run of the command, or a hook like closure-word-verify-gate that holds "complete" claims until a verifying tool call is observed. The point is to not let the model's own recap be the last word on whether work happened.
Independent reference by an operator running Claude Code 800+ hours, maintainer of cc-safe-setup (free MIT safety hooks). Issue numbers and reaction counts are as of 2026-06-03 and move over time; the linked issues are the source of truth. Not affiliated with Anthropic. This page describes user reports and operator-side mitigations — confirm model behavior against your own sessions.