If your Max subscription says you're at 16% — and then refuses the next prompt with a rate-limit error — you are not imagining it, and you are not alone. Here's the documented pattern, why you can't see your real quota from inside Claude Code, and what you can actually measure today.
This is one of the largest sustained complaint clusters in anthropics/claude-code. The anchor issue, #16157 "Instantly hitting usage limits with Max subscription", has 720+ reactions and over a thousand comments. It is one of about ten open issues — together well over 2,000 reactions — describing the same family of problems on Max plans, filed steadily from December 2025 through 2026. As of this writing they are all still open.
The short version. Three separate things stack up: (1) Claude Code gives you no built-in way to see your real remaining quota, (2) the percentage it does show sometimes disagrees with when it actually cuts you off, and (3) several release boundaries appear to have changed how fast quota is consumed. You can't fix the platform side — but you can measure your own consumption so you know whether a limit is normal usage or an anomaly worth reporting.
First, the thing most people miss: the quota is shared. Anthropic's own documentation states that "both Pro and Max plans offer usage limits that are shared across Claude and Claude Code, meaning all activity in both tools counts against the same usage limits" (official help article). So a heavy morning in the Claude.ai chat app (or other surfaces that draw on the same plan) eats directly into the window you have left in Claude Code — which is the most common reason a session dies "too early" when your Claude Code usage alone looked light. Check your total plan usage, not just Claude Code, before concluding something is broken.
There is no CLI command that tells you "you have X% of your 5-hour window left" with the same numbers the limiter uses. Operators have asked for exactly this — #13585 "Add Quota Information Access to Claude Code CLI". Without it, every other symptom below is something you experience but can't independently verify.
Multiple users report being blocked while the usage indicator still shows plenty of headroom:
When the number you're shown and the number that gates you don't match, you can't plan your session around it — which is why measuring independently matters.
The strongest signal that this isn't just one person's misconfiguration is that several independent reporters name a specific version or date after which their quota started draining faster:
| Reported boundary | Issue | What changed for them |
|---|---|---|
| v2.1.1 | #16856 | ~4× faster rate consumption than previous versions |
| Opus 4.6 (vs 4.5) | #23706 | Noticeably higher token consumption per task |
| v2.1.89 | #41788 | Max 20 plan hit 100% ~70 min after reset — "never happened before v2.1.89" |
| March 23, 2026 | #38335 | 5-hour session limits exhausted abnormally fast from that date |
| v2.1.100+ | #46917 | One operator's reproducible report: identical payload, cache_creation ~20K tokens higher than v2.1.98 (server-side) |
| Late April 2026 | #54714 | Max 20x daily limit hit on Apr 28–29 with reduced usage — "limits appear silently tightened" |
These are user reports, not a confirmed Anthropic changelog entry — but four-plus independent version boundaries pointing the same direction is a different shape from a single misconfiguration.
Honestly: some of each, and from inside Claude Code you can't always tell which. Long agentic sessions, large contexts, extended thinking, and cache creation all legitimately burn quota fast — and Opus burns it faster than Sonnet. The way to separate "I'm using a lot" from "something changed" is to measure your own token consumption over time and look for a step change, not just a high number. If you see a sudden jump that lines up with an update, that's the report worth filing.
Read your own session logs instead of guessing. Token Checkup and Token Consumption Diagnosis are browser-only, no signup. ccusage (open-source, npm) parses your local JSONL for a per-session breakdown.
Free MIT hooks in cc-safe-setup watch your own usage and warn at session open or after a tool call: quota-anomaly-detector, session-rate-monitor, session-quota-tracker, and cache-creation-drift-detector (built for the #46917 inflation pattern).
Install the hooks in about ten seconds:
npx cc-safe-setup
It installs the safety hooks and lists the optional quota/rate monitors you can enable. Zero npm dependencies; the hooks need jq at runtime.
A separate, sharper failure that's easy to confuse with running out of quota: a fresh session — sometimes right after /clear — immediately throws
API Error: Usage credits required for 1M context · run /usage-credits to turn them on, or /model to switch to standard context
even though you picked the standard (non-[1m]) model in the picker and never asked for 1M context. It's reported on the plain CLI and the VS Code extension (#62199), on Sonnet 4.6 (#61692), on Opus 4.6/4.8 (#64764), in dispatch/background agents (#64534), and during compaction (#63896). The common thread: requests are being force-routed to the 1M-context tier, which trips the usage-credits gate, even when you didn't select it.
CLAUDE_CODE_DISABLE_1M_CONTEXT=1 as a shell export (not in the settings.json env block — the env-var path is honored more reliably). This is not a downgrade of the model's reasoning, only of the context window you weren't deliberately using, so it returns you to ordinary Opus/Sonnet on 200K — which is the behavior most people hitting this actually want. Caveat: there's an open report that this env var is itself silently ignored in some builds (#63479); if you set it and the gate still fires, that's that bug, not your setup./model as a test — e.g. claude-opus-4-7. If 4.7 is not force-routed to 1M while 4.8 is, that narrows it to the 4.8 routing path and is useful signal for the report.$0 limit may also not satisfy a gate that's looking for a positive allowance, so a small non-zero cap is worth testing separately.This is distinct from the quota-visibility problem above: here the session isn't out of quota — it's being gated for a context tier you didn't choose.
If your own measurement shows a step change in consumption that lines up with an update — not just a generally high number — that's a useful, verifiable report. Add it to the matching issue above (a reproducible before/after with version numbers is far more actionable than "limits feel wrong"), and check the failure-mode cluster tracker to see if it's already documented.
Yes. Anthropic's help article states the usage limits are "shared across Claude and Claude Code, meaning all activity in both tools counts against the same usage limits" (source). If Claude Code dies early but its own usage looked light, check whether you spent the window in the chat app first.
Several operators report this starting around v2.1.89 (#41788). Heavy agentic loops, large contexts, and extended thinking can genuinely exhaust a 5-hour window fast — but if it started suddenly after an update for you too, measure before/after and add it to that issue.
That display-vs-cutoff mismatch is reported in #29579 and #19673. Because Claude Code exposes no authoritative quota number (#13585), you can't confirm it from inside — the practical move is to log your own usage and report the discrepancy.
There's no built-in command for it yet. The closest is reading your local session JSONL with ccusage or the browser tools above to track cumulative consumption, then comparing against your plan's window.
One operator posted a reproducible case (#46917): the same payload showed cache_creation ~20K tokens higher than on v2.1.98, server-side. Treat it as a well-documented single report, not a confirmed platform-wide regression — the cache-creation-drift-detector hook exists to flag it if it's happening to you.
Requests are being force-routed to the 1M-context tier even on a standard model selection, which trips the credit gate (#62199, #64764, #61692). Pin 200K with a shell export CLAUDE_CODE_DISABLE_1M_CONTEXT=1 — it doesn't reduce the model, only the context window — though that env var is itself reported ignored in some builds (#63479). Enabling $0 credits often won't clear it without a full restart, and a $0 limit may not satisfy the gate. See the section above for the full workaround ladder.
Yes, materially — Opus is the heavier model. #23706 tracks the 4.6-vs-4.5 jump specifically. If you're hitting limits fast, switching routine work to Sonnet is one of the most direct levers.
Independent reference by an operator running Claude Code 800+ hours, maintainer of cc-safe-setup (free MIT safety + quota-monitoring hooks). Issue numbers and reaction counts are as of 2026-06-02 and move over time; the linked issues are the source of truth. Not affiliated with Anthropic. This page describes user reports and operator-side measurement — for billing or quota questions, confirm against Anthropic's own documentation and support.