Opus 4.7 is 35% more expensive than you think.

Same $5 per million input tokens. Same $25 per million output. The pricing page says nothing moved from 4.6.

But if you're on Max, your weekly Opus quota drains faster than it did a month ago. Same prompts, same code, same workload. The number on the invoice didn't change. The number of tokens per request did.

Buried in Anthropic's own pricing doc, as a footnote under the model pricing table: "Opus 4.7 uses a new tokenizer compared to previous models, contributing to its improved performance on a wide range of tasks. This new tokenizer may use up to 35% more tokens for the same fixed text."

Read that again. Same fixed text. Up to 35% more tokens. That's not a price change you'd see on the sticker. It's a price change on every single request you send.

And honestly, that's only the first silent bump.

When Claude Code v2.1.117 shipped, the default effort level on Opus 4.7 moved to xhigh. Opus 4.6 and Sonnet 4.6 still default to high. If you never ran /effort to check, every call you've sent since the update has been reasoning harder, and burning more thinking tokens, than the same call a month ago. Haiku doesn't support effort levels at all, which honestly isn't a limitation. It's flat-rate reasoning you can actually budget against.

Two bumps stacked on the same workload. The Opus weekly cap on Max feels smaller because it effectively is.

The obvious move is to use Opus less. That's the wrong frame.

The right frame is that most of what you're running on Opus shouldn't be on Opus in the first place.

Open /agents in any recent Claude Code session and look at what Anthropic's own built-in subagents run on. Explore runs on Haiku. Claude Code Guide runs on Haiku. statusline-setup runs on Sonnet. Only Plan and general-purpose inherit whatever your main conversation is using. Anthropic shipped the cost routing pattern directly into the product.

Every custom subagent in your .claude/agents/ or ~/.claude/agents/ that doesn't set a "model:" field just inherits Opus and pays Opus rates for whatever it does.

Why did Anthropic put Explore on Haiku instead of Opus? Because Explore's job (glob, grep, read, summarize) is pattern matching that Haiku genuinely handles fine. They could've charged you the Opus premium for every file search and most people wouldn't have noticed. They didn't. The clue is literally right there in the product, and your custom subagents are ignoring it.

Here's why it matters as numbers. Haiku 4.5 scored 73.3% on SWE-bench Verified when it released in October 2025. Per Anthropic's own release notes it hits "90% of Sonnet 4.5's performance" in agentic coding and runs "up to 4-5 times faster than Sonnet 4.5". It even surpasses Sonnet 4 at some computer-use tasks. Pricing is $1 per million input and $5 per million output. Opus 4.7 is $5 and $25. Five to one on both sides.

On a typical subagent task (say, 50k tokens in and 5k out for a "go read these files and summarize what each one does"), Opus costs $0.375 per call. Haiku costs $0.075. Five to one, stamped onto every single call that subagent ever makes. On the API that's money. On Max it's quota, which honestly matters more because you can't buy your way out of a cap.

One more thing if you're doing this math. Subagents don't share the parent conversation's prompt cache. Every named subagent you spawn pays the full input load to bring itself up to speed (fork mode is the exception, forks inherit parent context wholesale and reuse the parent's cache, but forks are still experimental behind a flag as of v2.1.117).

So if your subagents keep reloading the same 100k-token context on Opus, each spawn is burning uncached input at $0.50 a pop. Move those same spawns to Haiku and the same load drops to $0.10. That gap compounds hard if you run a pipeline that spawns ten of them in a session.

So the fix. Three frontmatter levers and one environment variable.

For any individual subagent, add "model: haiku" to the frontmatter. One line, saved, done. Pair it with "disallowedTools: Write, Edit" if the subagent is read-only (most of them are), which means it physically can't edit files. Cheap, fast, and can't blow up your repo in the same config.

For skills and subagents that don't need deep reasoning, add "effort: low" to the frontmatter. It overrides the session effort only while that skill or subagent is active. Haiku ignores it (which is the point, the spend is already flat).

For the nuclear option, set CLAUDE_CODE_SUBAGENT_MODEL=haiku as an environment variable. That forces every subagent in the session onto Haiku regardless of what the individual frontmatter says. Useful for benchmarking the gap yourself before committing changes across your agents.

Personally, this is what I actually run. Main session on Opus 4.7 for the reasoning-heavy work it's genuinely good at. Every read-only or exploration subagent on Haiku 4.5 with "disallowedTools: Write, Edit". "effort: low" on any skill that just follows instructions without needing to think hard. That's it. Let Anthropic's own built-ins tell you which tasks belong on Haiku, they already did the homework. No need to optimize further than that.

15 comments