I stopped running everything on Opus. Here's the system I built instead.
Running Claude Code on Opus is powerful.
But most turns in a session are mechanical, reading files, writing boilerplate, simple edits. You're paying top-tier rates for work that Haiku could handle in its sleep.
Anthropic just released the Advisor Tool:
A server-side pattern where a cheap executor model consults Opus mid-generation for strategic guidance.
But it's API-only. Can't use it inside Claude Code.
So I rebuilt the pattern inside Claude Code using what already exists.
The setup:
Opus stays in the orchestrator seat.
It plans. It makes architecture decisions. It reviews output. It never touches files directly unless the task genuinely needs Opus-level judgment.
Everything else gets dispatched to cheaper models:
  • Agent({ model: "haiku" }) — Claude subagents with full file access for simple edits
  • Agent({ model: "sonnet" }) — for multi-file changes that need moderate reasoning
  • A CLI tool (ask.py) that routes to Gemini, Kimi, MiniMax, or local Gemma via Ollama — for code generation, research, video analysis, anything where you just need text back
The routing logic:
Does the task need file access? → Agent tool (haiku/sonnet)
Is the code complex? → sonnet or gemini
Is the code simple? → haiku, kimi, or minimax (cheaper)
Need video analysis? → gemini --video (native, no frame extraction) Need parallel research? → kimi-swarm (spawns up to 100 sub-agents)
Want zero API cost? → gemma running locally via Ollama
When a subagent hits something it can't handle, it reports back NEEDS_GUIDANCE. Opus thinks it through and re-dispatches with better context. That's the advisor pattern, strategic guidance exactly when needed, cheap execution everywhere else.
Cost impact:
A 10-task session where 8 tasks go to Haiku and 2 to Sonnet — your Opus tokens are only spent on planning and review. Maybe 20% of the total token volume. The other 80% runs at Haiku rates.
With local models mixed in, it drops further.
Open-sourced the whole thing:
Drop-in ask.py CLI (10 models, 5 providers + local Ollama), a /advisor slash command for Claude Code, and a ready-to-paste CLAUDE.md snippet that makes this the default behavior for every session.
No dependencies. stdlib Python only. macOS Keychain or env vars for keys.
The system around the AI is the intelligence. The model is just one variable and it doesn't always need to be the most expensive one.
36
40 comments
Ari Evergreen
6
I stopped running everything on Opus. Here's the system I built instead.
Clief Notes
skool.com/quantum-quill-lyceum-1116
Jake Van Clief, giving you the Cliff notes on the new AI age.
Leaderboard (30-day)
Powered by