Stretching Claude Further: ICM to Orchestrate 2,350 Local Workers
We've been experimenting with treating ICM not as the whole system, but as one layer inside a larger orchestration architecture.
For us, ICM solved something much bigger than prompting.
It solved context.
How do you keep models focused?
How do you stop them from reading entire repositories?
How do you bound work?
How do you reduce drift?
How do you move toward convergence?
ICM gives us work packets, context contracts, routing, validation, and controlled handoffs.
Once we started implementing it, we found ourselves asking:
What happens if we build around that?
Internally we've been experimenting with a governance layer we call AQ-CMF (just our internal name for it), but I think the more interesting thing to share is the orchestration itself.
Right now it's basically a small "Swarm Orchestration Starter Pack."
The idea is simple:
Use the smallest model capable of doing the work.
Reserve larger models for judgment and reasoning.
Current setup:
RTX 3060 12GB
• 2,200 binary filtering workers
• Qwen 0.6B
• yes/no decisions
• triage
• filtering
• classification
RTX 5060 Ti 16GB
• 150 structured extraction workers
• Qwen 4B
• schema completion
• information extraction
• template generation
• up to 200 Kimi 70B workers
• interpretation
• reasoning
• code generation
• higher-complexity analysis
Claude Code
• orchestration
• synthesis
• validation
• architecture decisions
• final judgment
The smaller models don't really "think."
They observe.
They classify.
They extract.
They filter.
Claude assembles.
Claude validates.
Claude decides.
One thing we've noticed is that this also changes the economics considerably.
Instead of paying frontier-model prices for every operation, we let local models perform the cheap labor.
Reading files.
Scanning repositories.
Filtering documents.
Classifying content.
Extracting structured information.
The paid models only see the subset of information that survives that process.
That means we're spending fewer paid tokens having large models read thousands of pages or hundreds of files just to determine whether something is relevant.
The local workers do the searching.
The local workers do the filtering.
The local workers do the extraction.
Claude and the larger reasoning models are used where judgment actually matters.
For us, that has been less about replacing frontier models and more about extending them.
Subscriptions last longer.
API costs stretch further.
Expensive reasoning gets reserved for expensive problems.
The interesting realization has been that ICM scales surprisingly well into this kind of architecture.
ICM remains the execution discipline.
Context stays bounded.
Packets stay small.
Workers stay focused.
Claude stays reserved for the places where judgment actually matters.
At the moment this is mostly just two Python files.
One routes tasks to the appropriate worker tier.
The other dispatches Kimi workers in parallel.
Nothing magical.
No claims of AGI.
Just an experiment in seeing what happens when ICM becomes the execution layer for a larger swarm-style orchestration system.
Curious if anyone else here has been exploring something similar.
Here are the orchestration starter pack files:
4
2 comments
Andre Cordero
4
Stretching Claude Further: ICM to Orchestrate 2,350 Local Workers
Clief Notes
skool.com/cliefnotes
What we give away free beats most paid courses. Build durable AI systems with a Marine vet and Edinburgh researcher. 40+ lessons, growing.
Leaderboard (30-day)
Powered by