Stretching Claude Further: ICM to Orchestrate 2,350 Local Workers

We've been experimenting with treating ICM not as the whole system, but as one layer inside a larger orchestration architecture.

For us, ICM solved something much bigger than prompting.

It solved context.

How do you keep models focused?

How do you stop them from reading entire repositories?

How do you bound work?

How do you reduce drift?

How do you move toward convergence?

ICM gives us work packets, context contracts, routing, validation, and controlled handoffs.

Once we started implementing it, we found ourselves asking:

What happens if we build around that?

Internally we've been experimenting with a governance layer we call AQ-CMF (just our internal name for it), but I think the more interesting thing to share is the orchestration itself.

Right now it's basically a small "Swarm Orchestration Starter Pack."

The idea is simple:

Use the smallest model capable of doing the work.

Reserve larger models for judgment and reasoning.

Current setup:

RTX 3060 12GB

• 2,200 binary filtering workers

• Qwen 0.6B

• yes/no decisions

• triage

• filtering

• classification

RTX 5060 Ti 16GB

• 150 structured extraction workers

• Qwen 4B

• schema completion

• information extraction

• template generation

Cloud reasoning layer (introduced to me by

@Ari Evergreen

's post https://www.skool.com/cliefnotes/i-run-100-agent-workflows-on-a-budget-model-heres-the-catch)

• up to 200 Kimi 70B workers

• interpretation

• reasoning

• code generation

• higher-complexity analysis

Claude Code

• orchestration

• synthesis

• validation

• architecture decisions

• final judgment

The smaller models don't really "think."

They observe.

They classify.

They extract.

They filter.

Claude assembles.

Claude validates.

Claude decides.

One thing we've noticed is that this also changes the economics considerably.

Instead of paying frontier-model prices for every operation, we let local models perform the cheap labor.

Reading files.

Scanning repositories.

Filtering documents.

Classifying content.

Extracting structured information.

The paid models only see the subset of information that survives that process.

That means we're spending fewer paid tokens having large models read thousands of pages or hundreds of files just to determine whether something is relevant.

The local workers do the searching.

The local workers do the filtering.

The local workers do the extraction.

Claude and the larger reasoning models are used where judgment actually matters.

For us, that has been less about replacing frontier models and more about extending them.

Subscriptions last longer.

API costs stretch further.

Expensive reasoning gets reserved for expensive problems.

The interesting realization has been that ICM scales surprisingly well into this kind of architecture.

ICM remains the execution discipline.

Context stays bounded.

Packets stay small.