Observability layer on top of ICM workspaces. Does our reasoning hold up?

Hi all,

We're a small AI consultancy in Germany (mid-market clients, running first pilot projects, fixed-fee model) and we've been exploring ICM as the delivery format for the workflows we ship to clients. The reasoning behind switching from custom-coded pipelines to ICM is mostly the obvious stuff — auditability for the upcoming EU AI Act, lower handover friction, faster iteration with the client in the loop. Jakes’ paper articulates the case better than we could.

But while planning the commercial rollout, we kept running into the same question, and we'd love a sanity check from people who've thought about this longer than we have.

The observation:

ICM commoditizes the workflow itself. Once we hand a client a workspace, they can in principle edit it, fork it, or hire someone else to maintain it. That's a feature, not a bug and it's exactly what makes the model trustworthy. But it also means our differentiation as a service provider has to move up the stack. The workspace can't be the moat.

The idea we're testing:

An observability layer that sits above a fleet of ICM workspaces, not inside them. Each stage emits a small telemetry event when it finishes (stage name, model used, tokens in/out, duration, success/failure, whether a human intervened). Events flow to a central collector. From that data we build:

- Per-workspace ROI metrics (time saved vs. baseline)

- LLM cost aggregation across providers

- Compliance / audit reports auto-generated for EU AI Act (documentation)

- Drift alerts when new model versions are released ("here are the 3 workspaces likely to benefit from Claude 5 — estimated quality delta")

- Optimization recommendations based on patterns across stages

The premise is that this layer is genuinely hard for the client to replicate because aggregation, history and cross-workspace insights require infrastructure they don't want to run themselves and that this is what justifies recurring revenue, not (only) maintenance bug-fixes.

What we're unsure about:

1. Does emitting telemetry from each stage feel like a violation of the ICM principle that "everything is files you can inspect"? Our intuition is no, because the workspace itself stays self-contained and the emission is just a side effect at stage end but we'd like to be told if that's naïve.

2. Has anyone tried collecting cross-workspace data at scale? We expect the messy part to be schema drift… every workspace evolves its stages over time, and the telemetry needs to stay backwards-compatible without becoming too bureaucratic.

3. For the audit-trail use case specifically, is there appetite in the community for a shared open spec? It feels like the kind of thing that would benefit from standardisation rather than every consultancy reinventing it.

4. And the meta-question: are we missing something more fundamental? If the whole "observability above ICM" thesis is wrong, we'd much rather hear that now than after we've built half of it.

Happy to discuss any of this in more depth. The thinking is at the "we believe this is right but haven't shipped it yet" stage, which is precisely when external pushback is most useful.

Cheers,

Nuvoro

————————————————————————————————

@Jake Van Clief

@David Vogel

@Bas Rosario

0 comments