Most multi-agent systems do not fail on capability. They fail because three things are moving at three different speeds and nobody built a system robust enough to keep them separate.
The three friction points:
- Active now - what is running right now, mid-task, changing by the second
- Recently completed - outputs landing, coordination surfaces updating, changing by the minute
- Next input - the context snapshot assembled for the next task, directly downstream of both
The problem is not the speed difference. The problem is when a surface serves two of these at once. An agent reads the coordination surface to build its context snapshot for the next task. At the same moment, the loop is writing to that surface because a job just finished. What the agent gets is half-harvested - not wrong enough to error, but wrong in the way that is hardest to catch. Plausible, stale in parts, inconsistently assembled. The system keeps running and producing subtly unreliable output with no clean signal telling you why.
A robust system fixes this at the level of clock and writer, not at the level of retry logic or error handling. In my setup: LIVE_STATE is the active-now surface - the kickoff writes a row when a task starts, the loop removes it when the journal lands. Two moments, one row, never concurrent. The coordination surfaces are the what-finished layer - the loop writes, the kickoff reads. That direction never reverses. The prime - the context snapshot for each new task - is assembled fresh from whatever the surfaces say at that exact stable moment, used once, then discarded. Not committed, not carried forward, not shared between tasks.
Three surfaces, three clocks, one writer each. The snapshot is only as reliable as the moment it was taken. Build the system so it only takes snapshots at moments you can trust.
Most of the unpredictability in multi-agent systems is a timing problem, not a capability problem. Get the clocks right and the agents can be considerably simpler than you think and still produce robust, reliable output.