Dual View Architecture - Full Orchestration Engine.
Here is a failure mode that ships more often than it should. The model that writes an output is also the model that checks it. You send the result back with "review this and flag anything weak," and the review skews toward approval, because a model reviewing its own work shares its own blind spots. This is a documented limitation, not a hypothesis. It is not universal. Plenty of teams already run multi-step pipelines, separate critic models, and output validators. But self-review as the only quality gate is still common, and for enterprise-grade output, where a confident wrong answer is a liability, it is worth solving properly. This post is how we approached it, where the design is genuinely sound, and where it is not. I want this community to review both. Why self-review is weak Two things work against a single model checking itself. The first is autoregressive momentum. A model picks each word partly from the words it already wrote, so the opening of an output conditions everything after it. A model that has generated thousands of reports has a strong format prior: summary, background, analysis, recommendation. Your spec might say to lead with the competitive threat and drop the background. That instruction competes with the prior, and a few sentences in, the prior often wins. The output looks like a report. It is not your report. The second is that evaluation has a prior too. In training data, reviews of polished work skew positive, so a model asked to "evaluate this" leans toward approval. A reviewer that is the same model, or the same model family, shares the writer's biases. Here is the honest version of the claim, and it is less dramatic than how this is usually sold. Prompting is not powerless, it is unreliable as your only quality control. Self-critique does catch real errors. It just catches fewer than an independent reviewer would, and you cannot tell from the output which case you got. So you do not throw prompting away. You add structure around it.