AI Coding Agents for QA: Part 4 — Why the Same Model Gives Different Test Results
In Part 3 I introduced Cursor and why IDE tools beat CLI for QA automation. But before we go deeper into Cursor features, there is a bigger question worth answering. ──────────────────────────────────────── 𝐓𝐰𝐨 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐬. 𝐒𝐚𝐦𝐞 𝐌𝐨𝐝𝐞𝐥. 𝐃𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐑𝐞𝐬𝐮𝐥𝐭𝐬. Engineer A asks GPT-5.4 to write a login test. Gets back: a clean, structured test. Uses their proper fixtures. Follows their naming convention. Works on first run. Engineer B does the same thing. Same model. Same task. Gets back: a generic, broken test. Hardcoded credentials. No page objects. Fails immediately. ──────────────────────────────────────── 🚫 𝐌𝐨𝐬𝐭 𝐏𝐞𝐨𝐩𝐥𝐞 𝐁𝐥𝐚𝐦𝐞 𝐭𝐡𝐞 𝐌𝐨𝐝𝐞𝐥 "GPT is bad at tests." "GPT doesn't understand Playwright." "I need a better model." That is the wrong diagnosis. The model is not the problem. All modern models can code really well. Three other things determine quality. ──────────────────────────────────────── ⚙️ 𝐋𝐚𝐲𝐞𝐫 𝟏: 𝐓𝐡𝐞 𝐓𝐨𝐨𝐥 As covered in Part 1, you never talk to the model directly. ► You ► Tool ► Model The tool decides what to send to the model. What context. What files. What history. Cursor sends your repo structure, open files, and recent edits. A chat app sends nothing. Same model. Different tool. Completely different output. ──────────────────────────────────────── 📁 𝐋𝐚𝐲𝐞𝐫 𝟐: 𝐑𝐞𝐩𝐨 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 AI agents amplify whatever already exists in your project. Good framework? The agent writes tests that slot right in. No page objects, no fixtures, no structure? The agent writes whatever it can. Which is usually a mess. This is the hard truth: AI cannot rescue a bad codebase. It makes it worse, faster. The model is only as good as what it can see. If your repo has: ∙ Clear fixture files ∙ Consistent naming ∙ Reusable page objects ∙ Good test examples The agent pattern-matches against all of that and writes code that fits. If it sees nothing, it invents everything. Pure lottery. ──────────────────────────────────────── 📝 𝐋𝐚𝐲𝐞𝐫 𝟑: 𝐓𝐡𝐞 𝐓𝐚𝐬𝐤 𝐒𝐩𝐞𝐜 "Write a login test" is not a task spec. It is a hint.