Everyone has a take on which AI model is best. Almost nobody runs the test.
I put five models through the same coding task: GPT-5.5, Opus 4.7, Gemini 3.1, and two Qwen 3.6 models running locally for free via Ollama. Same prompt. One shot each.
Final rankings:
1. Qwen 3.6 27B dense - 8/10 - 794 lines - $0 (49 min on M1 Max)
2. Claude Opus 4.7 - 7/10 - 613 lines - $1.65 (4.5 min)
3. Gemini 3.1 Pro - 585 lines - $0.26 (~5 min)
4. GPT-5.5 - 6/10 - 750 lines - $0.37 (~7 min)
5. Qwen 3.6 35B A3B MoE - last - 855 lines - $0 (~6 min)
The 20%: bigger parameters did not mean better output. The MoE model was faster but inconsistent. The dense model was slower but delivered. That tradeoff is worth understanding before you pick a model for a real build.
Full video breakdown is on YouTube.
Play every build: