Qwen 3.6 Dethrones Opus 4.7, GPT 5.5 and Gemini 3.1

Everyone has a take on which AI model is best. Almost nobody runs the test.

I put five models through the same coding task: GPT-5.5, Opus 4.7, Gemini 3.1, and two Qwen 3.6 models running locally for free via Ollama. Same prompt. One shot each.

Final rankings:

1. Qwen 3.6 27B dense - 8/10 - 794 lines - $0 (49 min on M1 Max)

2. Claude Opus 4.7 - 7/10 - 613 lines - $1.65 (4.5 min)

3. Gemini 3.1 Pro - 585 lines - $0.26 (~5 min)

4. GPT-5.5 - 6/10 - 750 lines - $0.37 (~7 min)

5. Qwen 3.6 35B A3B MoE - last - 855 lines - $0 (~6 min)

The 20%: bigger parameters did not mean better output. The MoE model was faster but inconsistent. The dense model was slower but delivered. That tradeoff is worth understanding before you pick a model for a real build.

Full video breakdown is on YouTube.

Play every build:

GPT-5.5 (medium): https://assets.airevenueclub.com/playground/video-2/gpt-medium-snake.html

GPT-5.5 (xhigh): https://assets.airevenueclub.com/playground/video-2/gpt-xhigh-snake.html

Claude Opus 4.7: https://assets.airevenueclub.com/playground/video-2/claude-snake.html

Gemini 3.1 Pro: https://assets.airevenueclub.com/playground/video-2/gemini-snake.html

Qwen3.6-27B: https://assets.airevenueclub.com/playground/video-2/qwen_27b_snake.html

Qwen3.6-35B-A3B: https://assets.airevenueclub.com/playground/video-2/qwen_35b_snake.html

0 comments