I looked at an AI leaderboard so you don’t have to...
📌 Quick translation: it’s basically “Top Trumps for LLMs”, but instead of a dragon vs a robot, it’s:
- memory
- reasoning
- tool use
- how often it doesn’t faceplant on a multi-step task
đź’ˇ What the table roughly suggests:
- GPT-5.2 xhigh is the all-rounder, best overall scores in this screenshot
- Claude Opus 4.5 is the “close second that often writes like it has read a book”
- Gemini 3 Pro Preview is the “I can remember your entire business plan” option because the context window is enormous
- The “medium” tiers are where you start seeing the classic symptoms:
- randomly shortening your copy
- ignoring half the brief
- confidently inventing details
- acting like schema markup is a vibe not a format
đź§ What this means for you (and your future sanity):
- If you’re doing multi-step tasks (funnels, automations, CRM logic, debugging, structured content), choose the model that scores well on agentic/tool use
- If you’re pasting big inputs (long pages, brand rules, FAQs, 18 tabs worth of chaos), prioritise context window
- If you’re writing persuasive content, remember:
- leaderboards measure test performance
- your audience measures whether it sounds like a human who understands humans
🎯 My practical rule of thumb:
- Use the “big brain” tier when the job has consequences
- Use the fast one when you’re brainstorming and you don’t mind binning 30% of it
đź’Ž Tiny challenge for today:
- Pick one task you keep avoiding (a sales page section, a follow-up sequence, a messy automation)
- Run it through a higher-tier model once
- Compare time saved, clarity gained and how many times you mutter “for God’s sake” at the screen
P.S. If you want, comment MODEL ME and tell me what you’re using AI for (content, automations, offers, SEO). I’ll tell you which model mode to use for that job and the prompt structure that stops it going off piste.