Okay, this one made me do a double take: 12 million tokens in one context window is wild.
Subquadratic just launched a new AI model architecture that claims to handle massive context more efficiently, which could change how we think about RAG, coding agents, research tools, and long-running automations.
- 12-million-token context window available through an API
- Subquadratic Selective Attention, built to avoid quadratic attention costs
- Claimed linear scaling in compute and memory
- 52x faster than dense attention at 1M tokens
- 92.1% needle-in-a-haystack retrieval at 12M tokens
- MRCR v2 score of 83, reportedly beating GPT-5.5
- Potentially less chunking, routing, and context stitching
- Bigger working memory for AI agents and automation flows
- Coding agent and deep research tool launching in beta
- Still early, with big claims that need real-world testing
What I’m watching here is whether huge context windows actually simplify automation, or whether we still need smart retrieval, memory, and workflow design around the model.