So the new Kimi K2.7 Code model just dropped.
So, when an AI reasons through a problem, it uses tokens, which cost money and add latency. K2.7 cuts that reasoning token usage by 30%
- +21.8% on coding tasks, +11% on general programming, and +31.5% on multi-language work across Python, Rust, and Go
- Better instruction following on long, complex coding sessions that span many steps
- Scores 81.1% on tool-use benchmarks, beating Claude Opus 4.8's 76.4%
Has a 256K Context Window.
Takes:
Images: png, jpeg, webp, gif.
Videos: mp4, mpeg, mov, avi, x-flv, mpg, webm, wmv, 3gpp
API Pricing:
0.19 USD per 1m tokens if the Cache hit
0.85 USD per 1m tokens if the Cache Misses
4 USD per 1m output tokens.
Decent long-horizon coding capabilities.
So, where to exactly use this model?
I'd say the best way to use it is with an orchestrator and smaller agents. 1 of the frontier, expensive and smart model's (Fable's out of the situation for now because of the recent government complications so Opus 4.8 works). Orchestration, planning and very critical tasks go to the main model whereas everything else goes to Kimi K2.7 Code to build it all out, now again don't just use Kimi K2.7 for everything, use it when you're building a heavy project, but if you're writing content, going through a workflow process or brainstorming, there's better and cheaper models.