Is this the fastest open-source model for Agents? (Step 3.5 Flash Breakdown)

I’ve been testing different models for agentic workflows lately, and I just came across a new release that solves a huge bottleneck: Speed vs. Intelligence.

It’s called Step 3.5 Flash by StepFun.

Usually, if you want a "smart" model (like for coding or complex reasoning), you have to deal with slow latency. If you want speed, you lose intelligence.

This model uses a Sparse Mixture-of-Experts (MoE) architecture to fix that.

Here are the specs that matter for us builders:

Huge Brain, Light Footprint: It has 196B total parameters but only activates 11B per token.
Insane Speed: It hits 350 tokens per second for coding tasks.
Agent-First: It scored 74.4% on SWE-bench Verified, meaning it’s optimized for tool use and executing code, not just chatting.
Runs Locally: You can actually run the Int4 version on a Mac Studio or a solid local rig using llama.cpp.

If you are building agents that need to "think and act" in real-time without burning cash on API latency, this is definitely worth a look.

Has anyone else tried running this locally yet? I’d love to see what kind of throughput you're getting.

2 comments