Microsoft just dropped a 1.58-bit LLM—and no, that’s not a typo. BitNet b1.58 2B4T (yes, that’s the real name) is an open-source, ultra-lightweight large language model that’s trained on 4 trillion tokens and can actually run on a CPU—like, say, your MacBook’s M2 chip. It clocks in at 400MB of memory usage, spanks models like Meta’s LLaMa 3.2 and Google’s Gemma 3 in several benchmarks, and even decodes faster on a CPU. While that's impressive, 400MB of memory is still way more than any everyday device is going to want to allocate to one program...at least as far as my laptop, phone, and watch are concerned 😉 The catch? You have to use Microsoft’s custom bitnet.cpp framework to see those speed gains—don’t expect miracles if you're just tossing it into Hugging Face Transformers. So… what does this mean? Less power, less memory, local compute—and maybe, just maybe, a path to AI that doesn’t require a server farm or a second mortgage. Could 1-bit AI unlock a future of hyper-efficient, decentralized intelligence? Someone should probably ask that. Preferably before the GPUs run out. Link to Article