Self-hosted DeepSeek on a lightweight, minimum install
Unsloth, an AI development team run by brothers Daniel and Michael Han, has successfully reduced the size of DeepSeek-R1 by approximately 80% using dynamic quantization techniques[1][2]. This significant reduction allows the model to run more efficiently on consumer hardware while maintaining much of its original performance.
## Key Achievements
- **Size Reduction**: The original DeepSeek-R1 model, which required 720GB of storage, has been compressed to just 131GB[1][2][6].
- **Performance Retention**: Despite the drastic size reduction, the compressed model maintains 80-90% of the original model's reasoning capabilities[4].
- **Efficiency Gain**: The compressed model can achieve a throughput of 140 tokens per second and 14 tokens per second for single-user inference on dual H100s[1].
## Dynamic Quantization Technique
Unsloth's approach to compressing DeepSeek-R1 involves:
1. **Selective Quantization**: Different parts of the model are quantized at varying levels of precision[2].
2. **MoE Layer Focus**: The Mixture of Experts (MoE) layers, which account for about 88% of the total weights, are quantized to 1.58 bits[2][5].
3. **Precision Balance**: Critical layers like the attention mechanism and initial transformer blocks use higher precision (4-bit or 6-bit) to maintain model integrity[2][3].
## Available Versions
Unsloth has created four dynamically quantized versions of DeepSeek-R1[2]:
1. 1.58-bit version (131GB)
2. 1.73-bit version (158GB)
3. 2.22-bit version (183GB)
4. 2.51-bit version (212GB)
## Practical Implications
- **Accessibility**: The compressed model can run on systems with as little as 80GB of combined VRAM and RAM[1][7].
- **Local Deployment**: Users can now run powerful AI models locally, reducing reliance on cloud services[1].
- **Cost-Efficiency**: The compression technique significantly reduces computational costs while maintaining strong performance[5].
This breakthrough in model compression demonstrates the potential for making advanced AI models more accessible and efficient, paving the way for broader adoption and application of powerful language models.
Citations:
0
0 comments
Guerin Green
4
Self-hosted DeepSeek on a lightweight, minimum install
Burstiness and Perplexity
skool.com/burstiness-and-perplexity
Master AI use cases from legal & the supply chain to digital marketing & SEO. Agents, analysis, content creation--Burstiness & Perplexity from NovCog
Leaderboard (30-day)
Powered by