DeepSeek has not only introduced its new R1 model that captured the attention of the AI community this week, but also released a lighter, distilled version called DeepSeek-R1-0528-Qwen3-8B. This new model stands out by outperforming similarly sized models in certain benchmarks.
The distilled model is built on Alibaba’s Qwen3-8B, which was launched in May. According to DeepSeek, it surpasses Google’s Gemini 2.5 Flash in AIME 2025, a challenging math benchmark. It also performs nearly on par with Microsoft’s Phi-4 Reasoning Plus in another math-focused test, HMMT.
Distilled models typically offer lower performance than their full-scale counterparts, but their major advantage lies in requiring significantly less computing power. According to cloud platform NodeShift, Qwen3-8B can run on a single GPU with 40GB–80GB of RAM, such as the Nvidia H100. In contrast, the full-sized R1 model demands around twelve 80GB GPUs.
DeepSeek-R1-0528-Qwen3-8B was created by collecting outputs from the full R1 model and using them to fine-tune Qwen3-8B.
In its official page on the popular AI platform Hugging Face, DeepSeek notes that the model is intended for both academic research and small-scale industrial applications.
Additionally, the model is released under the permissive MIT license, allowing for unrestricted commercial use. Several platforms, including LM Studio, already offer API access to the model.