Open Reproduction of DeepSeek-R1
Points and comments are a snapshot, not live.
Hugging Face releases open-source reproduction of DeepSeek-R1 reasoning model, completing step one of three-stage plan.
The open-r1 project provides scripts and datasets to reproduce DeepSeek-R1's reasoning capabilities. Step 1 completion (May 2025) released Mixture-of-Thoughts, a 350k-sample reasoning dataset distilled from R1 covering mathematics, coding, and science tasks. The project includes training recipes for OpenR1-Distill-7B using supervised fine-tuning and group relative policy optimization (GRPO), plus data generation pipelines. Results show OpenR1-Distill-7B matching or exceeding DeepSeek-R1-Distill-Qwen-7B on several benchmarks. Steps 2 and 3 involve reproducing the pure RL pipeline and demonstrating multi-stage training from base model. The codebase supports distributed training via DeepSpeed, vLLM inference, and code execution environments for competitive programming tasks.
What commenters are saying
Commenters note the project only completed step 1 of 3, falling short of full reproduction claims. A significant critique flags missing implementation details in code validation, with only exact string matching rather than proper ground-truth comparison. Users recommend alternative fully-open projects like OLMo and Nemotron as more complete reproducibility resources. Training cost estimates range from $294k (DeepSeek's claim, viewed skeptically) to $2.75M+ when accounting for market-rate compute.