Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
DeepSpec is DeepSeek's full-stack codebase for training and evaluating draft models for speculative decoding — the technique that accelerates LLM inference by letting a small draft model propose tokens that the large target model verifies in parallel. Released under the MIT license in late June 2026, the repository gathered nearly 6,000 GitHub stars in its first week by giving researchers a reproducible, end-to-end pipeline for one of the most impactful LLM acceleration methods. ## A Complete Three-Stage Pipeline DeepSpec covers the whole workflow rather than just inference kernels. Stage one prepares data: downloading prompts, regenerating answers with the target model, and building a target cache (the README warns this can reach roughly 38 TB for the default Qwen3-4B setting). Stage two trains a draft model against those cached outputs, launching one worker per GPU from a single script. Stage three evaluates speculative-decoding acceptance rates across nine benchmarks including GSM8K, MATH-500, AIME25, HumanEval, MBPP, LiveCodeBench, MT-Bench, Alpaca, and Arena-Hard-v2. ## DSpark and Eagle3 Implementations The codebase ships reference implementations of draft-model algorithms, including Eagle3 and DeepSeek's own DSpark method described in the accompanying paper. Configuration files select the algorithm and target model, making it straightforward to reproduce published results or swap in a new target. ## Released Checkpoints DeepSeek published trained draft checkpoints on Hugging Face for Qwen3-4B, Qwen3-8B, Qwen3-14B, and Gemma-4-12B-it targets, each trained on the open-perfectblend dataset. These serve both as ready-to-use accelerators and as baselines for researchers developing new speculative-decoding algorithms. ## Considerations DeepSpec is research infrastructure, not a turnkey serving solution: the default configs assume a single node with 8 GPUs, and the multi-terabyte target cache puts full training runs out of reach for most hobbyists. Teams that only want faster inference may prefer serving engines with built-in speculative decoding. For researchers and infrastructure engineers working on draft-model methods, however, an open, complete training-and-evaluation stack from a major lab is a significant contribution.