Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
## MiniMind: The Complete LLM Training Curriculum on a Single GPU MiniMind is an educational and practical open-source project that delivers something remarkable: a complete, reproducible pipeline for training a GPT-class language model from absolute zero in approximately 2 hours on a single NVIDIA 3090 GPU, at a compute cost of roughly $3 USD. With 47,000+ GitHub stars, it has become one of the most-starred LLM education repositories available. ### The Core Promise: Full LLM Lifecycle in One Codebase Most tutorials cover one phase of LLM development — pretraining, fine-tuning, or alignment. MiniMind covers them all. The pipeline progresses through: - **Pretraining** on a minimal 1.2GB dataset - **Supervised Fine-Tuning (SFT)** with 1.6GB of instruction data - **Knowledge Distillation** from larger teacher models - **LoRA adaptation** for parameter-efficient fine-tuning - **Reinforcement Learning**: DPO, PPO, GRPO, CISPO, and Agentic RL variants This soup-to-nuts coverage means a developer can follow a single repository from random initialization to a model capable of multi-turn dialogue and tool calling. ### Architecture Aligned with Modern Standards Despite its small parameter count, MiniMind implements a proper modern Transformer decoder stack: - **Pre-normalization** with RMSNorm for training stability - **SwiGLU activations** matching the Llama/Qwen design philosophy - **RoPE positional encoding** with YaRN extension for length generalization - **MoE variant**: 198M parameters with 4 experts and top-1 routing - **Custom BPE tokenizer** with 6,400 vocabulary tokens The dense base model hits 64M parameters — small enough to train on consumer hardware but architecturally representative of production LLMs. ### Native PyTorch First-Principles Approach A key design choice distinguishes MiniMind from wrapper-heavy educational projects: core algorithms are implemented from scratch in native PyTorch rather than delegating to high-level abstractions. This transparency lets learners trace gradients, understand attention mechanisms, and debug training dynamics without fighting through library indirection. ### Production-Compatible Output Trained MiniMind models integrate with mainstream inference stacks: transformers, llama.cpp, vLLM, Ollama, and Llama-Factory. The models expose OpenAI-compatible API endpoints, support multi-turn dialogue with configurable chain-of-thought via `<think>` tags, and handle tool calling for function invocation — capabilities that mirror much larger production models. ### Distributed Training Support For teams wanting to scale beyond a single GPU, the framework includes DDP (DistributedDataParallel) and DeepSpeed integration, making it straightforward to move from a single 3090 to a multi-GPU cluster without refactoring the training code. ### Benchmark Expectations On C-Eval and CMMLU benchmarks, the 64M model achieves roughly 25% accuracy — expected performance for an ultrasmall model trained on limited data. The project explicitly frames this as a learning tool rather than a competitive model: the goal is understanding LLM internals, not state-of-the-art scores. ### Who It's For MiniMind is uniquely valuable for ML engineers who want hands-on understanding of LLM training from first principles, researchers exploring parameter scaling laws at small scale, and students who cannot afford cloud compute for larger training runs. The $3 training cost eliminates the financial barrier that has historically gatekept LLM development education.