Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

nanochat - Open Source | Evermx | Evermx

Back to Open Source

TrendingFeatured

nanochat

Andrej KarpathyMIT

View on GitHub

LLM43.7K Stars5.7K Forks272 views

nanochat is an open-source LLM training framework created by Andrej Karpathy, designed to train a ChatGPT-class model from scratch for approximately $100. The project provides a complete end-to-end pipeline covering tokenization, pretraining, supervised fine-tuning, reinforcement learning, evaluation, and a chat interface, all within a minimal and hackable codebase. It has rapidly accumulated over 43,700 GitHub stars, reflecting strong community interest in accessible and transparent LLM training. ## Why nanochat Matters Training large language models has traditionally required massive infrastructure, specialized engineering teams, and budgets measured in millions of dollars. nanochat challenges this assumption by demonstrating that a GPT-2 capability model can be trained end-to-end in approximately 3 hours on an 8xH100 GPU node for roughly $72. This dramatically lowers the barrier to understanding and experimenting with LLM training, making it accessible to individual researchers, students, and small teams. The project continues Karpathy's philosophy established with nanoGPT: education through simplicity. Every component is designed to be readable and modifiable, serving as both a practical training framework and an educational resource for understanding how modern language models are built. ## Single-Dial Complexity with the Depth Parameter nanochat introduces an elegant abstraction: the `--depth` parameter. This single configuration dial controls the scale of the entire training pipeline. At lower depth values, training completes quickly with a small model suitable for experimentation and learning. At higher depth values, the framework scales up model size, dataset volume, and training duration. All hyperparameters are automatically optimized for the chosen depth, eliminating the need for manual tuning across dozens of interconnected settings. ## Complete Training Pipeline The framework covers every stage of modern LLM training. Tokenizer training uses BPE (Byte Pair Encoding) on the training corpus. Base model pretraining follows standard transformer architecture training. Supervised fine-tuning aligns the model with instruction-following behavior. Reinforcement learning through RLHF further refines model outputs. A comprehensive evaluation suite measures model quality against established benchmarks. Finally, both a web UI and CLI interface allow interactive conversation with the trained model. ## CORE Metric and Leaderboard nanochat introduces the CORE metric, which tracks "time-to-GPT-2" performance. This leaderboard encourages the community to optimize training efficiency, competing to achieve GPT-2-level capability in the shortest wall-clock time. This gamification of training optimization has driven contributions that improve data loading, gradient computation, and memory management across the codebase. ## Minimal and Hackable Design Unlike production training frameworks that abstract away implementation details behind layers of configuration, nanochat exposes everything directly. There are no complex configuration frameworks, plugin systems, or dependency chains. Each component is a standalone Python script that can be read, understood, and modified independently. This design philosophy makes nanochat an ideal starting point for researchers who want to experiment with novel training techniques without fighting framework abstractions. ## Web and CLI Chat Interfaces Once training is complete, nanochat provides a ChatGPT-style web interface for interacting with the trained model. Users can test their model's conversational abilities, evaluate response quality, and demonstrate results through a polished UI. A CLI chat interface is also available for programmatic testing and integration. These interfaces make it easy to immediately evaluate training outcomes. ## Hardware Requirements and Cost The reference training configuration uses an 8xH100 GPU node, available through cloud providers for approximately $24 per hour. A full training run at standard depth completes in about 3 hours, resulting in the approximately $72 total cost. Smaller experiments at reduced depth can run on fewer GPUs with proportionally lower costs. The framework supports standard PyTorch distributed training, so any GPU cluster with sufficient memory can be used. ## Limitations The resulting models achieve GPT-2 level capability, which is substantially below current frontier models. The framework is designed for education and experimentation rather than production deployment. Training still requires access to GPU hardware, which remains expensive despite the relative cost reduction. The minimal design also means features common in production frameworks, such as automatic checkpointing recovery, distributed fault tolerance, and mixed-precision optimization, are intentionally omitted to maintain code clarity.