Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

AutoResearch - Open Source | Evermx | Evermx

Back to Open Source

TrendingFeatured

AutoResearch

karpathyMIT

View on GitHub

LLM17.0K Stars2.1K Forks245 views

Andrej Karpathy's AutoResearch is a minimalist 630-line Python tool that lets AI agents run autonomous machine learning experiments on a single GPU overnight. Released on March 6, 2026, it has already gained over 17,000 GitHub stars, reflecting the community's strong interest in automated AI research workflows. ## Why AutoResearch Matters Traditionally, neural network architecture search and hyperparameter tuning require significant human involvement: researchers manually adjust parameters, run experiments, analyze results, and iterate. AutoResearch automates this entire loop by giving an AI agent control over a training file, letting it modify architectures, optimizers, and hyperparameters while running fixed-duration experiments and tracking improvements through git commits. ## Key Features ### Fixed 5-Minute Time Budget Every experiment runs for exactly 5 minutes regardless of hardware, making results directly comparable across runs. This constraint enables approximately 12 experiments per hour and roughly 100 experiments overnight. The fixed budget ensures that architectural changes are evaluated on equal footing rather than being confounded by variable training durations. ### Single-File Modification Scope The agent only modifies one file: train.py. This file contains the complete GPT model architecture, optimizer configuration, and training loop. By constraining the modification surface to a single file, AutoResearch keeps experiment diffs reviewable and the scope manageable. Two other files (prepare.py for data preparation and program.md for agent instructions) remain untouched. ### Self-Contained Design AutoResearch requires no distributed training infrastructure, no complex configuration files, and no external experiment tracking services. The entire system runs on one NVIDIA GPU with PyTorch and a few small dependencies. Progress is tracked through git commits on a feature branch, with each successful experiment (lower validation bits-per-byte) committed automatically. ### Validation Metric: Bits-Per-Byte The system uses validation bits-per-byte (val_bpb) as its single optimization metric. This provides a clear, comparable measure of model quality that the agent can reliably evaluate after each 5-minute training sprint. The agent keeps changes that lower val_bpb and discards those that increase it. ## How It Works The workflow follows a straightforward loop: | Step | Action | |------|--------| | 1 | Agent reads program.md for context and instructions | | 2 | Agent modifies train.py (architecture, optimizer, hyperparameters) | | 3 | Training runs for exactly 5 minutes | | 4 | Agent evaluates val_bpb metric | | 5 | If improved: git commit on feature branch. If not: discard changes | | 6 | Repeat from step 2 | ## Real-World Adoption Shopify CEO Tobi Lutke adapted the AutoResearch framework for an internal project, reporting a 19% improvement in validation scores. The agent-optimized smaller model eventually outperformed a larger model that had been configured through standard manual methods, demonstrating the potential of automated experimentation even for production use cases. ## Community Forks The open-source community has already created platform-specific variants including macOS support forks and Windows RTX adaptations, expanding AutoResearch beyond its original H100 GPU target. ## Limitations - Requires an NVIDIA GPU (H100 tested, consumer GPUs may work with reduced throughput) - Fixed 5-minute budget may be too short for some architectural explorations - Currently focused on nanochat-scale models rather than large-scale training - Agent quality depends on the underlying LLM provider (Claude, Codex, etc.) ## Conclusion AutoResearch demonstrates that meaningful AI research iteration can happen autonomously with minimal infrastructure. By constraining the problem to a single file, a single GPU, and a fixed time budget, Karpathy has created a framework where AI agents can genuinely contribute to neural network optimization. For researchers and engineers interested in automated ML experimentation, AutoResearch offers an accessible starting point that delivers tangible results overnight.

Key Features

Fixed 5-minute time budget per experiment enabling ~12 experiments/hour and ~100 overnight
Single-file modification scope (train.py only) keeping diffs reviewable and manageable
Self-contained design requiring only one NVIDIA GPU with no distributed training infrastructure
Automatic git commit tracking on feature branches for successful experiments
Validation bits-per-byte (val_bpb) as a single clear optimization metric
Compatible with multiple LLM agent providers including Claude and Codex
Human-editable program.md for customizing agent instructions and research direction