Karpathy’s AutoResearch Agent Runs 700 Experiments in Two Days, Cuts GPT-2 Training Time 11%

Andrej Karpathy, former Tesla AI director and OpenAI co-founder, has released an open-source framework called autoresearch that allows AI agents to autonomously run machine learning experiments on a single GPU. In its initial deployment, the agent executed approximately 700 experiments over two days, achieving an 11 percent efficiency gain on the “Time to GPT-2” benchmark — reducing training time from 2.02 hours to 1.80 hours.

The framework operates on a simple loop: the agent modifies training code in a train.py file, runs a five-minute experiment, evaluates results against a clear metric like validation loss, and decides whether to keep or discard the change. This cycle repeats autonomously, with the agent accumulating improvements that individually are small but compound over hundreds of iterations. Karpathy reports that the system identified around 20 additive improvements that transferred cleanly to larger models — meaning the optimizations discovered on small-scale experiments also worked when applied to full-size training runs.

The autoresearch repository crossed 8,000 GitHub stars within days of its release and has since approached 37,000 — making it one of the fastest-growing AI research tools in 2026. The rapid adoption reflects both Karpathy’s personal influence in the AI community and the practical appeal of the concept: researchers spend substantial time manually iterating on training configurations, and an agent that can run hundreds of experiments overnight on a single consumer GPU compresses weeks of human experimentation into hours.

The architectural simplicity is deliberate. Rather than building a complex multi-agent research system, autoresearch uses a single agent with a clear feedback loop and a constrained action space — it can only modify the training script and evaluate the result. This constraint prevents the agent from making destabilizing changes while ensuring that every modification is empirically validated. The five-minute experiment window limits compute cost and ensures rapid iteration, making the framework accessible to individual researchers with a single GPU rather than requiring cluster-scale infrastructure.

The broader implication is that AI-driven research acceleration is moving from theoretical to practical. If an agent can discover a 11 percent training efficiency improvement through automated experimentation, the approach can be applied to any optimization problem with a clear metric and a fast evaluation loop — hyperparameter tuning, architecture search, data augmentation strategies, and inference optimization. Karpathy has described autoresearch as a “glimpse of where AI is heading” — a future where AI agents handle the mechanical aspects of research while human researchers focus on defining problems, interpreting results, and making strategic decisions about what to optimize.

Karpathy’s AutoResearch Agent Runs 700 Experiments in Two Days, Cuts GPT-2 Training Time 11%

Enjoyed this story?

MetaClaw Framework Trains AI Agents During Calendar Downtime

Naver Builds First Location-Grounded World Model Using 1.2 Million Seoul Street View Images

CERN Deploys Sub-200 Nanosecond AI Models on FPGAs to Filter 40 Million Collisions Per Second at the LHC

Before you go…