RESEARCH

Karpathy’s AutoResearch Agent Runs 700 Experiments in Two Days, Cuts GPT-2 Training Time 11%

M megaone_admin Mar 23, 2026 2 min read
Engine Score 7/10 — Important

This story details a significant leap in autonomous AI agent capabilities by a prominent figure, indicating high industry impact and novelty. However, the Reddit source (Tier 2) slightly diminishes its reliability and verification, despite the linked Fortune article.

Editorial illustration for: Karpathy's AutoResearch Agent Runs 700 Experiments in Two Days, Cuts GPT-2 Training Time 11%

Andrej Karpathy, former Tesla AI director and OpenAI co-founder, has released an open-source framework called autoresearch that allows AI agents to autonomously run machine learning experiments on a single GPU. In its initial deployment, the agent executed approximately 700 experiments over two days, achieving an 11 percent efficiency gain on the “Time to GPT-2” benchmark — reducing training time from 2.02 hours to 1.80 hours.

The framework operates on a simple loop: the agent modifies training code in a train.py file, runs a five-minute experiment, evaluates results against a clear metric like validation loss, and decides whether to keep or discard the change. This cycle repeats autonomously, with the agent accumulating improvements that individually are small but compound over hundreds of iterations. Karpathy reports that the system identified around 20 additive improvements that transferred cleanly to larger models — meaning the optimizations discovered on small-scale experiments also worked when applied to full-size training runs.

The autoresearch repository crossed 8,000 GitHub stars within days of its release and has since approached 37,000 — making it one of the fastest-growing AI research tools in 2026. The rapid adoption reflects both Karpathy’s personal influence in the AI community and the practical appeal of the concept: researchers spend substantial time manually iterating on training configurations, and an agent that can run hundreds of experiments overnight on a single consumer GPU compresses weeks of human experimentation into hours.

The architectural simplicity is deliberate. Rather than building a complex multi-agent research system, autoresearch uses a single agent with a clear feedback loop and a constrained action space — it can only modify the training script and evaluate the result. This constraint prevents the agent from making destabilizing changes while ensuring that every modification is empirically validated. The five-minute experiment window limits compute cost and ensures rapid iteration, making the framework accessible to individual researchers with a single GPU rather than requiring cluster-scale infrastructure.

The broader implication is that AI-driven research acceleration is moving from theoretical to practical. If an agent can discover a 11 percent training efficiency improvement through automated experimentation, the approach can be applied to any optimization problem with a clear metric and a fast evaluation loop — hyperparameter tuning, architecture search, data augmentation strategies, and inference optimization. Karpathy has described autoresearch as a “glimpse of where AI is heading” — a future where AI agents handle the mechanical aspects of research while human researchers focus on defining problems, interpreting results, and making strategic decisions about what to optimize.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime

M
MegaOne AI Editorial Team

MegaOne AI monitors 200+ sources daily to identify and score the most important AI developments. Our editorial team reviews 200+ sources with rigorous oversight to deliver accurate, scored coverage of the AI industry. Every story is fact-checked, linked to primary sources, and rated using our six-factor Engine Score methodology.

About Us Editorial Policy