Andrej Karpathy’s three-file GitHub repository called autoresearch took a fresh step into the mainstream in March 2026 when Shopify opened pull request #2056 on a branch named autoresearch/liquid-perf-2026-03-11. The PR, opened by Tobi Lütke, used Karpathy’s pattern to chase a faster ThemeRunner path and left behind a result that was hard to ignore: parse-plus-render time fell from 7,469 microseconds to 3,534 microseconds.
The change also cut object allocations from 62,620 to 24,530, and all 974 unit tests passed. By early April 2026, the autoresearch repository had collected more than 80,000 GitHub stars, a sign that the method had moved well beyond a curiosity for engineers following Karpathy, the OpenAI co-founder and former director of AI at Tesla.
What made the Shopify example stand out was not just the speedup, but the scale of the experiment behind it. The pull request carried 93 commits drawn from roughly 120 automated experiments, an unusually dense record of trial, failure and refinement for one code change. Lütke himself wrote, “This is probably somewhat overfit.”
That caution matters because autoresearch is built around a tight loop: one editable file, a frozen evaluator and a scalar metric decide whether a change stays or gets reverted. Simon Willison documented that Lütke ran the loop using pi-autoresearch, a Pi extension he developed with Shopify engineer David Cortés. The method has since spread into prompt optimization, GPU kernel tuning, build-time reduction and test-suite acceleration, but the Shopify pull request remains the most quoted real-world demonstration of it.
Karpathy’s own two-day run produced around 20 stacking improvements and an 11% training speedup, while the Vector Institute said it ran 910 experiments across 16 GPUs in eight hours and matched a result that would have taken 72 hours in a sequential single-GPU run. That is the appeal of autoresearch in practice: it turns repetition into leverage, then asks a simple question after each pass — keep the change, or throw it away.
The timing also helped the idea travel. This week’s Google I/O 2026 opened with agentic coding as a confirmed centerpiece, putting a mainstream spotlight on the same style of machine-assisted iteration that Karpathy had packaged into a three-file repository. The unanswered issue now is not whether the method can find improvements — Shopify showed that it can — but how often those wins will survive outside the clean boundaries of a benchmark and a frozen test loop.
