Anthropic: AI Is Already Building Itself - The Numbers Are Staggering

📖 5 min read

Anthropic just published numbers that should make everyone pay attention. As of May 2026, more than 80% of the code merged into Anthropic’s own codebase was written by Claude – not humans. Eighteen months ago, that number was in the low single digits.

That one statistic is the headline of a new report from the Anthropic Institute titled “When AI Builds Itself.” It’s the most data-rich public accounting yet of how fast AI is accelerating its own development – a phenomenon researchers call recursive self-improvement. The full picture is more dramatic than most people realize.

The Numbers That Matter

Anthropic engineers now ship 8x more code per quarter than they did between 2021 and 2025. That’s not a gradual improvement – it’s a step change tied directly to two moments: when Claude started running code autonomously (rather than just suggesting snippets for humans to copy), and again in 2026 when models began working over longer time horizons without interruption.

The task horizon metric is the clearest signal of where this is heading. METR, the independent AI evaluation organization, tracks the longest tasks AI can complete reliably on its own. That number has been doubling every four months – faster than the previous trend of doubling every seven months.

📧 Want more like this? Get our free The 2026 AI Playbook: 50 Ways AI is Making People Rich — Free for a limited time - going behind a paywall soon

Model	Date	Max Reliable Task Length
Claude Opus 3	March 2024	~4 minutes
Claude Sonnet 3.7	March 2025	~90 minutes
Claude Opus 4.6	2026	~12 hours
Claude Mythos Preview	2026	16+ hours (METR’s limit)

Claude Mythos Preview hit the ceiling of what METR can currently measure. They had to stop the test – not because the model failed, but because their evaluation framework ran out of tasks long enough to challenge it.

Benchmarks Falling Fast

Two research benchmarks illustrate how quickly AI capability is saturating difficult domains:

SWE-bench – Tests real-world software engineering: models go from low single-digit scores to near-saturation in under two years. The test hands an AI an actual open-source codebase and a real bug report; it has to write a fix that passes the project’s own tests.
CORE-Bench – Tests scientific reproducibility: AI systems went from successfully reproducing published research results roughly 20% of the time in 2024 to saturating the benchmark just 15 months later.

On Anthropic’s internal open-ended engineering tasks – the kind where a senior engineer is handed a goal, not a spec – Claude’s success rate reached 76% in May 2026, up 50 percentage points in just six months.

What “Recursive Self-Improvement” Actually Means

The term sounds abstract. Here’s what it looks like in practice at Anthropic right now:

Join 2,400+ readers getting weekly AI insights

Free strategies, tool reviews, and money-making playbooks - straight to your inbox.

No spam. Unsubscribe anytime.

Claude writes most of the code used to train Claude
Claude runs the experiments that determine what the next version will do better
Claude debugs production failures – the report describes a case where a routine upgrade crashed tens of thousands of training jobs; an engineer handed Claude the incident with “little more than some text content and cluster access” and it solved it

Anthropic is explicit that they are not yet at full recursive self-improvement – the point where an AI designs and trains its own successor without human direction. The key gap is goal selection: Claude can execute well-specified tasks and increasingly design the approach to achieve a given goal, but deciding which problems are worth working on at all remains a human function.

That distinction matters. But the report’s own projections suggest it may not matter for long. If the current trend holds, tasks that take a skilled human days could come into range for AI before the end of 2026. By 2027, multi-week tasks may be achievable.

Why This Is Different from Normal AI Progress Announcements

Companies announce AI breakthroughs constantly. Most are either cherry-picked benchmark wins or vague capability claims. This report is different for a few reasons:

Internal data, not marketing: The 80% code statistic and the 8x engineering productivity number come from Anthropic’s own production systems, not a controlled demo.

It comes with a warning: Anthropic is unusual in publishing this openly because their view is that recursive self-improvement is a risk, not just a milestone. Their CEO has written publicly about the risks of AI systems that humans can no longer control. This report is partly a call for other institutions – governments, companies, researchers – to prepare for a development they may not be ready for.

The acceleration is itself accelerating: Task horizon doubling went from every seven months to every four months. If that pattern continues, the 2027 projections in this report may be conservative.

What This Means for You

The immediate practical implication is not abstract. If AI can currently handle 12-hour software engineering tasks reliably, and that number is doubling every four months, then the range of work that AI can fully own rather than assist with expands fast.

For businesses that use software development: the cost and time to build custom software tools is dropping. Not incrementally – by orders of magnitude.

For people who work in technical roles: the floor on what counts as human-level skill rises. If 80% of Anthropic’s code is AI-written, the humans remaining are there for judgment calls, not execution.

For investors and policymakers: Anthropic’s framing is explicit. They believe recursive self-improvement “could come sooner than most institutions are prepared for.” That’s not hype – it’s a company disclosing risk in its own domain.

The Honest Caveats

A few things this report doesn’t answer:

Quality vs. quantity: 80% of merged code being Claude-authored tells you about volume, not about whether that code is good. Bug rate, technical debt, and code maintainability data were not published.
Internal vs. external: Anthropic’s AI-assisted engineering works partly because they’re building the AI. Transferring these productivity gains to other organizations – with different codebases, different workflows, different institutional tolerance for AI risk – is not automatic.
Benchmark saturation ceiling: When AI saturates a benchmark, we build harder benchmarks. The rate of capability improvement is real, but the metrics are also moving targets.
Self-interest: Anthropic is not a neutral party. Publishing data showing their AI is accelerating AI development benefits their fundraising, hiring, and policy positioning. The data may be accurate and the framing still shaped by competitive interests.

BetOnAI Verdict

Signal, not noise. Internal data showing 80% AI-authored code at a frontier lab is the kind of number that doesn’t emerge from a press release – it comes from production systems. The task horizon progression (4 minutes – 90 minutes – 12 hours – 16+ hours, over 27 months) is a real trend with real benchmark backing.

Anthropic is telling you that the pace of AI capability growth is itself speeding up, and that the transition from “AI assists engineers” to “AI builds AI” is underway, not hypothetical. Whether full recursive self-improvement arrives in 2027 or 2030 matters less than the fact that the directional trend is established.

The practical bet: the window where “AI can’t do that yet” is a valid answer to business process questions is closing faster than most organizations are planning for. If your 2026 roadmap is built on 2024-era assumptions about AI capability ceilings, it’s worth revisiting before the end of this year.

Who should read the full report: Anyone making hiring, tooling, or competitive strategy decisions in software, research, or any domain where AI is starting to do the work.

Sources:

Enjoyed this? There's more where that came from.

Get the AI Playbook - 50 ways AI is making people money in 2026.
Free for a limited time.

Join 2,400+ subscribers. No spam ever.

Anthropic: AI Is Already Building Itself – The Numbers Are Staggering

The Numbers That Matter

Benchmarks Falling Fast

What “Recursive Self-Improvement” Actually Means

Why This Is Different from Normal AI Progress Announcements

What This Means for You

The Honest Caveats

BetOnAI Verdict

Trending Now 🔥

The Numbers That Matter

Benchmarks Falling Fast

What “Recursive Self-Improvement” Actually Means

Why This Is Different from Normal AI Progress Announcements

What This Means for You

The Honest Caveats

BetOnAI Verdict

Trending Now 🔥

📚 Keep Reading