Anthropic: What Happens When AI Learns to Design Itself?

Share this article
Share this article
Prioritise Us on Google
Anthropic calls for caution and slowing down frontier AI if worries about recursive self healing materialises
If AI becomes capable of redesigning itself, misalignment could pose an existential threat to humanity, prompting Anthropic to call for a pause

When one of the largest pure-play AI companies in the world wants systems in place to pause global AI development, the request carries weight.

Particularly, as this call happens to contradict the AI giant's economic interest, which could suggest the security risks are substantial.

The root concern is recursive self-improvement, where AI develops and designs its future self. This creates a control problem for cybersecurity professionals who must safeguard systems they may no longer fully understand.

According to Anthropic, in a blog post titled 'When AI Builds Itself', the industry is not fully at the stage of autonomous self-improvement.

However, it could arrive much sooner than organisations can prepare their security frameworks.

Youtube Placeholder

"Taken far enough and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor," the company says.

The coding loop near closing

Compared with 2021, an engineer at Anthropic today ships about eight times more code in a quarter. 

The trajectory started with skilled developers coding manually until chatbots arrived. These natural-language systems translated problems to code snippets, which could be copy pasted into integrated development environments (IDEs).

Code contributed per person per quarter at Anthropic | Credit: Anthropic

Then came coding agents, which removed the human middleman by writing and editing their own code. They modified entire files without direct human review at each step.

Autonomous agents that followed, completely pushed human input to a creative direction role, as agents divided large engineering tasks between themselves. 

A system this precise at code could also find new ways to improve itself. The circle could then close in theory, with AI smart enough to design itself.

AI expands faster than ever

Every four months, AI models double the length of tasks they can reliably complete on their own. 

Closing the coding loop as AI evolves | Credit: Anthropic

According to previous trends, this change happened every seven months.

Claude Opus 3 in 2024 could complete a task that takes a human about four minutes. Claude 3.7 a year later could manage tasks about an hour and a half long.

Another year after that, Claude Opus 4.6 could work on 12 hour tasks on its own. Extrapolating this, by 2027, models could execute a week's worth of human work without oversight.

SWE-bench – a standardised test for real-world software projects – works by handing models an open-source codebase and a bug report. The systems must fix the issues independently.

Claude session success rate | Credit: Anthropic

Models today saturate the benchmarks, while less than two years ago they each scored in the low single digits, which shows how rapidly AI systems are evolving.

CORE-Bench tests whether AI can do its own research. According to the benchmark, similar capability expansion is occurring in autonomous investigation tasks.

In long-duration coding tasks measured by METR, Claude Mythos Preview emerged "at the upper end of what [METR] can measure without new tasks". This may indicate that some existing evaluation frameworks could require more challenging tasks to distinguish the capabilities of frontier systems.

According to Anthropic's data from May 2026, 80% of code merged into the company's codebase was written by Claude. The company states this came with quality improvements, or, in its own words: "Claude writes code that works".

Control loss becomes a possibility 

"Recursive self-improvement is not here, nor is it inevitable," writes Marina Favaro, Lead at the Anthropic Institute, on LinkedIn.

Marina Favaro, Lead at the Anthropic Institute

"But if these trends continue, AI systems designing and building their own successors seems plausible.

"In that scenario, we expect that the pace of AI development will accelerate. This has the potential to bring enormous good to the world but it also creates loss of control risks.”

Marina co-authored the Anthropic blog.

What next?

Anthropic outlines three possible outcomes for the future. The first is that AI capability stalls, which would give security frameworks time to catch up.

The second is that efficiency compounds but faces a different bottleneck. This could mean compute limits or data constraints slow the pace of autonomous development.

The third scenario is that AI designs itself with fully recursive self-improvement. In this case, the human role diminishes to oversight and validation of what systems produce in virtual environments.

The alignment problem is what Anthropic says it is "least certain about". This uncertainty has direct implications for system security and control mechanisms.

Youtube Placeholder

Models may be aligned enough and "sufficiently wise" to halt production if things are not in alignment. However, misalignment persists in today's systems, however rare the occurrences.

This could compound as misaligned AI redesigns itself. The system becomes less understood and humanity loses control over its behaviour and output.

More concerning is that this could happen in physical AI systems and robotics. The security implications extend beyond software into critical infrastructure and automated physical systems.

"Without a global coordination mechanism, companies and governments will have to make difficult decisions about safety while under competitive and geopolitical pressures," Anthropic notes. 

The company concludes that "we believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology".

This pause could allow security frameworks and control mechanisms to develop before systems becomes out of human reach.

Company portals

Executives