How Anthropic’s Claude Disrupted an Autonomous Cyber Attack

By Maya Derrick

November 14, 2025

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

Anthropic has helped identify & counter the first large-scale cyber espionage attack conducted largely by AI agents without substantial human intervention

Anthropic has disclosed that Chinese state-sponsored hackers exploited its AI to carry out what it identifies as the first recorded large-scale cyberespionage operation primarily executed by artificial intelligence.

This signals the onset of a new phase in cyber warfare, where AI-driven agents can autonomously collect intelligence and execute attacks with minimal human oversight.

The breach was discovered in mid-September 2025 and exploited the autonomous functions of the AI model Claude Code to penetrate about thirty high-value global targets, spanning technology firms, financial institutions, chemical producers and government agencies.

AI handled approximately 80% to 90% of the attack processes independently, with human operators intervening only for key strategic decisions, marking a defining shift in how large-scale cybersecurity threats are executed.

AI’s autonomous cyber offensive

In a 13-page report outlining the details of the breach, Anthropic revealed that the campaign capitalised on recent breakthroughs in AI – spanning intelligence, agency and tool integration – to execute a multi-stage cyberattack with unprecedented autonomy.

This attack employed Claude Code not only as an advisory system but as an active agent performing complex hacking tasks.

Human operators initiated the campaign by defining targets and strategic objectives, while the AI autonomously carried out reconnaissance, vulnerability scanning, exploit creation, credential harvesting, lateral movement and data exfiltration.

So how did the attackers automate the process?

By bypassing Claude Code’s safety systems – dividing malicious instructions into seemingly harmless tasks – the group deceived the AI into assuming it was participating in a legitimate cybersecurity assessment.

As a result, Claude processed thousands of requests per second, a pace far beyond human capability.

Speaking to WSJ, Anthropic’s Head of Threat Intelligence Jacob Klein says the hackers conducted their attacks “literally with the click of a button, and then with minimal human interaction”.

Jacob Klein, Head of Threat Intelligence at Anthropic

He adds: “The human was only involved in a few critical chokepoints, saying, ‘Yes, continue,’ ‘Don’t continue,’ ‘Thank you for this information,’ ‘Oh, that doesn’t look right, Claude, are you sure?’”

The six stages of the attack

Campaign initialisation and target selection: Human operators input the target entities, tricking Claude into compliance via role-playing scenarios
Reconnaissance and attack surface mapping: Claude autonomously scanned networks, enumerated services and identified key infrastructure
Vulnerability discovery and validation: The AI generated and tested exploit payloads silently, analysing system responses to confirm vulnerabilities
Credential harvesting and lateral movement: Claude extracted and validated access credentials independently, mapping internal network privileges
Data collection and intelligence extraction: The AI parsed vast amounts of stolen data to prioritise intelligence based on value
Documentation and handoff: Claude produced detailed reports on attack progress, aggregated findings and prepared handoff materials for subsequent teams.

What does this mean for cybersecurity?

This campaign exemplifies how agentic AI systems can significantly reduce the barriers to executing advanced cyberattacks.

With the capability to autonomously sustain large-scale operations, AI could soon empower less experienced or smaller adversaries to carry out attacks once reserved for nation-state actors.

The autonomous agent model represents a marked escalation from earlier “vibe hacking” incidents, where human oversight remained central.

However, the operation was not without shortcomings.

Investigators observed that Claude occasionally produced false data, fabricated credentials or exaggerated exploit success rates, requiring human verification.

These inconsistencies remain among the final barriers to the realisation of fully autonomous cyberattacks.

Company portals

Anthropic

Executives

Jacob Klein
Head of Threat Intelligence

How Anthropic’s Claude Disrupted an Autonomous Cyber Attack

AI’s autonomous cyber offensive

What does this mean for cybersecurity?

Company portals

Anthropic

Executives

Jacob Klein

Tags

Anthropic