UK AI Safety Institute review of Claude Mythos: able to autonomously complete a 32-step enterprise network attack simulation

動區BlockTempo

The latest assessment from the UK Institute for AI Safety (AISI) shows that Anthropic’s Claude Mythos Preview is an AI model that can autonomously complete a full 32-step enterprise network attack simulation in a controlled environment. In expert-level CTF challenges, it achieves a 73% success rate, marking that AI cyberattack capabilities have crossed a key threshold.
(Background: Claude officially supports modifying Word files and saving workflows as skills; Microsoft Office suite integration is completed.)
(Background addition: Anthropic AI Economic Index—tens of thousands of word report: automation trading workflows double in frequency; Claude is moving from a tool to a life assistant)

Table of contents

Toggle

  • CTF Assessment: 73% expert-level achievement rate
  • Clear the 32-step enterprise attack simulation
  • Capability boundaries
  • The double-edged sword and organizational response

On the 13th, the UK Institute for AI Safety (AISI) released a cybersecurity capability assessment report regarding Anthropic Claude Mythos Preview. The assessment results show that, against the backdrop of the continued rapid improvement in frontier model network attack capabilities, Mythos Preview represents another notable leap forward in capability.

Since 2023, AISI has tracked AI cyberattack capabilities, building an evaluation system with progressively increasing difficulty year by year: from basic conversational probing, to Capture The Flag (CTF) challenges, and now to multi-step network attack simulations. This assessment used a reasoning budget of up to 100 million tokens to execute in a cyber range, and Mythos Preview’s performance continues to grow within this upper limit.

CTF Assessment: 73% expert-level achievement rate

Capture The Flag (CTF) is one of the standard methods for cybersecurity assessments: an AI model must find vulnerabilities in a target system and exploit them to obtain hidden “flag” strings. Such challenges simulate a single technical link in realistic attack scenarios, serving as a benchmark indicator for measuring a model’s penetration testing capability.

The assessment results show that in expert-level CTF tasks in which “no model can complete them before April 2025,” Claude Mythos Preview’s success rate reaches 73%. AISI points out that this figure indicates that frontier models have reached a high level of maturity in isolated single-point attack techniques.

Clear the 32-step enterprise attack simulation

However, expert-level CTFs only test a single technical capability. Real-world cyberattacks require dozens of steps stitched together across multiple hosts and multiple network segments; such sustained actions often take human experts hours, days, or even weeks to complete.

To get closer to real-world attack scenarios, AISI built an enterprise network attack simulation range called “The Last Ones” (TLO). TLO consists of 32 steps, covering the full process from initial reconnaissance to complete takeover of an enterprise network. AISI estimates that human professionals need about 20 hours to complete this process.

Claude Mythos Preview became the first model in history to pass TLO end to end. Out of 10 attempts, it fully completed all 32 steps in 3 of them. Even when counting failed attempts, Mythos Preview’s average number of completed steps is 22/32. By comparison, the next-best Claude Opus 4.6 only completed an average of 16 steps.

The assessment shows that in a controlled environment with clear instructions and provided network access permissions, Mythos Preview can execute multi-stage attacks and autonomously discover and exploit vulnerabilities—tasks that previously required human professionals to spend days on.

Capability boundaries

AISI also added that there is a gap between the existing assessment frameworks and the real world. The current ranges lack multiple defensive elements commonly found in real environments: there are no active defenders intervening, no defense tools deployed, and actions by the model that might trigger security alerts are not penalized in any way.

AISI says plainly: “This means we cannot determine whether Mythos Preview is able to attack systems with robust defenses.” The capabilities that Mythos Preview currently demonstrates are more accurately described as: given that an attacker already has a network entry point, it can autonomously attack enterprise systems that are smaller in scale, have weaker defenses, and contain known vulnerabilities.

The double-edged sword and organizational response

AISI’s conclusion directly highlights the dual nature of AI cyber capabilities. On one hand, more models with similar capabilities will continue to emerge in the future, posing an increasingly prominent risk to organizations with weak defenses. On the other hand, AI cyber capabilities can also bring breakthrough improvements on the defense side.

Regarding organizational response, AISI emphasizes the urgency of core cybersecurity fundamentals: regularly applying security updates, strong access controls, security configuration management, and comprehensive logging. AISI points out that future frontier models’ capabilities will be stronger, making it crucial to invest now in building cyber defenses.

For future assessment directions, AISI states that it will build cyber ranges that simulate reinforcement and defensive environments, incorporating elements such as active monitoring, endpoint detection, and real-time incident response—so as to measure the practical upper limits of AI cyberattack capabilities in a manner more closely aligned with real attack scenarios.

For the detailed report, see 【Original Text】

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments