The launch of Anthropic’s Claude Fable 5 and Mythos 5 models put frontier capabilities into more hands, but experts say the security story more or less remains the same: Don’t panic, do prepare.
Anthropic this week released the latest versions of its Claude models, Mythos 5 and Fable 5. Claude Mythos 5 is the next iteration of Claude Mythos Preview, a frontier model offered to a small number of organizational partners in April. Anthropic claims that Mythos is so capable of finding and exploiting vulnerabilities that the company said Mythos could find critical exploits in popular, decades-old software.
Because of this supposed danger, the company launched a secondary campaign, Project Glasswing, in order to give cybersecurity partners a headstart and limit the potential for Mythos to be misused by threat actors. Even under Glasswing, access was restricted and monitored.
Mythos has remained a hot topic since its unveiling two months ago. The Cloud Security Alliance (CSA) published a report the week after the model’s unveiling, authored by numerous cybersecurity luminaries. The report warned organizations must prepare for AI models like Mythos to limit the capacity of being exploited by them down the line. More recently, US President Donald Trump signed an executive order establishing a voluntary framework to give the federal government early access to frontier AI models.
Beyond cybersecurity, Anthropic argues that these latest models have cutting-edge capabilities across multiple fields such as biology, coding, multidisciplinary reasoning, computer use, and more.
Mythos 5 is something of a straight upgrade over Mythos Preview with greater but similar capabilities to its predecessor. It will remain available to a small (but expanding) pool of trusted partners including the US government. Fable 5 is the same model as its Mythos counterpart but “made safe for general use,” Anthropic said in a statement.
Claude Fable 5’s Anti-Tampering Guardrails
Fable comes with a number of safeguards. When users query certain topics like cybersecurity, the user may receive a response from Anthropic’s previous model, Claude Opus 4.8. While there are false positives, Anthropic said this triggers in under 5% of cases. Users are informed when Fable 5 switches models.
This is the result of the company’s new set of safety classifiers, separate AI systems that detect misuse and prevent the main model from generating a user-facing output. This is not the first time classifiers have been in place, but these, Anthropic said, “are an extension of this previous work with extra coverage.”
“Mythos-class models excel at discovering and exploiting software vulnerabilities. They can thus make cyberattacks substantially easier and cheaper to commit. Mythos-class models also show strong skills in agentic hacking,” Anthropic said. “To prevent these agentic hacking skills providing uplift in cyberattacks, we designed our cybersecurity classifiers to cover both exploitation and offensive cyber tasks in a broader sense.”
Daniel Shechter, CEO of application detection and response (ADR) vendor Miggo, tells Dark Reading that Anthropic’s rate-limiting approach is smart, but “it’s a speedbump, not a wall.”
“The underlying capability exists, and other models will replicate it. Open-source variants will follow. Betting your security program on the assumption that jailbreak protections will hold at scale is the wrong bet,” he says. “My read is that Anthropic is trying to give defenders a window of opportunity. Not just to find and fix more vulnerabilities, but to understand what defending against a model like this actually looks like.”
The company claimed the new models were exceptional at preventing jailbreaks. Between internal and external red teaming, the blog claimed penetration tests were unable to produce “universal jailbreaks” following more than 1,000 hours of testing. “External red-teaming organizations we engaged also failed to find any universal jailbreaks on long-form agentic tasks so far — although the UK[‘s AI Security Institute] has made progress towards one within a brief initial testing window,” the blog read.
Although it may be impossible to prevent 100% of jailbreak attempts, the company said its goal is to make jailbreaks slow enough and costly enough that they are stopped before attackers can use them at scale. As Adam Arellano, field CTO at AI software development vendor Harness puts it, “Anthropic’s strategy is essentially to make things as difficult as possible.”
Mythos Is No More of a Threat Than It Was in April
Rob T. Lee, chief AI officer at SANS Institute, says he operates under the assumption that Mythos-caliber models have already gotten into the wrong hands.
“Frontier models of similar capability are already running in other labs, and those actors are using them,” he tells Dark Reading. “Even under Glasswing, access was restricted and monitored. But those organizations have thousands of employees. Any one of them could be incentivized to hand access to a criminal group, or there could already be a DPRK actor sitting inside the org. We have no data saying that’s happened. But every time we’ve believed something was restricted, we’ve found out our adversaries had it earlier than we thought.”
Lee emphasizes that the classifier label also prevents defensive research on Fable 5. “I tried to use it to build a digital forensics skill and it dropped me down to Opus 4.8. Clever way to stop malicious actors or not, it keeps new defensive capability away from the people who will build the next generation of tooling.”
In April’s CSA report, the authors said defenders should prepare for the Mythos exploit storm by adjusting risk calculations and re-orienting security program resources for an increasing number of attacks, a higher volume of patches, and less time to patch.
This means focusing on basics such as segmentation, egress filtering, multifactor authentication, and defense in depth. The authors also argued it means prioritizing robust dependency management, enforcing automated security assessments through LLMs, and to introduce AI agents to the cyber workforce in order to keep up with attackers.
Rich Mogull, chief analyst at CSA, tells Dark Reading that when it comes to Mythos, the cybersecurity story for the average practitioner has not changed. “This was expected, and it is exactly what we used to develop our guidance,” he says. “Start now, get to work, but the release of Fable did not make you less secure than you were the day before.”
