San Diego News 24

collapse
Home / Daily News Analysis / Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

May 23, 2026  Twila Rosenbaum  5 views
Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

Anthropic's Mythos model promises major innovations in vulnerability management and security red-teaming, but questions remain regarding how defenders can keep threat actors from taking full advantage.

Anthropic on April 7 unveiled Claude Mythos Preview, a general-purpose large language model (LLM) that the company said in a blog post performs strongly across the board, with strikingly capable computer security tasks. The AI firm said Mythos could identify and exploit zero-day vulnerabilities in every major operating system and Web browser at user direction, including subtle and difficult-to-detect ones. One exploit included a patched 27-year-old flaw in OpenBSD.

Some of these vulnerabilities are complex, but the company says one does not need to be a security engineer to properly prompt the model. In one case, Mythos Preview wrote a Web browser exploit that chained together four vulnerabilities, writing a complex JIT heap spray that escaped both renderer and OS sandboxes. It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses. And it autonomously wrote a remote code execution exploit on FreeBSD's NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain over multiple packets.

The vulnerability detection and exploitation enhancements came as a downstream consequence of improving Mythos' code and reasoning capabilities, rather than it being an explicit goal on its developers' part. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them, Anthropic said.

While the aim is to assist defenders and keep Mythos out of attacker hands, and while Anthropic claims it has identified thousands of high-risk and critical security vulnerabilities that it is responsibly disclosing, it is not much of a leap to see how a model like Mythos Preview could be misused, similarly to how threat actors abuse legitimate penetration testing tools like Cobalt Strike. This dual-use dilemma is a recurring challenge in cybersecurity, where powerful tools intended for defense can be repurposed for offense. The history of penetration testing frameworks, from MetaSploit to Cobalt Strike, shows that once a capability is released, it often spreads beyond its intended user base. The same pattern could apply to AI models with offensive capabilities.

It is likely in anticipation of this that Anthropic introduced Project Glasswing, a new initiative the company launched this week in partnership with companies like Apple, AWS, Microsoft, Palo Alto Networks, and CrowdStrike. As part of its product launch, Anthropic claimed Project Glasswing could fundamentally reshape cybersecurity, and that this would be an urgent attempt to put these capabilities to work for defensive purposes. In practical terms, the AI vendor has extended Mythos Preview access to a group of more than 40 organizations to scan and secure first-party and open source systems. Lee Klarich, chief product and technology officer of Palo Alto Networks, called early Mythos Preview results compelling in a LinkedIn blog post.

In addition to granting limited access to partners, Anthropic is committing $100 million in Mythos Preview usage credits to Project Glasswing, as well as $4 million in direct donations to open source security organizations. This financial commitment underscores Anthropic's intent to position the model as a force for good, but the scale of investment also highlights the potential impact if the model were to fall into the wrong hands. The cybersecurity community is watching closely, as similar large-scale initiatives in the past have often been accompanied by unintended consequences.

As for why Anthropic introduced something so good at exploiting vulnerabilities, Forrester senior analyst Erik Nost said that it is good PR for Anthropic, as the company is basically saying its AI is so good that it can reshape cybersecurity and software development. Secondly, it also calls attention to the vulnerability detection gaps that the industry has dealt with for 30 years. The persistent problem of unpatched vulnerabilities, especially in open-source software, has long been a headache for defenders. By demonstrating that an AI can find and exploit these flaws automatically, Anthropic is essentially sounding an alarm that traditional vulnerability management practices are no longer sufficient.

Nost explained that there are controls in place ensuring Mythos stays in the right hands, though it has become a race for defenders to remediate and patch before other AIs, in the wrong hands, discover these zero-days and rapidly write exploits. This race is reminiscent of the zero-day market, where attackers and defenders compete to find and patch vulnerabilities first. The difference now is the speed and scale at which AI can operate. A single model can scan entire codebases in minutes, identify vulnerabilities that might take human researchers weeks to find, and then generate working exploits. The pressure on defenders is immense.

Julian Totzek-Hallhuber, senior principal solution architect at Veracode, said that because there is no clear answer for how these tools can stay out of attacker hands, defenders should assume the capability will proliferate, and should prepare accordingly. This means investing in detection instead of just prevention, identifying the behavioral signatures of AI-assisted exploitation, and investing in zero-trust architecture as well as aggressive patching cycles and anomaly-based detection. The traditional perimeter-based security model is inadequate against adversaries armed with AI that can adapt and bypass defenses in real time. Defenders must shift to a mindset of resilience, where the goal is not to prevent all attacks but to detect and respond quickly.

Melissa Ruzzi, director of AI at AppOmni, noted a deeper truth: no one can ever keep anything 100% out of attackers' hands. The best that can be done is to make it more difficult for them to get access to it. This realism echoes the sentiment in many cybersecurity circles: absolute security is a myth. The focus should be on raising the cost of attacking, making it less attractive for adversaries to invest the time and resources needed to acquire and use advanced tools like Mythos. However, with nation-state actors and well-funded cybercriminal groups, the barrier to entry is already low enough that any additional friction may not be enough.

Mythos' potential comes with a caveat. While the early Anthropic examples of discovered vulnerabilities are compelling, two data points do not make a pattern. Totzek-Hallhuber emphasized that Anthropic controls both the model and the narrative; independent replication is impossible when the model isn't publicly available. He added that until independent researchers with access can run their own evaluations, healthy skepticism is the appropriate posture. This is, frankly, another consequence of the restricted access model: the claims can't be tested, so they can't be fully trusted or refuted. The lack of independent verification is a recurring issue in the AI industry, where marketing claims often outpace solid evidence. Without third-party audits, the true capabilities and limitations of Mythos remain uncertain.

The broader implications of this release extend beyond a single product. It signals a new era in cybersecurity where AI not only assists humans but can autonomously carry out complex offensive operations. Defenders must reconsider their strategies, including how they prioritize vulnerability remediation, how they monitor for exploit attempts, and how they train their security teams. Automated exploit generation will likely force changes in software development practices, such as more rigorous static analysis, formal verification, and secure coding guidelines. The industry may need to develop new standards for AI-generated code and exploits, similar to how the legal system grapples with computer-generated content.

The timing of the announcement is noteworthy, coming just before major cybersecurity conferences where the topic is sure to dominate discussions. The AI industry is already under intense scrutiny from regulators and lawmakers, who are grappling with questions of responsibility, liability, and control. Anthropic's move could accelerate calls for stricter regulation of dual-use AI models, similar to export controls on cryptographic software. Already, there are discussions about requiring licensing for AI models that can generate exploits, akin to how munitions are controlled. The balance between innovation and security remains delicate.

The response from the security community has been mixed. Some applaud Anthropic for being transparent about the offensive capabilities and for investing in defensive initiatives like Project Glasswing. Others criticize the company for releasing such a powerful tool, even with restrictions, arguing that the risks outweigh the potential benefits. This tension mirrors the ongoing debate in the wider technology community about responsible disclosure of vulnerabilities and the ethics of offensive security research. The outcome of this debate will shape not only the future of Anthropic but also the broader landscape of AI-powered cybersecurity.

In the meantime, defenders have no choice but to prepare for a world where AI-generated exploits become commonplace. The recommendations from experts like Totzek-Hallhuber and Ruzzi provide a roadmap: emphasize detection, adopt zero-trust principles, accelerate patching cycles, and invest in AI-driven defense tools that can keep pace with the offense. The race is on, and the finish line is elusive. But with coordinated efforts among vendors, researchers, and organizations, there is hope that the good guys can stay ahead of the bad ones.


Source: Dark Reading News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy