Connect with us
Why Robust AI Governance Is Now a Non-Negotiable for Enterprise Survival

AI

Anthropic Withholds Powerful AI Model After It Uncovers Thousands of Critical Software Flaws

Anthropic Withholds Powerful AI Model After It Uncovers Thousands of Critical Software Flaws

A New AI Frontier in Cybersecurity

In a move that underscores the immense power and peril of advanced artificial intelligence, Anthropic has developed an AI model so adept at finding software vulnerabilities that the company has decided against releasing it to the public. Instead, this formidable tool, named Claude Mythos Preview, is being quietly distributed to a select coalition of technology giants and security organizations through an initiative called Project Glasswing. The decision reflects a growing consensus among frontier AI labs that some capabilities are simply too dangerous for general availability, at least for now.

An Unprecedented Coalition for a Novel Threat

The launch partners for Project Glasswing read like a who’s who of the digital world: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks. Beyond this core group, Anthropic has extended access to over 40 additional organizations responsible for building or maintaining critical software infrastructure. This is not a typical product launch; it’s a strategic containment and defense operation.

Anthropic is backing the effort with substantial resources, committing up to $100 million in usage credits for the Mythos Preview model across the coalition. The company is also making $4 million in direct donations to open-source security organizations, recognizing that the software underpinning the modern internet often lacks the defensive resources of major corporations.

Capabilities That Outgrew the Test

What makes Claude Mythos Preview so special, and so concerning? Interestingly, it wasn’t specifically trained for cybersecurity work. According to Anthropic, its prowess in this domain “emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.” In a classic double-edged sword scenario, the same architectural improvements that make the model exceptionally good at patching vulnerabilities also make it terrifyingly effective at exploiting them.

The model improved so rapidly that it essentially saturated existing security benchmarks. This forced Anthropic’s researchers to shift their testing focus to novel, real-world tasks, specifically hunting for zero-day vulnerabilities, flaws unknown even to the software’s own developers. The results were staggering.

From Theory to Real World Exploits

The model’s performance moved from academic to alarming with concrete discoveries. It unearthed a 27-year-old bug in OpenBSD, an operating system renowned for its strong security posture. In a more dramatic demonstration, the AI fully autonomously identified and exploited a 17-year-old remote code execution vulnerability in FreeBSD (tracked as CVE-2026-4747). This flaw allows an unauthenticated user anywhere on the internet to seize complete control of a server running the Network File System (NFS) protocol.

Critically, no human was involved in the discovery or exploitation after the initial prompt to find a bug. Nicholas Carlini from Anthropic’s research team highlighted the model’s sophisticated reasoning, noting its ability to chain together multiple vulnerabilities. “This model can create exploits out of three, four, or sometimes five vulnerabilities that in sequence give you some kind of very sophisticated end outcome,” Carlini said. He added a personal note that speaks volumes: “I’ve found more bugs in the last couple of weeks than I found in the rest of my life combined.”

The Rationale for Secrecy

So why lock it down? The answer from Anthropic is starkly pragmatic. “We do not plan to make Claude Mythos Preview generally available due to its cybersecurity capabilities,” stated Newton Cheng, Frontier Red Team Cyber Lead at Anthropic. The company’s reasoning hinges on the blistering pace of AI progress. They believe it will not be long before such capabilities proliferate, potentially falling into the hands of actors not committed to deploying them safely.

The potential fallout, for economies, public safety, and national security, could be severe. This isn’t a hypothetical fear. Anthropic has already documented what it calls the first major cyberattack largely executed by AI, attributed to a Chinese state-sponsored group. That operation used AI agents to autonomously infiltrate roughly 30 global targets, with artificial intelligence handling the majority of tactical operations independently.

Government Briefings and the Open-Source Dilemma

The implications have reached the highest levels of government. Anthropic has privately briefed senior U.S. officials on Mythos Preview’s full capabilities, and the intelligence community is now actively weighing how such a model could reshape both offensive and defensive hacking operations on a global scale.

Project Glasswing also tackles a chronic weak spot in global digital security: open-source software. Jim Zemlin, CEO of the Linux Foundation, framed the problem succinctly: “In the past, security expertise has been a luxury reserved for organisations with large security teams. Open-source maintainers, whose software underpins much of the world’s critical infrastructure, have historically been left to figure out security on their own.”

Through the Linux Foundation, Anthropic has donated $2.5 million to Alpha-Omega and the Open Source Security Foundation (OpenSSF), and another $1.5 million to the Apache Software Foundation. This funding is designed to give the often-under-resourced maintainers of critical open-source codebases access to AI-powered vulnerability scanning at a scale previously unimaginable.

The Road Ahead for Frontier AI

What comes next? Anthropic says its eventual goal is to deploy Mythos-class models at scale, but only when new, robust safeguards are firmly in place. The company plans to test these safety measures first with an upcoming Claude Opus model, allowing for refinement with a system that doesn’t pose the same extreme level of risk as Mythos Preview.

The competitive landscape is already shifting in response. When OpenAI released GPT-5.3-Codex earlier this year, it labeled the model as the first it had classified as “high-capability” for cybersecurity tasks under its internal Preparedness Framework. Anthropic’s move with Glasswing signals a new, emerging standard among frontier AI labs: controlled, restricted deployment for models at this capability tier, not open release.

This cautious approach raises a pivotal question for the industry. Can this standard of restraint hold as these powerful capabilities inevitably become easier to develop? The answer is unclear, and no single initiative, no matter how well-funded or intentioned, can provide it alone. The race is no longer just about who can build the most powerful AI; it’s increasingly about who can build the wisdom and the frameworks to manage it responsibly. The success of Project Glasswing may well set the precedent for how the world handles the next generation of tools that can both defend and dismantle our digital foundations.

More in AI