Cyber AI

Anthropic’s Mythos Preview Can Now Build Working Exploits, Not Just Find Bugs

by
Free RDP
May 19, 2026

For years, the security industry has quietly dreaded a particular milestone: the moment an AI could not only spot a vulnerability but also weaponize it. That moment, according to Cloudflare’s security team, has arrived. After weeks of stress-testing Anthropic’s Mythos Preview across more than 50 internal repositories, researchers saw something new. The model didn’t just identify flaws. It turned them into functional, demonstrable exploits.

This work happened under Anthropic’s invite-only Project Glasswing, a program designed to push the limits of AI safety research. And the implications are sobering. The distance between discovering a bug and delivering a working proof of concept has effectively collapsed. For defenders, that means the clock just got faster. For threat actors, it suggests a future where exploit development is no longer a bottleneck but a commodity.

Earlier frontier AI models, the ones Cloudflare tested previously, could do impressive things. They could find individual vulnerabilities, describe them clearly, and even explain why they mattered. But they consistently failed at the final step: stitching those pieces into a coherent, working exploit chain. A use-after-free here, an arbitrary read/write there, a ROP gadget scattered somewhere in memory. These models couldn’t connect the dots. Mythos Preview closes that gap in two distinct ways.

Exploit Chain Construction: Connecting the Dots

The first major leap is something Cloudflare calls exploit chain construction. Instead of treating each low-severity bug as an isolated finding, Mythos Preview can reason across multiple primitives. It takes a use-after-free vulnerability, an arbitrary memory read/write, and a return-oriented programming gadget, then figures out how they combine into a single, higher-severity attack path. This is not hypothetical reasoning. The model generates actual code to trigger the chain, compiles it in a sandbox, runs it, observes the failure output, and iterates its hypothesis until either the exploit works or the bug is ruled out as unexploitable.

That iterative process is what Cloudflare calls proof generation. It means confirmed findings arrive with a PoC attached. No more waiting for a human researcher to spend hours or days validating a lead. The triage queue gets compressed dramatically. Bugs that would have sat in a security backlog, deemed too low-severity or too ambiguous to act on, suddenly become actionable attack vectors. It’s a shift from speculation to demonstration.

But what about false positives? They’ve been the Achilles’ heel of AI-driven vulnerability research for years. Two factors dominate: programming language and model bias. C and C++ codebases, for instance, generate significantly more noise than memory-safe languages like Rust. Meanwhile, models tuned to report anything and everything flood triage queues with hedged language: possibly, potentially, could in theory. Mythos Preview noticeably dials this down. Its output includes fewer ambiguous conclusions, clearer reproduction steps, and PoC code that sharpens the fix-or-dismiss decision considerably. That’s a practical win for any security team drowning in alerts.

How Cloudflare Built a Smarter Research Pipeline

Cloudflare didn’t just throw Mythos Preview at its repositories and hope for the best. They learned that pointing any AI model directly at a codebase produces poor coverage. Effective vulnerability research demands a custom execution harness built around several principles. First, narrow scope. Scoping each agent’s task to a specific function, attack class, and trust boundary yields sharper findings than broad, repository-wide prompts. Think of it like a surgeon versus a general practitioner: both are doctors, but you want the one focused on the specific organ.

Second, adversarial review. A second independent agent, using a different model and a different prompt, reviews findings with the specific goal of disproving them. This catches a significant fraction of false positives that the first agent misses. It’s a classic red team versus blue team dynamic, but automated. Third, chain splitting. Treating the questions “Is this code buggy?” and “Can an attacker reach this from outside?” as separate tasks produces better reasoning on both. They’re fundamentally different problems, and mashing them together muddies the logic.

Finally, parallel narrow tasks. Cloudflare runs roughly fifty concurrent agents on tightly scoped hypotheses, then deduplicates the results. A single exhaustive agent might miss something or get stuck on a dead end. Fifty narrow agents, running in parallel, explore more ground and converge faster. Their full pipeline covers recon, hunt, validate, gapfill, dedupe, trace, feedback, and report stages. The trace stage is particularly important: it determines whether attacker-controlled input can actually reach a confirmed bug from an external entry point. That’s the difference between a theoretical vulnerability and a real exploitable weakness.

Despite operating under reduced safeguards within Project Glasswing, Mythos Preview exhibited inconsistent organic refusals. In some cases, it declined to write demonstration exploits. In others, it completed equivalent tasks when framed slightly differently. Cloudflare flagged this directly: emergent guardrails alone are not a reliable safety boundary. The same capabilities that accelerate internal bug discovery will accelerate attacks against internet-facing applications. That’s a sobering thought.

Architectural defenses, the kind that sit in front of applications, limit blast radius, and enable simultaneous global patch rollout, are becoming increasingly urgent. The window between vulnerability disclosure and active exploitation continues to shrink. We’ve seen it with zero-days, we’ve seen it with ransomware, and now we may see it with AI-driven exploit generation. The question is no longer if attackers will use these tools, but how quickly they can adapt them.