Skip to main content
  1. Blog/

Project Glasswing: Anthropic's Mythos AI Finds Thousands of Zero-Days

ThreatNeuron
Author
ThreatNeuron
Attacks. Defenses. Everything in between.
Table of Contents

On April 7, 2026, Anthropic pulled back the curtain on something it had been working on quietly for months: Project Glasswing, a cybersecurity initiative built around an unreleased AI model called Claude Mythos Preview. The model had already found thousands of high-severity zero-day vulnerabilities across every major operating system and web browser before anyone outside a small circle even knew it existed. Some of those bugs had been sitting in production code for nearly three decades.

What Project Glasswing Actually Is
#

The name comes from a species of butterfly with transparent wings — fitting, given the initiative’s goal of making opaque software codebases visible to automated analysis. But the substance matters more than the branding.

Project Glasswing is a controlled partnership between Anthropic and roughly 50 organizations, including AWS, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, NVIDIA, and Palo Alto Networks. Each partner gets access to Mythos Preview strictly for defensive security work. Anthropic has committed $100 million in model usage credits, donated $2.5 million to the Alpha-Omega initiative and OpenSSF through the Linux Foundation, and added another $1.5 million for the Apache Software Foundation.

There’s a 90-day public reporting window on all findings, which means vulnerabilities discovered through Glasswing get disclosed to affected maintainers with a fixed deadline. That’s a responsible disclosure model, not a zero-day hoarding operation — an important distinction given the sensitivity of what the model can do.

Anthropic briefed senior U.S. government officials on Mythos Preview before announcing it externally. Agencies including CISA, NSA, and NIST’s Center for AI Standards and Innovation were all looped in. The geopolitical backdrop isn’t subtle: a Chinese state-sponsored cyberattack reportedly hit around 30 global targets in the months leading up to the announcement, and the recent compromise of the Axios JavaScript library by North Korea-affiliated hackers underscored just how fragile the software supply chain has become. For context on how attackers are weaponizing AI across the full cyberattack lifecycle, the threat picture is evolving fast.

How Mythos Preview Outperforms Everything Else
#

Mythos Preview isn’t a general-purpose model. It’s purpose-built for code analysis and vulnerability discovery, and the benchmark numbers show it. On CyberGym Vulnerability Reproduction, Mythos Preview scores 83.1% compared to 66.6% for Opus 4.6. On SWE-bench Verified, it hits 93.9% versus 80.8%. That’s not an incremental improvement — it’s a generational jump in capability for automated code analysis.

The model is available through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry at $25 per million input tokens and $125 per million output tokens. That pricing puts it at the high end, but organizations burning through pentester hours at $300/hr aren’t going to blink at API costs when the model can scan entire codebases overnight.

What makes these numbers meaningful isn’t just the benchmarks. Nicholas Carlini, a research scientist at Anthropic, found Linux kernel vulnerabilities using an older Anthropic model with a simple prompt. He also discovered the first critical vulnerability in a 20-year-old open-source project. If the previous generation could do that, Mythos Preview operating at the numbers above is a different category of tool entirely.

Alex Stamos, CSO at Corridor, put it bluntly: “LLMs have now bypassed human capability for bug finding.” That’s a strong claim, but the evidence from the cURL project backs it up.

The cURL Case Study Tells the Real Story
#

Daniel Stenberg’s cURL project is one of the best real-world examples of what AI-driven vulnerability discovery looks like in practice. cURL is a 30-year-old open-source data transfer tool embedded in cars, medical devices, and essentially anything that connects to the internet. It’s exactly the kind of critical infrastructure that accumulates decades of technical debt.

In 2025, cURL received 185 bug reports. Fewer than 5% were actual security issues — the rest were noise, much of it AI-generated slop from researchers using early models to spray low-quality reports at bug bounty programs. The volume doubled from 2024 to 2025, and Stenberg stopped paying bug bounties entirely because the signal-to-noise ratio had collapsed.

Then something shifted. By 2026, most reports coming in were legitimate. About 1 in 10 turned out to be actual security vulnerabilities, and the rest were real bugs worth fixing. In Q1 2026 alone, the cURL team fixed more vulnerabilities than in each of the two prior full years. AI flagged over 100 bugs in cURL code that had passed human review and every traditional static analyzer.

Stamos ties the quality improvement to the release of Opus 4.5 in November 2025. The jump from models that generated garbage reports to models that find genuine, deep bugs happened remarkably fast. But Stenberg offers a useful counterpoint: AI is much better at finding bugs than fixing them. Judgment calls about how to patch something properly still take more time than actually writing the code, and that human bottleneck isn’t going away soon.

Stenberg also isn’t part of Glasswing, and he’s pointed out that many projects running critical internet infrastructure were left out of the initiative. That’s a fair criticism — 50 partners is a start, not a solution.

The Market Reaction and the Dual-Use Problem
#

When news about Mythos Preview broke, cybersecurity stocks dropped between 5% and 11%. That reaction tells you the market understood something immediately: if an AI model can find thousands of zero-days faster and cheaper than human pentesters, the economics of the entire vulnerability management industry just changed.

But the stock drop also reflects a deeper anxiety. The same capability that makes Mythos Preview valuable for defense makes it dangerous in the wrong hands. Anthropic says it has no plans to release the model publicly, and access stays restricted to vetted Glasswing partners. That’s the right call for now.

The problem is that containment has a shelf life. Stamos estimates open-weight models are less than one year behind closed-weight frontier models in capability. Whatever Mythos Preview can do today, open models will approximate within 12 to 18 months. HackerOne is already developing an agentic AI product to autonomously find and fix vulnerabilities — the commercial ecosystem is racing toward this capability regardless of what Anthropic does or doesn’t release.

This mirrors what’s happening with AI-driven supply chain attacks, where offensive capabilities diffuse faster than defensive ones. The question isn’t whether attackers will have models this good. They will. The question is whether defenders are ready to deploy them first.

What This Means for Open Source Maintainers
#

The Linux Foundation is a Glasswing participant, and kernel maintainers are already experimenting with Mythos Preview. Jim Zemlin, the Foundation’s CEO, framed the challenge honestly: “These maintainers are already overworked before AI.”

That’s the tension at the heart of this. AI can surface hundreds of real bugs in code that humans have been maintaining for decades. But someone still has to triage those bugs, write patches, review the patches, test them, and ship them. If the flood of legitimate findings overwhelms the same small teams that were already struggling, the net effect could be more known-but-unpatched vulnerabilities sitting in the open — arguably worse than unknown ones.

The $2.5 million to OpenSSF and $1.5 million to Apache help, but those are one-time donations against an ongoing problem. The open source sustainability crisis predates AI, and throwing a more powerful bug-finder at underfunded projects doesn’t fix the underlying economics. It just makes the backlog more visible.

Gary DePreta from Cisco’s U.S. Public Sector Organization described the shift as moving “from detect-and-respond to predict-and-prevent threats.” That sounds good in a press release, but prediction without the capacity to act on predictions is just a more sophisticated form of alert fatigue.

The Pentagon Wrinkle
#

There’s a political subplot worth watching. According to NPR, the Pentagon has labeled Anthropic a “supply chain risk” because the company refuses to allow its models to be used for autonomous weapons and mass surveillance. That label, if enforced, would bar government agencies and contractors from working with Anthropic.

The irony is thick. Anthropic briefed U.S. government officials on Mythos Preview and built an initiative that directly strengthens national cybersecurity. The same government that benefited from that briefing is considering blacklisting the company for having ethical boundaries around military applications. How that tension resolves will shape whether Glasswing’s government partnerships survive.

For security teams evaluating whether to build workflows around Anthropic’s tools, this is worth monitoring. Policy instability creates vendor risk, and the last thing a CISO wants is a critical security tool that gets caught in a procurement ban.

Key Takeaways
#

  1. Mythos Preview has found thousands of high-severity zero-days across every major OS and browser, including bugs that persisted for 16 to 27 years — a scale of discovery no human team could match.
  2. The capability gap between AI and human bug-finders has closed. Models released in late 2025 already triggered a measurable improvement in vulnerability report quality across projects like cURL.
  3. Containment won’t last. Open-weight models are estimated to be less than a year behind frontier closed models, meaning these capabilities will proliferate regardless of access controls.
  4. Finding bugs is easier than fixing them. The bottleneck is shifting from discovery to remediation, and underfunded open source projects aren’t staffed to handle the incoming volume.
  5. Market and policy dynamics add uncertainty. Cybersecurity stocks already reacted, and the Pentagon’s “supply chain risk” label on Anthropic could complicate government adoption of Glasswing outputs.

Frequently Asked Questions
#

What is Project Glasswing and how does it work?
#

Project Glasswing is Anthropic’s cybersecurity initiative that uses the Claude Mythos Preview model to scan critical software for high-severity vulnerabilities. Around 50 partner organizations, including AWS, Microsoft, Google, and Apple, have access to the model for defensive security purposes only.

How many vulnerabilities has Claude Mythos AI discovered?
#

Mythos Preview has identified thousands of high-severity zero-day vulnerabilities across every major operating system and web browser. Some of these bugs had been hiding in codebases for decades, including a 27-year-old flaw in OpenBSD and a 16-year-old bug in FFmpeg.

Will Anthropic release Mythos Preview to the public?
#

Anthropic has stated it has no plans to release Mythos Preview publicly due to the risk of misuse. The model is restricted to roughly 50 vetted organizations through Project Glasswing, though Anthropic intends to release other related cybersecurity models in the future.

How does Mythos Preview compare to other AI models for security?
#

Mythos Preview scores 83.1% on CyberGym Vulnerability Reproduction versus 66.6% for Opus 4.6, and 93.9% on SWE-bench Verified versus 80.8%. It’s specifically tuned for code analysis and vulnerability discovery rather than general-purpose tasks, which accounts for the significant performance gap.

Sources & References
#

Related