Strangelove-AI April 9, 2026

Claude Mythos: The model too dangerous to release

The invisible infrastructure

Software is everywhere and invisible — until it breaks. For most people, the code running banks, hospitals, and communication networks only becomes real during a catastrophic failure. The security math has always tilted against defenders: attackers need to find one hole; defenders need to find all of them. That arithmetic has made software security a grinding war of attrition for decades.

That may be changing. “Claude Mythos Preview” and “Project Glasswing” represent a genuine shift in how defenders can work. The model’s capabilities are, by its creators’ own admission, alarming — but they also offer something defenders have rarely had: a head start.

Coding and hacking are the same skill

Anthropic didn’t build Mythos to be a hacking tool. They built a coder. The problem is that understanding how software is constructed and understanding how it breaks are not separate abilities — they’re the same reasoning process applied in different directions.

A locksmith who truly understands how a lock works also understands how to pick it. You can’t have one without the other. Mythos Preview demonstrates this concretely across every major security benchmark:

SWE-bench Verified: 93.9%, up from Claude Opus 4.6’s 80.8%
CyberGym (vulnerability reproduction): 0.83, up from 0.67
SWE-bench Pro: 77.8%, up from 53.4%

As one researcher working with the model put it:

We haven’t trained it specifically to be good at cyber. We trained it to be good at code, but as a side effect of being good at code, it’s also good at cyber.

Bugs that survived 27 years

The clearest proof of what Mythos can do is what it found. Within weeks of deployment, the model identified a vulnerability in OpenBSD that had been sitting undetected for 27 years. The flaw let an attacker remotely crash any server running the OS — a platform built specifically around security hardening.

It also found a 16-year-old bug in FFmpeg, the video processing library that powers a substantial chunk of internet video infrastructure. That vulnerability had survived five million automated security tests.

These aren’t edge cases in obscure software. They’re in foundational tools, and they survived decades of human review and automated scanning.

Chaining low-severity bugs into high-severity attacks

Security teams often dismiss individual low-risk findings. Mythos changes that calculus. The model can take three, four, or five independently minor vulnerabilities and work out how to chain them into a serious attack.

In one case, Mythos autonomously found and linked several Linux kernel vulnerabilities to show how an ordinary unprivileged user could gain full control of a machine by running a single binary. The Linux kernel runs most of the world’s servers.

One researcher, Nicholas Carlini, working with the model said:

I’ve found more bugs in the last couple of weeks than I found in the rest of my life combined.

Why Anthropic isn’t releasing it

A model that can find these vulnerabilities can also be used to exploit them. If Mythos were publicly available today, state-sponsored groups — from China, Iran, North Korea, and Russia — could use it to find and weaponize zero-days at a volume no human security team could track.

Anthropic’s response is “Project Glasswing”: a controlled deployment rather than a public release. The project is backed by $100 million in usage credits and $4 million in direct grants to open-source security organizations including the Apache Software Foundation and the OpenSSF. The idea is to get the defensive benefits out before the offensive ones.

What this means for everyone else

Glasswing’s initial partners are large companies — Apple, NVIDIA, Cisco, CrowdStrike, JPMorganChase. But the goal isn’t to protect enterprise IT departments. It’s to harden the infrastructure those companies share with everyone else.

When Mythos finds a bug in a major browser or in the Linux kernel, a patch eventually ships to every device running that software. A small business owner doesn’t need to know what a 27-year-old privilege escalation vulnerability is. They just receive a routine software update. For the first time, the most capable AI security scanning available is being applied to the foundations of shared digital infrastructure — before the vulnerabilities become known attack vectors.

Where this is heading

Mythos Preview is the first of many models that will operate at this level. The capability curve is climbing, and the security implications will keep compounding.

Anthropic’s approach with Glasswing sets a concrete reference point: don’t release the sword before distributing the shield. Whether OpenAI, Google, and Meta adopt similar constraints remains to be seen. The answer matters more than most people realize.

References

https://red.anthropic.com/2026/mythos-preview/
https://www-cdn.anthropic.com/08ab9158070959f88f296514c21b7facce6f52bc.pdf

SWE-bench Verified: a 500-problem subset, each verified by human engineers as solvable
CyberGym: a benchmark that tests AI agents on their ability to find previously-discovered vulnerabilities in real open-source software projects given a high-level description of the weakness (referred to as targeted vulnerability reproduction)
SWE-bench Pro: problems drawn from actively-maintained repositories with larger, multi-file diffs and no public ground-truth leakage