[ SECURITY ] [ AI ] [ PLATFORM ENGINEERING ]

The Frontier Goes Quiet: How Three AI Labs Are Locking Down Their Most Dangerous Models

John Laban

June 1, 2026 · // 12 min read

When Anthropic announced in early April that it would not release its newest model to the public, the reasoning was unusual for a technology company. The model worked too well. In controlled testing, Claude Mythos Preview had identified thousands of previously unknown “zero-day” vulnerabilities, flaws unknown to the software’s own developers, across every major operating system and every major web browser. Rather than ship it, Anthropic walled it off behind a vetted-partner program called Project Glasswing and told the world it was sounding an alarm.

That decision set off a chain of events that has pulled in the White House, the Federal Reserve, a British government laboratory, and Anthropic’s two biggest rivals. Underneath the headlines sits a question that matters to anyone who runs a network: which of these models, and which of the security models built around them, should actually worry you, and how much of the alarm is real versus marketing?

What Mythos can do, and why Anthropic flinched

Mythos Preview was never built to be a hacking tool. Anthropic developed it to push the boundaries of software engineering, to create an AI capable of working with vast, complex codebases in ways previous models could not. The offensive ability came along for the ride. As the company put it, “We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them.”

The specifics are striking. Mythos found a 27-year-old vulnerability in OpenBSD, an operating system with a reputation as one of the most security-hardened in the world, that let an attacker remotely crash any machine simply by connecting to it. It uncovered a 16-year-old flaw in FFmpeg, video software used almost everywhere, in a line of code that automated testing tools had run five million times without ever catching the problem. In another case, it autonomously chained together several flaws in the Linux kernel to escalate from an ordinary user account to complete control of the machine. Crucially, it identified nearly all of these vulnerabilities, and developed many of the exploits, entirely on its own, without human steering.

The scale, once partners started using it, was bigger still. In the first month of Project Glasswing, the model autonomously discovered over 10,000 high- and critical-severity zero-day vulnerabilities across the world’s most critical software, with more than 50 partner organizations including Microsoft, Apple, Google, and Cloudflare. Mozilla reportedly used it to find and fix hundreds of flaws in Firefox, more than ten times what earlier AI tools had managed.

Anthropic’s framing is that this is a defensive gift wrapped around an offensive threat. The company argues that the same capabilities that make AI dangerous in the wrong hands make it invaluable for finding and fixing flaws, and that these models need to be in the hands of defenders before attackers get their own. The company warned of a narrow window, roughly six to twelve months, before adversaries build models that can do the same thing.

The security model Anthropic built around Mythos is the most restrictive in the industry. It chose not to make the model generally available at all, instead launching Project Glasswing as an industry consortium and granting monitored access to a group of over 40 organizations that build or maintain critical software. Even the version partners receive is throttled. According to an analysis from the Alan Turing Institute, the version released to partners has additional “harmlessness” training that reportedly reduced the task completion rate to near zero, with the model generally refusing to engage with offensive tasks from the start. Anthropic has said its eventual goal is to develop safeguards on a future, less dangerous Opus model before deploying Mythos-class systems more widely.

The government gets involved

What made the Mythos story jump from a tech-industry debate to a national one was the government’s reaction, which has been contradictory. The capabilities were alarming enough that the Federal Reserve chair and the Treasury secretary convened bank CEOs to discuss the threat.

Then the White House did something unusual: it tried to stop Anthropic from sharing the model more widely. The administration rejected Anthropic’s proposal to expand access from roughly 50 to 120 institutions, citing national security risks, potential for misuse, and concerns that Anthropic lacked the computing capacity to support a larger rollout without degrading service for the government’s own functions.

That objection sits awkwardly next to everything else happening at once. The National Security Agency was already using Mythos despite the Defense Department insisting Anthropic was a “supply chain risk,” meaning the military was broadening its use of the company’s tools while simultaneously arguing in court that those tools threatened national security. The feud’s roots are not really about cyberattacks at all. The dispute ignited in February when Anthropic refused to grant the Department of Defense unrestricted access to its models for military applications, explicitly prohibiting use for autonomous weapons and domestic surveillance. A federal judge issued a temporary injunction against the government’s designation, which the government has said it intends to appeal.

For a reader trying to separate signal from noise, this is an important tell. Some of the loudest government concern about Mythos is tangled up in an unrelated procurement and policy fight, not a clean assessment of cyber risk.

OpenAI: the same capability, a different doorway

The most revealing development came not from Anthropic at all, but from an independent referee. In late April, the UK’s AI Security Institute, a government-backed evaluation body created at the 2023 Bletchley summit, published back-to-back assessments of Mythos and OpenAI’s newest general model, GPT-5.5. The results undercut the idea that Mythos is a lone monster.

On the institute’s hardest “Expert” difficulty tier, GPT-5.5 hit a 71.4 percent average success rate, edging past Claude Mythos Preview’s 68.6 percent, with the gap inside the statistical margin of error. For comparison, the older GPT-5.4 scored 52.4 percent and Claude Opus 4.7 came in at 48.6 percent. More telling was the multi-step test. In April, AISI found Mythos was the first model to complete its corporate-network attack simulation end-to-end, a multi-step exercise the institute estimates would take a human around 20 hours. The GPT-5.5 results suggest this was not a breakthrough specific to one model, but part of a broader trend: a second model, from a different developer, now reaches a similar level.

One detail from that evaluation captures why defenders are uneasy about cost as much as capability. AISI highlighted a reverse-engineering challenge that a human expert would need roughly 12 hours to solve; GPT-5.5 solved it in 10 minutes and 22 seconds, at a total API cost of $1.73.

OpenAI’s own posture is more graduated than Anthropic’s all-or-nothing wall. The company rates its models on a published “Preparedness Framework” with tiers like Low, Medium, High, and Critical. GPT-5.3-Codex was the first model OpenAI classified as “High” capability for cybersecurity tasks, and the first it directly trained to identify software vulnerabilities. By GPT-5.5, the company landed on a careful distinction. GPT-5.5 did not independently produce a functional full-chain exploit or another verifier-confirmed Critical-level outcome against real-world targets in OpenAI’s hardest evaluation. The main bottleneck was not breadth of search but exploit-development judgment: deciding which leads merited deep investment and converting crashes into controlled attacks. In OpenAI’s words, that makes GPT-5.5 “High” but not “Critical.”

In plainer terms, as one security firm summarized it, GPT-5.5 can meaningfully assist skilled vulnerability researchers but cannot yet fully replace them for novel zero-day development — a capable junior researcher, not an autonomous exploit developer, at least not yet.

OpenAI’s security model layers controls rather than locking the door. The base GPT-5.5 ships broadly but with guardrails. Because it crossed the “High” threshold, OpenAI trained it to refuse clearly malicious requests and added classifier-based monitors that detect signals of suspicious cyber activity and route high-risk traffic to a less cyber-capable model. On top of that sits a separate, gated track for defenders. Starting with a cyber-permissive variant called GPT-5.5-Cyber, OpenAI began fine-tuning models specifically to enable defensive cybersecurity use cases, scaling defenses in lockstep with rising capability. These permissive versions reached limited preview in May, rolled out to vetted defenders responsible for securing critical infrastructure. The company also runs Aardvark, an agentic security researcher, and has committed millions in credits to open-source defenders.

The important nuance: the cyber-permissive variants are not meant to be more powerful than the base model. They are meant to be more cooperative for legitimate work, lowering the refusal walls for vetted professionals rather than expanding raw capability beyond what GPT-5.5 already has.

Google: betting on defense and integration

Google has taken the most defense-tilted path of the three, and talks the least about a single fearsome model. Its strategy is to spread AI security capability across a suite of specialized agents woven into its existing security business.

The centerpieces are agents, not a flagship chat model. Big Sleep, developed by Google DeepMind and Project Zero, actively searches for unknown vulnerabilities and has found real-world flaws, including one that was imminently going to be used by threat actors, which Google’s threat-intel group was able to cut off beforehand. Its companion handles the other half of the problem. CodeMender, which leverages Gemini’s reasoning, takes a comprehensive approach to code security: instantly patching new vulnerabilities and proactively rewriting existing code to eliminate entire classes of bugs. In its first six months it upstreamed 72 security fixes to open-source projects, some with millions of lines of code. Notably, CodeMender only surfaces high-quality patches for human review, checking that fixes address the root cause, are functionally correct, and cause no regressions.

Google has been packaging these into products rather than gating a frontier model. In late May it introduced AI Threat Defense, an automated platform that fuses the Gemini models, the cloud-security firm Wiz, CodeMender, and the Mandiant threat-intelligence practice to find, prioritize, and patch vulnerabilities at machine speed, aimed at enterprises facing attackers who now compress weeks of work into hours. Its experimental security model, Sec-Gemini, has stayed deliberately narrow. It is characterized as a gated research project, with its team emphasizing the importance of research partnerships over commercial contracts and explaining how DeepMind and Google Security collaborate to keep defensive AI ahead of offensive AI.

A caution on sourcing here: some commentary refers to a “Gemini 3.1 Pro” as Google’s general model, but Google’s own recent announcements describe a newer Gemini 3.5 series, with CodeMender being folded into its enterprise Agent Platform. The takeaway is steadier than the version numbers: Google is leaning into integrated, human-in-the-loop defensive tooling and has made no public claim of a Mythos-style autonomous exploit engine.

So which security model should worry you?

Strip away the branding and a clear hierarchy emerges, but not the one the headlines imply.

If your concern is raw, unsupervised offensive power in a single system, Mythos is still the most capable thing publicly documented, and Anthropic’s restrictive security model is the most cautious response to it. That combination — most dangerous model paired with the tightest leash — is internally consistent and, on its face, reassuring.

But the security model that should actually keep a defender up at night is not any one company’s. It is the trend the UK’s evaluators identified: that frontier offensive capability is no longer a single-vendor story. As AISI’s analysis put it, the defender’s job has shifted from “watch the leading lab” to “assume the frontier itself is the threat.” When a broadly available model like GPT-5.5 reaches statistical parity with the model everyone was told was too dangerous to release, the wall around Mythos protects less than it appears to. The capability is leaking out of the general-purpose frontier on its own.

There is also a quieter structural risk worth naming: containment failures. Anthropic’s caution did not prevent leaks. News of Mythos first surfaced because details were inadvertently stored in a publicly accessible data cache due to human error, followed days later by a second lapse that exposed nearly 2,000 source-code files for the Claude Code tool for about three hours. A security model is only as strong as the organization operating it.

Hype versus real

Three things are real. The capabilities are documented by an independent government body, not just by company marketing. Mythos and GPT-5.5 both genuinely completed multi-stage attack simulations that previously stumped every model. As one analysis noted, when a government-backed institute runs the tests, the White House blocks model rollouts, and the Federal Reserve holds emergency meetings, you can be reasonably confident something real is being measured. The economics are real too: collapsing a 12-hour expert task into 11 minutes for under two dollars changes who can afford to hunt for vulnerabilities. And the defensive upside is real, with thousands of long-buried bugs now getting patched.

Three things are hype, or at least overstated. First, the idea of a single uniquely dangerous model: parity across labs makes that framing obsolete almost as soon as it was coined. Second, the cleanliness of the government’s alarm: much of Washington’s loudest objection to Mythos is entangled in an unrelated fight over military use and procurement, not a pure read on cyber danger. Third, the notion that these models are turnkey attack machines. Even the strongest still need skilled human judgment to turn a discovered flaw into a working, reliable exploit against hardened real-world targets, which is exactly why OpenAI rates GPT-5.5 “High” rather than “Critical.”

The honest bottom line is less cinematic than “Anthropic built something too dangerous to release.” It is that the entire frontier crossed a capability threshold at roughly the same time, the three labs chose three different doorways — Anthropic’s locked vault, OpenAI’s tiered and monitored gate, Google’s distributed defensive toolkit — and none of those doorways changes the underlying fact the defenders now have to plan around: AI-assisted vulnerability discovery is a baseline threat to assume, not an emerging one to watch.

See how Tidra patches a CVE across every affected repo, then tracks it to merge

Book a Demo

// share