[ SECURITY ] [ AI ] [ PLATFORM ENGINEERING ]

Mythos on a Leash: Inside Anthropic's Decision to Hand the Public Its Most Dangerous Model

John Laban

June 9, 2026 · // 10 min read

An investigation into Claude Fable 5, the safeguards that make it shippable, and the questions those safeguards leave open.

TL;DR

On June 9, 2026, Anthropic released Claude Fable 5, the first publicly available model in its top “Mythos” tier, which sits above the Opus class.

Fable 5 and the restricted Claude Mythos 5 are the same underlying model. The only difference is the safety layer. Fable wears a leash; Mythos does not.

The leash is a set of classifiers that detect requests touching cybersecurity, biology and chemistry, or model distillation, then quietly hand those requests to the weaker Claude Opus 4.8 instead. Anthropic says this happens in fewer than 5% of sessions.

This is a reversal. In April, executives said Mythos was too dangerous to release publicly. The launch also lands days after Anthropic confidentially filed for an IPO and shortly after it urged the industry to install a collective “brake pedal” on frontier AI.

For most users, Fable 5 is a genuine leap in coding, analysis, and vision work. For sensitive fields, you may be talking to a less capable model without realizing the implications. And the safeguards, by Anthropic’s own admission, are not perfect.

The Reversal

Two months ago, Anthropic told the world that one of its models was too powerful to let out of the building.

When the company unveiled Claude Mythos in April, it described a system so good at finding and exploiting software flaws that releasing it openly could arm criminals and hostile states. Rather than ship it, Anthropic restricted Mythos Preview to a small circle of cyber defenders and infrastructure operators under a program called Project Glasswing. The message was unambiguous: this thing is dangerous, and the public cannot have it.

On Tuesday, the public got it anyway.

Anthropic released Claude Fable 5, which it openly describes as a version of that same Mythos technology, now deemed safe enough for general use. The company’s framing is that the model did not get less dangerous. Instead, the safeguards around it got good enough. Whether you find that reassuring or alarming is, in many ways, the whole story.

One Model, Two Faces

The cleanest way to understand the launch is this: Anthropic shipped a single frontier model as two products.

Claude Fable 5 is the public flagship, available through the Claude API and to paying subscribers. Claude Mythos 5 is the same model with key safeguards removed, restricted to vetted Glasswing partners working on cyber defense and critical infrastructure, in collaboration with the US government. Anthropic even explains the naming in a footnote: Fable comes from the Latin fabula, a cousin of the Greek mythos. The words are kin. The safeguards are the only thing that separates them.

That distinction matters more than a normal product split, because it tells you exactly what the public version is. Fable 5 is not a smaller or cheaper model. It is the dangerous one, wearing a muzzle.

How the Leash Works

The muzzle is a layer of classifiers, which are separate AI systems that watch incoming requests and decide whether the main model is allowed to answer. When a classifier flags a request as touching one of three areas, the response is silently rerouted to Claude Opus 4.8, Anthropic’s next most capable model. Users are told when this redirect happens.

The three flagged areas are revealing:

Cybersecurity. Mythos-class models are exceptional at discovering and chaining software exploits, and at the broader work of an actual attack, including reconnaissance and lateral movement. Anthropic built its cyber classifiers to block progress on offensive tasks of any meaningful kind.

Biology and chemistry. The company says it can no longer trust a narrow filter aimed only at obvious bioweapons questions, because the model is now genuinely useful at real scientific work. As a precaution, Fable 5 falls back to Opus 4.8 on most biology and chemistry requests, which is a deliberately blunt instrument that will catch legitimate science along with the rest.

Distillation. Anthropic has caught large-scale attempts to copy Claude’s abilities into rival models, often in authoritarian countries. Requests that look like distillation attacks also get rerouted.

Anthropic’s pitch is that this is far better than a flat refusal. More than 95% of sessions never trigger a fallback at all, and in those sessions, the company says, Fable 5 performs essentially like the unleashed Mythos 5. The trade is that the remaining sliver of requests, including some entirely benign ones, get a quieter answer from a weaker model.

What It Can Actually Do

Strip away the safety drama and Fable 5 is, on the evidence Anthropic presents, a serious jump in capability. The pattern across its claims is consistent: the longer and more open-ended the task, the bigger Fable’s lead.

The most concrete data point comes from Stripe, which tested the model early and reported that it compressed months of engineering into days. In a 50-million-line Ruby codebase, Fable 5 performed a codebase-wide migration in a single day that a human team would have needed more than two months to finish. GitHub and Cursor, also early testers, described similar gains on long, autonomous coding work.

Beyond code, Anthropic points to:

Vision. Fable 5 can rebuild a web app’s source code from screenshots alone, and it beat the video game Pokémon FireRed using only raw game images, something earlier Claude models could not do even with elaborate help.

Long-running focus. Given a persistent memory file, the model improved its own performance on the strategy game Slay the Spire three times more than Opus 4.8 did with the same tools.

Finance and analysis. Early customers reported it was the strongest model they had tested on senior-level financial reasoning and complex analytics.

The scientific claims belong to the restricted Mythos 5, not Fable, and they are the ones worth watching. Anthropic says its internal protein designers used Mythos 5 to accelerate parts of drug design roughly tenfold, with the model choosing binding sites and running design tools on its own. It also says Mythos 5 produced novel molecular biology hypotheses that its own scientists preferred about 80% of the time over those from earlier models, with one hypothesis later corroborated by an independent lab.

These are the company’s own results, not independent verification, and they should be read that way. But they explain why Anthropic frames the whole release around benefit rather than risk. Its product lead told CNBC the goal is a race to the top, delivering capability while keeping the guardrails that, in the company’s view, let the good outweigh the harm.

The Catch Nobody Advertises

Here is where an honest reader should slow down.

When Fable 5 reroutes your cybersecurity or biology question to Opus 4.8, you are not getting a slightly hedged version of the frontier model. You are getting a materially weaker one. CyberScoop, reading the published Opus 4.8 system card, laid out the gap in numbers. On a test of writing complete end-to-end exploits, Opus 4.8 scored about 5 out of 16, while Mythos Preview scored closer to 10. On reproducing known vulnerabilities in real open-source software, Opus 4.8 managed nearly 80% without guardrails, a figure Anthropic’s safety measures cut to around 1%.

For a security professional, that gap is the point. It is the safeguard working as designed. But it also means the public model’s usefulness in exactly the dual-use fields people most want help with is throttled, sometimes invisibly, to the capability of a year-ago system. The legitimate researcher and the malicious actor get the same downgrade, which is the unavoidable bind of dual-use technology.

Then there is the question of whether the leash holds. Anthropic says it ran an external bug bounty of more than 1,000 hours and that no one found a universal jailbreak, meaning a reliable, general method of stripping the safeguards. That is a real and meaningful claim. But two caveats deserve attention. First, Anthropic does not specify whether testers found partial or situational jailbreaks, the narrower cracks that often precede a full break. Second, the company quietly notes that the UK’s AI Safety Institute made progress toward a universal jailbreak during a short testing window. History is not encouraging here. Researchers have reliably broken older models given enough time, and Anthropic concedes that completely preventing universal jailbreaks is likely impossible. Its real goal is to make any crack slow and expensive enough to catch before it spreads.

The Questions Worth Asking

An exposé earns its name by asking what the announcement does not.

Why now? The timing is hard to ignore. The release lands days after Anthropic confidentially filed its IPO paperwork, with a reported revenue run rate of $47 billion, up from roughly $10 billion a year earlier, and a valuation near $965 billion. A company about to face public markets has strong incentives to ship its most impressive product. It also follows, by only days, Anthropic’s own public warning that frontier AI is advancing so fast it may soon improve itself without human input, and its call for the industry to adopt a coordinated brake pedal. Releasing your most powerful model to the public while warning that the field needs to slow down is a tension the company has not fully reconciled.

What happens to your data? Anthropic is imposing a new rule for Mythos-class models. It will retain all business traffic for 30 days on its own and third-party platforms, promising not to use it for training and to delete it after the window. CyberScoop notes this 30-day period aligns with a White House executive order creating a voluntary framework for sharing frontier models with the government. The stated purpose is defensive, to catch novel attacks and reduce false alarms. It is still a meaningful expansion of how long your interactions are kept.

Who decides what “safe enough” means? The honest answer is that Anthropic does, and it is grading its own homework. The safeguards, the testing, and the benchmarks are all the company’s own, released alongside a product it is selling. None of that makes the work wrong. It does mean independent scrutiny matters more than usual, and that the burden of proof sits with the lab, not the skeptic.

What It Means For You

If you write code, analyze documents, or do knowledge work, Fable 5 is probably the most capable model you can now access, and the gains grow with the complexity of the task. The practical advice is to lean on it for long, multi-step work where older models lost the thread.

If you work in security, biology, chemistry, or anything adjacent, assume you may be quietly rerouted to a weaker model, and that some legitimate requests will be caught in the net. Anthropic says it will narrow the safeguards over time and is opening trusted-access programs for vetted cyber and biology researchers, so the serious practitioner’s path forward is application rather than workaround.

And watch the calendar. Fable 5 is included free on Pro, Max, Team, and seat-based Enterprise plans only through June 22. On June 23, Anthropic plans to pull it from those plans, after which using it will require usage credits until the company can expand capacity. Pricing on the API runs $10 per million input tokens and $50 per million output tokens, which is less than half what Mythos Preview cost. The free window is a trial, not the standing offer.

The Bottom Line

Claude Fable 5 is a real and significant release, not the half-rumor the first reports made it sound like. Anthropic has done something the industry will study: it took a model it once called too dangerous to ship and wrapped it tightly enough to sell, betting that strong filters can let the public touch frontier capability without handing over the sharpest edges.

See how Tidra patches a CVE across every affected repo, then tracks it to merge

Book a Demo

// share