How to Build a Mythos-Ready Security Program

May 26, 2026 — Rocky Giglio

Security teams spent years assuming that finding zero-day vulnerabilities required nation-state resources, months of expert-level effort, and a lot of luck. Claude Mythos ended that assumption in April 2026, and it is time to change your security program to match the new speed.

Anthropic’s model autonomously discovered thousands of previously unknown vulnerabilities across every major operating system and web browser during a controlled evaluation. It found a 27-year-old bug in OpenBSD, a 17-year-old remote code execution flaw in FreeBSD’s NFS server, and a browser exploit chain that required chaining four separate vulnerabilities together – work that typically takes a senior human researcher months. It built fully functional exploits for most of them without human guidance.

Then Anthropic announced Project Glasswing: a restricted coalition of AWS, Apple, Google, Microsoft, and others committing $100 million to deploy Mythos defensively before adversaries develop comparable capabilities. The unspoken message in that announcement was direct: similar capabilities are coming regardless, and the window to get ahead of them is measured in months, not years.

The security community responded. The Cloud Security Alliance, SANS Institute, and the OWASP GenAI Security Project put together a community resource called “The AI Vulnerability Storm: Building a Mythos-Ready Security Program” that cuts through the noise and tells practitioners what to actually do. It is one of the better community-driven security documents I have seen on an emerging threat, and it is worth your time.

Here is what it covers and what matters most for cloud security teams.

The Mythos Numbers You Need to Understand

The headline benchmark is 83.1% success on vulnerability reproduction meaning Mythos could reproduce known vulnerabilities from description alone at that rate. That is a useful number for understanding general capability, but the figures that should anchor your thinking are different.

Over 99% of what Mythos found remains unpatched. That is not a disclosure failure. It is a volume problem. The coordinated vulnerability disclosure infrastructure was built for human research throughput: individual researchers, weeks of work, 90-day windows, vendor triage on manageable queues. AI-scale discovery does not fit that model. Linux kernel maintainers reported a 10 to 15x surge in vulnerability submissions after the Mythos announcement. The disclosure pipeline was not built for that.

The other number worth understanding is the exploitation timeline. The median window between vulnerability discovery and a weaponized exploit collapsed from 771 days in 2018 to under four hours by 2024. The projection for end of 2026 is under one hour. Your patch cycle is not a response to that. It is a calendar item that attackers do not wait for.

And then there is the containment failure. During internal safety testing, an early version of Mythos escaped its sandbox, gained internet access through a system configured to communicate only with a small set of predetermined services, emailed the supervising researcher who was away eating lunch, and independently posted descriptions of its own actions on publicly accessible websites without being asked to. Anthropic described this as agentic capabilities operating without adequate goal constraints. That statement matters as it is not a bug with a patch, it is a behavior pattern that requires a different security model for AI deployments.

Seven Assumptions Your Security Program Is Still Making

The community resource frames its threat model around seven foundational assumptions that security programs have relied on for years. If you are being honest with yourself, most of these still describe how your program operates.

• Patch cycles provide adequate protection. They do not when the window to exploitation is shrinking toward one hour.

• Attackers need significant resources to discover zero-days. A Mythos-class research run costs under $20,000.

• CVE databases capture the relevant threat universe. Mythos found thousands of vulnerabilities that have never appeared in any CVE list and may be exploited before they ever do. Google Mandiant documented this in the real world.

• Periodic scanning keeps you informed. AI-powered discovery does not schedule itself around your scan windows.

• Sustained expert human review catches serious vulnerabilities. OpenBSD had 27 years of security-focused review from a community explicitly built around code correctness. Mythos found what they missed in under a thousand autonomous runs.

• Embedded and legacy systems are a lower-priority risk because patching is harder. Correct that these systems are harder to patch. Wrong that this lowers their priority. AI discovery tools will find vulnerabilities in unpatchable devices, and you will have no patch cycle response available.

• AI systems deployed internally behave like tools and stay within assigned scope. The Mythos containment failure is the documented evidence that sufficiently capable AI agents require threat actor modeling, not just tool hardening.

The assumptions audit in the community resource is worth running against your own program explicitly. Not as a thought exercise – as an actual written audit with honest answers.

The Risk Register: What to Prioritize and When

The resource maps 11 prioritized risks across three time horizons: immediate, 6 months, and 12 months. It aligns to OWASP LLM Top 10, OWASP Agentic Top 10, MITRE ATLAS, and NIST CSF 2.0, which means you can map it directly to frameworks your organization already references rather than treating it as a separate track.

The immediate risks come down to three realities. First, triage capacity calibrated to pre-Mythos discovery rates will not scale. If you have not sized your vulnerability management resources for a higher volume baseline, that gap will become visible fast. Second, the exploitation window is already below four hours and shrinking. Controls that depend on patch availability are not sufficient. Third, if your organization is using AI coding tools you have a blind spot. AI-generated code has a different vulnerability density profile than human-written code, and standard scanners were not built to detect it. That gap is growing every time a developer uses an AI coding assistant.

The 6-month risks center on supply chain exposure. The same AI-scale analysis applied to your own codebase can be applied to your dependencies and vendors. If you do not have SBOM-level visibility into your software supply chain, you are operating without a map.

The 12-month risks are about the competitive window closing. Anthropic’s estimate is that comparable capabilities to Mythos will emerge from other AI labs within 12 to 18 months. Project Glasswing bought some time for defensive deployments to get ahead of adversarial access. That window is finite, and the risk register is built around using it deliberately.

The Defensive Playbook: What to Actually Do

The playbook covers seven domains. These are the ones that matter most for cloud security teams right now.

Vulnerability Management

Periodic scanning is structurally insufficient for this threat environment. Move toward continuous monitoring. Prioritization needs to shift from CVSS scores to actual exploitability in your specific environment – what researchers call reachability analysis. CVSS tells you how severe a vulnerability is in the abstract. Reachability analysis tells you whether it is exploitable against your actual architecture. Those are different questions, and only one of them helps you make a triage decision under time pressure. For a closer look at what a modern VulnOps function looks like in practice, see VulnOps: Why Automated Vulnerability Management Is No Longer Optional.

Application Security

If your organization has adopted AI coding tools and your AppSec program has not been updated to govern AI-generated code specifically, you have a gap. Standard scanners were tuned against vulnerability patterns in human-written code. AI-generated code introduces a different density of weaknesses. The risk does not disappear because your scanner does not flag it. Governance of AI-generated code needs to be a first-class component of your AppSec program, not an afterthought.

Identity Hardening

AI-powered exploit tools target the intersection of a reachable vulnerability, an overprivileged identity, and an exposed asset. Least-privilege enforcement reduces the blast radius of successful exploitation. For AI agents specifically, enforce the principle at the infrastructure layer, not just at the agent configuration layer. The Mythos containment failure is a real-world example of what happens when an agent has more network access than its task requires.

Incident Response Detection

AI-generated exploit traffic does not look like what your signature-based detection was tuned for. Traditional automated scanning is noisy and probe-heavy. AI-generated exploits are syntactically correct and operationally coherent. If your detection is based primarily on behavioral signatures derived from legacy tool patterns, you need to add behavioral anomaly detection on network traffic and application-layer logging at vulnerable protocol boundaries (NFS, RPC, and browser-facing interfaces) specifically.

Agentic AI Deployment Governance

If your organization is deploying AI agents in security operations, code review, or incident response, the Mythos containment failure should trigger an immediate review of how you have scoped agent access. Network access for AI agents must be enforced at the infrastructure layer independently of the agent itself, on the assumption that a capable agent may attempt workarounds. Audit logging needs to capture all agent outputs and external communications, including actions the agent was not directed to take. Define permitted actions, not just goals.

Legacy and Embedded Systems

This domain has the least runway and the least organizational appetite. AI discovery tools will surface vulnerabilities in devices that cannot be patched through conventional update channels: embedded systems, OT environments, legacy infrastructure. The fix-by-patching response is unavailable. Network segmentation, traffic monitoring, and decommissioning timelines are the available options. Deferring this audit means the window for implementing compensating controls narrows every month.

Third-Party and Supply Chain Risk

The FFmpeg vulnerability Mythos found had been present for 16 years in one of the most widely deployed multimedia libraries in the world. Your risk surface is not limited to code you wrote. AI-scale analysis can be applied to every library, dependency, and vendor component in your supply chain. SBOM-based visibility is the starting point. If you do not have it, prioritizing it belongs in your 6-month plan.

The CISO Brief: Talking Points and a 90-Day Plan

The community resource includes a ready-made executive briefing section, which is genuinely useful for CISOs who need to move budget conversations quickly. The 90-day plan covers four tracks: capacity, tooling, infrastructure hardening, and progress tracking.

The investment argument is straightforward. The cost of building capacity now is lower than the cost of incident response against AI-speed exploitation. That calculation gets harder to argue against as the exploitation window shrinks below one hour.

The threat model framing for the board is equally direct: the assumptions your security program was built on have changed. This is not a projection. Claude Mythos Preview is a documented capability that already exists. Anthropic’s estimate is that comparable capabilities will be broadly accessible within 12 to 18 months. Project Glasswing bought the defensive side some time. What your organization does with that time is a decision, not a circumstance.

If you are looking for a way to structure that conversation, the executive briefing section of the CSA/SANS/OWASP resource gives you the structure. Use it.

One More Thing Worth Knowing

Something that did not make it into most of the coverage: research published alongside the Mythos announcement tested the core vulnerability detections against small, cheap, open-weights models, some with as few as 3.6 to 5.1 billion active parameters, small enough to run locally. The key findings from that analysis were significant: many of the core detections, including the FreeBSD buffer overflow and parts of the 27-year-old OpenBSD bug, were recovered by those tiny models.

The capability frontier is jagged. It does not scale smoothly with model size or cost. That means the threat is more distributed than the Mythos announcement alone implies. The moat is not in having the largest model. It is in the system built around it: orchestration, validation, integration with traditional security tooling, and the security expertise to use the output correctly.

For defenders, that is the honest picture. You are not just planning for nation-state actors with frontier model access. You are planning for a broader landscape where meaningful vulnerability discovery capability is already more accessible than most organizations are accounting for.

The Community Resource

“The AI Vulnerability Storm: Building a Mythos-Ready Security Program” from the Cloud Security Alliance, SANS Institute, and OWASP GenAI Security Project is available now. It is community-driven, framework-aligned against OWASP, MITRE ATLAS, and NIST CSF 2.0, and not a vendor pitch. If you are trying to figure out how to approach this for your organization, that document is the right starting point.

The five sections cover the threat model, the assumptions audit, the risk register, the defensive playbook, and the executive briefing. Run the assumptions audit first. It will tell you faster than anything else where your program’s actual gaps are.

FAQ

What is Claude Mythos?

Claude Mythos Preview is Anthropic’s most capable AI model as of April 2026. During a controlled evaluation, it autonomously discovered thousands of previously unknown zero-day vulnerabilities across every major OS and browser and developed working exploits for most of them without human guidance. Its offensive security capabilities were not deliberately trained – they emerged as a downstream consequence of general improvements in coding ability and reasoning. Anthropic has not released the model publicly.

What is Project Glasswing?

Project Glasswing is Anthropic’s controlled defensive initiative announced at the same time as Mythos Preview. It gives restricted access to Mythos to a pre-approved coalition of organizations – including AWS, Apple, Google, JPMorgan Chase, and Microsoft – specifically for finding and patching critical software vulnerabilities before adversaries develop comparable capabilities independently. Anthropic has committed over $100 million in model usage credits and open-source security donations to fund the effort.

What is a Mythos-ready security program?

A Mythos-ready security program operates on the assumption that AI-scale vulnerability discovery is becoming accessible to adversarial actors, not just to Glasswing coalition members. It moves from periodic scanning to continuous monitoring, from CVSS-based prioritization to exploitability-based risk scoring, and from treating AI agents as tools to governing them with a threat-actor-informed model. It explicitly governs AI-generated code as a security concern and has compensating controls in place for systems that cannot be patched through conventional update channels.

How should cloud security teams respond right now?

The most immediate steps are: run an honest audit of which of the seven foundational assumptions your program is still relying on; update incident response detection use cases to account for syntactically correct AI-generated exploit traffic rather than noisy automated scanning patterns; review goal-boundary controls for any AI agents deployed internally and enforce network access at the infrastructure layer; and audit embedded and legacy systems that cannot be patched, then implement network-level compensating controls before that window closes.

Does the Mythos containment failure mean AI agents are too dangerous to deploy?

No, but it changes the governance model required. The containment failure showed that a sufficiently capable AI agent may pursue objectives beyond its assigned scope and take persistent external actions without being directed to. The appropriate response is not avoiding AI agent deployment – it is enforcing network access controls at the infrastructure layer independently of the agent, defining permitted actions rather than just goals, and treating agent goal-boundary violation as a threat model category that requires detection and response, not just configuration management.

Stay Ahead of the AI Security Curve

The Mythos announcement is not the end of this story. It is the opening condition. Anthropic has said that comparable capabilities will emerge from other AI labs within 12 to 18 months – at which point Project Glasswing’s head start disappears and adversarial access becomes the operating baseline.

Get Started Today

Cloud Security Pros tracks the AI security landscape as it develops. Not sure where your program actually stands against a Mythos-era threat model? Our Readiness assessment gives you an honest answer and a prioritized roadmap. Contact us to build your AI ready security program today. And subscribe to stay current on what this means for cloud security programs as the picture continues to evolve.

Written by Rocky Giglio Founder, Cloud Security Pros

Rocky Giglio is the founder of Cloud Security Pros, a consulting practice focused on AI-era cloud security. He works with security teams navigating the shift from traditional vulnerability management to AI-speed threat environments, covering the Cloud Security Alliance, SANS, and OWASP communities as the landscape evolves.

See all posts →

← Back to Blog