The First AI-Generated Zero-Day - What Google's GTIG Discovery Changes for Cloud Security

June 10, 2026 — Rocky Giglio

On May 11, 2026, Google’s Threat Intelligence Group published a report that confirmed a milestone in the history of cyberattacks. A prominent criminal group used an AI model to discover and weaponize a zero-day vulnerability in a popular open-source web administration tool. The exploit, written in Python, bypassed two-factor authentication by targeting a hardcoded trust assumption the developers had buried in the authentication logic.

For the last few years we knew AI was being used by attackers and would be used even more to find unknown zero-days and exploit them. That time has come. Google Threat Intelligence Group documented AI being used to find a way into an application in a way nobody thought of before. Something that AI reasoned its way to by discovering a way into an application by leveraging what the code do even though that was not what they developer intended. The story is what the AI did to turn that gap into a working attack script. That capability shifts the economics of zero-day research permanently.

What Google Found on May 11, 2026

Google’s Threat Intelligence Group (GTIG) documented a coordinated operation by a criminal group planning to run mass exploitation against a web-based system administration tool. When researchers analyzed the Python exploit, the code carried unmistakable signs of AI authorship. Things like educational docstrings, a hallucinated CVSS score, and textbook-style formatting (detailed help menus, the clean _C ANSI color class) that experienced human developers would not include in an attack tool.

“Based on the structure and content of these exploits, we have high confidence that the actor likely leveraged an AI model to support the discovery and weaponization of this vulnerability,” Google wrote in the report. The company noted that Gemini was not the model used. The unnamed AI had identified a logic flaw in authentication code, specifically a hardcoded exception that created a 2FA bypass route. The vulnerability required valid credentials to exploit but bypassed the second authentication factor entirely.

Google worked with the affected vendor before the criminal group could execute. The full GTIG report also documented AI usage in malware development, autonomous attack orchestration, agentic workflows for deepfake campaigns, and infrastructure deployment.

How an AI Cracked a Hardcoded Trust Assumption

The vulnerability traced to a hardcoded trust assumption in the tool’s authentication code. The developers had written logic that enforced 2FA in most cases, but included a hardcoded exception that created an invisible bypass. Standard static analysis and vulnerability scanners would see the logic and code and pass it with flying colors. Assuming the developers took these steps they still would have been exposed.

The AI did something different it read the intent. GTIG researchers described the capability this way: “frontier LLMs have an increasing ability to perform contextual reasoning, effectively reading the developer’s intent to correlate the 2FA enforcement logic with the contradictions of its hardcoded exceptions.”

This is a measurable capability shift. Earlier AI-assisted attacks helped lower-skilled actors write phishing emails or generate malware variants. That is volume work, or what you might call level 1 engineering. This attack required the AI to reason about what the code was supposed to do versus what it actually does, and identify where those two things diverge. That level of analysis previously required skilled human researchers with deep domain knowledge, significant time, and familiarity with the target codebase.

GTIG’s conclusion: “This capability can allow models to surface dormant logic errors that appear functionally correct to traditional scanners but are strategically broken from a security perspective.”

Why This Is a Different Kind of Threat

Security teams have tracked AI as a force multiplier for established attack types: faster phishing, more convincing social engineering, quicker malware iteration. Those are volume threats. More of the same category, generated faster.

The first AI-generated zero-day exploit falls into a different category. It required original discovery of an unknown flaw in deployed software. The AI acted as a vulnerability researcher, not a content generator.

Two consequences follow. First, the barrier to zero-day exploitation has dropped. Discovering a zero-day used to require deep expertise, significant time, and familiarity with the target codebase. An AI model removes two of those three constraints. Second, the scope of flaws now in reach has expanded. Logic errors, faulty trust models, and subtle authentication exceptions have historically been hard to catch with automated tools. They are now within reach of AI reasoning systems at scale.

This is not a hypothetical. Google documented the actual exploit, analyzed the code, confirmed the attack. The first AI-generated zero-day was caught mid-campaign. The next one may not be. This means we have to take a different approach to application security. It is time to create a Mythos ready security program.

Nation-State Actors Are Running the Same Playbook

The criminal group behind the first AI-generated zero-day is not an isolated case. GTIG’s May 2026 report documented parallel activity by state-sponsored actors.

Chinese group UNC2814 attempted persona-driven jailbreaks on Gemini, directing the model to act as a senior security auditor to analyze TP-Link firmware for vulnerabilities. The group has a history of targeting telecommunications and government networks across 42 countries since 2017, gaining initial access through vulnerabilities in edge systems and web applications.

North Korean group APT45 ran thousands of prompts through Gemini to analyze existing CVEs and validate proof-of-concept exploits. GTIG described the goal as building “a more robust arsenal of exploit capabilities that would be impractical to manage without AI assistance.”

Other actors pre-loaded AI models with historical vulnerability data to improve code analysis and surface flaws that manual review would overlook. Google also observed actors experimenting with agentic tools like OpenClaw and OneClaw alongside intentionally vulnerable testing environments, refining AI-generated payloads in controlled settings before deployment.

The common thread: AI-assisted vulnerability research has moved past the experimentation stage for both criminal and nation-state threat actors. The first AI-generated zero-day confirms this.

What Cloud Security Teams Need to Do Now

The May 2026 incident points to specific gaps in how most organizations approach authentication code review and patch management. Four actions to prioritize:

Audit your trust assumptions

The 2FA bypass worked because developers hardcoded an exception that created a logical contradiction. This pattern shows up in authentication code, session management, authorization flows, and API access logic. If your codebase contains hardcoded bypass conditions, whitelisted IP ranges with elevated trust, or exception logic in authentication paths, those warrant manual review. Automated scanners will not find them. You need a human reviewing what the code is supposed to do and comparing it against what the code actually does.

Shift code review toward logic flaws

Static analysis tools excel at memory safety bugs and input validation failures. Logic errors in authentication and authorization require a different lens. Review of developer intent versus actual behavior, not just code structure, is required now. Pair your standard SAST process with manual reviews focused on trust boundaries, privilege escalation paths, and exception handling in auth code.

Tighten your patching SLA for internet-facing systems

Google disrupted this campaign through coordinated disclosure. That window is narrowing. As AI accelerates vulnerability discovery, the timeline between public disclosure and active exploitation compresses. Your current patching SLA for internet-facing systems and admin tooling should reflect this. If it was set before AI-assisted exploit development was confirmed in the wild, it needs a revision.

Apply zero-trust principles to your authentication layer

Systems that grant elevated trust based on hardcoded conditions, network location, or implicit signals are more exposed than systems requiring explicit, context-aware verification at each access decision. If your authentication code contains exceptions, treat those exceptions as the highest-priority attack surface. The GTIG incident confirms that AI can find those exceptions faster than your team can review code. Take a look at your network designs, authentication systems, and think through ways to decrease blast radius if something gets through.

The Benchmark Has Shifted

The first AI-generated zero-day did not look like a future threat. It looked like a working Python script, caught mid-campaign, with a hallucinated CVSS score in the comments. The AI that built it is being iterated on and distributed across criminal groups and state-sponsored actors now.

Audit your trust assumptions, tighten your patching timelines, and add logic-flaw-focused code review to your standard process. The economics of zero-day development changed on May 11, 2026. Your security program should reflect that change before the next campaign launches. Even if you don’t develop software, think about what these changes should look like for your system designs, CI/CD pipelines for code driven infrastructures, endpoints, and off-the-shelf applications.

Frequently Asked Questions

What is the first AI-generated zero-day exploit?

On May 11, 2026, Google’s Threat Intelligence Group confirmed that a criminal group used an AI model to discover and weaponize a zero-day vulnerability in an open-source web administration tool. The exploit, written in Python, bypassed two-factor authentication by targeting a hardcoded trust exception in the authentication code.

Which AI model was used in the attack?

Google did not name the AI model. The company confirmed Gemini was not used. Code analysis pointed to an LLM based on the presence of educational docstrings, a hallucinated CVSS score, and formatting patterns consistent with LLM training data.

Was the vulnerability patched?

Yes. Google coordinated disclosure with the affected vendor before the criminal group could execute the planned mass exploitation campaign.

Why couldn’t traditional scanners find this vulnerability?

The flaw was a logic error in 2FA enforcement code: a hardcoded trust exception that created a bypass route. This type of flaw appears functionally correct to static analyzers because all individual authentication checks fire as intended. The error only surfaces when reasoning about developer intent versus actual code behavior across edge cases.

How does this change vulnerability management?

Zero-day discovery no longer requires human experts with deep codebase familiarity. AI can now surface logic flaws that traditional tools miss. This expands the types of vulnerabilities attackers can find at scale and shortens the window between an unknown vulnerability and active exploitation.

What should my team do right now?

Audit authentication and authorization code for hardcoded trust assumptions and bypass conditions. Update patching SLAs for internet-facing systems. Shift code review practices to cover logic flaws, not just memory safety and input validation. Apply zero-trust principles wherever code grants elevated access based on implicit conditions.

Get Started Today

Cloud Security Pros tracks the AI security landscape as it develops. Contact us to build your AI ready security program today. And subscribe to stay current on what this means for cloud security programs as the picture continues to evolve.

Written by Rocky Giglio Founder, Cloud Security Pros

Rocky Giglio is the founder of Cloud Security Pros, a consulting practice focused on AI-era cloud security. He works with security teams navigating the shift from traditional vulnerability management to AI-speed threat environments, covering the Cloud Security Alliance, SANS, and OWASP communities as the landscape evolves.

See all posts →
← Back to Blog