Winning the AI Cyber Race: Verifiability is All You Need
When speaking at BSides Frankfurt in Summer 2025 — my hometown conference — I opened with a question that’s been haunting me: Are we entering an AI winter, or is this the beginning of springtime for AI in cybersecurity?
After the NXS supply chain attack, Anthropic’s report on nation-states weaponizing AI, and DARPA’s AIxCC cyber challenge results, I’m convinced: spring is here, and offense is winning.
But not for the reasons you think. It’s not about better models or more compute. It’s about something far more fundamental: verifiability.
A Bot That Changed Everything
Let me take you back to 2001. I was 16, playing Counter-Strike obsessively, convinced I was pretty good. Then JoeBot showed up. A 16-year-old kid from another German city had built a neural network bot as part of his Abitur project. The paper is still on archive.org — go read it.
Here’s what blew my mind: this tiny neural network had only 77 parameters. Compare that to DeepSeek today with 700+ billion. Yet that bot made the game unplayable. It was that good.
The kid’s biggest challenge wasn’t building the network — it was getting training data. He tried simulating games, recording matches, scraping whatever he could to train his model. He eventually cracked it.
That experience sparked my obsession with neural networks. I bought my first book on backpropagation. I started building my own experiments. But more importantly, it taught me something crucial about machine learning that took the industry another two decades to fully understand: the bottleneck isn’t generation — it’s verification.
The Verifier’s Law
Here’s the mental model OpenAIs Jason Wei blogged about earlier in the year that’s reshaping how I think about AI in security:
The ease of training AI to solve a task is proportional to how verifiable that task is.
Think about it. Sudoku is hard to solve but trivially easy to verify — just check if each row, column, and box has unique digits. Chess? Easy to verify: did you win or lose? Diet advice? Easy to generate, impossible to verify for years.
This isn’t just theory. Look at every major AI benchmark that’s been “solved” in the past five years: math competitions, coding challenges, medical exams. They all share one trait — mechanical verification with oracles is cheap.
The implication is profound: AI will win where verification is easy. Defense will lose where verification is hard.
Why Offense Is Crushing It
Cyber offense has some good verifiers built in. Did you pop a shell? Binary. Did the exploit work? Binary. Did you exfiltrate the data? Binary. Is whoami returning root? Binary.
The numbers are staggering. OpenAI’s system card shows models went from solving 20% of CTF challenges (GPT-4) to over 90% (o3) in just eight months. That’s not incremental improvement — that’s a phase change.
On HackerOne’s leaderboard, xbow - an AI startup became the #1 vulnerability finder. Not a human. A machine. Even if they’re only solving 10% of vulnerability classes, that’s already transformational value.
But here’s the critical insight: raw LLMs aren’t doing this alone. Agentic architectures are.
And it gets worse. The Incalmo paper showed that by simply splitting one agent into an orchestrator plus multiple executors, multi-host attack success rates jumped from 0/10 to 9/10 partial successes. One architectural tweak. Nine successful compromises.
DARPA’s AIxCC results tell the same story: autonomous systems went from 37% vulnerability detection to 77% in one year. Patch success doubled from 25% to 61%.
The pattern is clear: agents + tools + verifiers = results. Offense gets free oracles. Defense? We’re drowning in noise.
Why Defense Is Struggling
Here’s the uncomfortable truth: most defensive security tasks sit in the hard to verify quadrant.
SIEM = Suspicious Information Every Minute. Run the math: 1 million events per day, 1 in 100,000 malicious, a detector with 99% true positive rate and 1% false positive rate. Result? 9 true positives buried under ~10,000 false positives. Precision: 0.09%. If you train AI models on this signal, you get garbage.
GRC and third-party risk? Even worse. No objective truth. “Secure enough” is subjective. Assessments take weeks. Every vendor is unique. Self-reported data is noisy. Try training a model on that.
Forensics? You can generate hypotheses easily. Verifying them? Good luck stitching together timeline events across millions of log entries. Precision stays low.
This is why AI security copilots promising “99% accuracy” are fundamentally misleading. Without crisp verifiers, they’re just serving you spaghetti alerts at machine speed.
Agents + Oracles = The Unlock
Here’s the good news: we’re starting to see what happens when you wrap AI agents with proper tool scaffolding.
The contrast between two recent projects tells the story perfectly.
Google’s Sec-Gemini applies LLMs to forensic timeline analysis but struggles at ~12% precision because stitching log events into attack narratives lacks mechanical ground truth — the model must make judgment calls without definitive validation.
Microsoft’s Project Ire went further by wrapping the LLM with proper scaffolding: sandboxing (Project Freta), decompilers (Ghidra, angr), control-flow graph reconstruction, and a “chain of evidence” validator agent. The result? 98% precision and 83% recall on Windows driver classification. The verifiers do the heavy lifting — the LLM just orchestrates.
Same caliber models. Wildly different outcomes. The verifier is the variable.
Project Ire’s architecture is instructive: every claim the agent makes maps back to tool outputs. A separate validator agent cross-checks claims against expert-defined statements. When the system can’t verify something — like a suspected anti-debugging behavior — it flags the claim as unsupported rather than hallucinating confidence.
This is the blueprint. Don’t ask “how do I add AI to my SOC?” Ask “what mechanical verifiers can I build so AI agents can actually help?”
The Path Forward: Engineer Oracles, Not Detections
The breakthrough insight is this: we can push defensive tasks into the “easy to verify” quadrant by building better mechanical verifiers. Here are six classes that work:
1. Canary Verifiers Honeytokens, fake credentials, tripwire files. Binary signals. If it’s accessed, something is very wrong. Deploy them widely.
2. Provenance & Attestation Proofs Signed images, SLSA attestation, SBOMs. Cryptographic verification. Did this artifact come from where it claims? Binary answer.
3. Replay Harnesses (Detection Engineering) This is massively underinvested. Take your SIEM logs, your IaC configs, replay them offline. Test your detection rules against known attacks. Get binary outcomes: did the rule fire correctly or not?
4. Invariants / Policy-as-Code Rego, OPA, Checkov, Cursor rules. Define ground truth declaratively. “This bucket must not be public.” Binary check. Infinitely scalable.
5. Sandboxing & Dynamic Analysis Yes, people keep saying sandboxing is dead. They’re wrong. We need better gyms — dynamic environments that represent real enterprise complexity, not just isolated test VMs. Behavior is ground truth.
6. Graph-Based Verifiers Attack graphs, reachability analysis, cloud security posture. 95% of vulnerabilities aren’t reachable in production. Build the graph. Make reachability a binary question.
Also there’s growing research on “LLM-as-a-Judge” — using AI to build the verifiers themselves. Promising, but still only when anchored to mechanical ground truth. Without that anchor, it’s turtles all the way down.
Three Takeaways for Security Leaders
1. Stop buying detection theater. Start demanding verification.
When evaluating AI security vendors, ask one question: How are you mechanically verifying your outputs? If they can’t answer clearly, walk away. Precision beats coverage every time.
2. Adopt offense-first thinking.
If AI offense can break it in hours, your SOC won’t stop it in weeks. Run continuous AI red teaming. Let attack results guide your roadmap. The best way to understand your verifier gaps is to let an attacker — even an AI one — show you.
3. Build cyber gyms.
Today’s LLMs fail to generalize on private enterprise data because they’ve never seen it. Either give vendors access to train on your environment (I know, the pushback is real), or invest in shared, verifiable testbed environments that represent real-world complexity. Without this, defense AI stays unfit.
The Race Is On
I opened this talk back than asking whether we’re entering an AI winter or AI spring in cybersecurity. After everything I’ve seen — the CTF benchmarks, the autonomous hacking research, the gap between offense and defense verification quality — my answer is clear and it didnt changed since than.
Offense is having its spring. Defense is still serving spaghetti.
The fundamental insight isn’t about better models or more data. It’s about who owns the oracles. Who has the mechanical verifiers that produce binary, reliable signals at scale.
Hackers have them built-in. Defenders must build them deliberately.
Who owns the verifiers wins the AI race.








Sergej — thank you for writing this. The core reframing really lands: stop obsessing over "AI in the SOC" and start engineering verifiers. Your "Verifier's Law" setup, the offense-vs-defense oracle asymmetry, and the Sec-Gemini vs Project Ire contrast are unusually actionable. And yes: many defenders are still serving spaghetti alerts at machine speed.
That said, I'd like to broaden the discussion with a few boundary conditions — not to diminish the verifier roadmap (I think it's essential), but to clarify where it works brilliantly and where it may hit a hard limit.
1) The grammar problem: attackers don't just break rules — they weaponise them
Verification shines when "correct vs incorrect" is mechanically decidable. But modern attacker tradecraft is often grammatically correct while being semantically malicious.
Living-off-the-land, BEC, legitimate admin tooling misuse, valid auth flows, approved APIs — these pass rule-based checks because they aren't rule violations. They're rule exploitation. The attacker constructs "sentences" your grammar explicitly permits.
You can't enumerate your way to safety in a generative space. Verifiers work brilliantly for known attack classes — but the attacker's job is precisely to construct tomorrow's sentences your grammar didn't anticipate.
So the question becomes: what do we do when the decisive property is intent, not syntax?
2) Quis custodiet: the recursion doesn't resolve — it moves up a layer
Even when we anchor to "mechanical ground truth," every verifier encodes assumptions: what counts as evidence, what thresholds mean, what threat model is implied, what the validator is allowed to ignore.
At scale, the risk isn't only hallucination — it's institutionalised false certainty, where verifier blind spots become invisible because "the system verified it." We don't eliminate trust; we relocate it into the verifier stack, which then becomes part of the attack surface.
A practical extension of your thesis: we need verification for the verifiers — continuous adversarial testing of oracles, explicit "oracle threat models," and metrics for verifier drift and brittleness (not only detector accuracy).
3) Autonomy changes the math (L1–L2 is not L4–L5)
Your framework is extremely strong for L1–L2 autonomy (assistive AI), where humans remain the trust anchor and verifiers improve throughput.
But at higher autonomy levels — autonomous defenders facing autonomous attackers — the defender must adapt its verification criteria under adversarial pressure, in real time, while the attacker probes those very criteria. Verification doesn't scale linearly with autonomy; it becomes existential.
This raises a question we don't ask enough: what does defensive AI architecture look like when verification itself becomes the attack surface?
A complementary corollary
I love your line "who owns the verifiers wins the AI race." I'd add:
- Where verification is possible, verifiers are decisive. Build the gyms.
- Where verification is fundamentally ambiguous, resilience becomes decisive.
Meaning: pair the verifier roadmap with architectures that assume oracle failure and stay safe anyway:
- Micro-segmentation as consequence bounding — even if detection fails, lateral movement is constrained
- Immutable infrastructure — instead of verifying compromise, assume it and replace on schedule; recovery speed beats detection accuracy
- Cryptographic compartmentalisation — make exfiltration less consequential, not merely more detectable
Not "detection vs prevention," but verification + survivability. The answer to oracle asymmetry might not be "build better oracles," but "design systems where oracle failure is survivable."
Two questions if you extend this
1) Which parts of the semantic/intent layer do you think can be pulled into verifiable territory — and which remain irreducibly judgmental?
2) Do we need an explicit "verifier stack security" discipline (threat modelling and red-teaming the oracles themselves) as a first-class capability?
Looking forward to continuing this conversation — perhaps over coffee at the next conference.
—Matthias
Great post Sergej. Counter-strike was also my first exposure to programming so it was really fun to read about your experience with the neutral network bots! Makes perfect sense to tie AI uses cases to outcomes that are easy to verify. This is the way forward.