Technology

Adversarial LLMs: AI v. AI

Treating an LLM as a trusted collaborator is a mistake. Treat it as a hostile witness instead. Challenge every assertion. Demand justification. And when possible, get a second opinion from a model trained differently.

Christian Koch

Dec 10, 2025

Adversarial LLMs: Why You Should Pit AI Against AI

This week I shipped a feature for filtering documents in our procurement app. Sounds simple. It wasn't. The feature itself—letting users filter by document type, date range, and status—is table stakes. But getting it right required building it three different ways before we found one that felt natural. Even the filter labels went through iterations. "Created" vs "Uploaded", "Type" vs "Category"? These small choices compound into user confusion or clarity.

What made this feature better: I had two LLMs fight about it the entire time.

I use Claude for most implementation work. But for this feature, I introduced a second opinion from OpenAI's Codex as an adversarial reviewer.

The workflow

Claude drafts a plan
Codex critiques the plan
I feed Codex's objections back to Claude
Claude revises or defends
Repeat through implementation

This wasn't collaborative pair programming. It was closer to cross-examination.

What the Adversarial Process Caught

Naming inconsistencies: Claude proposed filter labels that made sense in isolation but conflicted with terminology elsewhere in the app. Codex flagged the mismatch. We unified the language before writing any code.

Overcomplicated state management: The first implementation tracked filter state in three places. Codex asked why. There was no good answer. We simplified to one source of truth.

Missing edge cases: What happens when a user applies a date filter, then changes document type, and the date range no longer makes sense? Claude's initial plan didn't address it. Codex did.

UI patterns that don't scale: Both LLMs researched filter UI patterns independently. Claude favored dropdowns. Codex argued for chips with clear remove actions. We tested both and Codex was right—users need to see active filters at a glance.

The Pull Request Crucible: the adversarial process didn't stop at implementation. By the time code hit the PR, it had already survived multiple rounds of AI scrutiny. Then another AI reviewer examined the diff and found issues both previous models missed.

Each model has different training, different biases, different blind spots. Stack enough of them and the gaps start to close.

Why This Works

LLMs are confident. That's the problem. They'll propose a solution with the same tone whether it's brilliant or broken. They don't hedge. They don't say "I'm not sure about this part."

Treating an LLM as a trusted collaborator is a mistake. Treat it as a hostile witness instead. Challenge every assertion. Demand justification. And when possible, get a second opinion from a model trained differently.

The adversarial approach surfaces:

Hallucinations. One model invents a pattern, the other can't find documentation for it.
Overconfidence. Forced to defend a choice, the model sometimes admits alternatives exist.
Gaps. What one model assumes, another questions.

Practical Takeaways

Don't trust the first answer. Even if it looks right. Especially if it looks right.

Use multiple models. They fail differently. Claude is verbose and sometimes over-engineers. Codex is terse and sometimes under-specifies. Together they triangulate toward better solutions.

Make them argue. Feed critiques back explicitly. "Codex says your state management is too complex. Defend or revise." This forces the model out of its default agreeable mode.

Stack your reviewers. AI-assisted code review catches real issues. Multiple AI reviewers catch more. Human review remains essential, but arrives at cleaner code.

I spent more time on this feature than a simpler approach would have required. Three implementations. Multiple AI reviewers during development. Another on the PR. But the shipped feature is better. The code is cleaner. The UX tested well.

Multiple LLMs seem like they might slow you down but a willingness to challenge them and let them challenge each other produced work I'm confident in.

Your AI assistant wants to help. It will agree with you. It will validate your assumptions. That's not useful.

Make it fight.

Technology AI LLM

Adversarial LLMs: AI v. AI

Adversarial LLMs: Why You Should Pit AI Against AI

The workflow

What the Adversarial Process Caught

Why This Works

Practical Takeaways

Similar posts

Two Patterns That Make OTEL Actually Useful

The Sunk-Cost Trap in AI Assistants

Trace Decoration for Fun and Profit

Adversarial LLMs: AI v. AI

Adversarial LLMs: Why You Should Pit AI Against AI

The workflow

What the Adversarial Process Caught

Why This Works

Practical Takeaways

Similar posts

Two Patterns That Make OTEL Actually Useful

The Sunk-Cost Trap in AI Assistants

Trace Decoration for Fun and Profit

Get notified on new fleet insights