The Adversarial Review: How Two Passes Catch What One Cannot

Why single-pass evaluation fails

When an LLM reads your document in a single pass, it finds the evidence you provided and evaluates it. What it does not find is what you did not write — the missing customer interviews, the undocumented demand signal, the acquisition channel you have not thought through yet.

This is not a failure of the model. It is a structural limitation of the evaluation design. A single pass through a document can only evaluate what is present. It cannot evaluate absence. And in early-stage evaluation, absence is usually the most important signal.

The result is systematic overscoring. A document that confidently describes a well-defined problem, a specific target customer, and a unique insight will score well on those predicates — even if the three highest-weight predicates (customer validation, demand signal, acquisition channel) are entirely absent from the document.

The advocate pass

The first evaluation pass is supportive. It reads the document as an advocate would: looking for every piece of evidence that supports each predicate. If the founder mentions they spoke with potential customers, that counts. If they have a waitlist, that counts. If they can name a specific competitor, that counts.

The advocate pass is generous but honest. It does not invent evidence — it extracts what is genuinely present. A document with strong evidence scores well on the advocate pass. A sparse document scores neutrally.

The adversarial pass

The second pass is adversarial. It reads the same document from the perspective of a skeptical reviewer — someone whose job is to find the gaps, the missing evidence, and the unvalidated assumptions. For each predicate, the adversarial pass asks: what would count as evidence here, and is it present?

The adversarial pass is specifically looking for what the advocate pass could not see: the absence of customer names in a document that claims customer validation, the lack of a specific number in a document that claims a demand signal, the vague "we'll figure out distribution later" that fails the acquisition channel predicate.

This pass is why Bayescore scores sparse documents lower than they would score in a single-pass evaluation. The adversarial pass surfaces the gaps that founders — and single-pass models — systematically overlook.

How the two passes combine

The final confidence score for each predicate is determined by weighing the supporting evidence found in the advocate pass against the gaps identified in the adversarial pass. Strong evidence from the advocate pass can overcome minor gaps from the adversarial pass. But a predicate where the adversarial pass finds no evidence at all — where the advocate pass found nothing and the adversarial pass confirms nothing — scores near zero regardless of what surrounds it.

The two-pass structure is a direct consequence of how absence works as evidence. A single-pass model that finds no customer interviews treats that absence as neutral — it simply did not encounter evidence either way. The adversarial pass treats the same absence as informative: a document that claims customer validation but names no customers, quotes no feedback, and cites no discovery sessions has failed to demonstrate the predicate. That failure is information. A single pass misses it; the adversarial pass surfaces it.

If you want a score that flatters, use a single-pass evaluator. If you want a score that maps where you actually need to do work, you need both passes.

Run the two-pass evaluation

Paste your document and Bayescore runs both passes — extracting evidence and surfacing gaps — returning a calibrated score and the single highest-leverage finding.

Evaluate your document →
New posts
Get new posts when they drop.
No cadence. No newsletter. Just new writing on evaluation, evidence, and building with less waste.