The Question Frequentists Cannot Answer

The two definitions

Frequentism defines probability as long-run relative frequency. The probability of heads is 0.5 because, in a large number of coin flips, approximately half will land heads. This is well-defined and useful for designing controlled experiments. It requires a reference class — a collection of repeatable, comparable trials whose outcomes you can count.

Bayesianism defines probability as degree of rational belief. The probability of heads is 0.5 because you have no reason to believe one outcome is more likely than the other given what you know. This works for any proposition, including ones that are not repeatable. You can assign a probability to a hypothesis, a document, a single event — anything with a truth value and uncertain evidence.

Cox (1946) showed this is not merely a philosophical preference. It is a theorem. Any system of uncertain reasoning that obeys basic consistency constraints must follow the axioms of probability theory. Jaynes (2003) extended this into a full epistemology: probability theory as extended logic, applicable wherever beliefs must be updated by evidence.

These are not two equally valid perspectives. They answer different questions. The frequentist asks: how often does this happen? The Bayesian asks: what should I believe, given this evidence?

Document evaluation is the second question.

Why the frequentist framework breaks down here

When you hand a pitch deck to a VC, they are not running an experiment. There is no ensemble of identical startups to count outcomes across. There is one document, right now, and a question: does this document support the proposition that IS(startup, fundable) is true?

This is a question about a single event. The frequentist framework requires repeated trials. Applied to one document, it either has nothing to say — or it imports an implicit reference class without acknowledging it. Most rubric-based evaluation does the latter: the evaluator has seen enough decks to have priors, they apply those priors unconsciously, and they report the output as objective assessment.

The Bayesian framework makes this explicit. You state your priors. You update them with evidence. You report the posterior. The math is visible. The priors can be examined and challenged. This is strictly better than a rubric that buries its assumptions inside the evaluator's intuition.

What absence of evidence actually means

This is where the frameworks diverge most sharply, and where most scoring systems get it wrong.

A startup document that does not mention customer discovery contains no evidence that customer validation has occurred. Under a frequentist frame, this absence is neutral — there is nothing to measure, no data point to include, no test to run.

Under the Bayesian frame, absence of evidence is evidence. Not overwhelming evidence — but not neutral. The logic is direct: if customer validation had occurred, you would expect evidence of it in a document making the case for launch-readiness. The evidence is not there. That is a signal. It shifts the posterior toward zero confidence.

This is the Cox-Jaynes position, and it is the only logically consistent one. A document that does not mention a criterion is not ambiguous about that criterion. It is failing to provide evidence for it. Zero confidence is the correct score.

The frequentist would either omit the predicate entirely — biasing the overall score upward by ignoring a dimension — or call it "inconclusive." Inconclusive is a failure to distinguish absence from ambiguity. They are different things. Absence scores zero. Ambiguous evidence scores partial credit. The formula does not permit inconclusive as an output.

On priors: the standard objection

Bayesian analysis is subjective because priors are subjective. This is the standard frequentist objection, and it is correct about the first part.

Priors are chosen. In Bayescore, the predicate weights are priors — they represent the relative importance of each evaluation dimension derived from empirical research on failure rates. The customer validation weight of 18% is a prior. The go-to-market weight of 10% is a prior. They are published, specific, and disputable.

The frequentist alternative is not to eliminate priors. It is to hide them. Every rubric has implicit weights. Every evaluator has a model of what matters. Bayesian analysis forces that model into the open where it can be examined. That is an epistemological improvement, not a weakness.

You can disagree with the weights in Bayescore. The raw confidence scores per predicate are always visible. You can read the adversarial reasoning and decide if the evaluation missed something. The framework does not ask you to trust the output — it shows you everything that produced it.

The scoring formula is a theorem

The formula is: score = Σ(weight × confidence) × 100

This is not an arbitrary formula. It is the expected value of a Bayesian belief network over a set of binary predicates.

Each weight is the prior probability that this predicate is the decisive factor in the evaluation domain. Each confidence score is the posterior probability that this predicate is satisfied, given the evidence in the document. The sum is the expected degree to which the document's subject satisfies its evaluation criterion.

Pearl (1988, 2000) formalized exactly this structure: Bayesian networks, directed acyclic graphs over uncertain propositions where beliefs propagate via Bayes' rule. IS(startup, fundable) is a simplified Bayesian network — a single output node with a set of weighted evidence nodes feeding into it. The score is the weighted confidence sum after conditioning on the document's content.

The formula is locked because it follows from the structure. Changing it would require rejecting the theoretical foundation, not adjusting a parameter.

The adversarial structure

Two evaluation passes at temperature zero. Pass 1 asks: what evidence does this document provide that this predicate is satisfied? Pass 2 asks: what counter-evidence, gaps, or missing information would reduce confidence?

This is Bayesian reasoning made procedural. Pass 1 is the likelihood estimation — how much does this evidence raise confidence above prior? Pass 2 is prior resistance — what would a skeptical evaluator find to push back? The combination produces a posterior.

A frequentist evaluation has no equivalent structure. There is no "find counter-evidence" step in a t-test because t-tests are not trying to estimate belief. They are testing whether an observed result is sufficiently extreme to reject a null under an assumed data-generating process.

The two-pass adversarial structure is what makes absence of evidence score as zero rather than as mild doubt. Pass 2 specifically looks for what is not there. It finds nothing. The score reflects that.

Why this matters practically

The difference between the frameworks produces different outputs in three places:

Single documents. A pitch deck is one document. There is no statistical power. Frequentist methods that depend on sample size say nothing meaningful about a single case. Bayesian methods assign confidence based on evidence, not sample count.
Absence of evidence. Frequentism produces no data point when evidence is missing. Bayesianism produces a signal — a shift toward zero. For document evaluation, where what is absent is often as diagnostic as what is present, this difference is decisive.
Actionability. A p-value tells you whether to reject a null. It does not tell you what to do next. A Bayesian confidence score, combined with predicate weights, tells you exactly where to put effort. The highest-leverage gap — weight × (threshold − confidence) — identifies the single intervention that would produce the largest posterior shift. That is a repair map. A p-value is a verdict with no instructions attached.

Conclusion

Frequentism is the correct tool for controlled experiments with repeatable outcomes and large samples. Bayesianism is the correct tool for single-event evaluation under uncertainty with prior knowledge and explicit evidence.

Document evaluation is the second problem. It has always been the second problem. Most evaluation tools apply frequentist intuitions — rubrics, checklists, comparison to population norms — to a fundamentally Bayesian question. They get answers that look objective but are not examinable. They treat absence of evidence as neutral when it is not. They do not distinguish between what is present, what is ambiguous, and what is missing.

Bayescore applies the correct framework. The formula follows from Cox. The predicate structure follows from Pearl. The treatment of absence follows from Jaynes. These are not design choices — they are the logical consequences of taking seriously the question being asked.

The question is: given this document, what is the probability that this proposition is true?

The frequentist has no answer.

Get a calibrated probability score, not a review.

Drop any structured document. Bayescore extracts the evaluation DNA, runs two-pass adversarial scoring, and returns a Bayesian confidence score grounded in Cox, Pearl, and Jaynes — not reviewer opinion.

Score your document →