Pearl's Bayesian Networks#
Judea Pearl's framework for probabilistic reasoning in directed acyclic graphs, introduced in Probabilistic Reasoning in Intelligent Systems (1988) and extended via do-calculus and causal inference in Causality (2000). Pearl established how beliefs propagate through a graph of conditionally dependent variables, and how interventions (do-calculus) differ from observations. BayesCore's evaluation architecture draws on this framework — predicate nodes correspond to variables in a Bayesian network, and the scoring formula formalizes how evidence at the predicate level updates confidence in the root hypothesis.
Posterior Probability#
In Bayesian inference, the updated degree of belief in a hypothesis after incorporating evidence: P(H|E). Contrasts with the prior — the baseline belief before any evidence is considered. BayesCore's confidence scores are structured analogously: an LLM reasons over document evidence at temperature 0, defaulting to near-zero when evidence is absent and assigning higher values only when explicit, verifiable support exists. A score of 0.85 means the document provides strong explicit evidence for that predicate; 0.2 means the evidence is weak, absent, or contradictory.
Posterior Quality#
The accuracy and calibration of beliefs after evidence is incorporated. High-quality posteriors are proportional to the evidence — neither overconfident nor underconfident. Low-quality posteriors arise from three sources: insufficient evidence (evidence bottleneck), poor updating methods (inference bottleneck), or coordination overhead that degrades belief transmission across agents. Posterior quality — not evidence volume or team size — is the proximate determinant of decision quality. In BayesCore's evaluation model, every resource allocation question ultimately reduces to: will this intervention improve posterior quality at the current bottleneck step?
Predicate#
A binary variable representing one evaluation criterion in an evaluation domain. Each predicate poses a yes/no question about the document — for example, "Does the document provide validated demand signals?" or "Is the acquisition channel specified and defensible?" Predicates have two properties: a weight (relative importance in the domain) and a confidence score (LLM-assigned confidence 0.0–1.0 based on evidence in the document). The built-in Document Soundness domain has 8 fixed predicates. Pro users can extract custom predicates from any source document via DNA extraction.
Predicate-Based Evaluation#
An evaluation methodology that decomposes a root hypothesis into independently weighted binary criteria — predicates — each assessed separately and combined via the weighted scoring formula. Predicate-based evaluation differs from holistic scoring (a single overall impression) and from checklist scoring (all items equally weighted) by treating each criterion as a probabilistic variable with its own weight and confidence. The result is a decomposable, auditable score with a clear attribution of where evidence is strong or weak.
Predicate Weight#
A coefficient (0–1) assigned to each predicate reflecting its relative importance to the root hypothesis. Weights sum to 1 across all predicates in a domain. In the built-in Document Soundness domain IS(document, claims_supported), the eight predicate weights are: central_claim 18%, evidence_support 16%, scope_defined 14%, assumptions_stated 14%, success_criteria 12%, risks_acknowledged 12%, next_steps 8%, internal_consistency 6%. Custom domains (Pro) derive weights from the source document.
Prior Correction#
The operation performed by the IntentEngine to transform a raw LLM output into a Bayesian posterior. The LLM at temperature=0 produces P(E|H) — a calibrated probability distribution expressing how likely each intent is given the query evidence. Prior correction multiplies each intent's likelihood by its prior P(H) from BeliefState, then renormalises so probabilities sum to 1.0. The result is P(H|E) ∝ P(E|H) · P(H) — a posterior that combines the LLM's in-context judgment with the kernel's accumulated history of what has worked. Without prior correction, every routing decision treats all intents as equally likely before evidence is considered. With it, a task type that has succeeded 80% of the time gets a boosted posterior relative to one that has succeeded 30% of the time, even with identical LLM likelihoods.
Prior Overconfidence (Organizational)#
A systematic bias in which resource abundance shifts organizational priors toward overconfidence — the implicit prior that the current approach is working. When resources are plentiful, the evidence threshold required to trigger belief revision rises: teams can continue on a failing path longer before consequences force an update. Resource scarcity counteracts this by making every negative signal immediately costly, enforcing aggressive belief updating. The Bayesian explanation for why constrained teams frequently outperform well-resourced ones on per-unit output: scarcity suppresses this bias by collapsing the gap between evidence arrival and belief revision. The 46/100 self-score BayesCore published at launch is a deliberate defense against this bias.
Prior Probability#
The baseline degree of belief in a predicate before any document evidence is considered. In BayesCore's evaluation, priors reflect the base rate of predicate satisfaction for a given document class. The two adversarial passes update the prior into a posterior. BayesCore uses a non-informative prior by default — the document must establish predicate confidence through its own content, with no assumed credit for belonging to a particular category.