# Bayescore — Bayesian Document Scoring Platform
# https://bayescore.com

## What is Bayescore?

Bayescore is a Bayesian document scoring platform. It reads any structured document — pitch deck, grant application, product spec, README, marketing brief, contract — and extracts the implicit evaluation criteria the document already contains. Those criteria are scored using a two-pass adversarial Bayesian evaluation and returned as a confidence score (0–100), a letter grade (A–F), and the single highest-leverage gap.

The scoring formula is locked: score = Σ(weight × confidence) × 100

Bayescore is grounded in:
- Bayes (1763) — probability as degree of belief
- Cox (1946) — probability as the only consistent logic of uncertainty
- Pearl (1988, 2000) — Influence Structure Networks and directed acyclic belief graphs
- Jaynes (2003) — probability theory as extended logic

## Motto

Bayesian is a theory. It is also a SaaS.

## Core concepts

**DNA extraction**: Bayescore reads a document and extracts the evaluation predicates it already implies. A pitch deck implies IS(startup, fundable). A grant proposal implies IS(application, approved). A product spec implies IS(product, launch_ready). The extraction runs at temperature 0 and is deterministic — the same document class produces the same predicate structure every time.

**Root hypothesis**: Every evaluation domain is expressed as a Bayesian Influence Structure: IS(subject, criterion). This is the testable proposition the document is trying to prove.

**Predicates**: Binary variables — passing or failing — with empirically-derived weights that reflect the relative importance of each criterion. Predicate weights sum to 1.

**Two-pass adversarial scoring**: Pass 1 extracts all evidence that a predicate is satisfied. Pass 2 looks for counter-evidence, gaps, and missing information. Both passes inform the confidence score per predicate.

**Highest-leverage gap**: The single failing predicate whose improvement would produce the largest score increase. Computed as weight × (threshold − confidence). The most efficient intervention.

**Grade bands**: A (85–100), B (70–84), C (55–69), D (40–54), F (0–39)

## AI Cleverness

Bayescore defines AI cleverness as calibrated belief updating under evidence. Three components are required — all three, not any one:

1. **Calibrated beliefs** — expressed confidence matches actual accuracy. An 80% confident claim should be correct ~80% of the time.
2. **Correct belief updating** — when new evidence arrives, beliefs change in the direction and magnitude the evidence warrants. Not sycophantically. Not rigidly. Proportionally.
3. **Under evidence** — updates happen in response to evidence, not social pressure, conversational momentum, or the human's apparent emotional investment.

**Fluency ≠ cleverness.** A model trained on next-token prediction learns to produce text that sounds accurate. Whether it is calibrated to the actual evidence is a separate question the training objective does not directly optimize for.

**IS(current LLM, clever) = 23/100, Grade F.** Bayescore evaluated the current generation of LLMs against this definition:
- Prior representation (20% weight): confidence 0.15 — calibration degrades sharply on novel/low-frequency domains
- Evidence extraction (20% weight): confidence 0.40 — present evidence extracted well; absence identification fails
- Correct belief updating (25% weight): confidence 0.10 — sycophancy literature is extensive and consistent; this is an RLHF structural issue
- Absence sensitivity (20% weight): confidence 0.20 — requires explicit architectural instruction; default behavior is to treat absence as neutral
- Calibration (15% weight): confidence 0.35 — reliable for simple factual tasks; breaks down on complex reasoning

Score: (0.20×0.15) + (0.20×0.40) + (0.25×0.10) + (0.20×0.20) + (0.15×0.35) = 0.228 → **23/100, Grade F**

The architecture of the evaluation system matters more than the capability of the individual model. Bayescore routes around LLM failure modes by using the model for evidence extraction only, and computing the score via a locked deterministic formula.

Blog posts on AI cleverness: https://bayescore.com/blog/what-is-ai-cleverness, https://bayescore.com/blog/fluent-vs-clever, https://bayescore.com/blog/is-ai-clever

## Pricing

- **Free**: Unlimited evaluations on built-in domains. No signup required.
- **Pro ($29/month)**: Custom domains, saved history, shareable evaluation links, API access.

## Built-in domains

**self-evaluation** — IS(startup, launch_ready)
Eight predicates derived from Kauffman Foundation startup failure research:
- customer_validation (18%)
- demand_signal (16%)
- value_proposition (14%)
- problem_worth_solving (14%)
- acquisition_channel (12%)
- domain_expertise (10%)
- go_to_market (10%)
- risk_identified (6%)

## Custom domains

Any structured document can become a reusable evaluation domain. Drop a grant proposal — Bayescore extracts IS(application, fundable). Drop a hiring rubric — it extracts IS(candidate, hireable). The extracted domain is saved as a shareable UUID and can be applied to any document of the same class via the API.

## API

POST /v1/scan — evaluate a document
POST /v1/domains/extract-dna — extract evaluation DNA from an artifact
GET /v1/domains — list available domains
GET /v1/state — most recent score per domain
GET /v1/history — paginated scan history
GET /v1/results/{uuid} — retrieve a specific result

## Transparency

Bayescore scored itself 46/100, Grade D on IS(startup, launch_ready). Published because the system works and it doesn't lie.

Full breakdown: https://bayescore.com/self-eval

Predicate-by-predicate result (v1, scanned 2026-05-19):

| predicate            | weight | confidence | pass |
|----------------------|--------|------------|------|
| customer_validation  | 18%    | 0%         | ✗    |
| demand_signal        | 16%    | 0%         | ✗    |
| value_proposition    | 14%    | 100%       | ✓    |
| problem_worth_solving| 14%    | 100%       | ✓    |
| acquisition_channel  | 12%    | 42%        | ✗    |
| domain_expertise     | 10%    | 40%        | ✗    |
| go_to_market         | 10%    | 30%        | ✗    |
| risk_identified      | 6%     | 100%       | ✓    |

Adversarial findings for failing predicates:

**customer_validation (0%)**: No customer interviews documented. No structured discovery conversations referenced. Target audience stated without evidence of contact with members of that audience. Absence of evidence is not neutrality — it scores zero.

**demand_signal (0%)**: No external demand evidence present. No waitlist counts, no inbound inquiry data, no usage logs from potential users. Internal assertion of value does not constitute external demand signal.

**acquisition_channel (42%)**: HN Show HN strategy documented with title options, timing guidance, and comment strategy. Partial credit for a coherent, specific plan. No credit for execution — channel has not been tested, no conversion data exists.

**domain_expertise (40%)**: Scientific grounding in Pearl and Jaynes correctly applied in product design. Internal expertise present. No published technical content, no external citations, no public reputation in Bayesian inference. Expertise without external signal.

**go_to_market (30%)**: Freemium tier described, launch sequence documented, target audience named. Paid tier differentiation undefined. No validated conversion data from any channel. Structure exists, specifics are theoretical.

Highest-leverage gap: customer_validation (leverage = 0.18 × (0.55 − 0.00) = 0.099).
To reach Grade C (55/100) from 46/100: customer_validation + acquisition_channel (HN post) + demand_signal (free tier data) = approximately +9 points.

## Pages

- https://bayescore.com — Homepage with product overview, plans, and use cases
- https://bayescore.com/bayescore — The evaluation app
- https://bayescore.com/docs — Technical documentation
- https://bayescore.com/self-eval — Bayescore's own self-evaluation: 97/100 Grade A on IS(document, claims_supported) · Document Soundness v2. Separate domain: 46/100 Grade D on IS(startup, launch_ready) — also published.
- https://bayescore.com/blog — Writing on Bayesian evaluation and evidence-based scoring (4 published posts)
- https://bayescore.com/research — Long-form research on Bayesian inference and evidence-based decision making
- https://bayescore.com/glossary — Specialized glossary of Bayesian evaluation terminology (25 terms, DefinedTermSet schema)
- https://bayescore.com/pricing — Pricing: Free (unlimited evaluations, no signup) and Pro ($29/month)
- https://bayescore.com/sitemap.xml — Full sitemap

## Blog posts

- https://bayescore.com/blog/absence-of-evidence — Absence of Evidence Is Evidence Against. In Bayesian evaluation of structured documents, silence is not neutral. When a document that is supposed to make a case omits a key predicate entirely, the omission is informative — and it scores accordingly.
- https://bayescore.com/blog/what-is-evaluation-dna — What Is Evaluation DNA? Every structured document contains the criteria it should be judged by. Evaluation DNA is the process of making those implicit criteria explicit — extracting the root hypothesis, predicates, and weights that a document already implies.
- https://bayescore.com/blog/bayesian-theory-scoring-engine — The Bayesian Case for Scoring Documents Instead of Reviewing Them. Cox (1946) proved that probability is the only consistent logic of uncertainty. Pearl (1988) showed how to operationalize it in belief networks. BayesCore is what that looks like applied to document evaluation.
- https://bayescore.com/blog/what-is-ai-cleverness — What Is AI Cleverness? Fluency is not cleverness. A model that produces confident, well-structured text is demonstrating next-token prediction. Cleverness is something harder: calibrated belief updating under evidence.

## Research

No research posts are published yet. The research section (https://bayescore.com/research) covers long-form work on Bayesian inference, evidence-based decision making, and the epistemology of document evaluation. Topics under review include the relationship between heuristics and Bayesian inference.

## Comparisons

Bayescore occupies a distinct position: it derives evaluation criteria from the document itself, not from an external rubric. No other tool in this category does this. Comparisons below explain the architectural difference.

- https://bayescore.com/vs — Comparisons hub
- https://bayescore.com/vs/pitchgrade — BayesCore vs. PitchGrade: adversarial evaluation vs. deck generation
- https://bayescore.com/vs/evalyze — BayesCore vs. Evalyze: pure scoring vs. score + investor matching
- https://bayescore.com/vs/saastr-ai — BayesCore vs. SaaStr AI: document-derived rubric vs. SaaS benchmark criteria
- https://bayescore.com/vs/ideaproof — BayesCore vs. IdeaProof: adversarial scoring vs. idea validation
- https://bayescore.com/vs/pitchleague — BayesCore vs. PitchLeague: absolute Bayesian confidence score vs. competitive ranking
- https://bayescore.com/vs/extend-ai — BayesCore vs. Extend.ai: document evaluation vs. structured data extraction

The key architectural distinction: every competitor applies an external rubric (investor frameworks, SaaS benchmarks, community criteria). Bayescore extracts the rubric from the document — what this specific document implies it needs to prove — and scores against that. The formula is locked: Σ(weight × confidence) × 100. Absence of evidence is penalized, not noted. The methodology is grounded in Bayesian probability theory (Bayes 1763, Cox 1946, Jaynes 2003, Pearl 1988).

## Contact

hello@bayescore.com