Documentation

Everything you need to connect your tools, run confidence-gated pipelines, and verify agent output.

Getting Started

BayesCore is a hosted service — no installation, no account required for your first evaluation. Go to bayescore.com/bayescore and paste any structured document. That's the entire setup.

Your first evaluation

Describe the task or paste content you want the kernel to verify. The flow is three steps:

State your intent or paste your input. Type your task or paste the content you want verified.
The kernel routes your intent. BayesCore reads the document's implicit evaluation structure — the questions any rigorous evaluator would derive from the document itself — and surfaces a root hypothesis and weighted predicates. You can review and edit before saving.
Pipeline runs with confidence gates. Each step is gated on the agent belief state before executing. Confidence, gate decision, and trace are returned for every step.

Absence of evidence reduces confidence toward zero — the kernel does not paper over missing evidence.

API access

The scoring engine is available as a REST API. See the API Reference section below for endpoint documentation and example payloads. Programmatic access requires an API key — contact [email protected].

How Scoring Works

BayesCore's evaluation architecture is a directed acyclic graph of predicate nodes — a structure informed by Bayesian network theory (Pearl, 1988). Each domain encodes one root hypothesis in IS(subject, criterion) notation with a set of weighted binary predicates. Weights are fixed per domain and set by the domain definition. No empirically derived priors are used; the document evidence alone sets confidence per predicate.

The LLM runs two passes over your document:

Supportive pass: extracts all evidence that a predicate is satisfied.
Adversarial pass: looks for counter-evidence, gaps, and missing information.

The scoring formula is locked: score = Σ(weight × confidence) × 100. No editorializing. No bonus points. The LLM sets confidence. Weights are fixed.

Grade bands

Score	Grade	Meaning
85–100	A	Strong evidence across all predicates
70–84	B	Good evidence with minor gaps
55–69	C	Mixed evidence, significant gaps
40–54	D	Weak evidence, multiple failing predicates
0–39	F	Insufficient evidence — document fails to demonstrate the criterion

Scientific basis: Bayes (1763), Cox (1946), Pearl (1988, 2000), Jaynes (2003).

Domains

A domain is a root hypothesis plus a set of weighted predicates. BayesCore ships with one built-in domain. All other domains are created via DNA extraction from your own artifacts.

task-output (built-in)

IS(output, verified)

The only built-in domain. Evaluates any structured text against eight universal soundness criteria — works on agent outputs, research notes, product specs, READMEs, security policies, contracts, and any other structured artifact.

Predicate	Weight	Question
central_claim	18%	Is the central claim or thesis explicitly and specifically stated in the document?
evidence_support	16%	Is the central claim supported by concrete, verifiable evidence present in the document?
scope_defined	14%	Is the intended audience, scope, or use case of the document clearly stated?
assumptions_stated	14%	Are the key assumptions underlying the argument or proposal made explicit?
success_criteria	12%	Are success criteria, desired outcomes, or measurable goals defined?
risks_acknowledged	12%	Are known risks, limitations, or failure modes explicitly acknowledged?
next_steps	8%	Is there a clear next step, recommendation, or call to action?
internal_consistency	6%	Is the argument internally consistent with no contradictory claims?

domain_key: task-output — use this key in API calls.

Custom Domains

Any structured text can become an evaluation domain. Drop an agent output — BayesCore extracts IS(output, verified). Drop a product spec — it extracts IS(product, claims_supported). The extracted domain is reusable: once extracted from one artifact, it applies to all artifacts of the same class.

Custom domains are identified by a share_uuid returned when the domain is saved. Pass this UUID as domain_key in API calls. Use GET /api/domains to list your saved domains and retrieve their UUIDs.

DNA extraction runs at temperature 0. The same class of artifact produces the same predicate structure on every extraction.

API Reference

POST/api/extract-dna

Extract evaluation DNA from any structured artifact. Returns a root hypothesis and ranked predicates. Run this first — the result can be saved as a reusable domain.

Request body

{
  "text": "Full text of the artifact..."
}

Response

{
  "name": "Grant Application Evaluation",
  "root_hypothesis": "IS(output, verified)",
  "predicates": [{ "question": "...", "importance": "critical" }]
}

importance values: critical, high, medium, low — converted to weights on save.

POST/api/scan

Run an evaluation on a document. Use task-output for the built-in domain, or a custom domain's share_uuid for user-created domains.

Request body

{
  "domain_key": "task-output",  // or custom share_uuid
  "document": "Full text of the artifact..."
}

Response

{
  "domain_key": "task-output",
  "score": 34,
  "grade": "F",
  "summary": "...",
  "findings": [...],
  "highest_leverage_gaps": [...]
}

Optional header: X-AIOS-API-Key: your-key

GET/api/domains

Returns the list of available domain manifests.

GET/api/state

Returns the most recent scan result per domain — score, grade, timestamp.

GET/api/history

Paginated scan history. Query param: ?page=1

Desktop App — Bayesian Kernel

The BayesCore desktop app runs a local Bayesian kernel — a five-module Python engine that implements the full Bayesian cycle: prior → likelihood → posterior → action → observe → update. The kernel is grounded in Bayes' theorem: P(H|E) = P(E|H) · P(H) / P(E).

Every component maps to one structural link in the theorem. None of this is metaphorical: the code computes P(H|E) via Beta distributions and the conjugate update rule.

BeliefState — `belief.py`

P(H): the prior distribution, maintained per task type

The kernel stores a distinct Beta(α, β) distribution for each key — task type, intent, routing target. New keys initialise at Alpha=1.0, Beta=1.0 (the uninformative prior: 50/50, maximum uncertainty). The posterior mean is α / (α + β).

The conjugate update rule fires on every outcome:

# On task completion — closes the Bayesian feedback loop
belief_state.observe("document_eval", success=True)   # α += 1
belief_state.observe("document_eval", success=False)  # β += 1

At every session boundary, a forgetting factor decays pseudocounts toward the uninformative prior — preventing early observations from permanently dominating:

# FORGETTING_FACTOR = 0.9 — applied at session end
α_new = 1.0 + (α - 1.0) × 0.9
β_new = 1.0 + (β - 1.0) × 0.9

Belief state is serialised to disk (JSON) and survives app restarts. This is the kernel's compounding moat: the more you use it, the more calibrated P(success | task_type) becomes.

IntentEngine — `intent.py`

P(H|E): posterior intent distribution after prior correction

The IntentEngine applies Bayes' theorem to intent routing. The LLM at temperature=0 produces a calibrated probability distribution over intents — this is the likelihood P(E|H). The engine multiplies each intent's likelihood by its prior P(H) from BeliefState, renormalises, and produces the posterior P(H|E):

# P(intent|query) ∝ P(query|intent) × P(intent)
posterior[intent] = llm_probability[intent] × belief_state.p_success(intent)
posterior = normalise(posterior)  # sum to 1.0

Action is gated by INTENT_COMMIT_THRESHOLD = 0.72. If the top posterior exceeds 0.72, the kernel executes. If not, it asks a minimal clarifying question designed to maximise entropy reduction over the ambiguous intents — not an open-ended “what do you mean?”

# Decision rule — not a heuristic, a Bayesian decision gate
if top_posterior >= 0.72:  # INTENT_COMMIT_THRESHOLD
    route_to_agent(top_intent)
else:
    ask_entropy_reducing_question(ambiguous_intents)

ProbabilisticScheduler — `scheduler.py`

EV-ranked execution order — same beliefs, different application

The scheduler orders tasks by expected value, using the same BeliefState posterior means that power the IntentEngine. This closes the coherence requirement: a single belief model determines both what to do and in what order.

# EV formula — every task in the queue is ranked by this
EV = P(success | task_type) × utility - cost
    # P(success) = belief_state.p_success(task_type)
    # utility and cost are set at task creation

On task completion, the scheduler calls belief_state.observe(task_type, success) — automatically, without any manual rating step. This is what closes the Bayesian feedback loop end-to-end.

Feedback Loop

The moat: every outcome sharpens the belief model

The kernel captures implicit signals — not just explicit thumbs-up ratings. A follow-up question implies an incomplete result. A re-run implies the first result was insufficient. Moving to the next task implies success. These signals all route to observe().

The explicit feedback endpoint accepts structured signals from the Electron UI:

POST /kernel/feedback

{
  "task_id": "task_abc123",
  "useful": true,          // → observe(intent, success=True)
  "intent_correct": true,  // → observe(routing.{task_type}, success)
  "result_quality": 0.85  // → observe(quality.{task_type}, partial credit)
}

Wrong routing (intent_correct=false) also penalises routing.{{task_type}} — so the kernel learns from misroutes, not just from task quality.

Kernel API — `main.py`

The kernel runs as a local FastAPI server. The Electron app communicates with it over a loopback socket on an ephemeral port assigned at startup.

POST/kernel/run

Submit an intent. Runs the full prior → infer → route → schedule → execute cycle. Returns the result if committed, or a clarification question if the posterior is below threshold.

Request body

{
  "user_input": "Evaluate this document for logical consistency",
  "document": "...",          // optional
  "domain_key": "task-output", // optional
  "history": [...]                // last 6 turns for clarification context
}

POST/kernel/feedback

Submit outcome signals for a completed task. Updates BeliefState via the conjugate rule. See Feedback Loop above for field semantics.

GET/kernel/beliefs

Returns the current belief state — all Beta distributions by key, with α, β, and posterior mean P(success).

GET/kernel/queue

Returns the EV-ranked task queue. Each task shows task_type, EV, utility, cost, and status.

GET/kernel/agents

Returns the list of available agents — eval_agent, research_agent, summarise_agent, claim_scorer_agent — with their routing keys and supported domain_keys.

Documentation

Getting Started

Your first evaluation

API access

How Scoring Works

Grade bands

Domains

task-output (built-in)

Custom Domains

API Reference

Desktop App — Bayesian Kernel

BeliefState — belief.py

IntentEngine — intent.py

ProbabilisticScheduler — scheduler.py

Feedback Loop

Kernel API — main.py

BeliefState — `belief.py`

IntentEngine — `intent.py`

ProbabilisticScheduler — `scheduler.py`

Kernel API — `main.py`