BAYESCORE CAGE · OPEN SOURCE

A confidence gate for MCP tool calls

Your agent stops acting on tool outputs it can't verify. Every MCP call comes back PROCEED, FLAG, or BLOCK — with a calibrated confidence you can reproduce.

GitHub pip install bayesian-cage

The problem

An agent calling a tool over MCP trusts two untrusted things blindly: the tool's output, and its own synthesis of it. There's no point in the loop that asks “should we actually act on this?” The cage is that point.

What you get back

{
  "result": { "...the real tool output..." },
  "_bayescore": {
    "decision": "BLOCK",
    "p": 0.26,
    "reason": "contradiction: tool said 'Sydney', sources say 'Canberra'",
    "observation_id": "9f2c1a…",
    "belief": { "model": "kb-server", "task": "lookup", "mean": 0.5, "n": 12 }
  }
}

Under enforce mode a BLOCK is withheld and returned to the host as an MCP tool error (isError: true); the default advisory mode labels the call but still passes it through.

Why not just ask the model?

Because the model's own confidence is miscalibrated. On a 55-task execution-graded text-to-SQL benchmark (phi-3 via Ollama, 5-fold, seed=7, 67.3% accuracy): the cage's calibrated confidence vs phi-3's own.

metric	cage	raw phi-3
ECE — calibration (lower better)	0.081	0.325
Brier (lower better)	0.174	0.322
catch-rate (higher better)	33%	0%
acts on wrong outputs (lower better)	12	18
AUROC (higher better)	0.544	0.583

phi-3's raw confidence isn't discriminative — nearly every answer comes back ~1.0, so AUROC is near chance either way. The cage's win is calibration: ECE ~4× tighter and a third of wrong answers caught, at zero correct answers blocked. Reproduce it yourself.

Run it

pipx install bayesian-cage          # MCP-proxy binary on PATH (Python 3.10+)

# point Claude Desktop / Cursor at the cage; it spawns your real MCP server
# as a stdio subprocess and gates every call:
#   "command": "bayesian-cage",
#   "env": {
#     "BAYESIAN_CAGE_DOWNSTREAM": "npx -y @modelcontextprotocol/server-filesystem ~/Documents",
#     "BAYESIAN_CAGE_MODE": "advisory",
#     "BAYESIAN_CAGE_VERIFIER": "filesystem"
#   }

Pure stdlib, no runtime dependencies, fully offline — bring your own LLM. MIT licensed.

The cage is the calibration layer for agent tool use — independent verification that sits outside the model's loop. Read the code.