A confidence gate for MCP tool calls
Your agent stops acting on tool outputs it can't verify. Every MCP call comes back PROCEED, FLAG, or BLOCK — with a calibrated confidence you can reproduce.
The problem
An agent calling a tool over MCP trusts two untrusted things blindly: the tool's output, and its own synthesis of it. There's no point in the loop that asks “should we actually act on this?” The cage is that point.
What you get back
{
"result": { "...the real tool output..." },
"_bayescore": {
"decision": "BLOCK",
"p": 0.26,
"reason": "contradiction: tool said 'Sydney', sources say 'Canberra'",
"observation_id": "9f2c1a…",
"belief": { "model": "kb-server", "task": "lookup", "mean": 0.5, "n": 12 }
}
}Under enforce mode a BLOCK is withheld and returned to the host as an MCP tool error (isError: true); the default advisory mode labels the call but still passes it through.
Why not just ask the model?
Because the model's own confidence is miscalibrated. On a 55-task execution-graded text-to-SQL benchmark (phi-3 via Ollama, 5-fold, seed=7, 67.3% accuracy): the cage's calibrated confidence vs phi-3's own.
| metric | cage | raw phi-3 |
|---|---|---|
| ECE — calibration (lower better) | 0.081 | 0.325 |
| Brier (lower better) | 0.174 | 0.322 |
| catch-rate (higher better) | 33% | 0% |
| acts on wrong outputs (lower better) | 12 | 18 |
| AUROC (higher better) | 0.544 | 0.583 |
phi-3's raw confidence isn't discriminative — nearly every answer comes back ~1.0, so AUROC is near chance either way. The cage's win is calibration: ECE ~4× tighter and a third of wrong answers caught, at zero correct answers blocked. Reproduce it yourself.
Run it
pipx install bayesian-cage # MCP-proxy binary on PATH (Python 3.10+)
# point Claude Desktop / Cursor at the cage; it spawns your real MCP server
# as a stdio subprocess and gates every call:
# "command": "bayesian-cage",
# "env": {
# "BAYESIAN_CAGE_DOWNSTREAM": "npx -y @modelcontextprotocol/server-filesystem ~/Documents",
# "BAYESIAN_CAGE_MODE": "advisory",
# "BAYESIAN_CAGE_VERIFIER": "filesystem"
# }Pure stdlib, no runtime dependencies, fully offline — bring your own LLM. MIT licensed.
The cage is the calibration layer for agent tool use — independent verification that sits outside the model's loop. Read the code.