applications
playbooks with steps, metrics, and templates. pick a lane below and ship a pilot in days.
target outcomes
unsupported claim rate down at fixed accuracy
calibrated abstention when tension crosses threshold
faster human review due to attached provenance
hallucination control
stop fluent error by surfacing compression sites and gating unsupported claims.
tools
- MirrorNet: show compression choices to users and models.
- Hallucinet: flag outputs tied to risky compression paths.
playbook
- tag compression sites: retrieval, summarization, tool calls, post edit
- compute per site tension score τ and set threshold τ*
- attach provenance to each claim and enable abstention on fail
- report unsupported claim rate, accuracy, abstention rate, review time
hallucination_control:
compression_sites: [retrieve, summarize, tool, generate, redact]
tau_per_site: true
tau_star: 0.7
abstain_if: ["no_entailment", "tau > tau_star"]
metrics: [unsupported_claim_rate, accuracy, abstention_rate, review_time]
compliance and liability
show foreseeable risk from design choices and carry receipts for every regulated output.
patterns
- attach provenance and τ to each step that touches regulated text
- tiered provenance views for roles and least privilege
- export audit bundles for counsel and regulators
evidence pack
| artifact | purpose |
|---|---|
| compression map | design intent and foreseeable risks |
| τ report | risk scoring per site and path |
| provenance ledger | sources and transformation history |
| abstention log | when and why the system refused |
support automation and QA
reduce escalations by detecting contradictions in docs and tickets before response.
playbook
- index knowledge with contradiction tags and version lineage
- compute τ at retrieval and answer time
- abstain and route to human when sources disagree past τ*
- attach receipts in the reply so agents can verify fast
kpis
- first contact resolution up
- unsupported claim rate down
- agent handle time down with provenance links
knowledge ops and RAG governance
tune retrieval and summarization as explicit compression choices, not magic.
controls
- per query compression budget with user steer
- summary style that optimizes τ not length alone
- rankers that penalize contradictory sources
config template
rag_cai:
budget:
tokens_max: 2200
tau_target: 0.45
ranker:
prefer: ["high_provenance", "low_contradiction"]
summarizer:
objective: "minimize_tau_at_equal_accuracy"
user_controls:
knobs: ["evidence_strictness", "abstention_level"]
contradish: the contradiction benchmark
Contradish measures whether systems detect, surface, and reconcile conflicting information. it operationalizes CAI by turning contradiction into a measurable standard.
tracks
- detection: find contradictions in mixed sources
- tension: produce τ distribution per site
- response: abstain or resolve with receipts
report schema
{
"system": "your-model",
"run_id": "2025-10-30",
"metrics": {
"contradiction_detection_f1": 0.0,
"unsupported_claim_rate": 0.0,
"abstention_rate": 0.0
},
"tau_histogram": [ ],
"provenance_coverage": 0.0
}
product analytics and review
treat product decisions as compression. expose tradeoffs so reviews are about cost of loss, not opinion.
use cases
- release notes with compression map and τ changes
- red team reports keyed to high τ sites
- risk registers that track τ trends over time
review table
| area | change | τ impact | risk |
|---|---|---|---|
| retrieval | index pruning | +0.08 | loss of niche sources |
| summarizer | style update | -0.03 | better evidence density |
quick start in 48 hours
- list compression sites in one existing flow
- compute τ per site and set τ*
- turn on abstention and attach provenance in the reply
pilot scorecard
| metric | baseline | cai gated | target |
|---|---|---|---|
| unsupported claim rate | — | — | less than or equal to baseline |
| task accuracy | — | — | maintain or improve |
| abstention rate | — | — | calibrated at τ* |
| mean time to review | — | — | decrease |
last updated: