applications

target outcomes

↓

unsupported claim rate down at fixed accuracy

✓

calibrated abstention when tension crosses threshold

Δt ↓

faster human review due to attached provenance

hallucination control

stop fluent error by surfacing compression sites and gating unsupported claims.

tools

MirrorNet: show compression choices to users and models.
Hallucinet: flag outputs tied to risky compression paths.

playbook

tag compression sites: retrieval, summarization, tool calls, post edit
compute per site tension score τ and set threshold τ*
attach provenance to each claim and enable abstention on fail
report unsupported claim rate, accuracy, abstention rate, review time

open benchmarks see integrations

hallucination_control:
  compression_sites: [retrieve, summarize, tool, generate, redact]
  tau_per_site: true
  tau_star: 0.7
  abstain_if: ["no_entailment", "tau > tau_star"]
  metrics: [unsupported_claim_rate, accuracy, abstention_rate, review_time]

compliance and liability

show foreseeable risk from design choices and carry receipts for every regulated output.

patterns

attach provenance and τ to each step that touches regulated text
tiered provenance views for roles and least privilege
export audit bundles for counsel and regulators

evidence pack

artifact	purpose
compression map	design intent and foreseeable risks
τ report	risk scoring per site and path
provenance ledger	sources and transformation history
abstention log	when and why the system refused

governance templates

support automation and QA

reduce escalations by detecting contradictions in docs and tickets before response.

playbook

index knowledge with contradiction tags and version lineage
compute τ at retrieval and answer time
abstain and route to human when sources disagree past τ*
attach receipts in the reply so agents can verify fast

kpis

first contact resolution up
unsupported claim rate down
agent handle time down with provenance links

knowledge ops and RAG governance

tune retrieval and summarization as explicit compression choices, not magic.

controls

per query compression budget with user steer
summary style that optimizes τ not length alone
rankers that penalize contradictory sources

config template

rag_cai:
  budget:
    tokens_max: 2200
    tau_target: 0.45
  ranker:
    prefer: ["high_provenance", "low_contradiction"]
  summarizer:
    objective: "minimize_tau_at_equal_accuracy"
  user_controls:
    knobs: ["evidence_strictness", "abstention_level"]

contradish: the contradiction benchmark

Contradish measures whether systems detect, surface, and reconcile conflicting information. it operationalizes CAI by turning contradiction into a measurable standard.

tracks

detection: find contradictions in mixed sources
tension: produce τ distribution per site
response: abstain or resolve with receipts

report schema

{
  "system": "your-model",
  "run_id": "2025-10-30",
  "metrics": {
    "contradiction_detection_f1": 0.0,
    "unsupported_claim_rate": 0.0,
    "abstention_rate": 0.0
  },
  "tau_histogram": [ ],
  "provenance_coverage": 0.0
}

product analytics and review

treat product decisions as compression. expose tradeoffs so reviews are about cost of loss, not opinion.

use cases

release notes with compression map and τ changes
red team reports keyed to high τ sites
risk registers that track τ trends over time

review table

area	change	τ impact	risk
retrieval	index pruning	+0.08	loss of niche sources
summarizer	style update	-0.03	better evidence density

quick start in 48 hours

list compression sites in one existing flow
compute τ per site and set τ*
turn on abstention and attach provenance in the reply

see equations run benchmarks integration guides start a pilot

pilot scorecard

fill this for baseline vs CAI gated
metric	baseline	cai gated	target
unsupported claim rate	—	—	less than or equal to baseline
task accuracy	—	—	maintain or improve
abstention rate	—	—	calibrated at τ*
mean time to review	—	—	decrease

last updated: