implementations

quick start in 30 lines

toy but real. you can swap in your own models. this shows per site scoring and abstention with receipts.

# pip install numpy regex
import re, json, math, numpy as np

STOP = set("the a an and or of for in on to is are was were be been being with by as at from this that it".split())

def toks(s): return re.findall(r"[a-z0-9']+", s.lower())
def compress_ratio(t):
    b = t.encode(); return len(b) / (len(set(b)) + 1)
def delta_pred_loss(t):
    base = compress_ratio(t)
    content = " ".join([w for w in toks(t) if w not in STOP])
    simp = compress_ratio(content)
    d = max(0.0, base - simp)
    return max(0.0, min(1.0, d/2.0))
def kl_proxy(t):
    w = toks(t); 
    return 0.0 if not w else max(0.0, min(1.0, sum(len(x)>8 for x in w)/len(w)))
def residual(t):
    r = compress_ratio(t); return max(0.0, min(1.0, 1.0/(r or 1e-6)))

def S_score(text):
    d,k,e = delta_pred_loss(text), kl_proxy(text), residual(text)
    S = max(0.0, min(1.0, 0.5*d + 0.25*k + 0.25*e))
    return {"delta_pred":d, "kl":k, "residual":e, "S":S}

def nli_heuristic(text):
    sents = re.split(r"(?<=[.!?])\s+", text.strip()); sents = [s for s in sents if s]
    def neg(s): return re.search(r"(^|\s)(not|never|no|none|cannot|isn't|can't)(\s|$)", s, re.I) is not None
    def terms(s): return set([w for w in toks(s) if w not in STOP and len(w)>3])
    scores = []
    for i in range(len(sents)):
        for j in range(i+1, len(sents)):
            A,B = terms(sents[i]), terms(sents[j])
            inter = len(A & B); uni = len(A | B) or 1
            overlap = inter/uni; flip = 1 if (neg(sents[i]) ^ neg(sents[j])) else 0
            scores.append(overlap*flip)
    m = np.mean(scores) if scores else 0.0
    return max(0.0, min(1.0, m/0.6))

def coverage(text, source):
    T, S = set(toks(text)), set(toks(source))
    inter = len(T & S); uni = len(T | S) or 1
    return inter/uni

def C_score(text, sources):
    nli = nli_heuristic(text)
    gap = 1.0 - max([coverage(text, s) for s in sources] or [0.0])
    unsat = 0.0  # placeholder
    C = max(0.0, min(1.0, 0.5*nli + 0.25*unsat + 0.25*gap))
    return {"nli":nli, "unsat":unsat, "halluc_risk":gap, "C":C}

def decide(text, sources, tau=0.7):
    S, C = S_score(text), C_score(text, sources)
    CTS = S["S"] * C["C"]
    ok = CTS <= tau
    return {"S":S,"C":C,"CTS":CTS,"ok":ok,"provenance":sources}

print(json.dumps(decide("Paris is the capital of France. Paris is not the capital of France.", ["france gov site"]), indent=2))

see evaluation

scoring

compression strain $$ S = w_1\\,\\Delta L_{pred} + w_2\\,\\mathrm{KL}\\big(q(z\\mid x)\\Vert p(z)\\big) + w_3\\,E_{res} \\in [0,1] $$

contradiction magnitude $$ C = a\\,\\overline{P_{NLI}(contr)} + b\\,U_{SAT} + c\\,H_{halluc} \\in [0,1] $$

tension $$ CTS = S \\cdot C $$

start with equal weights, then learn weights on your validation sets.

gating policy

block unsupported claims when $CTS$ passes a threshold. request evidence or abstain. tune the threshold to your harm profile.

gating_policy:
  tau_star: 0.70
  abstain_if:
    - "no_entailment"
    - "CTS > tau_star"
  on_abstain:
    - "ask_for_source"
    - "increase_retrieval_weight"
    - "lower_temperature"

reference service

install and run

pip install fastapi uvicorn pydantic python-multipart
# optional
pip install transformers torch python-sat
uvicorn cstc_service:app --reload

endpoints

method	path	body	returns
POST	`/compute`	`{ text, sources[], constraints[][] }`	S, C, CTS, receipts
POST	`/train/cac`	`{ features[], target_cts }`	head weights
GET	`/health`	none	status

import requests, json
payload = {"text":"A is B. A is not B.","sources":["doc a","doc b"],"constraints":[]}
r = requests.post("http://127.0.0.1:8000/compute", json=payload)
print(json.dumps(r.json(), indent=2))

const res = await fetch("/compute",{
  method:"POST",
  headers:{"Content-Type":"application/json"},
  body:JSON.stringify({text, sources, constraints:[]})
});
const data = await res.json(); console.log(data);

logging schema

log tension, not secrets. you can redact raw content and still keep audit power.

{
  "request_id": "uuid",
  "timestamp": "2025-10-30T16:00:00Z",
  "user_role": "agent",
  "sites": [
    {"name":"retrieve","S":0.18,"C":0.22,"CTS":0.04,"provenance":["doc_a_id","doc_b_id"]},
    {"name":"summarize","S":0.31,"C":0.35,"CTS":0.11,"provenance":["doc_a_id"]},
    {"name":"generate","S":0.27,"C":0.41,"CTS":0.11,"provenance":["doc_a_id"]}
  ],
  "path_scores":{"S":0.27,"C":0.33,"CTS":0.09},
  "decision":{"abstained":false,"reason":null,"tau_star":0.70},
  "redaction":{"raw_text_logged":false,"token_count":224}
}

integration patterns

rag governance

score S and C at retrieve and summarize
penalize contradictory sources in rank
attach receipts in final answer

rag_cai:
  budget:
    tokens_max: 2200
    tau_target: 0.45
  ranker_prefer: ["high_provenance","low_contradiction"]
  summarizer_objective: "minimize_tau_equal_accuracy"

agent loop

compute CTS before tool call
if CTS high then clarify or fetch more sources
log per step receipts

if CTS > tau_star:
  plan = "ask user for evidence or fetch more docs"
else:
  plan = "proceed with tool call"

batch etl

score existing outputs
tag high CTS items for review
auto fix if new sources reduce C

-- pseudo
select id, text, CTS from outputs where CTS > 0.7 order by CTS desc;

evaluation harness

report these side by side for baseline and CAI gated runs

use the benchmarks page for tasks and a scorecard template
metric	baseline	cai gated	target
unsupported claim rate	—	—	less than or equal to baseline
task accuracy	—	—	maintain or improve
abstention rate	—	—	calibrated at tau*
mean time to review	—	—	decrease

open benchmarks see equations

implementation checklist

cai_impl_checklist:
  compression_sites_listed: false
  tau_per_site_computed: false
  provenance_attached: false
  tau_star_set: false
  contradiction_detector_wired: false
  abstention_policy_enabled: false
  logs_written_without_raw_text: false
  outcome_metrics_tracked: false

privacy by design: log ids and scores, not raw secrets. keep receipts minimal but sufficient for audit.

last updated: