engineer hub
implementations
to implement CAI just copy the quick start, wire the scoring and gating hooks, log receipts, and measure outcomes.
quick start in 30 lines
toy but real. you can swap in your own models. this shows per site scoring and abstention with receipts.
# pip install numpy regex
import re, json, math, numpy as np
STOP = set("the a an and or of for in on to is are was were be been being with by as at from this that it".split())
def toks(s): return re.findall(r"[a-z0-9']+", s.lower())
def compress_ratio(t):
b = t.encode(); return len(b) / (len(set(b)) + 1)
def delta_pred_loss(t):
base = compress_ratio(t)
content = " ".join([w for w in toks(t) if w not in STOP])
simp = compress_ratio(content)
d = max(0.0, base - simp)
return max(0.0, min(1.0, d/2.0))
def kl_proxy(t):
w = toks(t);
return 0.0 if not w else max(0.0, min(1.0, sum(len(x)>8 for x in w)/len(w)))
def residual(t):
r = compress_ratio(t); return max(0.0, min(1.0, 1.0/(r or 1e-6)))
def S_score(text):
d,k,e = delta_pred_loss(text), kl_proxy(text), residual(text)
S = max(0.0, min(1.0, 0.5*d + 0.25*k + 0.25*e))
return {"delta_pred":d, "kl":k, "residual":e, "S":S}
def nli_heuristic(text):
sents = re.split(r"(?<=[.!?])\s+", text.strip()); sents = [s for s in sents if s]
def neg(s): return re.search(r"(^|\s)(not|never|no|none|cannot|isn't|can't)(\s|$)", s, re.I) is not None
def terms(s): return set([w for w in toks(s) if w not in STOP and len(w)>3])
scores = []
for i in range(len(sents)):
for j in range(i+1, len(sents)):
A,B = terms(sents[i]), terms(sents[j])
inter = len(A & B); uni = len(A | B) or 1
overlap = inter/uni; flip = 1 if (neg(sents[i]) ^ neg(sents[j])) else 0
scores.append(overlap*flip)
m = np.mean(scores) if scores else 0.0
return max(0.0, min(1.0, m/0.6))
def coverage(text, source):
T, S = set(toks(text)), set(toks(source))
inter = len(T & S); uni = len(T | S) or 1
return inter/uni
def C_score(text, sources):
nli = nli_heuristic(text)
gap = 1.0 - max([coverage(text, s) for s in sources] or [0.0])
unsat = 0.0 # placeholder
C = max(0.0, min(1.0, 0.5*nli + 0.25*unsat + 0.25*gap))
return {"nli":nli, "unsat":unsat, "halluc_risk":gap, "C":C}
def decide(text, sources, tau=0.7):
S, C = S_score(text), C_score(text, sources)
CTS = S["S"] * C["C"]
ok = CTS <= tau
return {"S":S,"C":C,"CTS":CTS,"ok":ok,"provenance":sources}
print(json.dumps(decide("Paris is the capital of France. Paris is not the capital of France.", ["france gov site"]), indent=2))
scoring
compression strain
$$ S = w_1\\,\\Delta L_{pred} + w_2\\,\\mathrm{KL}\\big(q(z\\mid x)\\Vert p(z)\\big) + w_3\\,E_{res} \\in [0,1] $$
contradiction magnitude
$$ C = a\\,\\overline{P_{NLI}(contr)} + b\\,U_{SAT} + c\\,H_{halluc} \\in [0,1] $$
tension
$$ CTS = S \\cdot C $$
start with equal weights, then learn weights on your validation sets.
gating policy
block unsupported claims when $CTS$ passes a threshold. request evidence or abstain. tune the threshold to your harm profile.
gating_policy:
tau_star: 0.70
abstain_if:
- "no_entailment"
- "CTS > tau_star"
on_abstain:
- "ask_for_source"
- "increase_retrieval_weight"
- "lower_temperature"
reference service
install and run
pip install fastapi uvicorn pydantic python-multipart
# optional
pip install transformers torch python-sat
uvicorn cstc_service:app --reload
endpoints
| method | path | body | returns |
|---|---|---|---|
| POST | /compute | { text, sources[], constraints[][] } | S, C, CTS, receipts |
| POST | /train/cac | { features[], target_cts } | head weights |
| GET | /health | none | status |
import requests, json
payload = {"text":"A is B. A is not B.","sources":["doc a","doc b"],"constraints":[]}
r = requests.post("http://127.0.0.1:8000/compute", json=payload)
print(json.dumps(r.json(), indent=2))
const res = await fetch("/compute",{
method:"POST",
headers:{"Content-Type":"application/json"},
body:JSON.stringify({text, sources, constraints:[]})
});
const data = await res.json(); console.log(data);
logging schema
log tension, not secrets. you can redact raw content and still keep audit power.
{
"request_id": "uuid",
"timestamp": "2025-10-30T16:00:00Z",
"user_role": "agent",
"sites": [
{"name":"retrieve","S":0.18,"C":0.22,"CTS":0.04,"provenance":["doc_a_id","doc_b_id"]},
{"name":"summarize","S":0.31,"C":0.35,"CTS":0.11,"provenance":["doc_a_id"]},
{"name":"generate","S":0.27,"C":0.41,"CTS":0.11,"provenance":["doc_a_id"]}
],
"path_scores":{"S":0.27,"C":0.33,"CTS":0.09},
"decision":{"abstained":false,"reason":null,"tau_star":0.70},
"redaction":{"raw_text_logged":false,"token_count":224}
}
integration patterns
rag governance
- score S and C at retrieve and summarize
- penalize contradictory sources in rank
- attach receipts in final answer
rag_cai:
budget:
tokens_max: 2200
tau_target: 0.45
ranker_prefer: ["high_provenance","low_contradiction"]
summarizer_objective: "minimize_tau_equal_accuracy"
agent loop
- compute CTS before tool call
- if CTS high then clarify or fetch more sources
- log per step receipts
if CTS > tau_star:
plan = "ask user for evidence or fetch more docs"
else:
plan = "proceed with tool call"
batch etl
- score existing outputs
- tag high CTS items for review
- auto fix if new sources reduce C
-- pseudo
select id, text, CTS from outputs where CTS > 0.7 order by CTS desc;
evaluation harness
report these side by side for baseline and CAI gated runs
| metric | baseline | cai gated | target |
|---|---|---|---|
| unsupported claim rate | — | — | less than or equal to baseline |
| task accuracy | — | — | maintain or improve |
| abstention rate | — | — | calibrated at tau* |
| mean time to review | — | — | decrease |
implementation checklist
cai_impl_checklist:
compression_sites_listed: false
tau_per_site_computed: false
provenance_attached: false
tau_star_set: false
contradiction_detector_wired: false
abstention_policy_enabled: false
logs_written_without_raw_text: false
outcome_metrics_tracked: false
privacy by design: log ids and scores, not raw secrets. keep receipts minimal but sufficient for audit.
last updated: