engineer hub
implementations
to implement CAI just copy the quick start, wire the scoring and gating hooks, log receipts, and measure outcomes.
quick start in 30 lines
toy but real. you can swap in your own models. this shows per site scoring and abstention with receipts.
# pip install numpy regex import re, json, math, numpy as np STOP = set("the a an and or of for in on to is are was were be been being with by as at from this that it".split()) def toks(s): return re.findall(r"[a-z0-9']+", s.lower()) def compress_ratio(t): b = t.encode(); return len(b) / (len(set(b)) + 1) def delta_pred_loss(t): base = compress_ratio(t) content = " ".join([w for w in toks(t) if w not in STOP]) simp = compress_ratio(content) d = max(0.0, base - simp) return max(0.0, min(1.0, d/2.0)) def kl_proxy(t): w = toks(t); return 0.0 if not w else max(0.0, min(1.0, sum(len(x)>8 for x in w)/len(w))) def residual(t): r = compress_ratio(t); return max(0.0, min(1.0, 1.0/(r or 1e-6))) def S_score(text): d,k,e = delta_pred_loss(text), kl_proxy(text), residual(text) S = max(0.0, min(1.0, 0.5*d + 0.25*k + 0.25*e)) return {"delta_pred":d, "kl":k, "residual":e, "S":S} def nli_heuristic(text): sents = re.split(r"(?<=[.!?])\s+", text.strip()); sents = [s for s in sents if s] def neg(s): return re.search(r"(^|\s)(not|never|no|none|cannot|isn't|can't)(\s|$)", s, re.I) is not None def terms(s): return set([w for w in toks(s) if w not in STOP and len(w)>3]) scores = [] for i in range(len(sents)): for j in range(i+1, len(sents)): A,B = terms(sents[i]), terms(sents[j]) inter = len(A & B); uni = len(A | B) or 1 overlap = inter/uni; flip = 1 if (neg(sents[i]) ^ neg(sents[j])) else 0 scores.append(overlap*flip) m = np.mean(scores) if scores else 0.0 return max(0.0, min(1.0, m/0.6)) def coverage(text, source): T, S = set(toks(text)), set(toks(source)) inter = len(T & S); uni = len(T | S) or 1 return inter/uni def C_score(text, sources): nli = nli_heuristic(text) gap = 1.0 - max([coverage(text, s) for s in sources] or [0.0]) unsat = 0.0 # placeholder C = max(0.0, min(1.0, 0.5*nli + 0.25*unsat + 0.25*gap)) return {"nli":nli, "unsat":unsat, "halluc_risk":gap, "C":C} def decide(text, sources, tau=0.7): S, C = S_score(text), C_score(text, sources) CTS = S["S"] * C["C"] ok = CTS <= tau return {"S":S,"C":C,"CTS":CTS,"ok":ok,"provenance":sources} print(json.dumps(decide("Paris is the capital of France. Paris is not the capital of France.", ["france gov site"]), indent=2)) scoring
compression strain $$ S = w_1\\,\\Delta L_{pred} + w_2\\,\\mathrm{KL}\\big(q(z\\mid x)\\Vert p(z)\\big) + w_3\\,E_{res} \\in [0,1] $$
contradiction magnitude $$ C = a\\,\\overline{P_{NLI}(contr)} + b\\,U_{SAT} + c\\,H_{halluc} \\in [0,1] $$
tension $$ CTS = S \\cdot C $$
start with equal weights, then learn weights on your validation sets.
gating policy
block unsupported claims when $CTS$ passes a threshold. request evidence or abstain. tune the threshold to your harm profile.
gating_policy: tau_star: 0.70 abstain_if: - "no_entailment" - "CTS > tau_star" on_abstain: - "ask_for_source" - "increase_retrieval_weight" - "lower_temperature" reference service
install and run
pip install fastapi uvicorn pydantic python-multipart # optional pip install transformers torch python-sat uvicorn cstc_service:app --reload endpoints
| method | path | body | returns |
|---|---|---|---|
| POST | /compute | { text, sources[], constraints[][] } | S, C, CTS, receipts |
| POST | /train/cac | { features[], target_cts } | head weights |
| GET | /health | none | status |
import requests, json payload = {"text":"A is B. A is not B.","sources":["doc a","doc b"],"constraints":[]} r = requests.post("http://127.0.0.1:8000/compute", json=payload) print(json.dumps(r.json(), indent=2)) const res = await fetch("/compute",{ method:"POST", headers:{"Content-Type":"application/json"}, body:JSON.stringify({text, sources, constraints:[]}) }); const data = await res.json(); console.log(data); logging schema
log tension, not secrets. you can redact raw content and still keep audit power.
{ "request_id": "uuid", "timestamp": "2025-10-30T16:00:00Z", "user_role": "agent", "sites": [ {"name":"retrieve","S":0.18,"C":0.22,"CTS":0.04,"provenance":["doc_a_id","doc_b_id"]}, {"name":"summarize","S":0.31,"C":0.35,"CTS":0.11,"provenance":["doc_a_id"]}, {"name":"generate","S":0.27,"C":0.41,"CTS":0.11,"provenance":["doc_a_id"]} ], "path_scores":{"S":0.27,"C":0.33,"CTS":0.09}, "decision":{"abstained":false,"reason":null,"tau_star":0.70}, "redaction":{"raw_text_logged":false,"token_count":224} } integration patterns
rag governance
- score S and C at retrieve and summarize
- penalize contradictory sources in rank
- attach receipts in final answer
rag_cai: budget: tokens_max: 2200 tau_target: 0.45 ranker_prefer: ["high_provenance","low_contradiction"] summarizer_objective: "minimize_tau_equal_accuracy" agent loop
- compute CTS before tool call
- if CTS high then clarify or fetch more sources
- log per step receipts
if CTS > tau_star: plan = "ask user for evidence or fetch more docs" else: plan = "proceed with tool call" batch etl
- score existing outputs
- tag high CTS items for review
- auto fix if new sources reduce C
-- pseudo select id, text, CTS from outputs where CTS > 0.7 order by CTS desc; evaluation harness
report these side by side for baseline and CAI gated runs
| metric | baseline | cai gated | target |
|---|---|---|---|
| unsupported claim rate | — | — | less than or equal to baseline |
| task accuracy | — | — | maintain or improve |
| abstention rate | — | — | calibrated at tau* |
| mean time to review | — | — | decrease |
implementation checklist
cai_impl_checklist: compression_sites_listed: false tau_per_site_computed: false provenance_attached: false tau_star_set: false contradiction_detector_wired: false abstention_policy_enabled: false logs_written_without_raw_text: false outcome_metrics_tracked: false privacy by design: log ids and scores, not raw secrets. keep receipts minimal but sufficient for audit.
last updated: