engineer hub

implementations

to implement CAI just copy the quick start, wire the scoring and gating hooks, log receipts, and measure outcomes.

quick start in 30 lines

toy but real. you can swap in your own models. this shows per site scoring and abstention with receipts.

# pip install numpy regex import re, json, math, numpy as np STOP = set("the a an and or of for in on to is are was were be been being with by as at from this that it".split()) def toks(s): return re.findall(r"[a-z0-9']+", s.lower()) def compress_ratio(t): b = t.encode(); return len(b) / (len(set(b)) + 1) def delta_pred_loss(t): base = compress_ratio(t) content = " ".join([w for w in toks(t) if w not in STOP]) simp = compress_ratio(content) d = max(0.0, base - simp) return max(0.0, min(1.0, d/2.0)) def kl_proxy(t): w = toks(t); return 0.0 if not w else max(0.0, min(1.0, sum(len(x)>8 for x in w)/len(w))) def residual(t): r = compress_ratio(t); return max(0.0, min(1.0, 1.0/(r or 1e-6))) def S_score(text): d,k,e = delta_pred_loss(text), kl_proxy(text), residual(text) S = max(0.0, min(1.0, 0.5*d + 0.25*k + 0.25*e)) return {"delta_pred":d, "kl":k, "residual":e, "S":S} def nli_heuristic(text): sents = re.split(r"(?<=[.!?])\s+", text.strip()); sents = [s for s in sents if s] def neg(s): return re.search(r"(^|\s)(not|never|no|none|cannot|isn't|can't)(\s|$)", s, re.I) is not None def terms(s): return set([w for w in toks(s) if w not in STOP and len(w)>3]) scores = [] for i in range(len(sents)): for j in range(i+1, len(sents)): A,B = terms(sents[i]), terms(sents[j]) inter = len(A & B); uni = len(A | B) or 1 overlap = inter/uni; flip = 1 if (neg(sents[i]) ^ neg(sents[j])) else 0 scores.append(overlap*flip) m = np.mean(scores) if scores else 0.0 return max(0.0, min(1.0, m/0.6)) def coverage(text, source): T, S = set(toks(text)), set(toks(source)) inter = len(T & S); uni = len(T | S) or 1 return inter/uni def C_score(text, sources): nli = nli_heuristic(text) gap = 1.0 - max([coverage(text, s) for s in sources] or [0.0]) unsat = 0.0 # placeholder C = max(0.0, min(1.0, 0.5*nli + 0.25*unsat + 0.25*gap)) return {"nli":nli, "unsat":unsat, "halluc_risk":gap, "C":C} def decide(text, sources, tau=0.7): S, C = S_score(text), C_score(text, sources) CTS = S["S"] * C["C"] ok = CTS <= tau return {"S":S,"C":C,"CTS":CTS,"ok":ok,"provenance":sources} print(json.dumps(decide("Paris is the capital of France. Paris is not the capital of France.", ["france gov site"]), indent=2))

see evaluation

scoring

compression strain $$ S = w_1\\,\\Delta L_{pred} + w_2\\,\\mathrm{KL}\\big(q(z\\mid x)\\Vert p(z)\\big) + w_3\\,E_{res} \\in [0,1] $$

contradiction magnitude $$ C = a\\,\\overline{P_{NLI}(contr)} + b\\,U_{SAT} + c\\,H_{halluc} \\in [0,1] $$

tension $$ CTS = S \\cdot C $$

start with equal weights, then learn weights on your validation sets.

gating policy

block unsupported claims when $CTS$ passes a threshold. request evidence or abstain. tune the threshold to your harm profile.

gating_policy: tau_star: 0.70 abstain_if: - "no_entailment" - "CTS > tau_star" on_abstain: - "ask_for_source" - "increase_retrieval_weight" - "lower_temperature"

reference service

install and run

pip install fastapi uvicorn pydantic python-multipart # optional pip install transformers torch python-sat uvicorn cstc_service:app --reload

endpoints

method	path	body	returns
POST	`/compute`	`{ text, sources[], constraints[][] }`	S, C, CTS, receipts
POST	`/train/cac`	`{ features[], target_cts }`	head weights
GET	`/health`	none	status

import requests, json payload = {"text":"A is B. A is not B.","sources":["doc a","doc b"],"constraints":[]} r = requests.post("http://127.0.0.1:8000/compute", json=payload) print(json.dumps(r.json(), indent=2))

const res = await fetch("/compute",{ method:"POST", headers:{"Content-Type":"application/json"}, body:JSON.stringify({text, sources, constraints:[]}) }); const data = await res.json(); console.log(data);

logging schema

log tension, not secrets. you can redact raw content and still keep audit power.

{ "request_id": "uuid", "timestamp": "2025-10-30T16:00:00Z", "user_role": "agent", "sites": [ {"name":"retrieve","S":0.18,"C":0.22,"CTS":0.04,"provenance":["doc_a_id","doc_b_id"]}, {"name":"summarize","S":0.31,"C":0.35,"CTS":0.11,"provenance":["doc_a_id"]}, {"name":"generate","S":0.27,"C":0.41,"CTS":0.11,"provenance":["doc_a_id"]} ], "path_scores":{"S":0.27,"C":0.33,"CTS":0.09}, "decision":{"abstained":false,"reason":null,"tau_star":0.70}, "redaction":{"raw_text_logged":false,"token_count":224} }

integration patterns

rag governance

score S and C at retrieve and summarize
penalize contradictory sources in rank
attach receipts in final answer

rag_cai: budget: tokens_max: 2200 tau_target: 0.45 ranker_prefer: ["high_provenance","low_contradiction"] summarizer_objective: "minimize_tau_equal_accuracy"

agent loop

compute CTS before tool call
if CTS high then clarify or fetch more sources
log per step receipts

if CTS > tau_star: plan = "ask user for evidence or fetch more docs" else: plan = "proceed with tool call"

batch etl

score existing outputs
tag high CTS items for review
auto fix if new sources reduce C

-- pseudo select id, text, CTS from outputs where CTS > 0.7 order by CTS desc;

evaluation harness

report these side by side for baseline and CAI gated runs

use the benchmarks page for tasks and a scorecard template
metric	baseline	cai gated	target
unsupported claim rate	—	—	less than or equal to baseline
task accuracy	—	—	maintain or improve
abstention rate	—	—	calibrated at tau*
mean time to review	—	—	decrease

open benchmarks see equations

implementation checklist

cai_impl_checklist: compression_sites_listed: false tau_per_site_computed: false provenance_attached: false tau_star_set: false contradiction_detector_wired: false abstention_policy_enabled: false logs_written_without_raw_text: false outcome_metrics_tracked: false

privacy by design: log ids and scores, not raw secrets. keep receipts minimal but sufficient for audit.

last updated: