engineer hub

implementations

to implement CAI just copy the quick start, wire the scoring and gating hooks, log receipts, and measure outcomes.

quick start in 30 lines

toy but real. you can swap in your own models. this shows per site scoring and abstention with receipts.

# pip install numpy regex import re, json, math, numpy as np STOP = set("the a an and or of for in on to is are was were be been being with by as at from this that it".split()) def toks(s): return re.findall(r"[a-z0-9']+", s.lower()) def compress_ratio(t): b = t.encode(); return len(b) / (len(set(b)) + 1) def delta_pred_loss(t): base = compress_ratio(t) content = " ".join([w for w in toks(t) if w not in STOP]) simp = compress_ratio(content) d = max(0.0, base - simp) return max(0.0, min(1.0, d/2.0)) def kl_proxy(t): w = toks(t); return 0.0 if not w else max(0.0, min(1.0, sum(len(x)>8 for x in w)/len(w))) def residual(t): r = compress_ratio(t); return max(0.0, min(1.0, 1.0/(r or 1e-6))) def S_score(text): d,k,e = delta_pred_loss(text), kl_proxy(text), residual(text) S = max(0.0, min(1.0, 0.5*d + 0.25*k + 0.25*e)) return {"delta_pred":d, "kl":k, "residual":e, "S":S} def nli_heuristic(text): sents = re.split(r"(?<=[.!?])\s+", text.strip()); sents = [s for s in sents if s] def neg(s): return re.search(r"(^|\s)(not|never|no|none|cannot|isn't|can't)(\s|$)", s, re.I) is not None def terms(s): return set([w for w in toks(s) if w not in STOP and len(w)>3]) scores = [] for i in range(len(sents)): for j in range(i+1, len(sents)): A,B = terms(sents[i]), terms(sents[j]) inter = len(A & B); uni = len(A | B) or 1 overlap = inter/uni; flip = 1 if (neg(sents[i]) ^ neg(sents[j])) else 0 scores.append(overlap*flip) m = np.mean(scores) if scores else 0.0 return max(0.0, min(1.0, m/0.6)) def coverage(text, source): T, S = set(toks(text)), set(toks(source)) inter = len(T & S); uni = len(T | S) or 1 return inter/uni def C_score(text, sources): nli = nli_heuristic(text) gap = 1.0 - max([coverage(text, s) for s in sources] or [0.0]) unsat = 0.0 # placeholder C = max(0.0, min(1.0, 0.5*nli + 0.25*unsat + 0.25*gap)) return {"nli":nli, "unsat":unsat, "halluc_risk":gap, "C":C} def decide(text, sources, tau=0.7): S, C = S_score(text), C_score(text, sources) CTS = S["S"] * C["C"] ok = CTS <= tau return {"S":S,"C":C,"CTS":CTS,"ok":ok,"provenance":sources} print(json.dumps(decide("Paris is the capital of France. Paris is not the capital of France.", ["france gov site"]), indent=2))
see evaluation

scoring

compression strain $$ S = w_1\\,\\Delta L_{pred} + w_2\\,\\mathrm{KL}\\big(q(z\\mid x)\\Vert p(z)\\big) + w_3\\,E_{res} \\in [0,1] $$
contradiction magnitude $$ C = a\\,\\overline{P_{NLI}(contr)} + b\\,U_{SAT} + c\\,H_{halluc} \\in [0,1] $$
tension $$ CTS = S \\cdot C $$

start with equal weights, then learn weights on your validation sets.

gating policy

block unsupported claims when $CTS$ passes a threshold. request evidence or abstain. tune the threshold to your harm profile.

gating_policy: tau_star: 0.70 abstain_if: - "no_entailment" - "CTS > tau_star" on_abstain: - "ask_for_source" - "increase_retrieval_weight" - "lower_temperature"

reference service

install and run

pip install fastapi uvicorn pydantic python-multipart # optional pip install transformers torch python-sat uvicorn cstc_service:app --reload

endpoints

methodpathbodyreturns
POST/compute{ text, sources[], constraints[][] }S, C, CTS, receipts
POST/train/cac{ features[], target_cts }head weights
GET/healthnonestatus
import requests, json payload = {"text":"A is B. A is not B.","sources":["doc a","doc b"],"constraints":[]} r = requests.post("http://127.0.0.1:8000/compute", json=payload) print(json.dumps(r.json(), indent=2))
const res = await fetch("/compute",{ method:"POST", headers:{"Content-Type":"application/json"}, body:JSON.stringify({text, sources, constraints:[]}) }); const data = await res.json(); console.log(data);

logging schema

log tension, not secrets. you can redact raw content and still keep audit power.

{ "request_id": "uuid", "timestamp": "2025-10-30T16:00:00Z", "user_role": "agent", "sites": [ {"name":"retrieve","S":0.18,"C":0.22,"CTS":0.04,"provenance":["doc_a_id","doc_b_id"]}, {"name":"summarize","S":0.31,"C":0.35,"CTS":0.11,"provenance":["doc_a_id"]}, {"name":"generate","S":0.27,"C":0.41,"CTS":0.11,"provenance":["doc_a_id"]} ], "path_scores":{"S":0.27,"C":0.33,"CTS":0.09}, "decision":{"abstained":false,"reason":null,"tau_star":0.70}, "redaction":{"raw_text_logged":false,"token_count":224} }

integration patterns

rag governance

  • score S and C at retrieve and summarize
  • penalize contradictory sources in rank
  • attach receipts in final answer
rag_cai: budget: tokens_max: 2200 tau_target: 0.45 ranker_prefer: ["high_provenance","low_contradiction"] summarizer_objective: "minimize_tau_equal_accuracy"

agent loop

  • compute CTS before tool call
  • if CTS high then clarify or fetch more sources
  • log per step receipts
if CTS > tau_star: plan = "ask user for evidence or fetch more docs" else: plan = "proceed with tool call"

batch etl

  • score existing outputs
  • tag high CTS items for review
  • auto fix if new sources reduce C
-- pseudo select id, text, CTS from outputs where CTS > 0.7 order by CTS desc;

evaluation harness

report these side by side for baseline and CAI gated runs

use the benchmarks page for tasks and a scorecard template
metricbaselinecai gatedtarget
unsupported claim rateless than or equal to baseline
task accuracymaintain or improve
abstention ratecalibrated at tau*
mean time to reviewdecrease

implementation checklist

cai_impl_checklist: compression_sites_listed: false tau_per_site_computed: false provenance_attached: false tau_star_set: false contradiction_detector_wired: false abstention_policy_enabled: false logs_written_without_raw_text: false outcome_metrics_tracked: false

privacy by design: log ids and scores, not raw secrets. keep receipts minimal but sufficient for audit.

last updated: