engineer hub

implementations

to implement CAI just copy the quick start, wire the scoring and gating hooks, log receipts, and measure outcomes.

quick start in 30 lines

toy but real. you can swap in your own models. this shows per site scoring and abstention with receipts.

# pip install numpy regex
import re, json, math, numpy as np

STOP = set("the a an and or of for in on to is are was were be been being with by as at from this that it".split())

def toks(s): return re.findall(r"[a-z0-9']+", s.lower())
def compress_ratio(t):
    b = t.encode(); return len(b) / (len(set(b)) + 1)
def delta_pred_loss(t):
    base = compress_ratio(t)
    content = " ".join([w for w in toks(t) if w not in STOP])
    simp = compress_ratio(content)
    d = max(0.0, base - simp)
    return max(0.0, min(1.0, d/2.0))
def kl_proxy(t):
    w = toks(t); 
    return 0.0 if not w else max(0.0, min(1.0, sum(len(x)>8 for x in w)/len(w)))
def residual(t):
    r = compress_ratio(t); return max(0.0, min(1.0, 1.0/(r or 1e-6)))

def S_score(text):
    d,k,e = delta_pred_loss(text), kl_proxy(text), residual(text)
    S = max(0.0, min(1.0, 0.5*d + 0.25*k + 0.25*e))
    return {"delta_pred":d, "kl":k, "residual":e, "S":S}

def nli_heuristic(text):
    sents = re.split(r"(?<=[.!?])\s+", text.strip()); sents = [s for s in sents if s]
    def neg(s): return re.search(r"(^|\s)(not|never|no|none|cannot|isn't|can't)(\s|$)", s, re.I) is not None
    def terms(s): return set([w for w in toks(s) if w not in STOP and len(w)>3])
    scores = []
    for i in range(len(sents)):
        for j in range(i+1, len(sents)):
            A,B = terms(sents[i]), terms(sents[j])
            inter = len(A & B); uni = len(A | B) or 1
            overlap = inter/uni; flip = 1 if (neg(sents[i]) ^ neg(sents[j])) else 0
            scores.append(overlap*flip)
    m = np.mean(scores) if scores else 0.0
    return max(0.0, min(1.0, m/0.6))

def coverage(text, source):
    T, S = set(toks(text)), set(toks(source))
    inter = len(T & S); uni = len(T | S) or 1
    return inter/uni

def C_score(text, sources):
    nli = nli_heuristic(text)
    gap = 1.0 - max([coverage(text, s) for s in sources] or [0.0])
    unsat = 0.0  # placeholder
    C = max(0.0, min(1.0, 0.5*nli + 0.25*unsat + 0.25*gap))
    return {"nli":nli, "unsat":unsat, "halluc_risk":gap, "C":C}

def decide(text, sources, tau=0.7):
    S, C = S_score(text), C_score(text, sources)
    CTS = S["S"] * C["C"]
    ok = CTS <= tau
    return {"S":S,"C":C,"CTS":CTS,"ok":ok,"provenance":sources}

print(json.dumps(decide("Paris is the capital of France. Paris is not the capital of France.", ["france gov site"]), indent=2))
see evaluation

scoring

compression strain $$ S = w_1\\,\\Delta L_{pred} + w_2\\,\\mathrm{KL}\\big(q(z\\mid x)\\Vert p(z)\\big) + w_3\\,E_{res} \\in [0,1] $$
contradiction magnitude $$ C = a\\,\\overline{P_{NLI}(contr)} + b\\,U_{SAT} + c\\,H_{halluc} \\in [0,1] $$
tension $$ CTS = S \\cdot C $$

start with equal weights, then learn weights on your validation sets.

gating policy

block unsupported claims when $CTS$ passes a threshold. request evidence or abstain. tune the threshold to your harm profile.

gating_policy:
  tau_star: 0.70
  abstain_if:
    - "no_entailment"
    - "CTS > tau_star"
  on_abstain:
    - "ask_for_source"
    - "increase_retrieval_weight"
    - "lower_temperature"

reference service

install and run

pip install fastapi uvicorn pydantic python-multipart
# optional
pip install transformers torch python-sat
uvicorn cstc_service:app --reload

endpoints

methodpathbodyreturns
POST/compute{ text, sources[], constraints[][] }S, C, CTS, receipts
POST/train/cac{ features[], target_cts }head weights
GET/healthnonestatus
import requests, json
payload = {"text":"A is B. A is not B.","sources":["doc a","doc b"],"constraints":[]}
r = requests.post("http://127.0.0.1:8000/compute", json=payload)
print(json.dumps(r.json(), indent=2))
const res = await fetch("/compute",{
  method:"POST",
  headers:{"Content-Type":"application/json"},
  body:JSON.stringify({text, sources, constraints:[]})
});
const data = await res.json(); console.log(data);

logging schema

log tension, not secrets. you can redact raw content and still keep audit power.

{
  "request_id": "uuid",
  "timestamp": "2025-10-30T16:00:00Z",
  "user_role": "agent",
  "sites": [
    {"name":"retrieve","S":0.18,"C":0.22,"CTS":0.04,"provenance":["doc_a_id","doc_b_id"]},
    {"name":"summarize","S":0.31,"C":0.35,"CTS":0.11,"provenance":["doc_a_id"]},
    {"name":"generate","S":0.27,"C":0.41,"CTS":0.11,"provenance":["doc_a_id"]}
  ],
  "path_scores":{"S":0.27,"C":0.33,"CTS":0.09},
  "decision":{"abstained":false,"reason":null,"tau_star":0.70},
  "redaction":{"raw_text_logged":false,"token_count":224}
}

integration patterns

rag governance

  • score S and C at retrieve and summarize
  • penalize contradictory sources in rank
  • attach receipts in final answer
rag_cai:
  budget:
    tokens_max: 2200
    tau_target: 0.45
  ranker_prefer: ["high_provenance","low_contradiction"]
  summarizer_objective: "minimize_tau_equal_accuracy"

agent loop

  • compute CTS before tool call
  • if CTS high then clarify or fetch more sources
  • log per step receipts
if CTS > tau_star:
  plan = "ask user for evidence or fetch more docs"
else:
  plan = "proceed with tool call"

batch etl

  • score existing outputs
  • tag high CTS items for review
  • auto fix if new sources reduce C
-- pseudo
select id, text, CTS from outputs where CTS > 0.7 order by CTS desc;

evaluation harness

report these side by side for baseline and CAI gated runs

use the benchmarks page for tasks and a scorecard template
metricbaselinecai gatedtarget
unsupported claim rateless than or equal to baseline
task accuracymaintain or improve
abstention ratecalibrated at tau*
mean time to reviewdecrease

implementation checklist

cai_impl_checklist:
  compression_sites_listed: false
  tau_per_site_computed: false
  provenance_attached: false
  tau_star_set: false
  contradiction_detector_wired: false
  abstention_policy_enabled: false
  logs_written_without_raw_text: false
  outcome_metrics_tracked: false

privacy by design: log ids and scores, not raw secrets. keep receipts minimal but sufficient for audit.

last updated: