proof every system compresses
below are the claims, the tests, and what would count as a failure. if you already run production models, you can treat this as a contract: turn CAI on, run experiment a, and see if unsupported claims fall at fixed accuracy.
thesis
every useful system compresses. if compression is hidden, error hides with it. if compression is made explicit and scored, unsupported claims drop while accuracy and auditability improve.
unsupported claim rate should drop at fixed accuracy when CAI gating is on
abstention should activate exactly when tension passes a threshold
mean time to human review should fall due to attached provenance
core claims with failure conditions
claim 1. compression is forced by limits
finite compute, memory, bandwidth, and attention force reduction.
- biology: retina, thalamus, cortex filter and sparsify input.
- software: apis, schemas, caches collapse detail to act.
- law: rules and thresholds compress cases into decisions.
would fail if you can show a bounded system that never reduces any representation under load.
claim 2. prediction relies on compressed form
to predict is to store summaries that generalize.
- science: equations distill observations.
- ml: parameters encode compressed training signals.
- planning: heuristics prune search trees.
would fail if you can show robust prediction without any internal summarization.
claim 3. hidden compression hides error
when compression is implicit, contradictions and edge cases vanish from view.
would fail if unsupported claims did not correlate with untracked compression steps.
claim 4. explicit compression reduces harm
surfacing compression sites with tension scores and abstention reduces unsupported claims at the same or better accuracy.
would fail if CAI increases unsupported claims or forces unnecessary abstention at fixed targets.
experiments you can run now
experiment a. unsupported claims audit
- tag compression sites in an existing pipeline: retrieval, summarization, tool calls, post edit.
- compute per site compression tension score τ for each request. see foundations for the equation.
- set an abstention threshold τ* and block claims without sufficient entailment.
- compare baseline vs CAI on:
- unsupported claim rate
- accuracy
- abstention rate
- review time
pipeline_tagging:
- mark compression sites: retrieval, rank, summarize, generate, redact
scoring:
- compute τ per site using loss, provenance, uncertainty
gating:
- abstain if entailment fails or τ > τ*
report:
- unsupported_claim_rate, accuracy, abstention_rate, review_time
experiment b. ablation on provenance
- run tasks with provenance stripped vs attached at each step.
- measure change in unsupported claims and review time.
experiment c. contradiction stress test
- construct inputs with controlled contradictions or outdated facts.
- verify that τ spikes at the site where sources disagree.
- expect abstention or a request for clarification instead of fluent error.
minimal scorecard
| metric | baseline | cai gated | target |
|---|---|---|---|
| unsupported claim rate | less than or equal to baseline | ||
| task accuracy | maintain or improve | ||
| abstention rate | calibrated at τ* | ||
| mean time to review | decrease |
counterexample challenge
to refute CAI as stated, you can do one of two things:
- show a bounded system that achieves robust prediction on open inputs while never compressing or summarizing any representation, under realistic resource limits
- or show that explicit compression scoring raises unsupported claims at fixed accuracy, given a correct implementation of CAI gating
document setup and share logs. valid counterexamples will be listed here.
emergent misalignment: external empirical validation (2025)
a peer reviewed study shows that narrow fine tuning can create broad misalignment. this is a live demonstration of compression strain leaking across domains.
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs (Betley et al., 2025) shows that when a model is trained on insecure code with concealed intent, the resulting compression produces misaligned behavior far outside the code domain.
why this matters for CAI
- the model compresses a narrow contradictory objective
- the contradiction creates instability in latent space
- the instability appears in unrelated tasks and domains
- misalignment is inconsistent and hard to predict
this is exactly the pattern CAI calls compression strain. the study matches the prediction that a local contradiction produces global drift unless compression is tracked, scored, and gated.
key observation
misalignment disappears when the same data is framed with benign intent. this shows that intention controls how compression strain propagates. CAI models this directly with tension scores and abstention gates.
cross domain evidence
- information theory: mdl links learning and compression. bounds tie representation size to generalization.
- neuroscience: sparse and predictive coding reduce data for timely action.
- economics: prices compress distributed beliefs into a scalar for allocation.
- policy: metrics compress reality into scores that must be audited to avoid harm.
- software: interfaces and caches hide detail to meet latency budgets.
the point is not that compression exists. the point is that unscored compression produces fluent error. CAI scores it and gates claims.
last updated: