proof every system compresses

below are the claims, the tests, and what would count as a failure. if you already run production models, you can treat this as a contract: turn CAI on, run experiment a, and see if unsupported claims fall at fixed accuracy.

thesis

statement

every useful system compresses. if compression is hidden, error hides with it. if compression is made explicit and scored, unsupported claims drop while accuracy and auditability improve.

% ↓

unsupported claim rate should drop at fixed accuracy when CAI gating is on

abstention should activate exactly when tension passes a threshold

Δt ↓

mean time to human review should fall due to attached provenance

core claims with failure conditions

claim 1. compression is forced by limits

finite compute, memory, bandwidth, and attention force reduction.

  • biology: retina, thalamus, cortex filter and sparsify input.
  • software: apis, schemas, caches collapse detail to act.
  • law: rules and thresholds compress cases into decisions.

would fail if you can show a bounded system that never reduces any representation under load.

claim 2. prediction relies on compressed form

to predict is to store summaries that generalize.

  • science: equations distill observations.
  • ml: parameters encode compressed training signals.
  • planning: heuristics prune search trees.

would fail if you can show robust prediction without any internal summarization.

claim 3. hidden compression hides error

when compression is implicit, contradictions and edge cases vanish from view.

would fail if unsupported claims did not correlate with untracked compression steps.

claim 4. explicit compression reduces harm

surfacing compression sites with tension scores and abstention reduces unsupported claims at the same or better accuracy.

would fail if CAI increases unsupported claims or forces unnecessary abstention at fixed targets.

experiments you can run now

experiment a. unsupported claims audit
  1. tag compression sites in an existing pipeline: retrieval, summarization, tool calls, post edit.
  2. compute per site compression tension score τ for each request. see foundations for the equation.
  3. set an abstention threshold τ* and block claims without sufficient entailment.
  4. compare baseline vs CAI on:
    • unsupported claim rate
    • accuracy
    • abstention rate
    • review time
open benchmarks
pipeline_tagging:
  - mark compression sites: retrieval, rank, summarize, generate, redact
scoring:
  - compute τ per site using loss, provenance, uncertainty
gating:
  - abstain if entailment fails or τ > τ*
report:
  - unsupported_claim_rate, accuracy, abstention_rate, review_time
experiment b. ablation on provenance
  1. run tasks with provenance stripped vs attached at each step.
  2. measure change in unsupported claims and review time.
experiment c. contradiction stress test
  1. construct inputs with controlled contradictions or outdated facts.
  2. verify that τ spikes at the site where sources disagree.
  3. expect abstention or a request for clarification instead of fluent error.

minimal scorecard

report these side by side for baseline and CAI gated runs
metric baseline cai gated target
unsupported claim rate less than or equal to baseline
task accuracy maintain or improve
abstention rate calibrated at τ*
mean time to review decrease

counterexample challenge

bounty

to refute CAI as stated, you can do one of two things:

  • show a bounded system that achieves robust prediction on open inputs while never compressing or summarizing any representation, under realistic resource limits
  • or show that explicit compression scoring raises unsupported claims at fixed accuracy, given a correct implementation of CAI gating

document setup and share logs. valid counterexamples will be listed here.

emergent misalignment: external empirical validation (2025)

a peer reviewed study shows that narrow fine tuning can create broad misalignment. this is a live demonstration of compression strain leaking across domains.

study

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs (Betley et al., 2025) shows that when a model is trained on insecure code with concealed intent, the resulting compression produces misaligned behavior far outside the code domain.

why this matters for CAI

this is exactly the pattern CAI calls compression strain. the study matches the prediction that a local contradiction produces global drift unless compression is tracked, scored, and gated.

key observation

misalignment disappears when the same data is framed with benign intent. this shows that intention controls how compression strain propagates. CAI models this directly with tension scores and abstention gates.

cross domain evidence

the point is not that compression exists. the point is that unscored compression produces fluent error. CAI scores it and gates claims.

last updated: