Protocol
Hypotheses
H1: Drift reduction. A sparse anchored claim ledger reduces accidental claim changes over repeated editing passes.
H2: Convergence. Typed edits produce a clearer stopping condition than ordinary prompts because "improvement" is evaluated against an explicit rubric and claim-preservation check.
H3: Lazy extraction beats eager ontology. Starting with claims only will reveal which extra structures are actually needed. Adding assumptions, evidence, caveats, or examples before observing failures will create false invariants and unnecessary overhead.
Materials
Each trial needs:
- A source text of 300-1500 words.
- An ordinary-edit baseline prompt.
- A stateful-edit prompt using state-template.
- Three to five repeated edit passes for each condition.
- A human or agent scorer that compares each pass to the initial claim ledger.
Prefer source texts with real argumentative content, not generic marketing copy. Good first candidates are workshop drafts, speculative notes, or source-review paragraphs where nuance matters.
Conditions
Baseline
Run repeated ordinary editing prompts:
Revise this text to make it clearer and more concise while preserving the author's meaning and voice.
Then feed the revised text into the same prompt again for the next pass.
Stateful
Use the v0 two-loop algorithm:
- Build the initial status file from the source text using state-template.
- Approve the status file as the editing contract.
- External loop selects exactly one tuple part:
text,claim_ledger,presentation_spec,rubric, orgap_policy. - Internal loop proposes one complete candidate state for that selected part.
- Verify the candidate against the frozen tuple parts.
- Accept, reject, or send warning-level cases to human review.
The editor loop is the default candidate-production path, but not the only one. A human or separate session may side-load a complete candidate state for the selected tuple part, then send it through the same verification and acceptance rules. Side-loading skips only editor generation; it does not skip direction, active-part selection, verification, or logging.
Default typed request:
Select one active tuple part and propose one complete candidate state for that part only. Preserve all unselected tuple parts. Verify the candidate against claim preservation tests, presentation_spec, rubric, and gap_policy. Return the candidate file and verification report.
Minimal claim extraction
Extract only claims that a later edit could plausibly distort. Do not extract every sentence. Do not infer hidden premises unless the text itself makes them operationally important.
Each claim gets:
idstatus: asserted, implied, candidate, disputed, or retiredspan: exact quoted anchor from the source textclaim: normalized paraphrasepreservation_test: what would count as accidental change
Status meanings:
- asserted — the text commits to this.
- implied — the text relies on this, but less directly.
- candidate — plausible but not yet stable; do not freeze it.
- disputed — intentionally contested or unresolved.
- retired — removed through explicit claim edit.
Scoring
Score each pass on separate axes:
| Axis | Question | Scale |
|---|---|---|
| Claim preservation | Did asserted/implied claims survive unchanged? | pass / minor drift / major drift |
| Claim auditability | Are any claim changes explicit and inspectable? | pass / unclear / fail |
| Local quality | Is the text clearer, tighter, or better organized? | worse / neutral / better |
| Voice preservation | Does the text still sound like the intended author/style? | worse / neutral / better |
| Generic smoothing | Did the edit replace specific content with generic polish? | none / mild / severe |
| Ontology pressure | Did the model invent structure or false invariants? | none / mild / severe |
Failure classes
Log failures in failure-log using these initial classes:
- semantic drift — a claim changes without a claim edit.
- specificity loss — concrete wording becomes generic.
- voice damage — style becomes flatter, more corporate, or less authorial.
- false invariant — extracted structure freezes an interpretation the text did not commit to.
- over-extraction — the model creates too many ledger entries for practical use.
- rubric capture — the edit optimizes for a rubric item while making the piece worse.
- anchor mismatch — the claim is not tied to the right text span.
- revision blockage — the ledger prevents a genuinely good structural rewrite.
Generalization rule
Do not add a new state field because it seems theoretically complete. Add one only when:
- A trial produces a repeated failure.
- The failure would likely have been prevented by a specific field.
- The field can be extracted conservatively.
- The field has a clear acceptance or rejection test.
Candidate additions should be trial-scoped first. Promote them into the template only after repeated use.
First trial plan
- Select one short KB-adjacent draft with 5-10 meaningful claims.
- Run three baseline passes and three stateful passes.
- Score all six outputs against the initial claim ledger.
- Record every failure, including overhead and annoyance.
- Decide whether the next trial needs evidence, caveats, examples, or presentation-state changes.