Workshop: review-revise-gated

Goal: find review and revision arrangements that reliably produce the kinds of improvements we made manually to the session-history note, then codify those arrangements as reusable instructions.

Status: active. Using the gate-based review system. Gates are copied into gates/ for experiment isolation.

Materials

baseline.md — the note as of 3450a4f (2026-03-20), before any edits
target.md — the note after manual review and revision (2026-03-25)
change-catalogue.md — 16 named changes across 4 categories (accessibility, clarity, structure, cosmetic), each with baseline text, problem, and desired direction
gates/ — local copy of review gates, organized by bundle (accessibility, complexity, frontmatter, prose, semantic, sentence, structural)

Instructions

run-review.md — apply gates to baseline.md, write per-bundle review files to a run directory
run-revise.md — revise baseline.md based on review findings, write revised note to run directory

Experiment protocol

Start from baseline.md
Run run-review.md with the desired bundles, writing to a new run directory
Run run-revise.md on the review findings, writing revised.md to the same run directory
Score the result against change-catalogue.md → {run}/scores.md

Scoring

Each experimental run scores against the change catalogue:

Hit — makes a change in the same direction (not necessarily identical text)
Miss — doesn't catch the problem
Mistake — introduces a new problem or moves in the wrong direction

The score is: hits / 16 for coverage, with mistakes as a separate penalty count. A good arrangement has high hits, zero mistakes.

Results

Summary across all runs

Early runs (01-04) used monolithic review instructions. Runs 05-08 used the gate system. Run data for 01-07 was removed; run-08 is the current baseline.

Metric	Runs 01-02 (monolithic)	Runs 03-04 (v2 monolithic)	Runs 05-07 (gates)	Run-08 (gates, tuned)
Revision hits	6	11-12	9-11	11
Wrong direction	0-1	0-1	0	0
Factual errors	0-1	0	0	0
Detection (WARN+INFO)	2-9	11	9-14	14

Run-08: current gate system

34 gates across 7 bundles. Key gate additions from the experiment: - prose/bridge-paragraph-duplication — split from redundant-restatement to force separate checking - sentence/concept-attribution — catches prose claims about what linked notes contain - sentence/clause-packing — catches revision bloat (sentences overloaded with clauses)

Revision: 11/16 hits, 0 wrong-direction, 0 factual errors.

Detection: 14/16. Two items undetected (S2 merge sections, S3 compress taxonomy). Two items detected but not fixed: C3 at INFO (concept-attribution), S1 partially fixed (bridge paragraph trimmed but not deleted).

See run-08/scores.md for per-change scoring and run-08/gate-noise-audit.md for gate reliability analysis.

Remaining gaps

S2 (merge sections) — no gate checks for section-merging opportunities
S3 (compress taxonomy) — complexity gates consistently approve the taxonomy; completeness gate pushes toward expansion (wrong direction). Irreducible editorial judgment
S1 (bridge paragraph) — detected but revision trimmed wrong part. Detection solved; revision quality is the bottleneck
C3 (concept attribution) — detected at INFO, needs WARN to trigger action

Key findings

Gate granularity matters. Decomposing monolithic reviews into individual gates makes it possible to improve individual checks, add new checks, and select relevant subsets per note.

Separate gates for separate patterns. When two failure modes share a gate, the reviewer satisfies the gate by finding either one. Splitting redundant-restatement into section-opening and bridge-paragraph gates was necessary to catch S1.

Exhaustive-checking language helps. Adding "report all instances, not just the first" to gate tests improved detection coverage.

Severity thresholds drive revision behavior. The reviser treats WARN as actionable and INFO as optional. Five gates have unstable severity (framing-mismatch, general-before-specific, unidentified-references, redundant-restatement, confidence-miscalibration) — they flip WARN/INFO across runs on the same baseline. The noise audit (run-08/gate-noise-audit.md) documents specific threshold fixes.

Competing findings create wrong-direction risk. The completeness-boundary-cases gate correctly detects taxonomy gaps but recommends expansion when compression is the right fix. The gate now includes guidance to consider compression before recommending expansion.

Detection ceiling: 14/16. Revision ceiling: ~12/16. The two undetected items (S2, S3) are editorial judgments that resist gate formulation. Iteration (reviewing revised output again) can recover 1 additional hit, as demonstrated in early runs.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search