Entropy management must scale with generation throughput
Type: note · Status: seedling
When agents produce artifacts (code, notes, links), they replicate existing patterns — including bad ones. A codebase with inconsistent naming conventions gets more inconsistent. A KB with vague link semantics gets more vague links. The replication is not random; agents amplify whatever patterns are most visible in context, which means entropy compounds with volume.
This creates a scaling requirement: cleanup throughput must be proportional to generation throughput. If it isn't, quality degrades as a function of output volume, not as a function of time.
Evidence
OpenAI's Codex team found this empirically at 1M LOC scale. Early on, engineers spent 20% of their time on "AI slop cleanup" — manual Friday sessions fixing drift. The fix was not working harder but matching throughput: background cleanup agents that continuously scan for pattern violations and open small refactoring PRs, most auto-merged. "Garbage collection for code quality" — a continuous process, not a periodic chore. The transition from manual Fridays to automated cleanup is spec mining completing: observe drift patterns, extract standards, crystallise into automated enforcement. (Harness Engineering)
The stagnation finding from the context engineering study reinforces this from the negative direction: 50% of AGENTS.md files were never changed after creation. These are systems where maintenance throughput is zero — and instructions accumulate without pruning.
Implications for this KB
The KB already has the pieces — maintenance operations (what to clean), external triggering (when to trigger), staleness detection (how to detect). What it lacks is the scaling commitment: as note production increases (especially if boiling cauldron mutations are automated), the maintenance operations must run at matching frequency. Orphan detection, connection quality checks, and staleness sweeps need to become continuous, not periodic.
The pruning asymmetry makes this urgent: even in actively maintained systems, additions outnumber removals 6:1. Without deliberate pruning discipline, the KB grows noisier with every note added — and noisy links cause credibility erosion that degrades the entire navigation infrastructure.
Relevant Notes:
- methodology enforcement is stabilisation — connects: the stagnation evidence (50% write-once, 6:1 add-to-remove ratio) is what happens when maintenance throughput is zero
- spec mining as crystallisation — mechanism: the transition from manual cleanup to automated enforcement is spec mining applied to maintenance — observe drift, extract pattern, crystallise into check
- maintenance operations catalogue — operationalizes: the catalogue lists what needs scaling; this note argues it must scale proportionally with generation
- automating KB learning is an open problem — constrains: if boiling cauldron mutations are automated, maintenance must be automated at matching throughput or quality degrades
- quality signals for KB evaluation — detects: the credibility erosion failure mode is what happens when entropy management falls behind generation
- Harness Engineering — primary evidence: 1M LOC agent-generated codebase where background cleanup agents maintain quality at generation-matching throughput
Topics: