Distillation is transformation, not selection

Type: kb/types/note.md · Status: current

The intuitive framing of memory distillation (directed context compression) treats it as lossy selection — a trace contains some material, and the job is to pick the subset worth keeping. When distillation works, the output isn't a subset: it's an artifact of a different kind, a new shape.

The examples below are all memory distillation — the richest concrete case in hand — but the claim applies to distillation generally.

Distilled artifacts are not subsets of the trace

The same trace distills to different kinds of output, none of which are reducible to a subset of the trace's tokens:

Trace	Distilled artifact	What's new
Debugging a flaky test	Preference rule: always reproduce before attempting a fix	A universally-quantified claim — the trace contained one incident, the rule covers all future cases
Choosing between frameworks	ADR: motivation / alternatives / consequences	Structure (three roles) that wasn't in the trace; alternatives articulated more crisply than they were discussed
Recurring corrections	A linting rule in the codebase	A change of medium: natural-language correction becomes an executable check
A successful procedure	A runbook, a skill, or code	Generalization from one run to a reusable operator; sometimes a change of medium

None of these artifacts is "part of" the trace. They are different representations at different levels of abstraction, sometimes in a different medium — codification (committing a procedure to a symbolic medium) is the extreme case, but shape-change happens within a single medium too. A trace-to-preference distillation discards most of the trace and introduces structure that wasn't there: universal quantification, a condition clause, a rationale. Calling that "capturing the important parts" misses what happened.

Reach is another dimension. A preference rule extends to future instances in a bounded domain; a theory distilled from observations extends further — to cases the source didn't contain, by capturing structure rather than circumstance. Codification climbs the reliability-speed-cost compound; theorizing climbs reach. These are far from the only axes shapes differ on — no short list exhausts what distillation can produce.

Implication for memory-system design

Treating distillation as selection suggests better trace-to-summary pipelines — as if the job were to produce a shorter trace. But no amount of selection produces a preference, an ADR, or a skill. Those are different shapes; the work is in the shape-change.

Memory primitives, then, are not denser traces but different kinds of objects — each with its own structure, invariants, retrieval affordances, and lifecycle. A preference is a rule with scope and rationale that can be invoked, overridden, or retired — not a compressed trace. Designing a memory system is largely designing the inventory of shapes it supports.

"Traces → memory primitives" as a single arrow is too coarse. It reads better as a family of transformations (trace → preference, trace → ADR, trace → skill, trace → convention, ...), each with its own target structure, each potentially applicable to the same trace, none subsuming the others.

The substrate must persist

The inventory of shapes evolves. A shape that captured something well at one point may be wrong later — not because the content changed, but because the operating context has shifted enough that a different shape is now more useful. Three consequences follow:

The raw trace is what can't be re-derived. Losing a distillate means redistilling; losing a trace means the underlying material is gone.
Storage is finite, so perfect retention isn't possible — but the asymmetry argues for biasing toward trace retention over eager pruning.
Distilled artifacts are not permanent products. They are current-best representations under the inventory of shapes in force at the time, and may need to be re-derived as the inventory evolves.

Relevant Notes:

Designing a Memory System for LLM-Based Agents — exemplifies: the same raw trace can be distilled into different target artifacts depending on the future capacity the system wants to change

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search