Distillation

Type: kb/types/definition.md · Tags: learning-theory, distillation

Distillation is targeted transformation of already-recorded material into an artifact shaped for a particular downstream consumer. The target artifact preserves what the consumer needs from the source in a form the consumer can use.

In KB practice, distillation is usually directed context compression, because agents, maintainers, collaborators, and artifacts operate under bounded context. Compression is the common pressure, not the definition: a distillate may also reorganize, reframe, change register, make implicit structure explicit, or expand a key point while still being shaped for one use.

Scope

Use the term distillation when recorded material has been reshaped into an artifact for a particular downstream consumer. The source can be methodology, raw observations, prior reasoning, traces, research, a teacher model's outputs, or accumulated understanding. The resulting distillate can be a note, skill, instruction, prompt, review lens, design principle, summary, or model update.

Common KB instances:

methodology notes → skill body for an agent executing a workflow;
workshop material → durable note for future reasoning;
source article → claim summary for ingestion;
caller context → refined prompt for a sub-agent;
repeated review failures → gate instruction or validation rule.

Most KB learning has a distillation phase: explore messily, notice patterns, then write an artifact that future agents can actually use.

How distillation works

Distillation changes shape. Content may be selected, reorganized, reframed, compressed, expanded, or moved into another register to fit the consumer's task and context budget. Argumentative material can become procedural guidance; exploratory notes can become an assertive claim; a broad workshop can become a narrow note.

Source → Distillate	Target
Methodology → Skill	Agent performing a specific workflow
Workshop → Note	Future agents needing the insight
Research → Design principle	Decision-making in a particular area
Accumulated understanding → Narrative	Consumer who needs the current whole picture
Caller's knowledge + sub-agent's question → Refined prompt	Sub-agent facing a specific task
Domain artifacts (logs, patches, docs) → Detection/analysis skill	Agent diagnosing or investigating a class of problems
Many observations → Summary	Agent that can't fit them all in context

Targeting usually loses information, so a distillate does not replace its source for every purpose. A skill may preserve the steps needed to run a workflow while omitting the rationale, alternatives, and failed approaches needed to adapt that workflow to a novel situation.

A distillate can also look adequate while quietly losing behavioral influence: compressed experience is often less active than the raw traces it replaced (Faithful Self-Evolvers).

Exclusions

Distillation is not retrieval. Retrieval selects existing material; distillation rewrites or re-encodes material for use.

Distillation is not all training. General training may produce capacity change without a particular downstream consumer in view; distillation targets a consumer or use.

Distillation is not constraining, though the same artifact may be both distilled and constrained.

Relationship to constraining

Constraining and distillation operate on different dimensions of the same artifacts:

	Not distilled	Distilled
Not constrained	Raw capture (text file, session notes)	Use-shaped but semantically loose (draft skill, rough note)
Constrained	Committed but not transformed from a source (stored output, frozen config)	Use-shaped and semantically focused (validated skill, codified script)

Constraining asks: how constrained is this artifact? Distillation asks: was this artifact transformed from recorded source material for a particular consumer?

You can distill without constraining (write a task-shaped skill that remains natural language and underspecified), and you can constrain without distilling (store an LLM output, committing to one interpretation without transforming recorded source material). Strong artifacts often combine both.

The distinction is at the artifact level, not the decision level. The choice to impose a constraint can itself be a distillate of observed looseness: the same rate limit has different epistemic status depending on whether it came from measurement or prediction.

Instances

The same pattern appears in different media.

KB distillation is the focus of this note. A workshop, source, trace, or body of methodology is reshaped into a note, skill, instruction, prompt, or review lens for a future agent or collaborator. The consumer's effective context is the main budget, so the result is usually shorter than the source, but the important change is use-shaping: the artifact is written in the form the consumer can act on.

ML knowledge distillation (Hinton et al., 2015) has the same structure in a different substrate. A large teacher model's output distribution is transformed into a smaller student model's weights. The consumer is the student model, and the budget is parameter count or deployment cost rather than prompt context.

Both cases transform a source the consumer cannot use directly into a target the consumer can use. They differ in medium and mechanism, not in the role the transformation plays.

Context engineering is the architecture that routes, loads, scopes, and maintains knowledge under bounded context. Distillation is the main operation that architecture performs, though not the only one.

Relevant Notes:

context efficiency is the central design concern — foundation: the bounded context that makes distillation a feasibility requirement, not just an optimization
agent context is constrained by soft degradation, not hard token limits — foundation: effective context depends on task complexity, so the same source may be feasible for one task and infeasible for another
constraining — co-equal mechanism: constraining the interpretation space, orthogonal to distillation
codification — the far end of constraining; sometimes follows distillation (reshape a procedure for use, then commit it to a symbolic artifact)
skills derive from methodology through distillation — the full argument for distillation as the mechanism behind skill creation
agent statelessness makes routing architectural — driver: each session starts fresh, so reasoning must be distilled rather than remembered
deploy-time learning — timing frame for durable artifact updates that distillation can produce
learning is not only about generality — foundation: capacity decomposes into generality vs reliability+speed+cost; distillation trades source completeness for operational efficiency
information value is observer-relative — grounds: reframes distillation as bounded information extraction; deterministic transformations create information for bounded observers
evolving understanding needs re-distillation not composition — exemplifies: when a consumer needs the whole evolving picture, holistic rewrite is re-distillation
conversation vs prompt refinement in agent-to-agent coordination — exemplifies: prompt refinement is distillation of the caller's knowledge for a sub-agent's task
Epiplexity (Finzi et al., 2026) — grounds: epiplexity measures theoretically what distillation does operationally — quantifies extractable structure for a given observer under computational bounds
getsentry/skills — production evidence: the skill-writer meta-skill shows that distillation quality depends primarily on source collection breadth ("keep collecting until retrieval passes no longer add new guidance"), not compression technique — a dimension this note underemphasizes
Ingest: Large Language Model Agents Are Not Always Faithful Self-Evolvers — evidence: warning case that compressed experience can remain semantically plausible yet lose behavioral influence relative to the raw traces it distills

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search