Distilled artifacts need source tracking
Type: kb/types/note.md · Status: seedling · Tags: links
Distillation shapes an artifact for one consumer: an instruction guides an agent, a skill body runs a workflow, a checklist enforces a policy, a paper presents an argument. The shaping strips lineage by design — the consumer needs the artifact's content, not the reasoning that produced it. The executor is only one kind of consumer, but it is the most demanding and the most common: for a reader who must act on the artifact unassisted, inline provenance dilutes focus and adds indirection cost — which is why the default for distilled artifacts is no source backlinks at all, not just for instructions. A well-distilled artifact is deliberately silent about where it came from.
But distilled knowledge stays dependent on what it was distilled from. Sources keep evolving — a methodology claim is revised, a design decision reversed — and each source edit silently puts every downstream distillate at risk. Without a dependency record, a source change names nothing: there is no worklist for staleness review, and the drift is discovered only when a stale artifact misleads someone.
So the dependency record must exist — somewhere other than the artifact, whose focus is the point. The requirement any tracking design must satisfy comes from the asymmetry of the two lineage queries. The forward query — what depends on what I just changed? — must reach the editor at edit time, because its job is to interrupt: an answer the editor has to go looking for is an answer they will not look for. The reverse query — what informed this artifact? — is a deliberate, rare investigation that can afford a search. Whatever the design, the stored record must serve the path that has to interrupt; search can serve the path that can wait.
Where the record lives is a design choice
Judged against that requirement:
- Source-side records — a "distilled into" pointer in each source — satisfy the interrupt requirement with zero machinery: the pointer is in the file the editor already has open. The cost is lineage scattered across sources, with no global view.
- A dedicated distillation-link database holds the whole graph in one place, serving global queries (all distillates of a subtree, orphaned distillates, coverage) and scaling past what footers can carry — but it satisfies the interrupt requirement only if some edit-time surface consults it: a hook, a validator, an editor integration. A record without that surface is lineage nobody sees at the moment that matters.
- Artifact-side records — the distillate listing its sources in metadata hidden from its consumer — serve the reverse query cheaply and can be indexed into a forward view, but unindexed they fail the interrupt requirement outright, and they put provenance maintenance back on the artifact the distillation was keeping clean.
The designs compose: source-side pointers as the human-visible interrupt, derived into a database for global queries — at which point the database is itself a derived copy that must be checked or absent.
Two audiences, one direction of flow
| Distilled artifact | Lineage record | |
|---|---|---|
| Reader | The consumer the artifact was shaped for (most demandingly: an executor) | Maintainer changing the knowledge |
| Carries | Content only — focus is the point | Forward pointers from each source to its distillates |
| Staleness signal | None it could act on | Fires at edit time, where change originates |
Staleness detection flows in the direction of change: source changes → the editor sees the downstream targets → reviews them. What that review is depends on the derivation: in the general case it is judgment (re-read and re-distill); where the derivation is mechanical, the check is free and the regime flips to enforce-or-omit.
This KB's design choice is source-side records — the Distilled into: footer section with rg as the reverse query — documented in link-vocabulary.md. At repo scale the zero-machinery option wins; a link database becomes worth its surface when global queries or cross-repo lineage appear.
Relevant Notes:
- skills derive from methodology through distillation — grounds: the distillation relationship that produces artifacts needing source tracking
- link graph plus timestamps enables make-like staleness detection — extends: forward lineage pointers provide the dependency edges that distilled artifacts deliberately omit
- indirection is costly in LLM instructions — grounds: why the artifact side must stay lineage-free
- frontloading spares execution context — grounds: distillation is a form of frontloading; tracked lineage preserves the dependency structure the frontloaded artifact no longer shows
- A derived copy of recomputable truth must be checked or absent — extends: the deterministic special case — when the derivation is mechanical the staleness check is free, and managed review flips to enforce-or-omit
- link-vocabulary.md — evidence: the shipped source-side
Distilled into:convention, this KB's design choice within the space
Distilled into:
- link-vocabulary.md — the
Distilled into:footer convention