Axes of artifact analysis

Type: kb/types/note.md · Status: current · Tags: learning-theory

A recurring comparison in this KB used to pit "repo artifacts vs weights": one system stores learned behaviour as files, another as model parameters. That contrast packs together questions that need to stay separate. A retained lesson can live in a repository, database, vector store, prompt registry, configuration service, or checkpoint store; it can act as prose, symbolic control, or distributed-parametric state; it can be canonical source or a derived view; and it can advise, instruct, enforce, rank, validate, or train.

The useful unit is therefore not always the stored object. It is the operative part or consumption path: the behavior-shaping part of a retained artifact and the route through which a future consumer can use it. The same Markdown file can be low-priority evidence in one path, high-priority instruction in another, and archival provenance in a third. The bytes did not change; the behavioral authority did.

Artifact analysis records four fields:

Field Question Why it matters
Storage substrate Where does retained state persist? Access, deletion, versioning, deployment, latency, and rollback paths
Representational form How is the operative part encoded and consumed? Default review evidence: reading, tests/static checks, or behavioral probes
Lineage What sources or derivations does this retained behavior depend on? Invalidation, regeneration, source alignment, and retirement
Behavioral authority Who consumes it, through which channel, and with what force? Security, precedence, activation, scope, and behavioral consequence

These fields replace the older shorthand of class/backend/role. Backend becomes storage substrate. Class becomes representational form. Source relation becomes lineage. Role and control path become behavioral authority. The replacement is not cosmetic: the new names force the record to say what the artifact's operative part is, which consumption path gives it force, and what evidence would actually review it.

Representational form

Representational form is the main replacement for the old opaque/prose/symbolic class axis.

  • Prose artifacts carry behavior-shaping content in natural language interpreted by a model or human: notes, prompts, memory entries, reflections, rules, policies, playbooks, workflow descriptions, and many skills. They are readable and diffable, but their effect varies with context, prompt position, retrieval order, and model behavior.
  • Symbolic artifacts carry behavior-shaping content in localized units whose consequences are assigned by a parser, interpreter, runtime, validator, schema, route table, or other defined consumer. Tests, scripts, schemas, route tables, typed records, and deterministic validators are symbolic where that consumer gives fields or operations defined consequences.
  • Distributed-parametric artifacts carry behavior-shaping content in numerical state distributed across parameters or dense representations: model checkpoints, adapters, embedding spaces, dense-vector indexes, reward models, learned controllers, and learned memory-management policies.

The term opaque should now be used for practical inspectability, not as the form name. Distributed-parametric artifacts become opaque at very small scales because their behavior is not localized in readable units. Prose and symbolic systems can also become practically opaque at sufficient scale, as opacity is a scale threshold, not a class property argues.

The prose/symbolic boundary is the phase transition codification crosses and constraining often aims toward: the medium changes, the consumer changes, and the verification regime changes. Intermediate cases - typed prose, schema-validated Markdown, prompts with required sections - are mixed artifacts. Classify the operative parts separately: prose content for interpretation, symbolic structure for validation or assembly.

Storage substrate

Storage substrate records where retained state persists: repo files, database rows, service-managed objects, graph stores, vector stores with attached records, prompt registries, runtime configuration, audit logs, or model-artifact stores. Substrate is operationally important because it determines permissions, deletion, versioning, rollback, deployment, and latency.

Substrate is not representational form. A repository may contain prose workflows, symbolic validators, generated prompt views, route tables, and pointers to checkpoints. A vector store may package readable prose records together with distributed-parametric embeddings and ranking behavior. Cognee keeps prose records in a database-backed poly-store; the substrate changed, the prose form did not.

Lineage

Lineage records whether retained behavior is source material, canonical source, derived view, generated index, compiled artifact, assembled package, learned update, or archival evidence, and what source changes should invalidate or refresh it.

Lineage is separate from substrate and form. A prose workflow may be canonical in a repo file, summarized into a prompt view, and compiled into a symbolic validator. A generated directory index may be a derived view over canonical notes. A model checkpoint may be a distributed-parametric learned update derived from trace records and reward signals. The question is whether the system can tell what the derived artifact depends on, when it should refresh, what form change occurred, and what authority it carries relative to its sources.

Derived artifacts are useful because they put material where it can act, but they introduce drift. A workflow can be revised while an old skill manifest still routes tasks to the old procedure. A prompt summary can omit an exception from the canonical policy. These are lineage failures, not merely stale-memory failures.

Behavioral authority

Behavioral authority records the consumer, channel, and force of a retained artifact's use.

Part Question Examples
Consumer Who or what uses it? Acting agent, retriever, context scheduler, planner, runtime service, validator, router, view assembler, reviewer, maintainer, learning loop
Channel How does it reach the consumer? Retrieval, prompt assembly, explicit invocation, execution, configuration, validation, routing, ranking, review, training
Force What kind of behavioral effect can it have? Advice, instruction, enforcement, selection or ranking influence, audit trigger, learning input

This field replaces the old role axis. A knowledge artifact is consumed as evidence, reference, context, explanation, or advice that a future agent or human may weigh. A system-definition artifact is consumed with instruction, enforcement, routing, validation, configuration, evaluation, memory-operation, or learning force. These are useful authority-path shorthands, but the architectural record should name the actual consumer, channel, and force.

The distinction is not between intrinsic artifact types. A Markdown file of domain terms is a knowledge artifact when retrieved as reference and a system-definition artifact when loaded as standing instruction. A schema is a knowledge artifact when read as documentation and a system-definition artifact when an interpreter uses it to validate inputs. Model weights can encode stored associations and learned policy in the same parameters. The homoiconic context is what makes the prose switch possible without changing the stored bytes.

Authority is use-specific and can diverge from declared intent. A nominally advisory memory can dominate behavior if it is always included, placed late in a prompt, repeated often, or phrased imperatively. A formally authoritative rule can have little effect if no acting component loads it. Record both assigned authority and evidence of effective authority when that distinction matters.

Eligibility and scope

Eligibility is not one of the four core fields in the paper's compact record, but it remains useful local policy metadata for memory systems. It asks whether a specific use is currently allowed: candidate, active, superseded, deprecated, retired, or archival. Scope records where the use applies: user, project, repository, organization, deployment, task class, benchmark, or runtime stage.

These attach to the consumption path. A note can be current as an artifact while one high-authority use of it is deprecated. A workflow can be active for manual reviewer use while still candidate for runtime enforcement.

Why the fields matter

Each field blocks a different category mistake.

  • Substrate, not form. Files beat a database for agent-operated knowledge bases argues that a database schema forces premature commitment to access patterns. That is a substrate claim, not a claim about whether artifacts should be prose, symbolic, or distributed-parametric.
  • Form, not storage label. Comparing trajectory-informed memory tips to AgeMem policy weights is not "tips vs weights" but prose form vs distributed-parametric form. Tips are one prose package; weights are one distributed-parametric package.
  • Lineage, not co-location. A generated prompt view and its source notes may both live in repo files. The risk is not that the substrate differs, but that the derived view drifts from the canonical source while retaining high authority.
  • Authority, not object identity. "Knowledge artifact" and "system-definition artifact" are useful shorthands for authority families, but the precise claim is that an operative part is being consumed with a named force. A reflection retrieved as evidence and the same reflection loaded as standing instruction have different authority.
  • Eligibility, not existence. A memory, validator, prompt summary, or skill package can exist without being eligible for a given future use. "Stored" does not mean "active"; "current" does not mean "safe to enforce."
  • Learning is not decided by authority. Both advice-like knowledge-artifact use and high-authority system-definition-artifact use are learning by Simon's capacity-change test. The field separates what durable change affects, not whether it counts.

Examples

Example Storage substrate / lineage Representational form Behavioral authority
AgeMem memory policy Model-artifact store; learned update from memory-operation trajectories Distributed-parametric learned controller Learning loop or controller selects memory operations
Trajectory-informed memory Memory store, DB, or file; derived from completed trajectories Prose tips Retrieved or loaded as task advice
Commonplace notes Repo; canonical library artifacts or derived notes from workshops/sources Prose, with symbolic frontmatter where validators consume it Retrieved as knowledge, loaded as guidance, or validated as typed artifacts depending on path
Commonplace skills Repo / skill directory; distilled from methodology notes Mixed prose instructions, symbolic metadata, optional scripts Invoked procedure with routing and execution policy
Commonplace validators Repo / installed package; codified from conventions Symbolic Python checks Advisory or enforcing validation depending on command or hook
Generated directory index Repo; derived view over directory contents Structured Markdown: prose labels plus symbolic frontmatter Navigation aid; should refresh when sources change
RAG corpus Vector store; canonical or derived depending on ingestion Prose records plus distributed-parametric embedding/ranking path Retriever/ranker influences selection; selected records advise the model

Consequences for memory design

Memory design adds operational policies on top of this artifact analysis: capture, derivation, activation, authority assignment, lifecycle, and evaluation. Those policies should attach to operative parts and consumption paths. The question is not simply "what memory exists?" but "which retained artifact can be used how, with what authority, in which scope, and under what eligibility state?"

That shift explains why memory labels are too coarse. A retained lesson can become a prompt patch, workflow, script, validator, route, derived prompt view, retrieval index, learned controller, or checkpoint. Those differ in storage substrate, representational form, lineage, and authority: read, loaded, invoked, executed, enforced, routed through, assembled, audited, ranked, or used as training input.


Relevant Notes: