Continual learning's open problem is behaviour, not knowledge

Type: kb/types/note.md · Status: current · Tags: learning-theory

Take Herbert Simon's definition of learning, cited in learning is not only about generality: any durable change in a system's capacity for adapting to its environment. Under that criterion, continual learning splits along the role axis: artifacts consumed as knowledge (facts looked up by a fixed policy) and artifacts consumed as system-definition (content that is the policy). Both are learning; they differ in what a durable write changes.

The knowledge half is solved by ordinary data engineering — databases, file systems, vector stores, RAG, agent-memory records, user profiles in aggregate hold far more than weights can. Adding entries grows the system's reach without changing its disposition.

Behaviour change — where durable writes change what the system does — is the open problem. Two mechanisms achieve it:

  • Weight updates — fine-tuning, online learning, RLHF, continual pretraining. Expensive: heavy training infrastructure, cycles of days to weeks, opaque updates that can regress unrelated behaviours.
  • Readable system-definition artifacts — prompts, tips, notes, schemas, tools, tests. When an LLM (or an interpreter the agent invokes) reads such an artifact as policy, a durable write changes what the system does next session. Cheap: a commit, a diff, a revert; inspectable; driveable by runtime signals.

Mainstream continual-learning research targets weights (the expensive behaviour mechanism) and retrieval (the solved knowledge mechanism), leaving readable system-definition artifacts — the cheap behaviour mechanism — as engineering plumbing. Recognising that plumbing as a learning regime puts it into comparison with weight updates; the useful question is how the two combine.


Relevant Notes: