Treat continual learning as substrate coevolution

Type: kb/types/note.md · Tags: learning-theory

Continual learning's open problem is behaviour, not knowledge names two behaviour-change mechanisms: expensive weight updates and cheap readable system-definition artifacts. Deploy-time learning places the readable mechanism on the timing axis. Splitting the readable side by semantic regime gives three substrate classesopaque (weights and other hidden state), prose (prompts, notes, specs, rubrics), and symbolic (code, schemas, tests, tools). How should their improvement loops relate? They aren't independent: optimizing one assumes a position about the others.

Prose and symbolic cluster as the readable substrates — inspectable, editable, distinct from opaque in backend and update cost. The practical question of where to start building automated loops is the readable-substrate loop is the tractable unit for continual learning. This note is about the generic coevolution frame.

The mainstream direction: scaling the opaque loop

Computer vision provides the model. Before representation learning, features (SIFT, HOG) were hand-crafted and classifiers (SVMs) were learned — a clean separation that looked normal. Representation learning won by extending gradient descent across both, end-to-end. The general method didn't change; it covered more of the pipeline.

The bitter lesson extrapolates: general methods that leverage computation eat hand-crafted components. Applied today, mainstream research extends the opaque loop — RLHF, RLAIF, continual pretraining, online learning, fast adapters — hoping to subsume the hand-crafted prompts, tools, and evals that deployed systems depend on. This may or may not succeed; new architectures could close the tempo gap, or structural limits could keep large opaque updates on a slower cycle. This note takes no position on the outcome.

Per-substrate loops today

Current methods target individual substrates:

  • DSPy, ProTeGi — automated search over prompts (prose), weights frozen.
  • Genetic programming, FunSearch — automated search over code (symbolic), weights frozen.
  • Meta-Harness — automated search over harness code and prompt/context logic (symbolic + prose), weights frozen, benchmark traces as selection signal.
  • RLHF / RLAIF — updates weights (opaque), treating prompts and code as fixed.
  • Hand curation (Commonplace and similar) — evolves prose fast and symbolic artifacts slowly, without automated search or weight updates.

Each is partial. Even unifying two classes — a joint optimizer over weights and prompts, say — would be a significant step, analogous to what end-to-end gradient descent did for features plus classifier. The prerequisite is understanding what an improvement loop for each substrate looks like: mutation operators, selection signals, evaluation criteria.

Difficulties

The three classes have very different dynamics:

  • Opaque updates via gradient descent. Needs differentiable signal and heavy training infrastructure; large updates cycle on days to weeks, though smaller add-on mechanisms can be faster.
  • Symbolic artifacts are mutated by LLMs or search, then evaluated by tests, execution, or formal checks.
  • Prose artifacts are mutated by LLMs and evaluated by execution, use, or LLM-as-judge. Semantics stay underspecified, so verification is softer.

A joint optimizer has to handle pace mismatch — either it runs at the slowest class's cadence, or classes coevolve asynchronously without diverging — and cross-class credit assignment: a deployment failure rarely says which substrate wants the update (prompt revision, tool extraction, memory promotion, weight update, retrieval change). Per-class methods sidestep both by fixing the substrate in advance.

Starting point

Coevolution is the right conceptual frame, but a three-way joint optimizer isn't the near-term plan. The readable-substrate loop is the tractable unit for continual learning argues for starting with the prose+symbolic pair, on the basis of structural couplings that make the two a natural joint target.


Relevant Notes: