Treat continual learning as substrate coevolution
Type: kb/types/note.md · Tags: learning-theory
Continual learning's open problem is behaviour, not knowledge names two behaviour-change mechanisms: expensive weight updates and cheap readable system-definition artifacts. Deploy-time learning places the readable mechanism on the timing axis. Splitting the readable side by semantic regime gives three substrate classes — opaque (weights and other hidden state), prose (prompts, notes, specs, rubrics), and symbolic (code, schemas, tests, tools). How should their improvement loops relate? They aren't independent: optimizing one assumes a position about the others.
Prose and symbolic cluster as the readable substrates — inspectable, editable, distinct from opaque in backend and update cost. The practical question of where to start building automated loops is the readable-substrate loop is the tractable unit for continual learning. This note is about the generic coevolution frame.
The mainstream direction: scaling the opaque loop
Computer vision provides the model. Before representation learning, features (SIFT, HOG) were hand-crafted and classifiers (SVMs) were learned — a clean separation that looked normal. Representation learning won by extending gradient descent across both, end-to-end. The general method didn't change; it covered more of the pipeline.
The bitter lesson extrapolates: general methods that leverage computation eat hand-crafted components. Applied today, mainstream research extends the opaque loop — RLHF, RLAIF, continual pretraining, online learning, fast adapters — hoping to subsume the hand-crafted prompts, tools, and evals that deployed systems depend on. This may or may not succeed; new architectures could close the tempo gap, or structural limits could keep large opaque updates on a slower cycle. This note takes no position on the outcome.
Per-substrate loops today
Current methods target individual substrates:
- DSPy, ProTeGi — automated search over prompts (prose), weights frozen.
- Genetic programming, FunSearch — automated search over code (symbolic), weights frozen.
- Meta-Harness — automated search over harness code and prompt/context logic (symbolic + prose), weights frozen, benchmark traces as selection signal.
- RLHF / RLAIF — updates weights (opaque), treating prompts and code as fixed.
- Hand curation (Commonplace and similar) — evolves prose fast and symbolic artifacts slowly, without automated search or weight updates.
Each is partial. Even unifying two classes — a joint optimizer over weights and prompts, say — would be a significant step, analogous to what end-to-end gradient descent did for features plus classifier. The prerequisite is understanding what an improvement loop for each substrate looks like: mutation operators, selection signals, evaluation criteria.
Difficulties
The three classes have very different dynamics:
- Opaque updates via gradient descent. Needs differentiable signal and heavy training infrastructure; large updates cycle on days to weeks, though smaller add-on mechanisms can be faster.
- Symbolic artifacts are mutated by LLMs or search, then evaluated by tests, execution, or formal checks.
- Prose artifacts are mutated by LLMs and evaluated by execution, use, or LLM-as-judge. Semantics stay underspecified, so verification is softer.
A joint optimizer has to handle pace mismatch — either it runs at the slowest class's cadence, or classes coevolve asynchronously without diverging — and cross-class credit assignment: a deployment failure rarely says which substrate wants the update (prompt revision, tool extraction, memory promotion, weight update, retrieval change). Per-class methods sidestep both by fixing the substrate in advance.
Starting point
Coevolution is the right conceptual frame, but a three-way joint optimizer isn't the near-term plan. The readable-substrate loop is the tractable unit for continual learning argues for starting with the prose+symbolic pair, on the basis of structural couplings that make the two a natural joint target.
Relevant Notes:
- Continual learning's open problem is behaviour, not knowledge — foundation: two behaviour-change mechanisms (weights, readable artifacts) — the premise that lets the readable pair count as a learning substrate at all
- Deploy-time learning is the missing middle — foundation: places the readable mechanism on the timing axis
- Axes of substrate analysis — foundation: defines the opaque/prose/symbolic split used throughout this note
- The readable-substrate loop is the tractable unit for continual learning — practical plan: the prose+symbolic pair is the tractable first slice
- In-context learning presupposes context engineering — extends: the context-engineering buildout is itself part of the joint loop
- Codification and relaxing navigate the bitter lesson boundary — operators: codify, relax, constrain, and distill are artifact-side update operators
- Meta-Harness — evidence: a fixed-weight proposer mutates harness code and context/memory logic from raw traces — an readable-substrate loop in practice
- Ingest: Meta-Harness: End-to-End Optimization of Model Harnesses — evidence: raw execution traces outperform scores-only or summarized feedback in automated harness search