Deploy-time learning is the missing middle

Type: kb/types/note.md · Tags: learning-theory

Continual learning's open problem is behaviour, not knowledge names system-definition artifacts as the cheap behaviour-change mechanism alongside expensive weight updates. This note places that mechanism on the timing axis.

Three timescales

Deployed systems adapt at three timescales, each on a different substrate:

Timescale When Substrate Properties
Training Before deployment Weights Durable but opaque; heavy infrastructure; can't incorporate deployment-specific information
In-context Within a session Context window Inspectable but ephemeral; evaporates at session end
Deploy-time Across sessions, during deployment Durable system-definition artifacts (prose + symbolic) Durable, inspectable, versionable

Substrate and timing are orthogonal axes in principle. The combination the table leaves empty — weight updates at deployment pace — exists but stays rare because training infrastructure is heavy. OpenClaw-RL, which runs live RL from user interactions, is a current example.

Deploy-time learning is system-level adaptation: behaviour improves because artifacts improve — during deployment like in-context, durable like training, but inspectable and tool-compatible throughout.

Why AI researchers look past it

Traditional stateful software — CRMs, rule engines, document stores — counts as learning by Simon's criterion, but trivially: ordinary engineering handles it, so researchers filter it out. What they miss is how large a behaviour change can grow from durable system-definition artifacts.

A single prompt edit looks small, but a library of tips, schemas, tools, and tests accumulated across sessions is a different object. Context efficiency is why: progressive disclosure, skill routing, and retrieval into homoiconic context make the effective context far larger than the literal window, so stored artifacts can deliver behaviour change at weight-update scale. Researchers trained to think through gradients have mostly looked past it.

Mechanisms

Two operators drive the updates: constraining (narrowing the interpretation space) and distillation (re-compressing prior reasoning into task-ready artifacts). Codification, the far end of constraining, is where prompts undergo a phase transition into deterministic code. Both are reversible: commitments tighten along the verifiability gradient when cross-run patterns make them safe, and loosen when new evidence shows them wrong. A system that can only tighten ratchets itself into brittleness.

Co-evolving prose and code

Agile was already doing deploy-time learning, with an asymmetry: code and specs co-evolved, but only code executed, so moving a concern back to prose meant taking it out of production. LLMs close the asymmetry — prompts execute, so loosening a codified behaviour back to prose keeps the system running.

You deploy with behaviour in prompts, observe what works, codify the understood parts, and the prompts evolve as the code absorbs them. The boundary between code and prose moves as understanding accumulates.

The end state also differs. Agile treats natural-language specs as temporary — stories waiting to become code. Deploy-time learning recognises that some parts should stay in prose because they require judgment deterministic code can't capture. The hybrid is the end state, not a waypoint.

Boundary

This note is the timing argument alone. How opaque, prose, and symbolic substrates should coevolve is treat continual learning as substrate coevolution.


Relevant Notes: