Deploy-time learning is the missing middle

Type: kb/types/note.md · Tags: learning-theory, deploy-time-learning

Continual learning's open problem is behaviour, not knowledge names system-definition artifacts as the cheap behaviour-change mechanism alongside expensive weight updates. This note places that mechanism on the timing axis.

Three timescales

Deployed systems adapt at three timescales, each with a different durable medium:

Timescale	When	Medium	Properties
Training	Before deployment	Distributed-parametric state	Durable but hard to inspect directly; heavy infrastructure; can't incorporate deployment-specific information
In-context	Within a session	Context window	Inspectable but ephemeral; evaporates at session end
Deploy-time	Across sessions, during deployment	Durable system-definition artifacts (prose + symbolic)	Durable, inspectable, versionable

Medium and timing are orthogonal axes in principle. The combination the table leaves empty — distributed-parametric updates at deployment pace — exists but stays rare because training infrastructure is heavy. OpenClaw-RL, which runs live RL from user interactions, is a current example.

Deploy-time learning is system-level adaptation: behaviour improves because artifacts improve — during deployment like in-context, durable like training, but inspectable and tool-compatible throughout.

Why AI researchers look past it

Traditional stateful software — CRMs, rule engines, document stores — counts as learning by Simon's criterion, but trivially: ordinary engineering handles it, so researchers filter it out. What they miss is how large a behaviour change can grow from durable system-definition artifacts.

A single prompt edit looks small, but a library of tips, schemas, tools, and tests accumulated across sessions is a different object. Context efficiency is why: progressive disclosure, skill routing, and retrieval into homoiconic context make the effective context far larger than the literal window, so stored artifacts can deliver behaviour change at weight-update scale. Researchers trained to think through gradients have mostly looked past it.

Mechanisms

Two operators drive the updates: constraining (narrowing the interpretation space) and distillation (re-compressing prior reasoning into task-ready artifacts). Codification, the far end of constraining, is where prompts undergo a phase transition into deterministic code. Both are reversible: commitments tighten along the verifiability gradient when cross-run patterns make them safe, and loosen when new evidence shows them wrong. A system that can only tighten ratchets itself into brittleness.

Co-evolving prose and code

Agile was already doing deploy-time learning, with an asymmetry: code and specs co-evolved, but only code executed, so moving a concern back to prose meant taking it out of production. LLMs close the asymmetry — prompts execute, so loosening a codified behaviour back to prose keeps the system running.

You deploy with behaviour in prompts, observe what works, codify the understood parts, and the prompts evolve as the code absorbs them. The boundary between code and prose moves as understanding accumulates.

The end state also differs. Agile treats natural-language specs as temporary — stories waiting to become code. Deploy-time learning recognises that some parts should stay in prose because they require judgment deterministic code can't capture. The hybrid is the end state, not a waypoint.

Boundary

This note is the timing argument alone. How distributed-parametric, prose, and symbolic representational forms should coevolve is treat continual learning as substrate coevolution.

Relevant Notes:

Continual learning's open problem is behaviour, not knowledge — foundation: places system-definition artifacts on the timing axis
Treat continual learning as substrate coevolution — extends: asks how deploy-time prose/symbolic loops relate to distributed-parametric loops
The verifiability gradient — extends: the ladder that deploy-time artifacts move along in both directions
Axes of artifact analysis — sharpens: the repo is Commonplace's storage substrate choice for many durable system-definition artifacts
LLM context is a homoiconic medium — mechanism: lets content function as instruction, not only as data
Context efficiency is the central design concern in agent systems — lever: selective access patterns make the effective context far larger than the window, which is what lets stored-artifact behaviour change reach weight-update scale
changing requirements conflate genuine change with disambiguation failure — extends: agile's "changing requirements" reframed through the interpretation-error lens
Context Engineering for AI Agents in OSS — validates: 466 OSS projects treat AI context files as maintained software artifacts
ABC: Agent Behavioral Contracts — extends: behavioural contracts are verifiable repo artifacts that improve reliability without weight updates
Harness Engineering (Lopopolo, 2026) — exemplifies: "good harnesses compound" in practitioner language
Flawed Ephemeral Software Hypothesis — sibling: AI lowers the cost of mutating durable artifact stacks, not replacing them
in-context learning presupposes context engineering — extends: in-context learning depends on deploy-time learning to build the context-engineering machinery

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search