Stabilisation during deployment is continuous learning

Type: note · Status: current

AI labs frame "continuous learning" as a weight-update problem: how do you adapt a deployed model to new data, new tasks, and shifting distributions without a full retraining cycle? The standard approaches — fine-tuning on deployment logs, online learning, experience replay — all modify the model's parameters. But stabilisation — accumulating versioned repo artifacts like prompts, schemas, evals, tools, and deterministic code — adapts deployed systems through a different mechanism entirely.

Each stabilisation step trades generality for compound gains in reliability, speed, and cost. Extracting a deterministic format_date() function makes date formatting reliable, fast, and cheap. Versioning a system prompt with house style examples makes tone consistent across sessions. Adding a validation script catches errors that previously required human review. These gains persist, accumulate, and compose — and they produce the same narrowing of the behavior distribution that fine-tuning targets, just through inspectable, rollbackable artifacts instead of opaque weight updates.

Herbert Simon: learning is any change that produces a more or less permanent change in a system's capacity for adapting to its environment. Stabilisation during deployment meets every part of this definition — it changes the system (new or modified artifacts), the change is permanent (versioned, committed), and the capacity for adaptation improves. The definition doesn't require weight updates. It requires capacity change.

This isn't hypothetical. Systems like DSPy and ProTeGi already automate one slice of stabilisation — searching over prompt components to optimize against an objective — and the ML community recognizes this as learning. Research on professional developers using AI agents shows the same pattern in manual form: developers iteratively refine prompts, tools, and workflows based on deployment experience. Agent memory systems (Claude's memory files, Cursor rules, AGENTS.md conventions) store preferences across sessions. All of this is continuous learning through stabilisation — it just isn't recognized as such.

Weight-based learning captures distributional knowledge (style, tone, world knowledge) that doesn't reduce to explicit artifacts — not all continuous learning is stabilisation. But the extractable, testable subset that stabilisation handles covers most of what deployed systems need. The manual version works; automating the judgment-heavy parts is where the real gap is.


Relevant Notes:

Topics: