Constraining during deployment is continuous learning

Type: note · Status: current · Tags: learning-theory

AI labs frame "continuous learning" as a weight-update problem: how do you adapt a deployed model to new data, new tasks, and shifting distributions without a full retraining cycle? The standard approaches — fine-tuning on deployment logs, online learning, experience replay — all modify the model's parameters. But constraining — accumulating symbolic artifacts like prompts, schemas, evals, tools, and deterministic code — adapts deployed systems through a different mechanism entirely.

Each constraining step trades generality for compound gains in reliability, speed, and cost. Extracting a deterministic format_date() function makes date formatting reliable, fast, and cheap. Versioning a system prompt with house style examples makes tone consistent across sessions. Adding a validation script catches errors that previously required human review. These gains persist, accumulate, and compose — and they produce the same narrowing of the behavior distribution that fine-tuning targets, just through inspectable, rollbackable artifacts instead of opaque weight updates.

Herbert Simon: learning is any change that produces a more or less permanent change in a system's capacity for adapting to its environment. Constraining during deployment meets every part of this definition — it changes the system (new or modified artifacts), the change is permanent (versioned, committed), and the capacity for adaptation improves. The definition doesn't require weight updates. It requires capacity change.

This note is therefore a subset claim, not the umbrella claim. It does not argue that symbolic artifacts are the only continuous-learning substrate. It argues that constraining is one concrete way continuous learning happens outside weights.

This isn't hypothetical. Systems like DSPy and ProTeGi already automate one slice of constraining — searching over prompt components to optimize against an objective — and the ML community recognizes this as learning. Research on professional developers using AI agents shows the same pattern in manual form: developers iteratively refine prompts, tools, and workflows based on deployment experience. Agent memory systems (Claude's memory files, Cursor rules, AGENTS.md conventions) store preferences across sessions. All of this is continuous learning through constraining — it just isn't recognized as such.

Weight-based learning captures distributional knowledge (style, tone, world knowledge) that doesn't reduce to explicit artifacts — not all continuous learning is constraining. But the extractable, testable subset that constraining handles covers most of what deployed systems need. The manual version works; automating the judgment-heavy parts is where the real gap is.


Relevant Notes: