The chat-history model trades context efficiency for implementation simplicity

Type: note · Status: seedling · Tags: computational-model, tool-loop

The chat-history model became the default architecture for LLM applications because it is the cheapest way to preserve state without deciding in advance what the state should be. Append the next message, keep the full trace, let the model re-read everything. This buys implementation simplicity, auditability, and exploratory flexibility in one move.

That advantage is real. When the right handoff artifact is not yet known, preserving the transcript avoids premature compression. Builders do not need to define schemas, return types, or selection policies before they understand the task. Chat is a strong exploratory default.

But the property that makes chat easy to build makes it expensive to run under bounded context. The accumulated transcript is organized by time, not relevance. False starts, corrections, pleasantries, and intermediate reasoning that served its purpose three turns ago all survive into later calls. Each downstream step must re-interpret prior interaction rather than consume an artifact shaped for its own needs.

This is why mature orchestration drifts away from pure chat history even when systems begin there. Once builders understand what later stages actually need, they introduce compressed handoff artifacts, explicit return values, scoped sub-agents, or per-call prompt assembly — mechanisms that recover the context efficiency raw transcript inheritance wastes.

The contrast is not "chat is bad" versus "structured orchestration is good." It is between two optimization targets:

  • Chat history optimizes for builder convenience and maximum information preservation
  • Bounded-context orchestration optimizes for selective loading, explicit interfaces, and task-shaped artifacts

Those targets coincide early in a design, when preserving everything is safer than guessing wrong. They diverge when the bottleneck shifts from "how do I avoid losing information?" to "how do I stop re-reading the wrong information?"

The downstream claim that session history should not be the default next context follows from this analysis but is narrower: it argues that storage and next-context loading should be separate decisions. This note explains why they were conflated in the first place — chat won because it was easy to implement, not because it was the best architecture under context scarcity.


Relevant Notes: