The chat-history model trades context efficiency for implementation simplicity
Type: note · Status: seedling · Tags: computational-model, tool-loop
The chat-history model became the default architecture for LLM applications because it is the cheapest way to preserve state without deciding in advance what the state should be. Append the next message, keep the full trace, let the model re-read everything. This buys implementation simplicity, auditability, and exploratory flexibility in one move.
That advantage is real. When the right handoff artifact is not yet known, preserving the transcript avoids premature compression. Builders do not need to define schemas, return types, or selection policies before they understand the task. Chat is a strong exploratory default.
But the property that makes chat easy to build makes it expensive to run under bounded context. The accumulated transcript is organized by time, not relevance. False starts, corrections, pleasantries, and intermediate reasoning that served its purpose three turns ago all survive into later calls. Each downstream step must re-interpret prior interaction rather than consume an artifact shaped for its own needs.
This is why mature orchestration drifts away from pure chat history even when systems begin there. Once builders understand what later stages actually need, they introduce compressed handoff artifacts, explicit return values, scoped sub-agents, or per-call prompt assembly — mechanisms that recover the context efficiency raw transcript inheritance wastes.
The contrast is not "chat is bad" versus "structured orchestration is good." It is between two optimization targets:
- Chat history optimizes for builder convenience and maximum information preservation
- Bounded-context orchestration optimizes for selective loading, explicit interfaces, and task-shaped artifacts
Those targets coincide early in a design, when preserving everything is safer than guessing wrong. They diverge when the bottleneck shifts from "how do I avoid losing information?" to "how do I stop re-reading the wrong information?"
The downstream claim that session history should not be the default next context follows from this analysis but is narrower: it argues that storage and next-context loading should be separate decisions. This note explains why they were conflated in the first place — chat won because it was easy to implement, not because it was the best architecture under context scarcity.
Relevant Notes:
- session history should not be the default next context — sharpens: the mechanism-level claim that stored trace and next-context loading are separate decisions
- bounded-context orchestration model — contrasts: the clean model assembles each bounded call from selected state rather than inheriting transcript by default
- context efficiency is the central design concern in agent systems — grounds: bounded context makes selective loading a first-class architectural concern
- context engineering — extends: prompt assembly, session boundaries, and handoff artifacts are all architectural responses to bounded-context pressure
- conversation-vs-prompt-refinement-in-agent-to-agent-coordination — exemplifies: one local instance of the broader tradeoff between trace preservation and compressed handoff
- llm context is composed without scoping — supports: accumulated conversational state stays globally visible unless the architecture introduces explicit isolation