LLM context is composed without scoping
Type: note · Status: seedling · Areas: computational-model
An LLM's context is assembled by concatenating system prompts, skill bodies, user messages, and tool outputs into a single token stream. There is no scoping mechanism. Everything is global — every token is visible to every other token, and there is no way to say "this binding is local to this skill" or "this tool output should not influence instruction interpretation."
This is not even dynamic scoping, which at least has a stack with push and pop. It is flat concatenation — the homoiconic medium with no structure imposed on top. But the pathologies are the same ones that dynamic scoping produces, and the analogy to dynamically scoped Lisp clarifies them:
Spooky action at a distance. An early turn subtly biases a later response. The LLM has no mechanism to mark a binding as out of scope — once something enters the log, it influences everything downstream. This is the three-space memory claim's "operational debris pollutes search" failure mode, restated as a scoping problem.
Name collision. The word "table" meant an HTML element in turn 3 but a database table in turn 12, and the model conflates them. In a flat log there are no scope boundaries to disambiguate — every use of a term is in the same namespace.
Inability to reason locally. You cannot predict what a sub-task will do by reading its prompt alone, because its behavior depends on the entire accumulated history. This is the defining problem of dynamic scope: the meaning of a name depends on the call stack, not the definition site.
The structural parallel holds in both directions: - Bindings accumulate at runtime rather than being declared at definition time - Every consumer sees the full accumulated environment, not a curated subset - There is no mechanism for a sub-computation to limit what it inherits - Debugging requires inspecting the full runtime history, not just the local code
What flat context buys
Dynamic scoping survived in Emacs Lisp for decades because it has a real advantage: implicit communication. Functions can influence each other without explicit parameter passing. The flat log has the same property. When a user says "use a more formal tone" in turn 5, they want that to implicitly affect all subsequent turns without re-parameterizing anything. That's dynamic binding of a *tone* special variable, and it works precisely because the log is flat and globally visible.
The right model isn't "always avoid flat logs" but rather what Common Lisp settled on: lexical scope by default, dynamic scope when explicitly requested.
The capture problem
Flat concatenation creates a composition-specific problem: capture. A skill says "summarize the document." The document contains "don't summarize this section, skip it." The data-level use of "summarize" captures the instruction-level meaning. This is prompt injection framed as a hygiene failure — the same problem Scheme's hygienic macros solve for code generation.
Within-frame hygiene
Within a single context, the only scoping mechanisms available are weak conventions:
- Role markers (system/user/assistant/tool in chat APIs) — primitive structural separation, but the LLM still sees all roles in one attention pass
- Delimiters and quoting — XML tags, markdown fences, explicit "the following is data, not instructions" markers — conventional, not enforced
- Ordering conventions — system prompt first, then context, then user message — exploits primacy/recency effects but provides no isolation
These are the LLM equivalent of coding conventions in a language without a module system. They help, but they can't prevent capture.
Sub-agents as the scoping mechanism
Sub-agents are the one place where real isolation is achievable. A sub-agent gets a fresh context — its own system prompt, its own input, no inherited conversation history. The parent sees only the return value, not the internal reasoning.
This is lexical scoping: the sub-agent's "code" (its prompt) determines what's visible, not the runtime history. The design principle, borrowed from Common Lisp: lexical scope by default, dynamic scope when explicitly declared.
Lexically scoped (frame-local): The sub-agent's system prompt, the specific input for this invocation, any context the caller explicitly passes. Determined at "definition time" — when the sub-agent is designed.
Dynamically scoped (inherited): User preferences ("use a formal tone"), safety policies, global constraints, project-level conventions. Explicitly declared as "special" bindings that persist across all frames. The llm-do system prompt layer already approximates this — it's the dynamic environment that persists while call-specific context is lexically scoped.
The key word is explicitly. In a flat context, everything is implicitly global. In the scoped model, cross-frame bindings are a deliberate design choice.
The return value problem
The stack metaphor exposes a question flat contexts dodge: what does a sub-agent return? A function returns a typed value. A sub-agent returns natural language, or structured data, or a partial result with caveats.
This is where codification becomes load-bearing. Early in exploration, sub-agents return loose natural language — the equivalent of an untyped s-expression. As you codify, return values become structured, typed, validated. The stack architecture enables this progressive typing because each frame boundary is an explicit interface point where you can impose increasingly strict contracts.
The flat context has no such interface points. Everything bleeds into everything, making it impossible to even ask "what is the contract between these two stages of reasoning?"
What exists today
Most agent frameworks use flat contexts. Sub-agent architectures that approximate lexical scoping exist but are ad hoc: - llm-do's unified calling conventions give each agent its own system prompt and arguments — frame-local context - Claude Code's sub-agent tool spawns agents with clean context plus a task description — lexical framing - The loading frequency hierarchy (always-loaded → on-demand → task-specific) is a form of binding-time analysis for agent context
Several KB-design patterns are already lexical scoping in practice:
- The routing tier separation — skills are frame-local context loaded deterministically; methodology is out of scope unless explicitly loaded
- Type signatures on skills — frame interfaces that declare what bindings a sub-agent receives
- Automatic context injection — the harness constructs frames by determining which bindings to inject rather than exposing the full accumulated context
None of these frame it as a scoping discipline. Making it explicit would clarify what gets inherited and what gets isolated.
Undeveloped directions
These ideas follow from the stack-frame model but don't yet have concrete examples:
Tail-call optimisation for sub-agents. If a sub-agent's last action is delegating to another sub-agent, you don't need to keep the first frame alive — discard its context entirely. In a flat context, the first agent's reasoning is still consuming tokens.
Stack unwinding for error recovery. When a deep sub-agent fails, selectively discard its context while preserving the frames above it that hold recovery logic. In a flat context, there is no clean way to undo a failed sub-task's contamination. This connects to condition/restart systems in Common Lisp.
Recursion with clean frames. A flat context makes recursive decomposition painful because each recursive call appends to the same context. With a proper stack, each recursive call gets a clean frame and completed calls are popped — bounded by single-frame size, not cumulative size. ConvexBench (Liu et al., 2026) provides direct empirical validation: LLMs verifying convexity of composed functions collapse from F1=1.0 to F1≈0.2 at depth 100, even though the total token count (5,331) is trivial relative to the context window. The failure is not about token capacity but about compositional reasoning depth — each recursive step conditions on an expanding history of prior sub-steps, diluting attention on the current step's actual dependencies. When the authors prune accumulated history to retain only direct dependencies at each recursive sub-step (i.e., give each call a clean frame), performance recovers to F1=1.0 at all depths. This confirms the prediction: the stack discipline's value is not just theoretical tidiness but measurable recovery of reasoning capability that flat accumulation destroys.
Sources: - Anthropic (2025). Effective context engineering for AI agents — recommends sub-agents return 1,000–2,000 token summaries; the tens of thousands of tokens each sub-agent explores stay out of the caller's window. Validates the lexically scoped frames pattern.
Relevant Notes:
- llm context is a homoiconic medium — amplifies: the medium provides no structural boundaries, so scoping must be imposed by architecture
- three-space memory separation predicts measurable failure modes — exemplifies: the failure modes (search pollution, identity scatter, insight trapping) are symptoms of flat scoping applied to memory
- agentic systems interpret underspecified instructions — foundation: underspecified instructions are sensitive to everything in context, making scope contamination especially damaging
- unified calling conventions enable bidirectional refactoring — existing approximation: llm-do's per-agent system prompts and arguments are frame-local context
- codification — enables: frame boundaries are interface points where return values can be progressively typed
- instruction specificity should match loading frequency — grounds: the loading hierarchy is a form of binding-time analysis for what's in scope
- agent statelessness makes routing architectural, not learned — exemplifies: the routing tier separation is lexical scoping in practice
- instructions are typed callables — enables: type signatures on skills are frame interfaces — declaring what bindings a sub-agent receives
- agent statelessness means the harness should inject context automatically — mechanism: automatic context injection constructs lexically scoped frames
Topics: