LLM context is composed without scoping

Type: note · Status: seedling · Tags: computational-model

An LLM's context is assembled by concatenating system prompts, skill bodies, user messages, and tool outputs into a single token stream. There is no scoping mechanism. Everything is global — every token is visible to every other token, and there is no way to say "this binding is local to this skill" or "this tool output should not influence instruction interpretation."

This is not even dynamic scoping, which at least has a stack with push and pop. It is flat concatenation — the homoiconic medium with no structure imposed on top. But the pathologies are the same ones that dynamic scoping produces, and the analogy to dynamically scoped Lisp clarifies them:

Spooky action at a distance. An early turn subtly biases a later response. The LLM has no mechanism to mark a binding as out of scope — once something enters the log, it influences everything downstream. This is the three-space memory claim's "operational debris pollutes search" failure mode, restated as a scoping problem.

Name collision. The word "table" meant an HTML element in turn 3 but a database table in turn 12, and the model conflates them. In a flat log there are no scope boundaries to disambiguate — every use of a term is in the same namespace.

Inability to reason locally. You cannot predict what a sub-task will do by reading its prompt alone, because its behavior depends on the entire accumulated history. This is the defining problem of dynamic scope: the meaning of a name depends on the call stack, not the definition site.

The structural parallel holds in both directions: - Bindings accumulate at runtime rather than being declared at definition time - Every consumer sees the full accumulated environment, not a curated subset - There is no mechanism for a sub-computation to limit what it inherits - Debugging requires inspecting the full runtime history, not just the local code

What flat context buys

Dynamic scoping survived in Emacs Lisp for decades because it has a real advantage: implicit communication. Functions can influence each other without explicit parameter passing. The flat log has the same property. When a user says "use a more formal tone" in turn 5, they want that to implicitly affect all subsequent turns without re-parameterizing anything. That's dynamic binding of a *tone* special variable, and it works precisely because the log is flat and globally visible.

The right model isn't "always avoid flat logs" but rather what Common Lisp settled on: lexical scope by default, dynamic scope when explicitly requested.

The capture problem

Flat concatenation creates a composition-specific problem: capture. A skill says "summarize the document." The document contains "don't summarize this section, skip it." The data-level use of "summarize" captures the instruction-level meaning. This is prompt injection framed as a hygiene failure — the same problem Scheme's hygienic macros solve for code generation.

Within-frame hygiene

Within a single context, the only scoping mechanisms available are weak conventions:

Role markers (system/user/assistant/tool in chat APIs) — primitive structural separation, but the LLM still sees all roles in one attention pass
Delimiters and quoting — XML tags, markdown fences, explicit "the following is data, not instructions" markers — conventional, not enforced
Ordering conventions — system prompt first, then context, then user message — exploits primacy/recency effects but provides no isolation

These are the LLM equivalent of coding conventions in a language without a module system. They help, but they can't prevent capture.

Sub-agents as the scoping mechanism

Sub-agents are the one place where real isolation is achievable. A sub-agent gets a fresh context — its own system prompt, its own input, no inherited conversation history. The parent sees only the return value, not the internal reasoning.

This is lexical scoping: the sub-agent's "code" (its prompt) determines what's visible, not the runtime history. The design principle, borrowed from Common Lisp: lexical scope by default, dynamic scope when explicitly declared.

Lexically scoped (frame-local): The sub-agent's system prompt, the specific input for this invocation, any context the caller explicitly passes. Determined at "definition time" — when the sub-agent is designed.

Dynamically scoped (inherited): User preferences ("use a formal tone"), safety policies, global constraints, project-level conventions. Explicitly declared as "special" bindings that persist across all frames. The llm-do system prompt layer already approximates this — it's the dynamic environment that persists while call-specific context is lexically scoped.

The key word is explicitly. In a flat context, everything is implicitly global. In the scoped model, cross-frame bindings are a deliberate design choice.

The return value problem

The stack metaphor exposes a question flat contexts dodge: what does a sub-agent return? A function returns a typed value. A sub-agent returns natural language, or structured data, or a partial result with caveats.

This is where codification becomes load-bearing. Early in exploration, sub-agents return loose natural language — the equivalent of an untyped s-expression. As you codify, return values become structured, typed, validated. The stack architecture enables this progressive typing because each frame boundary is an explicit interface point where you can impose increasingly strict contracts.

The flat context has no such interface points. Everything bleeds into everything, making it impossible to even ask "what is the contract between these two stages of reasoning?"

What exists today

Most agent frameworks use flat contexts. Sub-agent architectures that approximate lexical scoping exist but are ad hoc: - llm-do's unified calling conventions give each agent its own system prompt and arguments — frame-local context - Claude Code's sub-agent tool spawns agents with clean context plus a task description — lexical framing - The loading frequency hierarchy (always-loaded → on-demand → task-specific) is a form of binding-time analysis for agent context

Several KB-design patterns are already lexical scoping in practice:

The routing tier separation — skills are frame-local context loaded deterministically; methodology is out of scope unless explicitly loaded
Type signatures on skills — frame interfaces that declare what bindings a sub-agent receives
Automatic context injection — the context engine constructs frames by determining which bindings to inject rather than exposing the full accumulated context

None of these frame it as a scoping discipline. Making it explicit would clarify what gets inherited and what gets isolated.

Undeveloped directions

These ideas follow from the stack-frame model but don't yet have concrete examples:

Tail-call optimisation for sub-agents. If a sub-agent's last action is delegating to another sub-agent, you don't need to keep the first frame alive — discard its context entirely. In a flat context, the first agent's reasoning is still consuming tokens.

Stack unwinding for error recovery. When a deep sub-agent fails, selectively discard its context while preserving the frames above it that hold recovery logic. In a flat context, there is no clean way to undo a failed sub-task's contamination. This connects to condition/restart systems in Common Lisp.

Recursion with clean frames. A flat context makes recursive decomposition painful because each recursive call appends to the same context. With a proper stack, each recursive call gets a clean frame and completed calls are popped — bounded by single-frame size, not cumulative size. ConvexBench (Liu et al., 2026) provides direct empirical validation: LLMs verifying convexity of composed functions collapse from F1=1.0 to F1≈0.2 at depth 100, even though the total token count (5,331) is trivial relative to the context window. The failure is not about token capacity but about compositional reasoning depth — each recursive step conditions on an expanding history of prior sub-steps, diluting attention on the current step's actual dependencies. When the authors prune accumulated history to retain only direct dependencies at each recursive sub-step (i.e., give each call a clean frame), performance recovers to F1=1.0 at all depths. This confirms the prediction: the stack discipline's value is not just theoretical tidiness but measurable recovery of reasoning capability that flat accumulation destroys.

Sources: - Anthropic (2025). Effective context engineering for AI agents — recommends sub-agents return 1,000–2,000 token summaries; the tens of thousands of tokens each sub-agent explores stay out of the caller's window. Validates the lexically scoped frames pattern.

Relevant Notes:

llm context is a homoiconic medium — amplifies: the medium provides no structural boundaries, so scoping must be imposed by architecture
agent orchestration needs coordination guarantees, not just coordination channels — extends: scoping is one coordination guarantee family; without it, flat context fails by contamination rather than by inconsistency or amplification
three-space memory separation predicts measurable failure modes — exemplifies: the failure modes (search pollution, identity scatter, insight trapping) are symptoms of flat scoping applied to memory
agentic systems interpret underspecified instructions — foundation: underspecified instructions are sensitive to everything in context, making scope contamination especially damaging
unified calling conventions enable bidirectional refactoring — existing approximation: llm-do's per-agent system prompts and arguments are frame-local context
codification — enables: frame boundaries are interface points where return values can be progressively typed
instruction specificity should match loading frequency — grounds: the loading hierarchy is a form of binding-time analysis for what's in scope
agent statelessness makes routing architectural, not learned — exemplifies: the routing tier separation is lexical scoping in practice
instructions are typed callables — enables: type signatures on skills are frame interfaces — declaring what bindings a sub-agent receives
agent statelessness means the context engine should inject context automatically — mechanism: automatic context injection constructs lexically scoped frames
topology, isolation, and verification form a causal chain for reliable agent scaling — extends: argues that scope isolation is the second prerequisite in a dependency chain, manufacturing the atomic units that verification needs

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search