LLM context is composed without scoping

Type: kb/types/note.md · Status: seedling · Tags: computational-model

An LLM's context is assembled by concatenating system prompts, skill bodies, user messages, and tool outputs into a single token stream. Everything is global: every token is visible to every other token, with no way to say "this binding is local to this skill" or "this tool output should not influence instruction interpretation."

This is not even dynamic scoping (name bindings resolved through the call stack rather than the source structure), which at least maintains a stack with push and pop. Flat concatenation is the homoiconic medium (instructions and data share one representation) with no structure imposed on top, yet it produces dynamic scoping's pathologies — and the Lisp analogy still clarifies them:

Spooky action at a distance. An early turn subtly biases a later response. The LLM has no mechanism to mark a binding as out of scope — once something enters the log, it influences everything downstream. This is the three-space memory claim's "operational debris pollutes search" failure mode, restated as a scoping problem.

Name collision. "Table" meant an HTML element in turn 3 but a database table in turn 12, and the model conflates them. A flat log has no scope boundaries to disambiguate — every use of a term sits in one namespace.

Inability to reason locally. You cannot predict what a sub-task will do by reading its prompt alone; its behavior depends on the entire accumulated history. This is the defining problem of dynamic scope: the meaning of a name depends on the call stack, not the definition site.

The capture problem

Flat concatenation creates a composition-specific problem: capture. A skill says "summarize the document." The document contains "don't summarize this section, skip it." The data-level use of "summarize" captures the instruction-level meaning. This is a hygiene failure that leads to prompt injection — the same problem Scheme's hygienic macros (macros that rewrite code without accidentally capturing names from the call site) solve for code generation.

Within-frame hygiene

Within a single context, the only scoping mechanisms available are weak conventions:

  • Role markers (system/user/assistant/tool in chat APIs) — primitive structural separation, but the LLM still sees all roles in one attention pass
  • Delimiters and quoting — XML tags, markdown fences, explicit "the following is data, not instructions" markers — conventional, not enforced
  • Ordering conventions — system prompt first, then context, then user message — exploits primacy/recency effects but provides no isolation

These are the LLM equivalent of coding conventions in a language without a module system. They help, but they cannot prevent capture.

What flat context buys

Flat logs have a real upside: implicit communication. When a user says "use a more formal tone" in turn 5, the effect propagates to later turns without re-parameterizing. This ambient influence is what makes flat context ergonomic at single-call granularity. The design question is not whether to have the upside, but where to contain it.

The architectural response

The scoping problem is prose-specific. Symbolic artifacts (code, schemas, types) inherit scoping from their interpreter — see axes of artifact analysis — and opaque artifacts don't have the question at all. Prose has nothing to inherit: no modules, no lexical scope, no interpreter-enforced boundaries. Scope can only be imposed architecturally.

At invocation time this surfaces as a design choice — flat (parent context) or bounded (sub-agent frame) — same class, same backend, same role, different context-efficiency profile. Flat pays the full volume and complexity cost and risks contamination; bounded trades an interface cost for isolation.

Sub-agents are the canonical architectural move: code outside the LLM constructs a fresh flat context, the LLM sees only that, and the scope lives in the orchestration code rather than in the LLM itself.

This is one specialization of the general constraining argument in agentic systems interpret underspecified instructions — enforcement is the qualitative reason to move a property to code, distinct from the quantitative reasons (cost, latency, reliability). The error-profile version is scheduler-llm-separation exploits an error-correction asymmetry: bookkeeping has catastrophic error cost on the semantic substrate (the LLM) and zero error cost on the symbolic substrate (the surrounding code). Scope is bookkeeping, so it belongs on the symbolic side.

Empirical validation comes from ConvexBench (Liu et al., 2026), a benchmark for recognizing convexity in deeply composed symbolic functions: LLMs collapse from F1=1.0 to F1≈0.2 at depth 100, even though the total token count (~5,331) is trivial relative to the context window. The failure is compositional reasoning depth, not token capacity — each recursive step conditions on an expanding history that dilutes attention on the current step. Pruning to retain only direct dependencies at each sub-step (one clean frame per call) recovers F1=1.0 at all depths.


Sources: - Anthropic (2025). Effective context engineering for AI agents — recommends sub-agents return 1,000–2,000 token summaries; the tens of thousands of tokens each sub-agent explores stay out of the caller's window. Validates the lexically scoped frames pattern.

Relevant Notes: