Bounded-context orchestration model

Type: kb/types/note.md · Status: seedling · Tags: computational-model

This is a model of the joint LLM-code system, not a model of a standalone LLM. The system being modeled includes both the symbolic code that owns state and control flow and the bounded LLM calls that perform semantic judgment. The central question is how the code side should schedule, frame, and absorb those calls when no single LLM context window can hold all relevant state.

Two observations motivate this model. First, context is the scarce resource in agent systems — the finite window of tokens the agent can attend to, with both volume and complexity costs. Second, there is reason to think that bookkeeping and semantic work have different error profiles — symbolic substrates eliminate all three sources of error for bookkeeping, while LLMs are needed only for semantic judgment. (The second argument is conjectural; the first is well-established.)

Together these imply a natural architecture: a symbolic scheduler over bounded LLM calls. This is not a restrictive design choice — any symbolic program with LLM calls is a select/call program, so the model captures the full space of such architectures.

The model

The model has two components:

  • a symbolic scheduler over unbounded exact state, which assembles prompts and orchestrates the workflow
  • bounded clean context windows for each LLM call — the expensive, stochastic operation that the architecture is designed around

The scheduler's state includes source artifacts, prior prompts, and outputs from earlier LLM calls: relevance labels, cluster summaries, extracted claims, sub-goals, partial syntheses. In practice this state may live in files, in-memory structures, databases, or a mix. The operational requirement is simple: accumulated state lives there, not in conversation history; LLM calls do judgment work and return results to code; the next prompt is assembled from stored state rather than from the model's memory of prior turns.

The model also accommodates architectures where the LLM emits a symbolic control program rather than a direct natural-language answer. That still fits as long as execution and state progression remain external to the conversation. A system that keeps bookkeeping inside an LLM conversation is a degraded variant that spends bounded context on work the symbolic scheduler handles for free.

The select/call loop

Let:

  • K be the scheduler's full symbolic state — source artifacts plus everything prior calls have produced
  • P be one complete prompt, including both the requested operation and the material selected for that call
  • M be the maximum effective context budget for one call
  • ||P|| be the effective cost of complete prompt P — token count, compositional difficulty, task framing, or all three

The cost measure ||·|| is an idealized effective-cost measure over the whole prompt, not just a token count. The cost may depend on the kind of task that P describes: a synthesis prompt and a relevance-check prompt can have different effective costs even when they contain the same source material. Effective context is task-relative and complexity-relative, not a fixed model constant develops the empirical case.

The loop alternates between symbolic scheduling and bounded LLM calls. Symbolic scheduling happens outside LLM context: file listing, retrieval, sorting, prompt assembly, deduplication, state update, and cache maintenance. LLM calls are the bounded, stochastic steps that perform semantic judgment under focused prompts.

The select function either builds a prompt P from the current state K, subject to the feasibility constraint ||P|| ≤ M, or returns None when the scheduler has no further LLM call to make. This is where the scheduling difficulty lives.

The result r is incorporated back into symbolic state as K + r. In the minimal event-sourced case this is append-only: K is the complete trace, and select recomputes any derived view from that trace. Implementations usually cache derived symbolic state, such as indexes, rankings, dependency maps, queues, phase tags, parsed fields, retry metadata, or satisfaction signals. The model treats those caches as part of explicit K, not as hidden conversation state.

Operationally:

while (P := select(K)) is not None:
    r  = call(P)
    K  = K + r

Real orchestrators routinely fan out parallel calls. Parallelism changes the scheduling problem (the scheduler must merge or arbitrate when parallel results interact), but not the core structure: prompts are still selected from K, calls still produce results, and results still return to explicit state.

In practice, select cannot usually compute ||P|| exactly. It uses heuristics: token counts, known prompt templates, empirical difficulty estimates, prior relevance labels, decomposition plans, or feasibility judgments returned by earlier LLM calls. When an LLM helps judge feasibility or produce a plan, that judgment is itself another bounded call whose result is incorporated into K; a later select step consumes it symbolically. Hierarchical decomposition is therefore not a separate mechanism, but a pattern of using the same loop recursively.

The ContextProvider pattern is a concrete source-scoped instance of the loop. The parent agent keeps a small action alphabet such as query_slack or update_github; select(K) chooses the source boundary and frames the question or instruction; call(P) runs inside a provider sub-agent that owns the raw tools, source quirks, permissions, and optional skills. The article's token and latency claims are not reproducible evidence from the snapshot, but the architecture strongly validates the model's decomposition mechanism: tool complexity can move out of the parent context when a source boundary gives the bounded call a cleaner frame.

What makes selection hard

The select function is where the optimisation lives. The first problem is that selection is sequential, not static, so the task is already closer to a control problem than to a one-shot packing problem:

Sequential dependence. Each selection affects future state. A good first iteration might discover that the goal decomposes differently than expected, changing what later iterations should select. This makes the problem closer to a control problem than to one-shot packing.

Coupled selection and framing. Context cost has two dimensions — volume (how many tokens) and complexity (how hard the tokens are to use). The same knowledge, presented differently, has different value to a bounded observer: "Here are six documents, synthesise them" is less useful than "documents A and B establish X, documents C and D contradict it, resolve the tension." Same tokens, different yield for a bounded reader. See information value is observer-relative.

Scope and open questions

The full global optimisation problem is probably too rich for clean strategy theorems: goals are underspecified, LLM calls are noisy, the decision to halt or continue is itself a judgment call inside select, and the value of including item X depends on the sub-agent's stochastic interpretation. There is no clean objective function. But the model supports local comparative results — comparing two concrete strategies or justifying a transformation from one strategy to another. The decomposition rules catalogue specific transformations that the model shows move a system in the right direction.

  • Can the framing decisions within select be factored cleanly enough that their cost can be ignored in a first theory and reintroduced later?
  • How much selection judgment should the scheduler perform before constructing a bounded call, and how much should be delegated to the LLM inside that call?
  • What restrictions on the model (fixed decomposition templates, bounded branching, finite sub-goal depth) yield tractable optimisation while preserving enough expressiveness?
  • What heuristics make select good in practice?
  • When should the orchestrator compress state, offload it to external storage, or delegate to a sub-loop?
  • Can the loop be made self-improving — can later iterations learn from the quality of earlier selections? This would connect to deploy-time learning.

Sources: - Liu et al. (2026). ConvexBench: Can LLMs recognize convex functions? — scoped recursion with focused context as a clean-model implementation for compositional reasoning. - Meyerson et al. (2025). MAKER: Solving a million-step LLM task with zero errors — maximal decomposition (m=1) as extreme clean-model instantiation; O(s ln s) cost scaling. - @Vtrivedy10 (2026). The Anatomy of an Agent Harness — the Ralph Loop (prompt → execute → observe → decide) is a concrete instance of the select/call loop; the source's runtime components map to scheduler infrastructure. - Ashpreet Bedi (2026). Context providers: the missing layer between agents and tools — source-scoped provider sub-agents instantiate select/call by hiding raw tool surfaces behind bounded query/update calls.

Relevant Notes: