Frontloading spares execution context

Type: kb/types/note.md · Status: seedling

When instructing LLMs, frontloading means computing instruction inputs whose values are already known before the consuming call, then inserting the result directly. Because context is the central scarce resource in agent systems, frontloading spares execution context: it keeps the prompt, tool-output, interpretation, reasoning, and follow-up-call budget for work whose result is not already available.

The context saving

Doing a procedure inside an LLM call costs more context than inserting its result. The procedure text itself may be small: "search for X in kb/notes/" is a single line. Carrying it out can still produce tool calls, search results, reasoning traces, interpretation work, and sometimes additional LLM calls whose outputs then occupy the context window. All of that competes with the call's real task, and the cost recurs on every invocation.

The saving matters before any hard token limit is reached. Frontloading can be constitutive because it shapes what fits in a consuming call's effective context: without the pre-step, the call may become less reliable through missed instructions, shallow reasoning, stale material treated as live, or budget spent interpreting setup. This follows from the broader point that agent context is constrained by soft degradation, not hard token limits. Frontloading is also economic when one build-time, install-time, or session-start computation saves many runtime calls from repeating the same work.

Discovery avoidance is the practical version of the same pattern. Pre-resolving paths, endpoints, or configuration means the agent never spends runtime context determining them. The resolution happens outside the consuming call, replacing what the agent would otherwise have to figure out with what is already known.

What to frontload

The basic test is whether the value is known before the consuming call runs.

Static (frontloadable): - Variable resolution — paths, project names, configuration values known at setup time (the indirection elimination case) - File listings — "here are the files in kb/notes/" rather than "list the files in kb/notes/" - Aggregations — counts, summaries of known datasets, pre-computed indexes - Template expansion — build-time generation of skills and instructions - Caller-resolved inputs — what a parent agent has discovered, decided, or framed at runtime, packaged into instructions for a sub-agent that doesn't see the parent's conversation

What counts as known-before-the-call depends on the consumer. State that is dynamic for a parent agent's LLM can be static for a sub-agent it spawns, because the parent can package its judgment as a self-contained instruction. Hybrid sub-procedures are common: frontload the known parts and instruct the rest.

A frontloaded artifact also needs a validity window: the span during which its pre-computed inputs remain accurate. If inputs may change before the consuming call, include enough lineage (what it depends on, and when it must be regenerated), timestamp, or regeneration instruction for refresh.

Possibility is not enough. Frontload when the pre-step removes repeated discovery, runtime indirection, or task-specific ambiguity from a later LLM call. Stop when the pre-step merely restates a stable skill contract, meaning an interface the callee already has loaded.

Frontloading vs codification

Frontloading can also be constraining when it narrows the interpretations available to a later consumer. It becomes codification when the result is consumed by a symbolic artifact with formal semantics or assigned consequences, such as a schema, route table, validator input, or executable function. Deterministic prose generation, by itself, is frontloading without being codification.

Mechanism

The most common realization is inlining: the pre-computed result is substituted directly into the instruction stream. See Frontloading is partial evaluation, not divide-and-conquer for a more theoretical discussion of the mechanism. At the architecture level, the symbolic scheduling model treats frontloading as the single-step case of its separation between symbolic computation and LLM calls: pre-compute what can be known, reserve the LLM call for what requires judgment.


Relevant Notes: