Frontloading spares execution context

Type: note · Status: seedling

When instructing LLMs, any part of the instructions whose inputs are known before the LLM runs should be computed beforehand and the result inserted directly. This spares execution context — the primary bottleneck in LLM-based systems.

The context saving

Context is the central scarce resource in agent systems. Frontloading addresses the complexity dimension of that scarcity: executing a procedure inside a bounded call costs more context than inserting its result. The procedure text itself may be small — "search for X in kb/notes/" is a single line — but execution generates artifacts that persist in the context window: tool calls, search results, reasoning traces, interpretation. All of that competes with the work that actually needs the LLM's judgment. A pre-computed listing costs only the bytes of the listing itself; the same listing produced at runtime costs the instruction, the tool call, the full output, and the LLM's interpretation of it — on every invocation.

The saving extends beyond procedure execution to discovery avoidance. When values like paths, endpoints, or configuration are pre-resolved, the agent never spends tokens determining them at runtime — no searching, no trial-and-error, no asking the user. The resolution can happen entirely outside the agent: at installation time, build time, or session start. This is the most basic form of frontloading — replacing what the agent would have to figure out with what is already known.

What qualifies for frontloading

The test: can this be computed without the LLM's runtime state (the conversation, the user's query, the evolving task)?

Static (frontloadable): - Variable resolution — paths, project names, configuration values known at setup time (the indirection elimination case) - File listings — "here are the files in kb/notes/" rather than "list the files in kb/notes/" - Aggregations — counts, summaries of known datasets, pre-computed indexes - Template expansion — build-time generation of skills and instructions

Anything that depends on the user's current request, the conversation state, or the evolving task is dynamic and not frontloadable. The boundary isn't always sharp — some sub-procedures depend partially on known and partially on runtime information. In those cases, frontload the known parts and leave the runtime-dependent parts as instructions.

Frontloading vs codification

Indirection elimination and build-time generation are common cases of frontloading. In those cases the pre-computed result happens to be deterministic, so frontloading and codification (committing to a single deterministic output) apply simultaneously. But frontloading does not require determinism — the context saving comes from replacing derivation with insertion, whether the result is deterministic or still underspecified.

The mechanism: partial evaluation or divide-and-conquer?

Frontloading looks like divide-and-conquer: solve a subproblem, pass the result to the next stage. Any system does this. But in LLM instruction systems, frontloading can also be viewed through the lens of partial evaluation.

The key: LLM context is a homoiconic medium. Instructions and data are both natural language tokens. When you pre-compute a file listing and insert it into an instruction, the result is still a valid instruction — you've specialised a program with respect to known inputs, producing a residual program in the same medium. That's partial evaluation, not just preprocessing. In a non-homoiconic system, the pre-computed result would need a format conversion to re-enter the instruction stream; here it flows in directly because everything is text.

Standard PE specialises a program P with respect to known static inputs s, producing a residual program Ps that needs only the remaining dynamic inputs d:

[[Ps]](d) = [[P]](s, d)
PE concept Frontloading equivalent
Program P The instruction set (CLAUDE.md, skills, prompts)
Static inputs s Everything known before the LLM runs (paths, file listings, config, search results over stable content)
Dynamic inputs d The user's request, conversation state, evolving task
Residual program Ps The frontloaded instructions — static sub-procedures replaced with their results
Binding-time analysis The author's judgment about what depends on runtime context vs what doesn't
Specialisation The build-time/setup-time step that produces concrete instructions

Template variable expansion is textbook PE. The generate-at-build-time note describes a specialiser for skill templates.

Where the PE analogy stretches

Standard PE assumes precise denotational semantics, exact equivalence, and time as the optimisation target. LLM instructions differ on all three points:

  • The "program" has underspecified semantics, so there is no exact [[P]]
  • Replacing a procedure with its result is only approximately equivalent
  • The gain is context and reliability, not runtime speed

Those differences matter for theory, but not for the practical benefit. Frontloading saves context by removing procedures from the bounded LLM call. The homoiconicity of the medium is what makes the PE framing precise rather than merely metaphorical — without it, "pre-compute and insert" would be just divide-and-conquer.

Relationship to the scheduling model

The symbolic scheduling model models frontloading as the single-step case of its separation between symbolic computation and bounded LLM calls: pre-compute what can be known, reserve the bounded call for what requires judgment.


Relevant Notes: