Orchestration strategies and run-state have opposite persistence economics

Type: kb/types/note.md · Status: seedling · Tags: computational-model, learning-theory, tool-loop

A scheduler over bounded LLM calls has two symbolic parts: the accumulated state K and the select logic that decides what the next call sees and does. When the host language plays both roles, it is tempting to treat them as one substrate with one lifecycle. It is the wrong instinct. Measured by cross-task reuse value — how much a later, different task gains from keeping the part around — K and select sit at opposite poles, so a system that promotes them symmetrically gets one of them wrong.

The axis here is cross-task persistence: whether a part is worth lifting into a durable library so later tasks reuse it. That is distinct from within-run survival — whether K outlives its process or fits in memory — which the companion note treats separately and which can force K to be reified even when it has no cross-task value.

  • Run-state K is the answer to this task — source artifacts plus the relevance labels, summaries, and partial syntheses prior calls produced for it. Some of that is expensive to recompute, but recomputing it does not help the next task, because the next task asks something else. Its cross-task reuse value is near zero, so it should stay ephemeral across tasks: it may be checkpointed within a run for durability or capacity, but it is not promoted into the library.
  • select-strategies — the decomposition, partitioning, and aggregation patterns the scheduler applies — recur across tasks and are expensive to rediscover. Each is a small piece of control logic that took search to find, and the same shape pays off on the next task. These are the high-value promotion target, worth lifting into durable, tested library code.

RLM discards both halves after every query. Discarding K is correct: it is query-specific, so even when rebuilding it is costly, the cost buys nothing for the next, different query. Discarding the select-strategy with it is the loss — a decomposition the model searched for and got right is gone before the next query arrives, so the same search is paid again on a task where the same shape would have worked. The fix is not the opposite symmetry — reify everything, the durable-execution pole. It is to split the cross-task lifecycles: let K stay ephemeral, promote the recurring select-fragments.

The control structure: a test-gated orchestrator cache

Splitting the lifecycles turns the scheduler into an orchestrator cache with a test-gated write-back, run reuse-first rather than generate-first:

  1. On a task, search the library for a fitting tested orchestrator.
  2. Hit → reuse the verified code: cheap, deterministic, no re-derivation.
  3. Miss → generate the orchestration fresh (RLM-style); if it clears the promotion gate below and passes tests, write it back to the library.

Two distinct gates run here, and conflating them is the trap. Fit (step 1) is a selection judgment — does this stored strategy apply to the task at hand — and it does not get easier just because a fragment is tested. Trust (step 2) is what the tested qualifier buys: a cache of merely retained code is memoization, whereas a cache of verified code is what lets a later run rely on a fitting fragment without re-deriving and re-checking it. Tests certify that the fragment still does what it did, not that it suits a given task; selecting a fitting one remains a separate cost (see the retrieval problem below). Promotion is therefore movement up the verifiability gradient — from loose, model-authored REPL code toward deterministic library functions. That is the general shape of codification, and the loop it closes is deploy-time learning through the repo.

What this costs, and why promotion must be selective

The asymmetry does not make accumulation free; it tells you which half is worth paying for. Promoting select-fragments takes back, for those fragments, the governance burden that discarding everything avoids: provenance, approval, staleness, retirement, dependency drift, and a retrieval problem — once the library is large, finding the right orchestrator becomes its own selection cost, and naming noise grows. Keeping K ephemeral keeps the corresponding state-management burden off the table, so the bill is paid only on the reusable half, not on everything.

So the promotion gate cannot be "anything that worked." Passing tests is necessary — it is the trust gate above — but not sufficient: the gate must also admit only patterns that are stable, frequently recurring, and expensive to rederive, because codifying what the model will soon do better unaided is a net-negative trade. The persistence asymmetry justifies having a promotion path; the bitter-lesson boundary governs what crosses it. Run-state never crosses — not because it is cheap, but because it never recurs; only the costly, recurring control strategies do.

Where it lands

This is a third mode on the authorship axis. Two poles already exist: the model re-authoring select from scratch every run (RLM), and a programmer authoring it once up front (a hand-written host-language scheduler). The third is distinct from both because authorship is split across time — the model authors a fragment during a run, and a promotion step turns the recurring, tested ones into library code that later runs reuse. It is neither purely per-run nor purely up-front: the corpus of select-functions grows from execution. So it is the host-language scheduler made self-populating — built bottom-up from exploration rather than top-down by design — and the concrete form of the combined system the persistence-boundary comparison sketches.


Relevant Notes: