RLM has the model write ephemeral orchestrators over sub-agents

Type: note · Status: seedling · Tags: computational-model

Recursive Language Models (RLMs) have the LLM write and execute code in a REPL, with a recursive_llm(query, context) primitive that spawns fresh LLM calls. The pattern maps directly onto the symbolic scheduler model:

Model component RLM implementation
Symbolic state K Python REPL namespace (variables)
Bounded LLM call recursive_llm(query, context)
Inner scheduler The code the LLM writes
select + prompt constructor The LLM's decomposition logic expressed as code

What RLM gets right

RLM has two layers of symbolic orchestration. The outer layer is a traditional tool loop: it calls the model, the model requests code execution, the loop runs it in the REPL and feeds the result back. Inside the REPL, the model writes its own orchestrators — symbolic compositions of agents via recursive_llm().

The key move is that the model writes the orchestrator rather than being it. A standard tool loop consults the model at each step: "what should we do next?" RLM has the model emit the plan as code — results = [recursive_llm("summarize", chunk) for chunk in chunks] — so dispatch decisions are authored by the model but executed on a symbolic substrate. Bookkeeping for the inner orchestration lives in Python variables and the REPL stack, not in the conversation. This avoids the degraded scheduler failure mode where bounded context is wasted on bookkeeping.

The model-authored orchestrators are structurally the same thing a programmer would write by hand — exactly the symbolic orchestration over sub-agents that the tool loop argument calls for. The elegance of RLM is how it packs this into the tool-loop model: the REPL tool gives the model a substrate for writing orchestrators without anyone having to write reentrant framework code.

Ephemerality

The orchestrators are ephemeral. A brilliant decomposition strategy discovered for one query is gone before the next query arrives. In the framework this KB develops elsewhere — deploy-time learning, codification, spec mining — learning happens through the repo: generated artifacts enter version control, get tested, and become reusable infrastructure. RLM opts out of this entire mechanism by discarding its artifacts.

This is a genuine trade-off, not a deficiency. The repo-as-learning-substrate approach carries real costs (approval complexity, maintenance burden, the risk of codifying vision features). RLM avoids much of that burden. If the REPL is restricted to pure computation with no side effects, the approval problem becomes much simpler because the generated code is not directly changing the world; the remaining concerns are mostly about cost, resource use, and output quality rather than side effects. And it's possible that accumulation will come through other paths: improved model capabilities that make re-derivation cheap, decomposition strategies learned in weights rather than repo artifacts, or mining the ephemeral code from execution logs — an out-of-band process could gather the generated orchestrators together with their prompts and results, and distill recurring patterns into reusable knowledge.


Relevant Notes: