Ingest: The Y-Combinator for LLMs

Type: kb/sources/types/ingest-report.md

Source: the-y-combinator-for-llms-solving-long-context-rot.md Captured: 2026-06-11 From: https://arxiv.org/pdf/2603.20105

Classification

Type: scientific-paper -- arXiv preprint with formal definitions, theorems, experimental protocol, ablations, and references. Domains: computational-model, context-engineering, tool-loop, orchestration Author: Amartya Roy, Rasul Tutunov, Xiaotong Ji, Matthieu Zimmer, and Haitham Bou-Ammar. The paper is from IIT Delhi, Robert Bosch GmbH, Huawei Noah's Ark Lab, and UCL Centre for Artificial Intelligence; credibility comes from a formal paper structure and reported experiments, but the source is still a preprint and the implementation was not inspected here.

Summary

The paper introduces λ-RLM, a restricted Recursive Language Model runtime for long-context reasoning. Standard RLM stores the prompt in a REPL environment and lets the LLM write arbitrary Python to inspect, split, recurse, and compose results; λ-RLM keeps the prompt-as-environment idea but replaces free-form model-written control code with a typed library of deterministic combinators such as SPLIT, MAP, FILTER, REDUCE, CONCAT, CROSS, and PEEK. A planner chooses the task type, branching factor, leaf threshold, and composition operator, then executes a fixed recursive program while neural calls happen only at bounded leaves. The paper proves termination, cost, and accuracy-scaling bounds under simplifying assumptions, then reports that λ-RLM beats normal RLM in 29 of 36 model-task accuracy comparisons across four long-context tasks and nine open-weight models, with 3-6x latency reductions among recursive methods. For this KB, the paper is most valuable as a formal, empirical candidate for the existing claim that reliable long-context agent work comes from symbolic control over bounded LLM calls, not from larger context windows alone.

Connections Found

The companion connect report found a tight computational-model cluster rather than a reference-doc or related-system target. The strongest connection is to RLM has the model write ephemeral orchestrators over sub-agents: λ-RLM accepts that note's RLM mechanism, then attacks its weakest point by making the inner orchestration language typed and pre-verified instead of arbitrary model-written Python. It also grounds Scheduler-LLM separation exploits an error-correction asymmetry and Bounded-context orchestration model, because the paper explicitly moves recursion, splitting, filtering, aggregation, depth, and cost accounting to deterministic symbolic control while leaving semantic leaf judgments to the model. The connect report also flags Decomposition heuristics for bounded-context scheduling, Agent orchestration occupies a multi-dimensional design space, and RLM, Tendril, and llm-do place symbolic work at different persistence boundaries as likely consumers.

Extractable Value

  1. Restricted orchestration language as an axis independent of persistence -- The existing RLM cluster distinguishes ephemeral REPL code from durable generated tools/workflows. λ-RLM adds another independent variable: whether the orchestration language is arbitrary model-authored code or a small typed combinator vocabulary. This sharpens agent orchestration occupies a multi-dimensional design space: verifiability can improve without changing cross-session persistence. [quick-win]

  2. Formal support for scheduler-LLM separation -- The paper's central design is exactly the KB's clean split: symbolic runtime for bookkeeping/control, neural calls for bounded semantic leaf work. Its termination theorem and cost recurrence are not final truth for all agent systems, but they make the separation concrete enough to cite in Scheduler-LLM separation exploits an error-correction asymmetry and Bounded-context orchestration model. [quick-win]

  3. Empirical evidence that typed control helps most when model coding skill is weak -- λ-RLM wins all weak-tier comparisons and 11/12 medium-tier comparisons, but only 6/12 strong-tier comparisons; CodeQA is where strong free-form RLM wins most often. This supports a nuanced version of the KB's constraining story: symbolic constraints substitute for model control skill up to a point, while highly capable code models can sometimes recover expressivity that a fixed combinator library lacks. [just-a-reference]

  4. Partitioning can be part of the scheduler's formal cost model -- The paper treats branching factor and leaf threshold as planned parameters, not ad-hoc chunking. Its claim that random k hurts accuracy by 16.8 points on the Qwen3-8B/OOLONG ablation is useful evidence for Decomposition heuristics for bounded-context scheduling: split shape is not only a context-fit detail, it changes both accuracy and latency. [experiment]

  5. Fixed combinators expose the expressivity/reliability trade-off -- The seven normal-RLM wins mostly occur on strong models or CodeQA, where free-form repo navigation, adaptive chunking, and backtracking matter. That is a useful limitation for any future Commonplace note recommending typed runtimes: the gain comes from restricting control flow, and the cost is loss of task-specific control programs unless the combinator library grows. [quick-win]

  6. Potential synthesis target for the RLM/orchestrator cluster -- Standard RLM, λ-RLM, Claude Code dynamic workflows, Tendril, and llm-do now give enough examples to write a sharper synthesis about orchestration authorship, representational form, persistence, and verification. The likely claim is that "model writes the scheduler" is too coarse: what matters is the scheduler language, who selects it, what is verified, and what persists. [deep-dive]

Limitations (our opinion)

This is a preprint, and the implementation was not inspected during ingest. The paper reports that the complete implementation is open-sourced, but this report only uses the PDF snapshot, so claims about the runtime should not be treated as code-grounded.

The formal guarantees rest on simplifying assumptions: deterministic total combinators, bounded model halting on leaf calls, monotone costs, and accuracy/composition assumptions. These are useful for clarifying the design, but they do not automatically transfer to soft-oracle Commonplace tasks such as synthesis, note review, or connection judgment, where composition correctness is harder to define.

The empirical benchmark suite is broad enough to be useful but still design-aligned with λ-RLM's strengths: search, aggregation, pairwise reasoning, and code QA all admit explicit splitting and recomposition. The paper's own failure cases show the simpler account: fixed combinators win when the right decomposition is in the library, while normal RLM can win when creative task-specific code or code-aware navigation matters.

The k* = 2 optimal partition theorem appears under one stated simple cost model, while Algorithm 3 later shows a planner formula that can yield other branching factors. Treat this as a formal candidate and ablation-supported design signal, not as a universal chunking rule.

The paper frames λ-calculus as the foundation, but the operational contribution may be the restricted typed combinator runtime rather than λ-calculus specifically. A less ornate host-language runtime with the same fixed combinators, planner, and bounded leaf calls might deliver most of the benefit. That matters for this KB because our design guidance should promote the mechanism, not the mathematical branding.

~~Update RLM, Tendril, and llm-do place symbolic work at different persistence boundaries to add λ-RLM as an RLM variant: ephemeral per-run symbolic orchestration, selected by a planner, constrained to a pre-verified typed combinator vocabulary. Link this ingest as evidence, and explicitly separate "orchestration language restriction" from "persistence boundary" so the note does not keep collapsing verification and accumulation into one axis.~~

Completed 2026-06-11: the comparison note now treats λ-RLM as a restricted RLM variant and separates control-language restriction from persistence boundary.