Ingest: Self-Revising Discovery Systems for Science

Type: kb/sources/types/ingest-report.md

Source: self-revising-discovery-systems-agentic-ai.md Captured: 2026-06-06 From: https://arxiv.org/html/2606.01444v1

Classification

Type: scientific-paper -- arXiv preprint (Wang & Buehler, MIT) with a formal framework plus a quantitative case study and an implemented system; methodology and a worked empirical result, not an essay. Domains: learning-theory, discovery, verification, typed-artifacts Author: Markus J. Buehler is an established MIT materials-science / mechanics researcher who publishes regularly on AI-for-science and graph-based scientific reasoning; the category-theory framing is the novel and least-validated part.

Summary

The paper develops a category-theoretic account of agentic scientific discovery in which a system's state is a copresheaf mapping artifact types to their current populations, and provenance is captured as the category of elements over that copresheaf. Its central move is a typology that separates three operations usually conflated: retrieval (fetch existing artifacts), search (explore within a fixed schema of admissible types and operations), and discovery (a verified regime transition that changes the admissible types themselves, transporting prior evidence into the new schema via a Kan extension). Commitment to a transition is decided by explicit gates — Minimum Description Length, AIC, peer review. Two systems instantiate the framework: Builder/Breaker, which discovers a new protein-mechanics interaction type ("mode-conditioned compliance") under MDL gates, and CategoryScienceClaw, a self-revising typed knowledge-computation graph where schemas, skills, gates, and discourse are themselves typed objects and morphisms updated across runs. Worth reading for anyone formalizing how an automated system can extend its own representational vocabulary rather than just answer within a fixed one.

Connections Found

The companion connect report found no agent-memory-system match (this is a theory/case-study paper, not a memory implementation) and clustered all candidates in the learning-theory area. The strongest tie is to known-target discovery benchmarks show reachability, not discovery closure: that note names a "missing practical loop" — a forward-looking verifier for unknown candidates — and the paper's gate-verified regime transitions (MDL/AIC/peer review) are a concrete instance of exactly that loop. Other evidence-grade ties: discovery is seeing the particular as an instance of the general (the schema-category revision formalizes "name the general structure, re-recognize the particulars under it"), ad hoc prompts extend the system without schema changes (the paper's search-within-fixed-schema vs. discovery-changes-schema distinction is the formal complement of that note's prompt-extension-vs-schema-change line), automated synthesis is missing good oracles (the gates are worked-example oracles), constraining during deployment is continuous learning (CategoryScienceClaw is deploy-time symbolic learning outside weights), and definitions/lineage (provenance-as-category-of-elements is an external formalization of lineage). A second-pass connect against the KB's Popper/Deutsch quality-theory cluster (which the original trace did not branch into) added two further evidence ties: first-principles reasoning selects for explanatory reach over adaptive fit — the paper's "distinguish novel structure from reparameterization" under MDL gating is a worked instance of Deutsch's adaptive-fit-vs-explanatory-structure split, a second independent example alongside the SuperARC compression gap that note already cites — and mechanistic constraints make Popperian KB recommendations actionable — the gated regime transition (propose schema change, subject to MDL/AIC/peer-review, commit only if it survives) is conjecture-and-refutation with a graded decorrelated-oracle panel, corroborating that note's "criticism must be structural" and "convergence depends on oracle quality" claims. What the source adds beyond existing KB content: a precise structural separation of retrieval/search/discovery, and a model of schema change with evidence-preservation guarantees (Kan transport) plus a failure mode (Kan obstruction: isolated new types get no transported content). The connect report also flags a cross-source compares-with to the GIANTS snapshot — a sibling discovery paper from the opposite oracle direction (retrospective target oracle vs. internally-verified transition).

Extractable Value

The retrieval / search / discovery typology as a structural distinction -- the KB currently has no note separating these three as operations that differ in kind (fetch vs. explore-within-schema vs. change-the-schema). This sharpens the discovery-vs-reachability boundary the KB already cares about and gives known-target-discovery-benchmarks a vocabulary for what its "missing loop" actually changes. High reach; transfers well beyond materials science. [deep-dive]
Discovery as a verified regime transition with evidence-preserving transport -- the operational unit isn't "generate a candidate" but "commit to a schema change, gated, while carrying old evidence forward." This operationalizes automated synthesis is missing good oracles and supports constraining during deployment is continuous learning with a concrete protocol. High reach. [experiment]
Kan obstruction as a named failure mode -- when a newly admitted artifact type is isolated (no morphisms connecting it to existing types), no prior evidence transports to it. This is a crisp, reusable warning for any KB that adds new types: a type with no links to the existing vocabulary inherits nothing. Directly relevant to KB type-system design. [just-a-reference]
MDL/AIC/peer-review as a graded gate set -- a worked example of oracles of differing strength deciding commitment, corroborating the oracle-strength-spectrum framing with a real discovery system rather than a thought experiment. [just-a-reference]
Provenance-as-category-of-elements -- an external, formal model of source-dependency tracking that the KB names as lineage; useful as evidence that lineage is a recurring formal need, not a Commonplace idiosyncrasy. [just-a-reference]
Two-strategy synthesis with GIANTS -- pairing this paper (internally-verified regime transition) with GIANTS (manufactured retrospective target oracle) yields a higher-order claim: there are two complementary ways to make discovery testable. The KB captures only the retrospective-oracle side today. [experiment]

Limitations (our opinion)

This is editorial opinion. The central risk is the one the type spec warns about for conceptual framings: the category-theoretic apparatus (copresheaves, Kan extensions, category of elements) may be naming structure rather than explaining it. The paper does not show that the formalism does work the empirical result couldn't have been gotten without it — Builder/Breaker's "mode-conditioned compliance" finding rests on MDL gating, which is a standard model-selection criterion that needs no category theory to apply. So the load-bearing contribution (gated discovery with evidence preservation) is separable from, and more defensible than, the categorical dressing; treat the two independently when citing. Evidence base is thin: essentially one case study (n=1 discovery in protein mechanics) plus one implemented system, with no baseline comparing the framework against a plain typed knowledge graph without the categorical machinery, and no demonstration that the framework generalizes beyond mechanics. The "discovery" claim is also internally graded by the system's own gates; whether a gate-passing regime transition constitutes genuine discovery or sophisticated search within a meta-schema is exactly the question known-target discovery benchmarks show reachability, not discovery closure raises, and the paper asserts rather than settles it. Finally, the snapshot is a condensed WebFetch summary, not the full paper body — extracted claims should be re-checked against the PDF before any load-bearing promotion. (Minor: the snapshot frontmatter says "materials-science case study" while the body says "protein mechanics"; both are Buehler's domain and refer to the same Builder/Breaker study.)

Recommended Next Action

Write a new kb/notes/ note in the learning-theory cluster that separates retrieval, search, and discovery as structurally distinct operations (item 1), framed around the schema-change-with-evidence-transport mechanism, and citing this snapshot as evidence plus GIANTS as a compares-with contrast (the two-strategy synthesis from item 6). This is the one durable gap the connect report flagged that no existing note covers, and it gives the dormant reverse-edges from known-target-discovery-benchmarks and automated-synthesis-is-missing-good-oracles a natural anchor. Defer the category-theory apparatus — per the limitations, it is a one-off until more categorical-framework sources arrive.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search