What the matrix shows across 148 agent memory systems

Type: kb/types/note.md · Tags: agent-memory

Across 148 code-grounded reviews, each system is classified on the same axes: storage substrate (where memory lives), lineage (how retained state was derived), behavioral authority (what force memory has), and how memory reaches the next action. The classifications live in systems.csv and the comparison table. Read together, they show that what divides the collection is less the storage substrate than how memory is activated and verified. Four findings stand out.

Storage predicts little by itself

Files-family substrates — plain files plus repo — lead at 104 of 148 systems (70%), but that number needs care. The roster was assembled largely from the llm-wiki discussions that followed Karpathy's sketch of the idea, and those over-sample file-based systems, so the majority is a fact about this collection, not the field. The durable point is the spread. Substrate runs from Agent-R, a checkpoint-learning system whose "memory" is fine-tuned weights, to supermemory, a hosted memory API — and across that range, substrate alone says little about whether memory is authored, trace-extracted, pushed, pulled, enforced, or behavior-tested. It is an operational floor, not the architectural fork it is usually treated as.

Capture and push usually travel together

Whether a system learns automatically and whether it injects memory unasked turn out to be tightly coupled: 81 of 97 trace-learning systems (84%) push memory, and 35 of 51 pull-only systems (69%) are not trace-learning. So the collection splits into two camps — an automatic camp (81 of 148 — learns from traces and pushes) and a curated pull-only camp (35 of 148 — does not mine traces and waits to be asked).

This finding uses the stricter trace_learning learning field, not the broader lin_trace_extracted lineage field. Lineage says that some retained artifact came from traces; learning says the system automatically distills traces into durable behavior-shaping memory. The gap is meaningful: in the file/repo slice, 74 of 104 systems retain trace-extracted artifacts, but only 62 of 104 have a qualifying trace-learning path. The rest keep traces as evidence, recovery state, continuity, or debugging material rather than as distilled lessons, rules, skills, validators, embeddings, adapters, rankers, or other learned memory.

The instructive exception is not graph storage in general but one visible pull-only graph pattern. Graphiti, Cortex, and dense-mem — three graph-memory systems — all capture automatically yet stay strictly pull-only. A graph is expensive to build and cheap to query, so they can afford to wait for an explicit lookup rather than guess what to push. Other graph-backed systems do push, which narrows the lesson: automatic capture does not force automatic activation when the retained structure has a strong query interface.

Automatic activation is shipped on faith

Pushing is the survey's most common capability and its least verified. Of the 97 systems that push memory, 75 (77%) use a coarse always-load, session-start, or generic recall path rather than selecting for the current instance, and an LLM relevance judgment appears in only 17 of 97 (18%). Scarcer still: just 5 of 97 pushing systems test whether injected memory actually changed behavior — the evaluation-first systems Reflexion, Synapptic, KBLaM, auto-harness, and Meta-Harness. Everywhere else, storage is simply assumed to imply activation.

Full lifecycle curation is rare

Capture is common; lifecycle maintenance is uneven. Of the 139 systems that write automatically, 19 run no curation operation at all — pure acquisition, never touching what is already stored — while only 7 run all seven tracked operations: the full-lifecycle systems ATLAS, Clude, GBrain, LACP, Origin, Stash, and WUPHF. Most systems sit between those endpoints, so the pattern is not a clean barbell but partial maintenance: promotion, evolution, consolidation, deduplication, and synthesize flags — attempts to derive new material from stored entries — appear often enough, while decay is much rarer. The hard part is not writing memory; it is giving stored memory a complete lifecycle.

For us

Commonplace sits in the curated pull-only camp: authored, files-family, review-heavy, and weak on scale. The data argues not for defecting to automatic capture but for automating lifecycle curation while keeping durable memory authored. Faithfulness testing is the cheap edge: nearly absent across 148 systems, so checking whether a loaded note actually changed an agent's behavior would measure what almost the whole field only assumes.

Relevant Notes:

knowledge-storage-does-not-imply-contextual-activation - grounds: the read-back / activation distinction the second and third findings rest on
agent-memory-is-a-crosscutting-concern-not-a-separable-niche - see-also: why the dividing axes span storage, retrieval, and learning at once
trace-learning-techniques-in-related-systems - see-also: the focused survey of the automatic camp this matrix quantifies

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search