Agent memory is a crosscutting concern, not a separable niche
Type: note · Status: seedling · Tags: learning-theory, computational-model
Many systems self-identify as "memory systems" — Mem0, Graphiti, Letta, ClawVault, cass-memory, CrewAI Memory, and others in the related-systems index. The label suggests memory is a separable subsystem: build a memory layer, plug it in, the agent remembers things. The comparative review of eleven such systems reveals that the interesting design questions are not about memory at all — they are about context engineering, learning theory, and action capacity, wearing a memory costume.
Memory decomposes into three problems
Storage is cheap and effectively solved. Text is small. Disks are large. Appending session logs, observations, and artifacts to a filesystem or database is a well-understood engineering problem. The comparative review confirms that storage format (files vs database, vector store vs graph) is a consequential architectural choice but not the hard problem — it follows from what you're trying to do with the stored material, not the other way around.
Retrieval and activation is the hard problem, and it IS context engineering. The scarce resource in agent systems is context — the finite window through which the agent receives instructions, accesses knowledge, and reasons toward action. Memory's value is not what's stored but what gets loaded into context at the right moment. The activation gap — the regime where knowledge is stored but not surfaced when it matters — is a context engineering problem: cue match, priority arbitration, and commitment are all about what enters the bounded call and in what frame.
Learning from experience — extraction, promotion, graduation of knowledge from session logs to durable artifacts — IS the learning theory cluster. Extracting corrections from session logs is accumulation. Promoting a recurring preference to a CLAUDE.md entry is constraining. Compressing fifteen methodology sessions into one procedure is distillation. Recognizing that three unrelated failures share a common cause is discovery. The oracle problem — evaluating whether a memory operation improved the system — is the same oracle problem the KB's learning theory identifies as the bottleneck for automated knowledge improvement.
And wrapping around all three: memory serves action capacity, not just retrieval. A memory system that only answers questions is solving a narrow slice. The full problem is making agent actions more competent across execution, classification, planning, communication, and pattern recognition — which is contextual competence under bounded context.
The comparative review already shows this
The comparative review's six dimensions — storage unit, agency model, link structure, temporal model, curation operations, extraction schema — span all three decomposed problems. Storage unit is the storage problem. Agency model and extraction schema are the learning problem (who decides what to remember, and how). Link structure and temporal model are the retrieval problem (how do you find and navigate stored knowledge). The review is not really a "memory systems" comparison; it is a comparison of how systems manage knowledge persistence, activation, and learning, viewed through the memory lens.
The systems themselves confirm the decomposition by specializing differently:
- Mem0 mostly solves storage + retrieval (facts in, facts out via vector similarity)
- AgeMem mostly solves learning (RL-trained policy for when to store/retrieve)
- ExpeL and Voyager mostly solve learning (trajectory-to-rule extraction, skill promotion)
- ClawVault and Cludebot span learning + retrieval (lifecycle management, reflection pipelines, promotion)
- Agent Skills mostly solves retrieval/activation (loading the right instructions at the right time)
No system solves all three well — which is the agency trilemma: no system combines high agency, high throughput, and high curation quality. The trilemma exists precisely because the problems are different and optimizing one trades off against another.
The runtime decomposition predicts this
The scheduler / context engine / execution substrate decomposition of agent runtimes predicts where memory's subproblems land:
- Storage belongs to the execution substrate (persistent world outside the model)
- Retrieval/activation belongs to the context engine (what enters each bounded call)
- Learning belongs to neither cleanly — it is a cross-component concern that reads from the execution substrate, writes to it, and improves the context engine's selection function over time
Memory is not a fourth component alongside these three. It is an aspect of all three, which is why treating it as a standalone subsystem produces systems that solve one subproblem well and ignore the others.
What this means for reviewing systems
The related-systems index contains 70+ systems reviewed with a consistent format. The current review template captures enough raw material to support crosscutting analysis — the comparative review already demonstrated this. No template change is needed to accommodate this observation.
What changes is interpretation. When reviewing a system that calls itself a "memory system," the review should assess which subproblem(s) it actually addresses:
- Does it solve storage? (Format, persistence, versioning — usually yes, usually uninteresting)
- Does it solve retrieval/activation? (How does stored knowledge enter the agent's context? Progressive disclosure? Cue-based? Always-loaded?)
- Does it solve learning? (Does the system improve from experience? What oracle does it use? What artifacts does it produce?)
- Does it serve action capacity? (Does it make the agent's actions more competent, or just its answers more accurate?)
Systems that address only one subproblem are useful components, not complete memory solutions. Systems that address all three are doing context engineering and learning theory under a memory label — and should be recognized as such.
Relevant Notes:
- context efficiency is the central design concern — grounds: context scarcity is why retrieval/activation is the hard problem, not storage
- knowledge storage does not imply contextual activation — grounds: the activation gap is what makes memory a context engineering problem
- agentic memory systems comparative review — evidence: six dimensions span storage, retrieval, and learning; the agency trilemma exists because the subproblems trade off against each other
- agent runtimes decompose into scheduler, context engine, and execution substrate — grounds: the runtime decomposition assigns memory's subproblems to different components, predicting that memory is crosscutting
- learning theory index — grounds: extraction, promotion, and graduation are instances of accumulation, constraining, distillation, and discovery
- claw learning loops must improve action capacity not just retrieval — extends: memory serves contextual competence across five action modes, not just question-answering
- a good agentic KB maximizes contextual competence — grounds: the three properties (discoverable, composable, trustworthy) apply to memory-stored knowledge just as they do to authored notes
- session history should not be the default next context — exemplifies: the "store more than you load" principle is the storage/retrieval split in action
- automating KB learning is an open problem — grounds: the oracle problem for memory management is the same oracle problem as for KB learning
- memory management policy is learnable but oracle-dependent — exemplifies: AgeMem solves the learning subproblem (policy) while leaving retrieval mostly to base-model instruction following