AgeMem

Type: ../types/agent-memory-system-review.md · Status: current · Tags: trace-derived

AgeMem is an agent memory system whose distinctive contribution is a memory-management policy learned by reinforcement learning rather than hand-written or model-prompted. Coverage here is doc-grounded — drawn from the paper, local snapshot, and ingest, with no implementation repository inspected — so the mechanisms below are reported, not observed in code.

Source documents: arXiv HTML paper; local snapshot Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for LLM Agents; local ingest Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for LLM Agents; analysis note memory management policy is learnable but oracle-dependent.

Reviewed document version: arXiv 2601.01885v1, captured locally 2026-03-08 and reconciled 2026-06-02.

Core Ideas

  • The policy is the product, not the storage unit. The reviewed sources report ordinary task-state facts in memory; the distinctive retained artifact is a learned controller deciding when to store, update, retrieve, summarize, filter, and delete. Value sits in the policy over operations, not in a novel memory representation.
  • A fixed operation set over LTM and STM. The paper exposes six tool actions: Add, Update, Delete over long-term memory; Retrieve, Summary, Filter over short-term/active context. The operations are hand-designed; the policy for invoking them is learned.
  • Learning is RL against task and memory-management rewards. The sources report a three-stage curriculum and step-wise GRPO over trajectories, with task completion as the dominant signal plus context-management and memory-management rewards. The paper reports Add usage increasing after training (0.92 to 1.64 per episode for Qwen2.5-7B).
  • Context efficiency via learned STM management. Active context is reportedly kept bounded by learned Summary and Filter behavior, with Retrieve bringing long-term facts into the current context when needed. The paper reports 3.1-5.1% token reduction while maintaining task performance; no implementation was inspected to verify runtime budgeting.

Artifact analysis

Claim-level (no code inspected):

  • Storage substrate: model-weights — the central retained controller is reported as an RL-trained policy in the agent model. The LTM facts are reported as entries in a long-term memory store, but the paper does not give an inspectable persistence implementation.
  • Representational form: prose symbolic parametric — the policy is distributed-parametric; the stored facts appear as prose/tool-argument records in a reported LTM store; STM is manipulated as active context text. The most consequential operative part is the least inspectable one.
  • Lineagetrace-extracted: the policy is derived from staged task trajectories / rollouts via RL; task, context, and memory-management rewards decide what becomes signal. Stored facts are authored at runtime by the policy. Changes in task distribution, reward design, or evaluation oracle would invalidate the learned policy claim.
  • Behavioral authority — two parts: the learned policy is a system-definition artifact consumed as a controller over memory actions; the stored LTM facts are knowledge artifacts consumed as context when retrieved. Effective quality and authority are reported by the paper, not verified from code.

Comparison with Our System

AgeMem and Commonplace sit at opposite ends of the curation-agency axis. AgeMem learns its curation policy and bakes it into weights — adaptive, but opaque and not incrementally refinable. Commonplace curates through human-gated, inspectable typed notes and links — high quality, low throughput, fully auditable. AgeMem improves from experience; Commonplace improves from judgment.

Borrowable Ideas

  • "When to store/retrieve" as a first-class learnable decision. Even without adopting RL, the framing — that curation timing is a policy, not an incidental side effect of writing — is worth carrying into how we think about promotion and read-back triggers. Needs a use case before any automation.
  • Operation-set + policy separation. Hand-crafted memory operations with a separable decision layer is a clean architecture; in our terms the decision layer is exactly the read-back/curation policy we under-specify. Not ready to build; conceptually useful.

Write side

Write agency: automatic — the reported learned policy invokes memory-management actions itself, including LTM Add, Update, and Delete, rather than leaving store changes only to a human authoring channel.

Curation operations: evolve — the reported Update action lets the policy modify existing long-term memories in response to the task trajectory; other operations are reported as tool actions but not described clearly enough to map to the curation vocabulary.

Trace-derived learning

AgeMem qualifies: it derives a durable artifact (the policy) from agent traces.

  • Trace source — staged task trajectories / rollouts covering LTM construction, STM control under distractors, and integrated task execution.
  • Extraction — RL training with step-wise GRPO; the oracle is task completion, augmented by context-management and memory-management rewards. That oracle dependence bounds where the approach transfers: open-ended domains without a clear completion signal lack the training reward.
  • Distilled form — distributed-parametric (the policy in weights); the stored facts remain a separate prose/key-value store.
  • Scope and timing — cross-task policy; learned offline in training, then deployed (staged, not online).
  • Survey placement — a clean trajectory-to-weights case in the trace-derived learning survey, flagged lower-confidence because the runtime is not code-inspected.

Read-back

Read-back: pull — memory re-enters action through the reported Retrieve operation, selected inside the agent policy rather than delivered by a separate unsolicited activation service. The paper also reports proactive retrieval during staged training, but still as memory tool use inside the policy loop; it does not document a relevance-gated before-action push path from an external memory layer, so there is no push-activation tag.

Curiosity Pass

  • The operation set is hand-crafted while the policy is learned — a hybrid that raises the question of whether the gains come from better memory decisions or merely more memory activity (the Add-rate increase is usage, not demonstrated usefulness).
  • Simpler alternative worth checking: how much of AgeMem's benefit a well-prompted (un-trained) policy over the same six operations would capture — i.e., is the RL the load-bearing part?

What to Watch

  • A reachable implementation. If the repository becomes available and is inspected, this promotes to an agent-memory-system-review and the four-field claims get verified against code.
  • Oracle-free or oracle-light variants. The task-completion dependence is the main transfer limit; any move to learn the policy without a clean completion signal would change how broadly the approach applies.

Relevant Notes