Workshop: Agent Memory Design

Question

What would an ideal memory system for agents look like if storage were treated as cheap, session logs were captured in full, and almost all design effort moved from storage policy to retrieval and activation?

Why this workshop exists

This workshop starts from a strong working premise: the binding constraint in agent systems is not disk, it is bounded context. That suggests a design inversion:

  • store aggressively, including session logs, intermediate artifacts, corrections, and observations
  • spend design intelligence on retrieval, activation, promotion, and graduation

The goal is to work out what architecture makes that premise usable rather than noisy. A store-everything system without good retrieval is just a larger haystack.

The workshop is grounded in the KB's existing memory and context-engineering theory, plus the comparative review of agent memory systems. It focuses especially on the value of session logs as a substrate for decision provenance, correction consolidation, preference mining, procedure extraction, and ADR drafting.

Current grounding

What this workshop needs to resolve

  1. What layered architecture makes "store everything, load selectively" practical?
  2. What retrieval methods surface action-relevant knowledge rather than just answer semantic queries?
  3. What useful signal types can be extracted from session logs, and which of them have clear enough oracles to automate?
  4. Where is the boundary between the memory system and standard project artifacts like code, docs, ADRs, and CLAUDE.md?
  5. What graduation pathways turn accumulated observations into durable artifacts without creating premature maintenance burden?

Working hypotheses

  • Session logs are a primary memory substrate, not disposable exhaust.
  • Raw traces should be retained for provenance but not loaded directly into active context.
  • Observation and episode layers are likely the missing middle between raw traces and curated library notes.
  • Search is a better fit for lower memory layers; navigation is a better fit for synthesized artifacts and library notes.
  • The memory system should feed manual distillation workflows like ADRs and conventions before trying to replace them.

Starter artifacts

Open questions

  • How inspectable should the retrieval policy be versus how much should be learned from usage?
  • Can composite weak signals from session logs produce a practical soft oracle for promotion and graduation?
  • How should episode boundaries be detected when work spans multiple sessions or interleaves topics?
  • What storage medium fits each layer best without making the system operationally awkward?
  • When does storing everything become retrieval pollution rather than useful substrate?