Agent Memory Systems

Type: index

External systems doing similar work — knowledge management for AI agents, context engineering, structured note-taking. We track these not just to borrow ideas but to watch how they evolve. Convergence across independent projects is a stronger signal than any single design argument.

Two coverage tiers. Systems with open-source repos get the deep path: clone the repo, read the code, write a review note here. Systems known only from a README or paper get the lightweight path: snapshot a single page into kb/sources/, run /ingest, and optionally add a standard note under source-only/ when the system needs a stable place in this collection. The comparative review synthesises across both tiers. Database-backed memory systems (Mem0, Graphiti, Letta, A-MEM, AgeMem) currently have only lightweight coverage via ingest reports in kb/sources/.

Systems

  • ACE — playbook-learning loop with generator, reflector, and curator roles; strongest nearby artifact-learning analogue to Autocontext, with bullet-level helpful/harmful counters but an append-heavy maintenance path
  • Agent-R — iterative self-training pipeline that mines MCTS search trees into corrected conversation traces and fine-tuning data; clearest search-to-weights learning system in this queue
  • Agent Skills for Context Engineering — skill-based context engineering reference library loaded as agent guidance; strong on operational patterns, no learning theory
  • AgeMem — source-only paper coverage of an RL-trained LTM/STM memory-management policy; trace-derived trajectory-to-weights case, but no local code-inspected review
  • Archie — Arch Linux config repo with Stow-managed multi-root deployment, Incus dev VMs, and agent-executable work-item docs; strong operational packaging, no real knowledge-learning loop
  • Ars Contexta — Claude Code plugin that generates knowledge systems from conversation; ancestor of our KB, upstream source for link semantics and title-as-claim. Includes the "Agentic Note-Taking" article series (@molt_cornelius) — first-person agent testimony from inside the system
  • Atomic — database-backed personal KB that stores markdown atoms in SQLite/Postgres, enriches them into embeddings/tags/semantic edges, and maintains per-tag wiki articles; strongest nearby database-first counterexample with a real derived wiki layer
  • auto-harness — minimal benchmark-driven outer-loop for improving one coding-agent file with regression-suite promotion and held-out-score gating; strongest reviewed example so far of hard-oracle workshop automation kept intentionally small
  • Autocontext — closed-loop control plane for iterative agent improvement via multi-role orchestration (competitor/analyst/coach/architect), tournament evaluation, accumulated playbooks, and MLX distillation; strongest reference for automated iterative learning loops, but context "compilation" is concatenation with budget-aware trimming, not transformation
  • Awesome Agent Memory — TeleAI-UAGI curated bibliography of agent-memory products, papers, benchmarks, surveys, articles, and workshops; useful source-discovery map, not an implemented memory system
  • Binder — local-first typed knowledge graph with markdown/YAML projections, schema-as-data, and immutable transactions; clearest reviewed example here of database-first structure surfaced as editable files
  • browzy.ai — terminal personal knowledge base that compiles raw sources into a markdown wiki, uses SQLite FTS as a derived retrieval layer, and writes lightweight session-derived digests and insight drafts
  • ByteRover CLI — source-available coding-agent CLI with file-backed .brv/context-tree, tiered retrieval, live scoring/review/manifest layers, git-like context-tree VC, and four connector modes; strongest production reference so far for packaging file-backed memory into other coding-agent environments, though automatic archiving still looks less central than the paper's broader lifecycle framing
  • cass-memory — cross-agent procedural memory with three-layer cognitive architecture (episodic/working/procedural), confidence-decayed playbook bullets, and trauma guard; closest production sibling to ACE's playbook-learning loop, with genuine cross-agent session mining
  • ClawVault — TypeScript memory system with scored observations, session handoffs, and reflection pipelines; has a working workshop layer where we have theory, strongest source of borrowable patterns for ephemeral knowledge
  • Claude Context Guard — Claude Code continuity scaffold built from safeguard files, prompt-defined recovery skills, and light hooks; strongest reviewed example so far of workshop-state preservation without a dedicated runtime
  • Cludebot — Generative Agents-inspired memory SDK with five-type taxonomy, type-specific decay, six-phase dream cycles (consolidation + compaction + contradiction resolution + action learning), entity knowledge graph, Hebbian co-retrieval reinforcement, and clinamen anomaly retrieval; richest reviewed trajectory-to-lesson learning loop, heavily oriented to a social-bot use case
  • Cognee — pipeline-first knowledge engine (add/cognify/memify/search) with Pydantic-schema graph extraction, poly-store backends (graph + vector + relational), and multi-tenancy; strongest database-side counterexample to files-first architecture, but treats knowledge as a data engineering problem rather than a curation problem
  • CocoIndex — Rust-backed incremental indexing framework with a Python dataflow DSL, Postgres tracking tables, and broad vector/graph target connectors; strongest reviewed example so far of derived-index maintenance as a layer below the primary knowledge substrate
  • Context Constitution — Letta’s instruction-first governance corpus for agents, treating context management as identity, memory, and continuity policy; strongest reviewed case of a related system defined mainly by doctrine rather than code
  • CORAL — multi-agent coding harness with per-agent git worktrees, eval-gated attempt tracking, checkpointed shared notes/skills, and heartbeat prompts; clearest lightweight open-source outer loop for collaborative code search with artifact sharing
  • cq — Mozilla.ai's local-first shared agent knowledge commons with SQLite local/team stores, approval-gated team sharing, and plugin-packaged query/propose/confirm loop; strongest reviewed reference so far for lightweight cross-agent operational learning, though the richer trust and guardrails layers remain mostly conceptual
  • CrewAI Memory — unified vector-memory for agent crews with LLM-driven scope inference, composite scoring, and consolidation; sophisticated retrieval infrastructure but no learning theory, treating memory as plumbing rather than a knowledge medium
  • Decapod — Rust governance kernel for AI coding agents with proof-gated completion, workspace isolation, and 120+ embedded constitution documents; strongest reference for hard-oracle verification in agent workflows, though constitution claims transformation where the code primarily relocates
  • DocMason — repo-native document-analysis workspace with staged/published KB boundaries, multimodal evidence channels, provenance tracing, and sync-time promotion of host interaction logs into published memories
  • Dynamic Cheatsheet — test-time adaptive memory with cumulative cheatsheet carryover and optional retrieval-synthesis variants; strong artifact-learning baseline, but the actual maintenance path is whole-document rewrite rather than structured mutation
  • engraph — Obsidian vault server with SQLite hybrid index, wikilink graph expansion, section-level writes, and local MCP/HTTP surfaces; strongest local-first derived index over a human note substrate
  • EQUIPA — multi-agent coding orchestrator with git-worktree dev/test loops, SQLite run memory, trace-derived lessons/rules/prompt tuning, and partial training-data export from the same execution traces
  • ExpeL — cross-task experiential learning pipeline with separate trajectory gathering, rule extraction, prompt-time trace retrieval, and explicit ADD/EDIT/REMOVE/AGREE rule maintenance; clearest trajectory-to-rule artifact-learning example in this queue
  • Exocomp — Go coding-agent harness with role-scoped tools, sandboxed execution, and file-backed bug/changelog coordination; execution controls are real, but planning and sub-agent workflows are still stubbed
  • Fintool — AI agent for professional investors; S3-first with derived PostgreSQL, markdown skills with copy-on-write shadowing, ~2000 eval test cases; strongest production-scale evidence for filesystem-first at commercial grade (lightweight coverage only — ingest report, no repo review)
  • GBrain — personal-brain CLI and MCP layer that indexes markdown-derived compiled-truth/timeline pages into Postgres+pgvector, with agent skillpacks for trace-to-entity enrichment and brain maintenance
  • G-Memory — multi-agent memory harness with state-graph trajectory capture, task-neighborhood retrieval, and scored text insights; strongest reviewed example so far of mixed memory substrates inside one benchmark agent system
  • getsentry/skills — Sentry's shared skills repo with a skill-writer meta-skill that codifies the skill creation process: source-driven synthesis with depth gates, labeled iteration, description-as-trigger optimization, and the Agent Skills cross-tool spec; strongest reference for how to systematically create and improve agent skills
  • Graphiti — temporally-aware knowledge graph with bi-temporal edge invalidation; strongest counterexample to files-first architecture and strongest temporal model in the surveyed systems (lightweight coverage only — ingest report, no repo review)
  • Hindsight — biomimetic agent memory with LLM-driven fact extraction, four-way parallel retrieval (semantic + BM25 + graph + temporal), auto-consolidation into observations, and agentic reflection; strongest production evidence that three-space memory separation yields measurable retrieval gains (LongMemEval SOTA)
  • Hyalo — Rust CLI for Obsidian-compatible markdown vaults with single-pass scanning, ephemeral snapshot indexes, mutation-safe link operations, and one-command Claude bootstrap
  • HyperAgents — self-referential code-agent evolution harness with diff archives, Docker lineage replay, staged benchmark evaluation, and scored parent selection; strongest reference here for outer-loop self-editing over executable agent code, though the checked-in meta agent is much thinner than the framing suggests
  • LACP — local agent control plane with risk-tier routing, Claude hooks, Obsidian memory automation, and provenance receipts; strongest reviewed reference for governance-heavy local agent operations around existing CLIs
  • LLM Wiki (kenhuangus) — executable local-first markdown wiki pipeline with Python ingestion, LLM extraction/merge, BM25 query, monitors, FastAPI/React UI, and a partial prompt-optimization loop; strongest nearby contrast to the promptware-only LLM Wiki protocol
  • LLM Wiki — Claude Code plugin and portable AGENTS protocol for topic-isolated compiled markdown wikis; strongest nearby reference for packaging a whole knowledge-system workflow as prompt artifacts rather than executable software
  • Letta — agent-self-managed three-tier memory hierarchy using OS analogy (main context ≈ RAM, archival ≈ disk, recall ≈ conversation log); strongest existing exemplar of the agent-self-managed agency model (lightweight coverage only — ingest report, no repo review)
  • MentisDB — hash-chained semantic memory ledger with additive ranked retrieval, agent key registry, and immutable skill versioning; strongest reviewed example here of service-shaped durable memory plus a real skill-lifecycle layer
  • Mem0 — two-phase add pipeline (extract facts + LLM-judged CRUD reconciliation); purest production example of automated accretion-without-synthesis in the surveyed systems (lightweight coverage only — ingest report, no repo review)
  • Memori — Python/TypeScript SDK and hosted memory layer with LLM-client interception, entity/process/session scoping, BYODB storage, conversation/agent-trace augmentation into facts/triples/summaries, and compact prompt-time recall
  • MemPalace — local-first memory system with verbatim Chroma drawers, wing/room retrieval priors, a sidecar SQLite fact graph, and optional AAAK compression; strongest reviewed reminder so far that raw storage plus good retrieval can outrun heavier extraction stories
  • MiroShark — document-to-social-simulation stack with Neo4j graph extraction, cross-platform round memory, heuristic belief drift, and ReACT reporting; strongest nearby reference for graph-grounded simulation loops
  • Napkin — Obsidian-vault CLI with NAPKIN.md pinned context, TF-IDF overview maps, agent-shaped search defaults, and pi-based auto-distill; strongest reference for adapting a mainstream human note substrate into an agent-facing memory interface
  • Nuggets — Pi-coupled personal memory assistant with local HRR nugget files, chat-channel scheduling, and a MEMORY.md promotion bridge; strongest reference so far for tiny file-backed scratch memory, though the promotion loop is only partially wired
  • o-o — polyglot HTML/bash living-document system where each file carries its own update contract, rendering, source cache, and Claude dispatch; strongest reviewed example of the file-as-app pattern
  • OpenSage — Google ADK-based agent framework with runtime subagent creation, AI-written tools, Neo4j graph memory, Docker sandbox isolation, agent ensemble coordination, and RL training integration; strongest reference for self-modifying agent topology, but knowledge structure is flat and the self-programming claims outrun the implementation
  • OpenClaw-RL — live-RL framework that trains from next-state signals; TODO: repo now exists, so this should get a repo-backed review rather than source-only coverage
  • OpenViking — ByteDance/Volcengine's context database with filesystem-paradigm virtual directories, L0/L1/L2 tiered loading, hierarchical recursive retrieval, and session-driven memory extraction; first production system to make progressive disclosure a native storage primitive, but the "filesystem" is a metaphor over a database, not actual files
  • Pal — Agno-based personal knowledge agent with a dual split between routing metadata, session-derived operational learnings, structured SQL state, and a compiled wiki; strongest reviewed example so far of "map versus compass" memory separation inside a live assistant runtime
  • Pi Self-Learning — pi extension with automatic task-end reflection, scored learnings index, and context injection; purest implementation of the automated mistake-extraction loop, but the reflection pipeline primarily relocates rather than transforms
  • Playground — TribleSpace-backed shell-first agent runtime with branch-separated cognition/archive/memory, unified chat-log importers, and budget-adaptive temporal memory; strongest reviewed example here of append-only event storage plus synthetic memory turns
  • ReasoningBank — reasoning-as-memory pipeline that extracts structured memory items from both successful and failed trajectories, retrieves by embedding similarity, and proposes test-time scaling via parallel trajectory comparison; sits between Reflexion (simpler) and ExpeL (richer lifecycle) on the artifact-learning spectrum
  • Reflexion — verbal reinforcement loop that turns failed attempts into short natural-language plans; important early trajectory-to-artifact precedent, but with a much thinner memory lifecycle than newer systems
  • REM — four-database episodic memory service (Postgres + Qdrant + Neo4j + Redis) with LLM-driven consolidation from episodes to scored semantic facts and temporal graph expansion at retrieval; heaviest infrastructure footprint among reviewed systems with the thinnest knowledge transformation layer
  • SAGE — BFT-branded agent memory with CometBFT consensus, Ed25519 signing, application-level validators (sentinel, dedup, quality, consistency), confidence decay, and AES-256-GCM encryption; the consensus framing is ceremony around a deterministic validation pipeline in single-node mode, but the validation gate pattern and domain-scoped RBAC are genuinely useful
  • Semiont — document-grounded annotation kernel with W3C annotations, git-backed events, working-tree URIs, and shared human/agent flows; strongest example here of annotation-first KB infrastructure
  • sift-kg — LLM-powered document-to-knowledge-graph pipeline with schema discovery, human-gated entity resolution, and interactive visualization; strongest reference for extraction-first knowledge construction and confidence aggregation
  • Siftly — Next.js + SQLite ingestion system with deterministic-first enrichment, resumable stage markers, and hybrid retrieval; strongest reference so far for high-volume source loading patterns
  • SkillNote — self-hosted skill registry with dual version tracks, live MCP projection via PostgreSQL NOTIFY, and agent-submitted ratings; strongest reviewed reference so far for skill hosting and distribution as a product, though the offline sync and install stories are narrower than the README implies
  • Spacebot — Rust concurrent agent framework with code-level symbolic scheduling (cortex), context-forking branches, typed memory with graph edges and hybrid search; cleanest production implementation of the bounded-context orchestration model among reviewed systems
  • Supermemory — hosted memory platform with an open integration layer (MCP server, multi-framework SDK wrappers, graph UI); strongest reference for MCP-first distribution and prompt-middleware ergonomics, with core memory logic mostly behind hosted /v3 and /v4 APIs
  • Synapptic — Python CLI that mines Claude Code transcripts into weighted user/guard profiles, benchmarks guard usefulness with per-model ablations, and compiles the result into assistant-specific memory files
  • Thalo — custom plain-text language with grammar, types, validation, and LSP; makes the same programming-theory bet we do but with full compiler formalization
  • Thalo entity types compared to commonplace document types — detailed type mapping showing gaps (supersedes links, source status tracking) and borrowable patterns
  • Tracecraft — S3-backed CLI coordination layer for multi-agent systems with five primitives over object storage; cleanest exemplar of coordination-by-convention, where the coordination semantics live in naming conventions and client compliance rather than enforcement mechanisms; first entry focused purely on coordination infrastructure rather than memory/knowledge management
  • Trajectory-Informed Memory Generation — source-only paper coverage of trajectory-derived strategy/recovery/optimization tips; artifact-learning counterpart to AgeMem's weight-learning path
  • virtual-context — proxy-owned context virtualization layer with topic summarization, fact extraction, tool-chain stubbing, and demand-paged retrieval; strongest reviewed example of managing the context window itself rather than bolting retrieval onto it
  • Voyager — embodied lifelong-learning loop with automatic curriculum, critic-gated retries, and promotion of successful trajectories into retrievable JavaScript skills; clearest executable-artifact learning system in this queue
  • xMemory — research-code agent memory system that turns dialogue streams into episodes, semantic facts, and theme hierarchies, then uses coverage-based representative selection plus entropy gates for top-down retrieval
  • Zikkaron — MCP memory server for Claude Code with 26 neuroscience-branded subsystems (Hopfield retrieval, predictive coding write gate, engram allocation, hippocampal replay) all implemented as heuristic Python without LLM calls; the neuroscience framing is vocabulary over mechanism, but the 9-signal retrieval fusion and compaction hooks are genuinely useful

Patterns Across Systems

Most systems here (ours, Ars Contexta, Thalo, ClawVault, Agent-Skills) independently converge on: - Filesystem over databases — plain text, version-controlled, no lock-in - Progressive disclosure — load descriptions at startup, full content on demand - Start simple — architectural reduction outperforms over-engineering - Trace-derived learningtrace-derived learning techniques in related systems broadens the comparison beyond pi-adjacent session mining to include artifact-learning and weight-learning systems fed by live traces and trajectories

The divergences are more revealing: - Storage model — Cognee uses a poly-store (graph + vector + relational with pluggable backends), Siftly uses SQLite, CrewAI uses LanceDB (embedded vector database), Hindsight uses PostgreSQL+pgvector, Zikkaron uses SQLite with FTS5+sqlite-vec, and SAGE uses SQLite+BadgerDB (personal) or PostgreSQL+pgvector (multi-node) as operational substrates, while the others keep files as the primary storage interface. OpenViking occupies a novel middle position: it presents a filesystem interface (viking:// URIs, ls/read/find operations) but the substrate is AGFS + vector index — filesystem as metaphor, not mechanism. Cludebot uses Supabase (PostgreSQL+pgvector) for its full mode but also offers a local JSON file store that is the closest a database-first system gets to filesystem-first. Cognee, Hindsight, CrewAI, Zikkaron, Cludebot, and SAGE are the furthest from filesystem-first: memories are opaque database records, not readable files - System boundary — CocoIndex sits one layer below most systems here: it is an incremental engine for maintaining derived vector/graph/relational projections, not a primary knowledge medium. That makes it more relevant to our "operational layer beneath the KB" question than to the note/link semantics question directly - Agent-facing UX — Napkin is the clearest example of treating CLI output itself as part of the memory architecture: hidden scores, match-only snippets, and next-step hints are all tuned for model behavior rather than human browsing. Most other systems focus on storage and retrieval internals but leave the interaction layer human-shaped - Packaging unit — most systems distribute concerns across multiple files (notes, configs, scripts, indexes), but o-o pushes the opposite extreme: each document is a self-contained polyglot file carrying rendering, update contract, shell dispatch, source cache, and changelog. That maximizes portability and local inspectability at the cost of modularity and inter-document structure - Grounding discipline — cognitive psychology (arscontexta) vs programming theory (commonplace, thalo) vs empirical operational patterns (Agent-Skills) - Formalization level — custom DSL (thalo) vs YAML conventions (commonplace) vs prose instructions (Agent-Skills) - Governance stance — most systems treat governance as advisory (instructions the agent should follow); Decapod enforces governance with hard gates (validation must pass, VERIFIED requires proof-plan); SAGE enforces with cryptographic gates (signed transactions, validator quorum, RBAC clearance levels) — two very different enforcement models, both structurally enforced rather than instructed - Access control — SAGE has structured multi-agent RBAC (clearance levels, domain-scoped permissions, on-chain agent identity); Cognee has relational ACLs with tenant isolation and per-dataset permissions; most other systems either have no access control or rely on filesystem permissions - Cross-agent knowledge transfer — most systems are single-agent or agent-agnostic; cass-memory is the first reviewed system to make cross-agent session mining a first-class feature, indexing logs from Claude Code, Cursor, Codex, Aider, and others into a shared playbook - Runtime self-modification — most frameworks have fixed agent topology defined at build time; OpenSage is the first reviewed system where agents can create subagents and scaffold new tools at runtime, though without quality gates on the created artifacts - Self-referentiality — only our KB is simultaneously a knowledge system and a knowledge base about knowledge systems

Open Questions

  • Does convergence on filesystem-first indicate a durable pattern, or a phase that will be outgrown?
  • Should high-volume ingestion in a file-first KB adopt a small operational database layer for stage state and indexing?
  • Will the programming-theory grounding produce better systems than the psychology grounding, or will they converge?
  • Are there systems we're missing that take a fundamentally different approach?

Other tagged notes