Mem0: Universal Memory Layer for AI Agents
Source: https://github.com/mem0ai/mem0 Paper: https://arxiv.org/abs/2504.19413 Type: Open-source memory system (Apache 2.0), YC S24 Retrieved: 2026-03-05
Overview
Mem0 ("mem-zero") is a memory layer for AI assistants and agents that extracts, stores, and retrieves facts from conversations. It supports multi-level memory scoped to users, agents, and runs. Production-oriented: claims +26% accuracy over OpenAI Memory on LOCOMO benchmark, 91% faster responses, 90% fewer tokens.
Architecture
Memory Model
Mem0 stores facts — short declarative statements extracted from conversations (e.g., "Name is John", "Favourite movies are Inception and Interstellar"). Each fact is embedded and stored in a vector store with metadata (user_id, agent_id, run_id, actor_id, timestamps).
Memory types: - Conversational/factual (default) — personal preferences, details, plans, professional info - Procedural — agent-specific procedural knowledge (requires agent_id) - Graph relations (optional) — entity relationships stored in a graph database alongside vector memories
Core Pipeline: Add
The add() operation follows a two-phase pipeline:
Phase 1 — Fact Extraction: 1. Parse messages (user/assistant conversation) 2. Send to LLM with a fact extraction prompt → get JSON list of facts 3. The prompt is specifically tuned: "Personal Information Organizer" that extracts preferences, details, plans, etc. 4. Separate prompts for user memory extraction vs agent memory extraction
Phase 2 — Memory Update (the key innovation): 1. For each extracted fact, embed it and search the vector store for existing similar memories (top 5) 2. Collect all retrieved old memories, deduplicate by ID 3. Send old memories + new facts to LLM with an update prompt 4. LLM returns a list of actions: ADD (new memory), UPDATE (modify existing), DELETE (remove), or NOOP 5. Execute each action against the vector store 6. Optionally, also add to graph store in parallel
This is the accretion + curation pattern — new facts don't just pile up, the LLM decides whether to add, merge, or delete.
Storage Backends
Vector stores (20+ supported): Qdrant, Pinecone, Chroma, Milvus, Faiss, pgvector, Elasticsearch, OpenSearch, Weaviate, Redis, MongoDB, Supabase, Azure AI Search, LanceDB, Cassandra, S3 Vectors, Neptune Analytics, etc.
Graph stores (optional): Neo4j, Memgraph, Kuzu — used for entity relationship extraction alongside the vector store.
Metadata store: SQLite for memory history tracking.
Search / Retrieval
search(query, user_id, limit) — embeds the query and does vector similarity search filtered by user/agent/run scope. Optional reranking via configurable reranker (Cohere, etc.).
Memory Operations (MemoryBase interface)
add(messages)— extract facts, reconcile with existing, storeget(memory_id)— retrieve single memoryget_all(user_id)— list all memories for a usersearch(query, user_id)— semantic searchupdate(memory_id, data)— direct updatedelete(memory_id)— delete single memorydelete_all(user_id)— delete all memories for a userhistory(memory_id)— get change history (ADD/UPDATE/DELETE events tracked in SQLite)reset()— clear everything
Agent Integration
Simple API: memory = Memory(); memory.add(messages, user_id="alice"); memories = memory.search("restaurants", user_id="alice"). The memory layer sits outside the agent loop — memories are retrieved before LLM calls and injected into the system prompt, new memories are extracted after.
Key Design Decisions
- Facts as unit of storage — not raw messages, not summaries, but extracted declarative facts. This is opinionated: the LLM decides what matters.
- LLM-mediated CRUD — the update step uses the LLM to decide ADD/UPDATE/DELETE, not just append. This prevents memory bloat but introduces LLM judgment into storage.
- Vector-first, graph-optional — the primary store is vector (for semantic search); graph is an add-on for relationship tracking, not the core.
- Scoped by user/agent/run — memories are namespaced by who they belong to. This is oriented toward personalization use cases.
- Stateless integration — Mem0 doesn't manage the agent loop or conversation flow. It's a memory API you call from your agent.
- Massive backend flexibility — 20+ vector stores, multiple graph stores. Integration-heavy, aimed at enterprise deployment.
Limitations
- Fact extraction prompt is tuned for personal assistant use cases (preferences, details, plans). May not capture domain-specific knowledge well.
- The LLM-mediated update step adds latency and cost to every add operation (two LLM calls: extract + reconcile).
- No temporal model — memories don't have validity intervals or decay by default (despite "dynamic forgetting" claims in marketing).
- Graph store is bolted on, not deeply integrated with the vector store.