REM

Type: ../types/agent-memory-system-review.md · Status: current · Tags: trace-derived

REM, from satyammistari, is a service-oriented memory layer for AI agents. The inspected repository implements a Go API, Python worker, Python SDK, LangChain and AutoGen integrations, and a multi-store backend around Postgres, Qdrant, Neo4j, and Redis. Its strongest implemented path is episodic write and retrieval with prompt injection; its semantic consolidation path exists in worker and schema code, but is not fully connected through the Go API routes at the reviewed commit.

Repository: https://github.com/satyammistari/REM

Reviewed commit: 935e8be0a1fca5b23dbabfe16c48b562c1cd24cc

Last checked: 2026-05-16

Core Ideas

Episodes are the primary retained trace. The public SDK writes an episode after a task or conversation turn, and the Go API persists it only after validating agent_id, user_id, and content length (sdk/python/rem_memory/client.py, go-api/internal/api/handlers/episodes.go). The canonical episode record stores raw content plus parsed entities, intent, outcome, domain, emotion signal, importance score, retrieval counters, consolidation flags, source session/team fields, and timestamps in Postgres (go-api/internal/domain/episode.go, go-api/internal/db/postgres/migrations/001_initial.sql). In artifact terms, raw episodes are trace-derived knowledge artifacts: they preserve evidence about prior agent/user interactions and become behavior-shaping only when retrieval or consolidation activates them.

Write fan-out turns one episode into several indexes. WriteEpisode parses the raw content through the Python worker, embeds the raw content with Redis caching, inserts the episode row in Postgres, upserts an episode vector in Qdrant, creates a Neo4j episode node, links it to the previous episode with FOLLOWED_BY, and then pushes a consolidation job payload to Redis (go-api/internal/services/write_service.go, python-worker/services/parse_service.py, go-api/internal/db/qdrant/client.go, go-api/internal/db/neo4j/client.go, go-api/internal/db/redis/client.go). The storage substrate is deliberately polyglot: Postgres is the canonical row store, Qdrant is distributed-parametric retrieval state, Neo4j is a symbolic temporal/semantic graph index, and Redis is cache plus job transport.

Retrieval is vector-first with graph expansion and prompt assembly. Retrieve embeds the query, searches Qdrant for episode vectors and optionally semantic-memory vectors, loads episode rows from Postgres, expands temporal neighbors from Neo4j for the top episode hits, reranks by vector score, recency, and retrieval count, increments retrieval counts asynchronously, and formats the result into an === RELEVANT MEMORY CONTEXT === prompt block (go-api/internal/services/retrieve_service.go, go-api/internal/api/handlers/retrieve.go). Episodes and semantic facts are knowledge artifacts while stored or returned as evidence; the assembled injection prompt gains temporary system-definition-artifact authority because the SDK and integrations tell callers to place it into the next model call.

Semantic memory is implemented as extracted prose facts with source IDs, but the end-to-end route is incomplete. The schema includes semantic_memories with fact, confidence, evidence_count, source_episode_ids, domain, fact_type, active, contradicted_by, and superseded; the Python consolidation engine groups unconsolidated episodes by domain and intent keyword overlap, asks an LLM for durable facts, embeds each fact, and attempts to store it through the Go API (go-api/internal/db/postgres/migrations/001_initial.sql, python-worker/services/consolidation/engine.py, python-worker/workers/tasks.py). However, the Go router registers only GET /api/v1/semantic, not the POST /api/v1/semantic endpoint the Celery task calls, and it also lacks the /api/v1/episodes/:id/consolidated patch endpoint the task calls to mark source episodes (go-api/internal/api/router.go, go-api/internal/api/handlers/semantic.go, python-worker/workers/tasks.py). There is a standalone Python /consolidate endpoint and demo script path, but the production write-to-semantic loop should be read as partial at this revision (python-worker/api/handlers/consolidate.py, scripts/seed_demo.py).

Lifecycle metadata exists, but governance is not wired into policy. Episodes have consolidated, consolidation_candidate, consolidated_into, retrieval_count, and last_retrieved_at; semantic memories have active, contradicted_by, superseded, confidence, and source_episode_ids (go-api/internal/db/postgres/migrations/001_initial.sql, go-api/internal/db/postgres/episodes.go, go-api/internal/db/postgres/semantic.go). These are good handles for lineage, activation, contradiction, and retirement, but the inspected code does not implement redaction, contradiction resolution, supersession workflows, confidence review, or a policy that decides when a fact should stop influencing prompts.

Integrations make activation the product surface. The Python SDK exposes write, retrieve, agent management, and semantic-memory listing; LangChain's REMMemory loads retrieved prompt text into relevant_memories before an LLM call and writes a combined human/assistant episode after the response; AutoGen's REMMemoryStore can build a system-message prefix and persist conversation turns (sdk/python/rem_memory/client.py, sdk/python/rem_memory/integrations/langchain.py, sdk/python/rem_memory/integrations/autogen.py). REM is therefore more than storage: it gives agents an explicit capture-before/after and retrieval-before path.

Comparison with Our System

Dimension REM Commonplace
Primary purpose Service memory for AI agents and agent frameworks Agent-operated methodology KB with durable notes, instructions, reviews, ADRs, and validation
Storage substrate Postgres rows, Qdrant vectors, Neo4j graph nodes/edges, Redis cache/jobs, API objects Git-tracked Markdown, schemas, sources, generated indexes, review outputs, scripts
Representational form Raw trace prose, parsed symbolic fields, LLM-extracted prose facts, vector embeddings, graph edges, prompt text Typed prose and frontmatter, symbolic schemas/commands/links, generated indexes, validation code
Lineage Semantic facts can carry source_episode_ids; episodes carry consolidation lifecycle columns Source-pinned reviews, authored citations, replacement archives, statuses, validation, and review gates
Activation SDK/framework retrieval before generation; prompt injection; vector ranking; graph expansion Search, indexes, authored links, skills, instructions, validation and review workflows
Behavioral authority Retrieved rows advise; embeddings rank; injection prompt conditions the next model call Artifact authority is separated across notes, instructions, skills, schemas, commands, and review gates

REM and commonplace share the premise that memory matters only when it changes later work. REM's strongest advantage is online activation: once an agent calls retrieve, prior traces and learned facts are assembled into prompt context without the agent manually searching a repository. Commonplace is stronger in inspectable durability: claims, instructions, and reviews live as source-controlled artifacts with explicit type contracts, citations, validation, replacement history, and human-readable diffs.

The sharpest difference is authority discipline. REM collapses storage, retrieval, ranking, and prompt assembly behind service calls. That is convenient for agent frameworks, but it makes the injected prompt the effective authority boundary. A retrieved episode might only be an advisory knowledge artifact in storage, yet it can become high-impact prompt context if it is always injected before generation. Commonplace makes stronger authority surfaces explicit: a note advises, a skill instructs, a schema validates, and an index routes.

The other difference is implementation maturity. REM's schema and worker design point toward trace-to-semantic learning, including source episode IDs and lifecycle fields. But the inspected API currently makes episodic write/retrieve more reliable than semantic consolidation: the worker's store and mark-consolidated calls target routes that are not registered in the Go router. Commonplace has less online memory automation, but its shipped validation and review flows are tighter around the artifacts it does claim to operate.

Read-back: both — clients can call retrieval directly, and framework integrations inject assembled memory before generation.

Trace-derived learning placement

Trace source. REM qualifies as trace-derived learning. The qualifying source traces are agent episodes: SDK calls after tasks, LangChain conversation input/output pairs, AutoGen turns, and direct /api/v1/episodes writes. Each episode stores raw content and parsed metadata, and the write path indexes it in Postgres, Qdrant, and Neo4j (sdk/python/rem_memory/integrations/langchain.py, sdk/python/rem_memory/integrations/autogen.py, go-api/internal/services/write_service.go).

Extraction. Extraction has two layers. The parse step asks gpt-4o-mini to extract intent, entities, domain, emotion signal, and importance score from each episode. The consolidation step groups at least three unconsolidated episodes by domain and intent keyword overlap, then asks gpt-4o to produce durable facts with confidence, fact type, domain, evidence count, and source episode IDs (python-worker/services/parse_service.py, python-worker/services/consolidation/engine.py). The oracle is the configured LLM plus simple clustering and thresholds, not human review or deterministic proof.

Storage substrate. Raw traces persist as Postgres episode rows, Qdrant episode vectors, Neo4j episode nodes and temporal edges, and Redis-cached embeddings. Distilled semantic memories are designed to persist as Postgres rows, Qdrant semantic vectors, and Neo4j semantic nodes linked from source episodes, although the end-to-end Go API route for creating those rows is missing at this commit (go-api/internal/db/postgres/migrations/001_initial.sql, go-api/internal/db/qdrant/client.go, go-api/internal/services/graph_service.go, go-api/internal/api/router.go).

Representational form. Raw episode content and semantic facts are prose. Episode metadata, lifecycle columns, API request/response types, graph edges, and route definitions are symbolic. Qdrant embeddings and vector search indexes are distributed-parametric retrieval state. The operative memory path is mixed: prose traces and facts are selected by symbolic filters and vector scores, expanded through graph edges, reranked by code, and converted back into prompt prose.

Lineage. REM has a useful lineage seed: semantic memories carry source_episode_ids, and source episodes can be marked consolidated with consolidated_into (go-api/internal/db/postgres/migrations/001_initial.sql, python-worker/workers/tasks.py). Lineage remains weak as governance because there is no prompt-version field, extraction-run ID, human review state, invalidation rule, contradiction workflow, or regeneration policy. The source pointer exists; the lifecycle around it is not yet enforced.

Behavioral authority. Raw episodes and semantic facts are knowledge artifacts when they are stored, listed, or returned as evidence. Qdrant scores, recency weighting, retrieval-count bonuses, and graph expansion have ranking authority because they select what becomes active. The generated injection prompt and AutoGen system-message prefix are system-definition artifacts for the next model call because they are inserted into instruction-bearing prompt channels (go-api/internal/services/retrieve_service.go, sdk/python/rem_memory/integrations/autogen.py).

Scope. The implemented scope is per user and per agent, with optional session and team fields on episodes. It is not a project repository, organization-wide KB, or governed cross-task library unless clients impose that discipline above REM.

Timing. Episode capture and retrieval happen online. Parse and embedding happen during writes. Consolidation is intended to run asynchronously after writes and periodically via Celery, but the inspected write path pushes to a Redis list while Celery uses Redis as a broker queue, and the worker's semantic-memory writeback calls missing Go routes (go-api/internal/db/redis/client.go, python-worker/workers/celery_app.py, python-worker/workers/tasks.py).

Survey placement. REM belongs on the trace-to-fact and trace-to-prompt-injection axes of the trace-derived survey. It strengthens the survey split between raw trace storage and behavior-changing activation: raw episodes become useful when parsed, embedded, clustered, consolidated into semantic facts, ranked, and injected into a prompt. It also demonstrates a failure mode worth tracking: schema and worker code can describe trace-derived learning before the API and job lifecycle make that learning operational.

Borrowable Ideas

Treat episode rows as canonical, indexes as derived views. Ready to borrow as artifact vocabulary. REM's design is clearest when Postgres rows are canonical trace records and Qdrant/Neo4j are derived activation indexes. Commonplace should keep the same distinction for any future vector or graph layer.

Use source episode IDs on semantic facts. Ready as a lineage requirement. Even weak source_episode_ids are better than free-floating learned facts; a commonplace version would add extraction-run metadata, review state, and invalidation rules.

Return an assembled prompt, not just search hits. Useful for agent integrations. REM's injection_prompt is pragmatic because framework callers need a behavior-ready surface. Commonplace should only do this for derived views with explicit source and authority metadata.

Keep lifecycle columns separate from policy. Worth borrowing cautiously. REM's active, superseded, contradicted_by, retrieval counters, and consolidation fields are useful handles, but the real design work is the policy that sets them and the consumer that respects them.

Do not borrow the poly-store stack without a strong activation need. Postgres plus Qdrant plus Neo4j plus Redis plus two runtimes increases operational surface area. Commonplace should add those substrates only when a specific workflow cannot be served by files, indexes, and scripts.

Takeaways

REM is a trace-derived memory service, but not all advertised loops are equally complete. Episode write/retrieve is concrete. Parse, embedding, vector search, temporal graph expansion, and prompt construction are wired. Semantic consolidation exists in schema and worker code, but the Go API route and job handoff are incomplete at the reviewed commit.

The useful artifact split is raw trace, parsed metadata, semantic fact, index, lifecycle state, and prompt view. Calling all of these "memory" hides the actual behavior path. Raw traces preserve evidence; parsed fields structure retrieval; semantic facts distill patterns; Qdrant and Neo4j rank and expand; lifecycle columns should govern eligibility; injection prompts condition the next model call.

Prompt injection is the main authority boundary. REM's memories are advisory while stored, but they become more forceful when the SDK or framework integration injects them into system or memory variables before generation.

Lineage is present but under-governed. source_episode_ids and consolidated_into are the right shape, but there is no review, contradiction, source invalidation, prompt-version, or retirement workflow around them.

The service orientation trades inspectability for activation. REM gives external agents an easy API and framework adapters. Commonplace gives maintainers inspectable source-controlled artifacts and validators. These are complementary, but they solve different failure modes.

Curiosity Pass

REM's most interesting design move is not the vector store. It is the attempt to put trace-derived facts, graph relationships, retrieval ranking, and prompt assembly behind one agent-facing API. That makes activation cheap for clients, but also concentrates authority in a service response that may be hard for an agent to audit.

The simpler version would be a pure Postgres/Qdrant episodic retriever. REM adds semantic facts and Neo4j lineage/temporal edges to get closer to cognitive "episodic plus semantic memory." The inspected code shows why that extra power needs integration discipline: each extra substrate needs a clear canonical source, writeback path, and lifecycle policy.

The repository contains several lifecycle fields that look more mature than the consuming logic around them. contradicted_by, superseded, active, and consolidation_candidate are good vocabulary, but the reviewed implementation does not yet prove how those states are assigned or enforced.

Open Questions

  • Will REM add the missing semantic-memory creation and episode-consolidation routes, or move consolidation storage fully behind the Python worker?
  • Should Redis consolidation jobs be sent through Celery task queues instead of a custom list that the Celery worker does not consume?
  • How will REM resolve contradictions when a later episode changes a user preference or invalidates a learned rule?
  • Should semantic memories carry extraction prompt version, model, run ID, confidence rationale, and source episode snippets?
  • How much does Neo4j graph expansion improve retrieval beyond vector search plus recency scoring?
  • Will the graph API return real nodes and edges? The inspected GetAgentGraphData currently returns an empty graph after querying.
  • Should prompt injection distinguish advisory memories from stronger instructions before they enter a system-message channel?

What to Watch

  • Whether semantic-memory writeback becomes a registered Go API path with Qdrant and Neo4j updates.
  • Whether consolidation gains source-linked review, contradiction handling, redaction, and supersession policy.
  • Whether the dashboard exposes lineage from learned facts back to source episodes, not just memory cards and graph views.
  • Whether retrieval evaluation measures downstream agent behavior rather than hit count or latency alone.
  • Whether the TypeScript SDK, managed service, and forgetting/redaction roadmap items change the authority and governance model (README.md).

Bottom Line

REM is best read as an online episodic-memory service with a partially wired trace-to-semantic layer. It captures agent episodes, parses and embeds them, stores canonical rows plus vector and graph indexes, retrieves relevant context, and returns prompt text that can condition the next model call. Commonplace should borrow REM's explicit episode/fact/index/prompt split and source-ID lineage seed, while retaining stronger artifact contracts, validation, and governance before any extracted memory gains durable instructional authority.

Relevant Notes: