Cludebot
Type: ../types/agent-memory-system-review.md · Status: current · Tags: trace-derived
Cludebot is the repository behind Clude, an agent memory SDK, hosted service, MCP server, dashboard, and autonomous X/Twitter bot built by sebbsssss. The public package presents "cognitive memory" rather than plain storage: agents store typed memories, recall them through hybrid retrieval, and can run dream cycles that consolidate episodes into semantic, procedural, and self-model memories. The code supports several deployment surfaces: hosted Cortex API, self-hosted Supabase/Postgres, default local SQLite MCP storage, and a legacy JSON-file local mode.
Repository: https://github.com/sebbsssss/cludebot
Reviewed revision: 28672beebc1f0e72f9b9fda685bb97f85d0ee422
Core Ideas
Typed memory rows are the central retained artifact. The Supabase schema stores episodic, semantic, procedural, self_model, and introspective memories with content, summary, tags, concepts, emotional valence, importance, access count, source fields, owner scoping, decay, evidence IDs, optional encryption metadata, embeddings, compaction markers, and tokenization/on-chain attestation columns (schema). The SDK API exposes the same type vocabulary through Cortex.store, Cortex.recall, summaries, hydration, stats, recent memories, self-model, links, and dream hooks (Cortex SDK). These rows are knowledge artifacts when they supply evidence or context, but procedural rows become system-definition artifacts when formatMemoryContext emits them as instructions the agent "MUST follow" (memory.ts).
Decay and reinforcement are explicit, but not uniform across modes. In the Supabase path, DECAY_RATES preserve semantic/procedural/self-model memories longer than episodic memories, recall updates access counts, boosts importance with source-aware reinforcement, and co-retrieval strengthens memory links (constants, memory.ts). The newer SQLite store has its own decay engine and queue, while the --local JSON store is simpler: keyword scoring, access-count updates, and a small importance boost on recall, without the graph/dream machinery (SQLite store, dream engine, JSON local store).
Recall is a layered retrieval pipeline, not just vector search. The Supabase recallMemories path expands queries, performs vector search with pgvector RPCs, fetches metadata candidates, optionally adds BM25 full-text hits, always admits curated knowledge seeds, computes a weighted score over recency, keyword relevance, importance, vector similarity, and decay, then widens through entity mentions, entity co-occurrences, typed memory links, and type-diversity repair (memory.ts, graph.ts). findClinamen is a separate lateral-retrieval path: high-importance, low-relevance memories are returned as creative anomalies rather than as direct answers (clinamen.ts).
Dream cycles generate new behavior-shaping memories from traces. The full dream cycle accumulates episodic importance, then runs consolidation, compaction, reflection, contradiction resolution, action-learning, optional JEPA deep connections, and emergence (dream cycle). Consolidation turns recent episodes into semantic insights and procedural rules with evidence IDs and support links; compaction summarizes older low-importance episodes and marks originals compacted; reflection writes self-model memories; contradiction resolution writes semantic resolutions and decays weaker beliefs. Hosted agents get a lighter scheduled worker that periodically turns episodic memories into semantic insights and procedural rules per owner wallet, skipping the heavier full-cycle phases (hosted dreams).
Action-outcome learning separates raw events from procedural promotion. logAction stores an action as an episodic memory tagged action and awaiting_outcome; logOutcome stores a separate episodic outcome and updates the original action's tags and importance; refineStrategies groups recent action memories by feature, checks positive/negative outcome rates, and stores procedural [LEARNED STRATEGY] memories when a feature has enough signal (action-learning.ts). That is the cleanest trace-derived split in the repo: raw action/outcome rows remain evidence; generated procedural memories carry stronger behavioral authority during later prompt assembly.
The graph layer mixes explicit links, entities, and optional learned latent links. Memory-to-memory links are typed (supports, contradicts, elaborates, causes, temporal relations, and others), auto-created from evidence IDs, vector similarity, concept overlap, user overlap, and contradiction heuristics, and strengthened on co-retrieval (memory.ts, schema). Entity extraction creates entity, mention, relation, and co-occurrence surfaces for graph-aware recall (graph.ts). The JEPA "deep connection" phase is optional and environment-gated: it calls an external predictor for relation-specific latent embeddings, filters out targets already findable by ordinary recall or existing links, and writes new memory links plus a dream log (deep connection, JEPA client).
The integration surface is broad but uneven. The stdio MCP server supports default SQLite, JSON local, hosted HTTP, and self-hosted Supabase modes, and exposes recall, store, stats, clinamen, delete, update, list, batch store, prompt instructions, and skill extraction where supported (MCP server). The hosted Streamable HTTP connector in the Express app is narrower: recall, store, and stats, authenticated by bearer API key and owner-scoped per request (remote MCP route). The README says MCP agents get four tools including find_clinamen, but the code only matches that claim for the stdio server, not for the hosted HTTP connector (README, remote MCP route).
Experimental confidence and reranking are implemented as opt-in wrappers. enhancedRecallMemories can wrap standard recall with IRCoT, cross-encoder/Voyage reranking, low-confidence filtering, and evidence sufficiency scoring, but all of these are behind feature flags or explicit wrapper use; the public Cortex routes and core SDK recall path call the standard retrieval function directly (enhanced recall, config, Cortex routes). Confidence gating is computation over retrieved memories, not learned state; reranking is a request-time ranking artifact, not a durable memory artifact (confidence gate, reranker).
Comparison with Our System
Cludebot is almost the inverse of commonplace. Commonplace treats a filesystem knowledge base as the source of truth and uses validation, typed notes, links, indexes, and review procedures to keep knowledge inspectable. Cludebot treats memory as an online service substrate: rows, embeddings, graph edges, dream logs, API routes, MCP tools, dashboard views, memory packs, and optional Solana attestations.
| Dimension | Cludebot | Commonplace |
|---|---|---|
| Main substrate | Supabase/Postgres, SQLite, JSON local store, hosted API | Git-tracked markdown files |
| Primary unit | Typed memory row with summary/content/metadata | Typed note or reference artifact |
| Retrieval | Vector, BM25, keyword, tags, importance, decay, entities, graph, clinamen | rg, frontmatter, authored links, indexes, validation reports |
| Distillation | Dream cycles generate semantic, procedural, self-model, compaction, contradiction-resolution memories | Human/agent curation into notes, instructions, ADRs, indexes |
| Authority path | MCP/SDK recall injects rows into prompts; procedural memories can be instructions | Agents read note content; instructions and commands carry stronger authority |
| Governance | Owner scoping, decay, evidence IDs, links, on-chain hashes, optional confidence gates | Type contracts, validation, review bundles, git history |
| Lifecycle | Decay, compaction markers, contradiction resolution, delete/update/list tools | Status fields, replacement archives, explicit review/validation workflows |
| Portability | Memory packs, smart export, local modes | Repository portability and plain-text inspectability |
Cludebot is stronger where the agent can continuously capture traces and act through an MCP tool surface. It records experiences as they happen, can reinforce memories by access, can run scheduled consolidation, and can make procedural rules immediately active in later prompt context. It also has a more diverse retrieval stack than commonplace currently does: vectors, SQL full-text search, entity graph expansion, typed link traversal, and anomaly retrieval.
Commonplace is stronger where memory must remain auditable, composed, and maintainable by humans and agents over time. Cludebot preserves metadata and evidence IDs, but most durable knowledge remains free-text rows. Its semantic/procedural outputs are generated memories, not typed library artifacts with section contracts, backlinks, status transitions, or review gates. The system has lineage fields, but not a robust regeneration or invalidation contract for a generated insight when its source episodes are edited, imported, compacted, or contradicted.
The most important design divergence is behavioral authority. In commonplace, an instruction file, a type spec, a validator, and a review note have different authority channels. In Cludebot, those distinctions collapse into memory type plus prompt formatting: episodic and semantic memories advise; procedural memories become instructions when formatted; graph edges and embeddings rank; MCP tool descriptions instruct the consuming agent when to call tools. The mechanism is pragmatic, but the authority boundary is softer.
Read-back: pull — the agent or host calls recall/MCP tools; no proactive injection path is described.
Borrowable Ideas
Treat memory type as a decay and authority parameter, not just metadata. Ready as a framing. Cludebot uses type to shape decay, recall diversity, display grouping, and prompt wording. Commonplace already has type specs; we could be more explicit about which types are merely evidential and which are instruction-bearing.
Keep raw traces and distilled behavior separate. Ready now. The action/outcome/procedural chain is a clear pattern: raw action rows and outcome rows are retained as evidence, then a generated procedural rule becomes the behavior-shaping artifact. Commonplace should preserve that split whenever trace-derived notes or instructions are produced from review logs, validation histories, or agent work traces.
Use co-retrieval reinforcement as a weak signal, not a source of truth. Needs a use case first. Cludebot strengthens links between memories that are retrieved together. A KB analogue would be tracking notes co-opened by agents and using that as a recommendation signal, while leaving authored links and type contracts as the source of truth.
Expose anomaly retrieval as a named operation. Ready as a concept, not implementation. findClinamen is useful because it does not pretend to answer the query; it explicitly asks for high-importance material outside the current relevance basin. Commonplace could support a similar "lateral search" report over tags, links, and descriptions before adding embeddings.
Generate prompt-ready context from memory, but mark the authority. Ready as a design warning. Cludebot's formatMemoryContext makes procedural memories binding in prompt text. If commonplace ever compiles notes into assistant instructions, the compiled view should keep provenance and authority labels visible so advice is not silently upgraded into instruction.
Owner-scoped hosted memory is an adoption advantage. Not directly borrowable for this repo. Cludebot's hosted mode and remote MCP connector are practical distribution features: users get memory without running infrastructure. Commonplace's local/git substrate is better for inspectability, but worse for zero-setup adoption.
Trace-derived learning placement
Trace source. Cludebot consumes several trace streams: MCP/API/tool calls storing user or agent memories; X/Twitter interactions stored as episodic memories; raw action rows from logAction; outcome rows from logOutcome and social engagement tracking; recent episodic rows used by dream cycles; and imported MemoryPack rows. Trigger boundaries vary: per memory store for raw capture and importance accumulation, every six hours for hosted dreams, explicit clude dream or worker startup for full dream cycles, and delayed outcome tracking for social actions.
Extraction. Extraction is mostly LLM-driven but has different oracles. Consolidation asks focal-point questions over recent episodes and stores semantic insights with parsed evidence IDs. Procedural extraction and refineStrategies derive behavioral rules from episodes or action/outcome rates. Contradiction resolution uses existing contradicts links and an LLM to synthesize a semantic resolution, then weakens one source memory through decay. Hosted dreams use OpenRouter to turn recent episodes into factual insights and actionable rules. The strongest non-LLM oracle is action outcome measurement: X engagement metrics become positive/neutral/negative outcome memories, and outcome rates decide whether a procedural strategy is worth storing.
Storage substrate. Raw and distilled retained state primarily lives in the memories table, memory_links, entities, entity_mentions, dream_logs, vector columns, and related RPC-backed graph/search indexes in Supabase. Default MCP local mode uses ~/.clude/brain.db SQLite plus optional sqlite-vec embeddings and a dream queue. --local mode uses ~/.clude/memories.json. Memory packs and smart exports are portable JSON/Markdown/context documents; Solana memo/registry writes are attestations over memory content, not the operational memory store.
Representational form. Raw episodes, actions, outcomes, semantic memories, procedural memories, self-model memories, dream logs, memory packs, and smart exports are prose or prose-plus-JSON. Memory types, tags, concepts, evidence IDs, link types, owner scopes, schemas, RPCs, and MCP tool definitions are symbolic. Embeddings, vector indexes, reranker scores, confidence scores, and JEPA-predicted embeddings are distributed-parametric or numerical ranking artifacts. The operative parts are mixed: a row's content may be prose knowledge, its type and tags route it symbolically, and its embedding affects activation.
Lineage. Cludebot keeps partial lineage through source, source_id, evidence_ids, dream_logs.input_memory_ids, dream_logs.new_memories_created, memory links, compaction markers, and content hashes. This is enough to explain many generated memories after the fact. It is not yet a regeneration contract: generated semantic/procedural rows do not become derived views automatically invalidated when source memories decay, are updated, are deleted, or are superseded by contradiction resolution.
Behavioral authority. Raw episodic/action/outcome rows are knowledge artifacts: they are evidence and recall context. Generated semantic memories are also knowledge artifacts unless injected as facts in prompt context. Procedural memories become system-definition artifacts when formatted as "Learned Strategies" with a mandatory instruction in formatMemoryContext. Graph links, embeddings, BM25 ranks, reranker scores, confidence gates, and JEPA links are ranking or routing artifacts. MCP tool definitions are system-definition artifacts for the consuming agent because they define when and how the agent may write, recall, edit, list, or delete memory.
Scope and timing. Scope is per owner wallet, API key, local database, or local JSON file; hosted dreams iterate over active agents. Timing is online and staged: raw traces are captured during interaction, dream cycles run later, and procedural memories shape future prompt assembly. The optional experimental rerank/confidence layer is request-time only and does not learn durable state.
On the survey's axes, Cludebot sits in the live-service trace stream branch on axis 1 and the mixed-artifact branch on axis 2. It strengthens the survey's distinction between raw trace memory and distilled behavior-shaping memory: the same memories table stores both, but authority changes when a row is transformed into procedural prompt instruction. It also splits the "artifact learning" bucket because it combines prose memories, symbolic graph/schema/tool artifacts, and numerical activation artifacts in one production-oriented service.
Curiosity Pass
The README's "local-first SQLite" and MCP tool claims need mode qualifiers. The default stdio MCP server really does use SQLite when no hosted key or Supabase URL is configured, and it exposes many tools. The hosted HTTP connector is narrower, and the --local JSON store is much simpler than the SQLite path. A reader who treats "local", "hosted", and "self-hosted" as equivalent will overestimate feature parity.
The strongest implemented learning loop is not the flashiest one. Dream cycles, JEPA, clinamen, and emergence are distinctive, but action-outcome learning has the cleanest behavioral contract: action trace, outcome trace, measured sentiment, learned strategy. That is the part most worth comparing to trace-derived learning systems.
Generated procedural memories are powerful and risky. Once a procedural row reaches formatMemoryContext, it is no longer just retrieved evidence; it becomes a behavioral rule. The system has evidence IDs and source fields, but no review gate between "LLM extracted a pattern" and "future agents MUST follow it."
Graph edges have mixed provenance. Some links are explicit evidence links from generated memories to source memories; some are heuristic auto-links; some are co-retrieval strengthened; some can come from JEPA predictions. They share the same retrieval traversal channel, so consumers may not know whether a graph boost came from evidence, heuristic similarity, or learned latent prediction.
On-chain anchoring is audit metadata, not memory authority. The content hash and Solana signature can help verify that a memory row existed in a given form, but the recall behavior still comes from database rows, embeddings, decay, links, and prompt formatting.
What to Watch
- Whether hosted and stdio MCP tool surfaces converge, especially around
find_clinamen, delete/update/list, and skill extraction. - Whether procedural memories get a promotion gate, confidence threshold, or human review path before becoming prompt-level instructions.
- Whether source lineage matures into invalidation/regeneration for generated semantic, procedural, compaction, and contradiction-resolution rows.
- Whether JEPA deep connections becomes a shipped service dependency or remains an environment-gated experimental phase.
- Whether local SQLite dream queues gain feature parity with the Supabase dream cycle, or stay a simpler local memory substrate.
- Whether MemoryPacks become a genuine cross-agent artifact contract rather than export/import convenience.
Relevant Notes:
- trace-derived learning techniques in related systems — extends: Cludebot is a live-service trace-derived system where raw rows, distilled procedural memories, graph/ranking artifacts, and MCP tools carry different authority
- retained artifact — defined-in: Cludebot bundles many retained states under "memory", including rows, embeddings, graph links, dream logs, prompt text, and exported packs
- behavioral authority — sharpens: procedural memories, MCP tool definitions, graph/ranking artifacts, and knowledge rows affect behavior through different channels
- distillation — exemplifies: dream cycles compress recent traces into semantic, procedural, and self-model memories
- codification — contrasts: Cludebot mostly promotes traces into prose and symbolic routing artifacts, not executable code
- axes-of-artifact-analysis — exemplifies: the system separates storage substrate, representational form, lineage, and authority only partially, making it a useful mixed-artifact case