Semiont

Type: ../types/agent-memory-system-review.md · Status: current

Semiont is The AI Alliance's TypeScript platform for human and AI collaboration over document corpora. It is relevant to agent memory less as a personal-memory product than as annotation infrastructure: source resources live in a project working tree, W3C Web Annotations capture passage-level interpretation, an event log records domain changes, and graph/vector/read models project those records for browsing, context assembly, entity resolution, and generation. Its strongest memory-system lesson is the separation between source documents, annotation events, derived retrieval projections, and runtime actor surfaces.

Repository: https://github.com/The-AI-Alliance/semiont

Reviewed commit: d62c2d2da236280f7cfd9e6678d631e0614b378b

Core Ideas

W3C annotations are the primary semantic unit. Semiont's protocol centers the Mark flow on W3C Web Annotations with motivations such as highlighting, commenting, tagging, assessing, and linking; annotations carry selectors, bodies, creator/generator attribution, and source anchoring rather than becoming free-floating facts (W3C annotation docs, Mark flow). The backend assembles annotations through a pure helper that validates selector shape and stamps the creator before persistence (annotation assembly, assembly handler). These annotations are knowledge artifacts when a human or agent consumes them as evidence, context, links, comments, or review targets.

The storage substrate is layered, not one database. SemiontProject defines durable project-root paths for .semiont/events/ and representations/, while XDG state paths hold projections, embeddings, jobs, logs, and runtime files (project paths). The current WorkingTreeStore treats file:// URIs as stable resource identifiers and writes, registers, moves, or removes files directly in the project tree, optionally staging those changes with git (working-tree store). EventStorage appends JSONL events under the per-resource event directory, sharded by resource id, and also stages event files when [git] sync = true (event storage). Source resources and persisted events are the system of record; materialized views, graph, vectors, embeddings, and job files are operational or derived state.

The EventBus is the shared write and coordination surface. The Make-Meaning service wires one EventBus into the job queue, event store, Stower, Gatherer, Matcher, Browser, CloneTokenManager, and bus command handlers (service wiring). Stower is the single write gateway: it converts yield:*, mark:*, frame:*, and durable job:* commands into domain events and file operations (Stower). The HTTP bus gateway validates channels against CHANNEL_SCHEMAS, injects the authenticated principal DID as _userId, supports resource-scoped SSE replay by sequence number, and emits the same bus events used by SDK, CLI, frontend, MCP, and worker processes (bus route, EventBus).

Humans and agents share the same verbs but not the same affordances. The SDK exposes verb namespaces for frame, yield, mark, bind, gather, match, browse, beckon, and job over an injected transport and local event bus (SDK client). The CLI wraps those verbs for terminal workflows, including manual and delegated mark, upload or delegated yield, match, bind, and listen (CLI mark, CLI yield, CLI skill). The MCP server exposes a narrower tool set over the same SDK: browse resources, create or assist annotations, bind references, gather annotation context, create resources, and generate from annotations (MCP server, MCP handlers). The frontend is another peer over SemiontSession and the SDK, not a separate domain model (human UI docs).

Graph and vector stores are projections over the event/resource layer. The knowledge-base object groups the event store, filesystem views, working-tree content store, graph database, graph consumer, and optional vector store, rebuilding views and graph from the event log on startup unless skipped (knowledge base). The graph consumer subscribes to graph-relevant persisted event types and projects resources, annotations, body updates, entity tags, and archive state into the graph backend (graph consumer, graph interface). The Smelter actor embeds resource chunks and annotation exact text, writes an embedding cache under project state, and upserts to the vector backend (Smelter, embedding store, vector interface). Matcher then combines name search, entity-type overlap, graph neighborhood, vector similarity, and optional LLM reranking for bind candidates (Matcher).

AI workers create annotations and resources, not learned policy. Worker processes authenticate as software-agent identities, claim configured job types, call inference clients, emit job lifecycle events, create W3C annotations, or upload generated resources through the same client path as other actors (worker main, worker process). The processors detect highlights, comments, assessments, references, and tags, or generate markdown resources from gathered context (processors, annotation detection, generation). Their prompts, schemas, and job parameters are system-definition artifacts because they instruct what gets extracted or generated; the outputs are knowledge artifacts until a downstream workflow treats them as configuration or enforcement.

Boundary validation is stronger than epistemic governance. The backend authenticates humans and software peers with JWT-backed middleware, but Semiont's current RBAC docs say ordinary authenticated users have full read/write access to content, while moderator/admin flags gate administrative and exchange operations rather than resource-level permissions (auth middleware, admin routes, RBAC docs). OpenAPI validation is real for request bodies, query params, path params, and bus channel schemas (OpenAPI middleware, bus protocol schemas). Event integrity is more limited than some docs imply: current EventStorage explicitly says integrity is provided by git commit history when gitSync is enabled, not by per-event chaining metadata, while the backup verifier only checks manifest shape, stream presence, event counts, and blob counts despite older hash-chain comments (event storage, verify command, backup docs).

Comparison with Our System

Dimension Semiont Commonplace
Primary retained artifact Source files plus W3C annotations and event streams Typed markdown notes, sources, ADRs, instructions, reviews
Storage substrate Working tree, .semiont/events/, XDG state projections/jobs/embeddings, graph DB, vector DB, PostgreSQL users Git-tracked markdown, generated indexes/reports, validation outputs
Knowledge artifacts Annotations, resource files, gathered context, generated resources, graph/vector search results Notes, snapshots, indexes, links, review reports
System-definition artifacts Flow protocol, OpenAPI schemas, bus channel schemas, worker prompts/processors, entity/tag schemas, auth/RBAC config Type specs, collection contracts, AGENTS.md, skills, commands, validators
Activation Query, browse, match, gather, MCP/CLI/SDK workflows, worker jobs rg, frontmatter filters, authored links, indexes, skills, validation/review commands
Source/derived split Source resources and events are canonical; views, graph, vectors, embeddings, jobs are projections/operations Notes and sources are canonical; indexes and reports are generated views
Evaluation Structural validation, auth, schema checks, event replay, UI/worker tests Semantic review, deterministic validation, link health, human-readable claim review

Semiont is stronger where the core problem is grounded annotation over a shared document corpus. It has a richer operational layer than commonplace: browser sessions, SDK namespaces, CLI commands, MCP tools, worker daemons, graph projections, vector projections, and live coordination signals all speak the same protocol. That makes it a better substrate for "many actors annotate and resolve the same corpus" than a markdown-only library.

Commonplace is stronger where the memory artifact needs explicit argumentative shape. A Semiont linking annotation can say "this passage mentions this resource" and can drive graph traversal or generation, but the platform does not impose claim maturity, link labels as reader contracts, review status, retirement rationale, or theory-building structure. Its annotations are excellent anchors; they are not yet a substitute for curated notes.

The most useful comparison is authority. In Semiont, source documents and persisted events are authoritative records; graph/vector/search surfaces advise later actions as knowledge artifacts; protocol definitions, schemas, prompts, and worker code instruct or enforce behavior as system-definition artifacts. In commonplace, the human-readable retained artifact usually carries both meaning and lifecycle authority directly.

Read-back: pull — agents deliberately browse, match, gather, and inspect annotations or graph/vector projections through MCP, CLI, SDK, or UI workflows.

Borrowable Ideas

Annotation layer before note synthesis. For dense source corpora, Semiont's W3C annotation model is a cleaner intermediate layer than ad hoc excerpts. Commonplace could borrow this only when a project needs passage-scale review or source markup before writing notes.

One bus contract across human UI, CLI, MCP, and workers. Semiont shows the value of giving agents the same verbs humans use, rather than a separate automation API. If commonplace exposes MCP or a long-running daemon API, it should front the real note/review/link operations.

Derived retrieval projections with explicit source authority. Semiont keeps graph and vectors downstream from files and events. That is the right shape for any future commonplace graph/vector layer: build projections from canonical markdown and snapshots, then make regeneration cheap.

Deferred acknowledgements after persistence. mark:create-request only resolves after Stower persists mark:added, using correlation IDs. That is a good pattern for agent-facing write APIs where "accepted" and "durably recorded" must not be confused.

Participant coordination signals. Beckon and local UI intent channels are outside commonplace's current library model, but a workshop layer could use analogous signals for multi-agent review, handoff, or focused human inspection.

Curiosity Pass

This is not trace-derived learning. Semiont records domain events, job outcomes, progress, OpenTelemetry traces, and bus timelines, but I did not find a path that mines prior agent sessions, action traces, worker outcomes, or human corrections into durable rules, prompts, validators, rankers, model weights, or skills. The durable behavior-shaping artifacts are designed up front: protocol schemas, worker processors, prompts, entity/tag schemas, and auth/configuration. The event log can become evidence for later human or agent work, but the reviewed code does not automatically learn from it.

The code has moved faster than some architecture docs. SemiontProject and WorkingTreeStore make project-root resources and .semiont/events/ central, while docs/system/FILESYSTEM.md still describes event logs under XDG state. Backup and exchange docs still discuss hash-chain verification, but current event storage has no per-event chain metadata. The live code is coherent enough to review; the documentation needs a cleanup pass around storage and integrity.

The job queue is operational state, not memory. Job files under XDG state record pending/running/complete/failed/cancelled work, and Stower persists durable job lifecycle boundaries. That helps observability and recovery, but completed jobs are not promoted into durable lessons or policy. They are workflow traces unless another actor writes annotations or resources from them.

Vector embeddings are mixed-form derived artifacts. The embedding cache stores text plus numerical vectors, while Qdrant or memory vector stores rank later retrieval. That is distributed-parametric retrieval state derived from prose resources and annotations, not canonical knowledge. Its behavioral authority is ranking influence when Matcher or Gatherer consults it.

RBAC is enough for a trusted project, not a multi-tenant corpus. Authentication and admin gating are implemented, but ordinary content has shared write authority among authenticated users. That matches collaborative KB assumptions but is a major boundary if Semiont is used for regulated or adversarial corpora.

What to Watch

  • Whether storage docs converge on the current working-tree plus .semiont/events/ design, or whether events move back out to XDG state.
  • Whether event integrity grows beyond git commit history into signed events, per-event hashes, backup verification, or provenance checks.
  • Whether generated annotations get review states, confidence, correction loops, or promotion paths into higher-level durable knowledge artifacts.
  • Whether MCP remains a first-class surface over the same protocol as CLI/SDK/frontend as the API evolves.
  • Whether graph/vector projections gain explicit lineage fields that make every retrieved candidate traceable to resource content, event sequence, worker prompt, and embedding model.
  • Whether job outcomes or human corrections become inputs to trace-derived learning; current code does not do this.

Relevant Notes:

  • Knowledge artifact - classifies: source resources, annotations, gathered context, generated resources, and search results when consumed as evidence or advice
  • System-definition artifact - classifies: Semiont's flow protocol, channel schemas, OpenAPI contracts, prompts, processors, and tag/entity schemas
  • Behavioral authority - frames: graph/vector projections rank or advise, while protocol and worker code instruct or enforce
  • Storage substrate - frames: working tree, event log, state files, graph DB, vector DB, and PostgreSQL serve different artifact roles
  • Lineage - useful for assessing Semiont's source/resource/event/projection derivation story
  • Files beat a database for agent-operated knowledge bases - compares: Semiont keeps source resources and event streams in filesystem/git-adjacent form while using databases for derived or administrative surfaces