Semiont

Type: agent-memory-system-review · Status: current · Tags: related-systems

Semiont is an open-source "semantic knowledge kernel" from The AI Alliance for building shared corpora where humans and AI agents annotate documents, resolve references, and generate new resources against the same underlying system. The current checkout is a large TypeScript monorepo with a Hono backend, React frontend, CLI, MCP server, event-sourcing package, graph abstraction layer, vector search layer, and six AI workers. The important architectural fact is narrower than the project framing: Semiont is primarily a document-grounded annotation platform whose graph, vector, and AI layers are projections and workflows built around W3C Web Annotations.

Repository: https://github.com/The-AI-Alliance/semiont

Core Ideas

Annotations are the primary semantic unit. Semiont's central move is to make passage-level W3C annotations, not notes or extracted facts, the first-class semantic object. The write actor in stower.ts turns mark:create and yield:create commands into append-only domain events, while the Mark flow documentation treats highlights, comments, assessments, tags, and references as one unified annotation substrate. The knowledge graph is downstream of these annotations; it is not the source of truth.

Humans and AI really do share the same write path. The "peer collaboration" claim is not just README rhetoric. Backend routes, CLI commands, MCP tools, and async workers all converge on the same EventBus domain surface. startMakeMeaning() in service.ts wires one EventBus into Stower, Gatherer, Matcher, Browser, CloneTokenManager, and the six workers. Workers are separate polling loops, but they emit the same mark:*, yield:*, and job:* commands that any other caller would. That is a meaningful implementation choice: AI automation is not a side channel bolted onto a human system.

The durable substrate is a project working tree plus a git-backed event log. The reviewed content path does not keep the corpus only inside an opaque service-owned database. SemiontProject defines resource files in the project working tree and event history in .semiont/events/, while WorkingTreeStore writes, moves, and removes file:// resources directly in the repo, optionally staging them with git. EventStorage writes append-only JSONL event streams and also stages them. This is a strong convergence signal with our files-first position, but with a different unit of durability: Semiont persists source documents plus event history, not authored notes plus semantic links.

Graph and vector layers are projections, not the knowledge medium. The graph consumer in graph/consumer.ts subscribes to event streams and projects graph-relevant events into the graph database. The Smelter embeds resources and annotations for vector search. The Matcher in matcher.ts combines graph search, entity-type filtering, graph neighborhood expansion, vector similarity, and optional LLM reranking. This is a well-factored "files/events as system of record, graph/vectors as derived retrieval machinery" architecture, not a graph-native knowledge system.

The system grows knowledge through annotation workflows, not through learning from traces. Semiont has substantial AI automation, but it is workflow automation, not trace-derived learning. The jobs layer and worker docs define reference detection, highlighting, assessment, commenting, tagging, and generation workers that create annotations or resources from current corpus state. I did not find a comparable mechanism in the reviewed repo for mining prior agent sessions into durable lessons, rules, or updated policies. Semiont is best read as shared semantic markup infrastructure, not as a self-improving memory system.

Validation is structural and audit-oriented, but content governance is still thin. The platform has real enforcement at the boundaries: OpenAPI request validation in validate-openapi.ts, hash-linked event integrity in event-validator.ts, and explicit RBAC middleware. But docs/RBAC.md is clear that all authenticated users currently have full read/write access to all content. The system is strong on provenance and infrastructure validation, weaker on epistemic review or content-level permissioning.

Comparison with Our System

Dimension Semiont Commonplace
Primary semantic unit Passage-level W3C annotation Note with frontmatter, prose, and explicit links
Source grounding Mandatory: annotations point into source passages Optional and uneven: notes may cite sources, but many are free-standing arguments
Durable substrate Working-tree files plus append-only event log Markdown files in git
Derived layers Materialized views, graph projection, vector search, job queue Minimal derived layers; validation and search stay close to files
Human/agent symmetry Strong: same event surface, same write path, same annotation model Moderate: humans and agents edit the same files, but workflows are still instruction-shaped rather than API-shaped
Link semantics Binding and graph edges derived from annotations and references Links carry articulated relationship semantics directly in prose
Validation model OpenAPI validation, event-chain hashing, middleware auth, rebuildable projections Frontmatter/structure/link validation plus semantic review workflow
Knowledge evolution Continuous annotation, linking, and generation over a shared corpus Continuous note writing, connection, and progressive formalization
Learning from traces None beyond normal event history and job outputs Limited and mostly manual; workshop theory exists, automation is still thin

Semiont is stronger where grounded corpora matter. If the job is "build a shared semantic layer over documents, passages, and references," its annotation-first model is much more concrete than ours. Passage grounding, unified human/AI write paths, MCP exposure, and derived graph/vector retrieval are all real.

Commonplace is stronger where authored knowledge and epistemic shape matter. Our notes encode why things relate, what claim is being made, and how mature an artifact is. Semiont's graph is grounded, but it is still mostly a projection of annotations and references; it does not have an equivalent of typed note evolution, explicit argument structure, or relationship semantics carried directly in prose.

Borrowable Ideas

Annotation-first source grounding. Semiont's best idea is making passage-grounded annotations the raw material for later synthesis. For any future commonplace workflow that starts from dense source corpora, an annotation layer could be a cleaner precursor than ad hoc excerpts and freeform notes. This needs a use case first because it implies a different primary substrate than our current markdown-first workflow.

One domain surface for humans, agents, and integrations. The EventBus-centered architecture is genuinely clean. If we add more automation, it should go through the same write path, validation path, and provenance path as human-authored changes rather than through agent-only shortcuts. Ready to borrow as a systems principle.

Derived retrieval layers that stay subordinate to durable artifacts. Semiont shows a pragmatic middle ground between "just grep files" and "database as source of truth." Resources and events stay durable; views, graph projections, and vectors are rebuildable derivatives. That pattern is relevant if our operational layers grow more complex. Needs a concrete use case first.

MCP over the real knowledge domain, not a parallel wrapper. Semiont's MCP server is not a separate abstraction; it fronts the same backend domain as the CLI and UI. If commonplace ever exposes MCP, that should be the model. Needs a use case first.

Curiosity Pass

"Semantic knowledge kernel" overstates what is implemented. The real mechanism is collaborative annotation infrastructure plus retrieval projections. That is not a criticism; it is a narrower and more defensible claim. The graph becomes meaningful because annotations are grounded in passages, not because the system has a general semantic reasoning engine.

The storage story is in flux, and the code is more coherent than the docs. The live path uses WorkingTreeStore and project-root event logs from project.ts, but surrounding docs still describe a checksum-addressed RepresentationStore in packages/content/README.md and even claim in CONFIGURATION.md that runtime state stays outside the project directory. This is not fatal, but it means the architecture is still actively settling. The working-tree-plus-event-log design is the clearest account of the real system today.

Human and AI are operational peers, not epistemic peers. They emit the same commands and produce the same annotation structures, which is a solid architectural decision. But the system still depends on human judgment for low-confidence binding, moderation, and broad corpus quality. The symmetry is about interface and provenance, not about trust or capability.

Semiont's graph is grounded better than most reviewed graph systems, but it still stops short of synthesis. The graph consumer, matcher, and gatherer are real. What is absent is a maintenance layer that turns accumulated annotations into revised higher-level knowledge artifacts. The platform makes document-grounded graph growth easier; it does not yet answer what durable understanding should emerge from that graph over time.

What to Watch

  • Whether the repo fully commits to the working-tree plus git-backed event-log architecture and retires the older representation-store framing.
  • Whether content-level access control and approval gates arrive, since the current all-authenticated-can-edit model limits serious multi-tenant use.
  • Whether the annotation substrate grows upward into durable higher-level knowledge artifacts, or remains an excellent document-grounding layer without a synthesis story.
  • Whether MCP and KB-repo adoption make Semiont a de facto standard for document-grounded shared corpora.

Relevant Notes:

  • Files beat a database for agent-operated knowledge bases — complicates: Semiont independently converges on real files and git for durable artifacts, but adds graph/vector projections and service layers above them
  • OpenViking — contrasts: both use filesystem-shaped interfaces and MCP, but Semiont keeps real project files while OpenViking virtualizes the filesystem over a database service
  • sift-kg — compares: both build document-grounded graphs, but sift-kg is a batch extraction pipeline while Semiont is an ongoing collaborative annotation system
  • Binder — contrasts: both care about auditability and agent integration, but Binder makes markdown a derived view over a database while Semiont keeps source files and annotations primary
  • A functioning knowledge base needs a workshop layer, not just a library — extends (provisional — target is seedling): Semiont is a concrete external example of a KB with a real operational layer of queues, workers, and in-flight enrichment around the durable corpus