OpenViking

Type: ../types/agent-memory-system-review.md · Status: current · Tags: trace-derived

OpenViking is Volcengine's service-backed "context database" for AI agents. The inspected repository implements a Python server/library, Rust CLI and RAGFS filesystem layer, native/local and remote vector index adapters, session compression and memory extraction, HTTP/MCP/API surfaces, and plugins for Codex, Claude Code, OpenCode, OpenClaw, LangChain, LangGraph, and Vikingbot. Its core claim is not just vector memory: it presents resources, memories, skills, sessions, abstracts, overviews, files, and relations through one viking:// namespace backed by AGFS/RAGFS plus a tenant-aware vector projection.

Repository: https://github.com/volcengine/OpenViking

Reviewed commit: af4c54ff8f011611d3c60c4936a84a784f042e3f

Last checked: 2026-05-16

Core Ideas

viking:// is a virtual filesystem namespace over service state, not a mounted local worktree. The public URI grammar has scopes for resources, user, agent, and session, with internal temp and queue scopes; short user and agent URIs are expanded through request identity into canonical namespace paths (docs/en/concepts/04-viking-uri.md, openviking/core/namespace.py). VikingFS then maps URIs to backend paths, checks access, delegates POSIX-like reads/writes/lists/removes/moves to AGFS/RAGFS, and synchronizes vector records on delete and move (openviking/storage/viking_fs.py). The filesystem metaphor is real as an API contract and storage organization, but agents normally experience it through HTTP, MCP, CLI, plugins, and SDK calls rather than a kernel-mounted filesystem.

Content storage and retrieval storage are deliberately split. The docs describe AGFS/RAGFS as the content store for L0/L1/L2 files, multimedia, and .relations.json, while the vector index stores URIs, vectors, metadata, levels, context type, active count, and owner fields (docs/en/concepts/05-storage.md). The implementation follows that split: VikingVectorIndexBackend works over a context collection and exposes retrieval, lookup, memory-dedup, and URI-rewrite fields, while VikingFS remains the canonical read path for file content (openviking/storage/viking_vector_index_backend.py). The index is therefore a runtime/index surface with ranking and selection authority, not the source of the full knowledge artifact.

The L0/L1/L2 tiering scheme makes directories into navigable context objects. OpenViking stores .abstract.md as L0 for short vector-search summaries, .overview.md as L1 for rerank and navigation guidance, and original files/subdirectories as L2 detail (docs/en/concepts/03-context-layers.md). Resource ingestion parses material into a tree, moves it into AGFS, then queues bottom-up semantic processing that writes L0/L1 sidecars and vectorizes directory contexts asynchronously (docs/en/concepts/06-extraction.md, openviking/storage/queuefs/semantic_processor.py). The derived abstracts and overviews are knowledge artifacts when read as summaries, and index inputs when consumed for retrieval.

Retrieval combines typed intent, global vector starts, recursive directory search, rerank, and usage hotness. find() is a fast query path without session context; search() can use session summary and recent messages to produce up to five typed queries over memories, resources, and skills (docs/en/concepts/07-retrieval.md). The implementation starts from context-type roots or target directories, supplements them with global vector hits, recurses through directories via a priority queue, optionally reranks at each stage, and returns MatchedContext records with URI, type, leaf status, abstract, score, and relations (openviking/retrieve/hierarchical_retriever.py). This is more than flat RAG, but it is still index-mediated retrieval over generated sidecars; L2 detail is loaded separately through filesystem reads.

Sessions are raw traces first, then archives, working-memory summaries, and extracted long-term memories. A session stores messages and tool/context usage, and commit() synchronously archives a message slice under history/archive_NNN/messages.jsonl, clears or retains the live tail, returns a task id, and launches Phase 2 in the background (openviking/session/session.py, docs/en/concepts/08-session.md). Phase 2 writes .abstract.md, .overview.md, .meta.json, relations to used contexts, active-count increments, and completion markers; if extraction is enabled, it concurrently runs user-memory and agent-memory extraction. The raw session archive is evidence; the working-memory overview and extracted memories are derived knowledge artifacts with stronger prompt-time authority.

Memory extraction is category-aware and has both vector and LLM deduplication. The extractor classifies candidates into user profile, preferences, entities, events and agent cases, patterns, tools, skills (openviking/session/memory_extractor.py). The compressor always merges profile-style categories, uses special merge logic for tool/skill memories, and otherwise sends candidates through a deduplicator that vector-prefilters similar memories and asks an LLM for candidate-level skip/create/none plus per-existing merge/delete actions (openviking/session/compressor.py, openviking/session/memory_deduplicator.py). Memory diffs in archive directories give an audit surface, but semantic correctness still depends on the LLM extractor and dedup oracle.

Tenancy is part of both filesystem and vector semantics. OpenViking's identity model has account, user, and agent boundaries; resources are shared within an account, user memories and sessions are user-scoped, and agent memories/skills can be agent-scoped or agent-per-user depending on namespace policy (docs/en/concepts/11-multi-tenant.md). The namespace helpers canonicalize roots, visible roots, and accessibility, while vector search and URI owner fields keep retrieval aligned with readable storage (openviking/core/namespace.py, openviking/storage/viking_vector_index_backend.py). This makes OpenViking closer to a shared context service than a single-agent local memory folder.

The behavior-shaping surfaces are broad: HTTP, MCP, CLI, plugins, skills, and bot runtime. The REST server exposes filesystem, resources, skills, sessions, search, observer, admin, metrics, privacy, and content routes; the MCP endpoint exposes find, search, read, list, remember, add_resource, grep, glob, forget, and health with the same request identity path (openviking/server/mcp_endpoint.py, openviking/server/routers). The Codex, Claude Code, and OpenCode plugins add automatic recall, session capture, lifecycle commit, and direct tool access around those APIs (examples/codex-memory-plugin/README.md, examples/claude-code-memory-plugin/README.md, examples/opencode-plugin/README.md). Skills under viking://agent/skills/ are system-definition artifacts when agents consume them as callable workflow instructions.

Benchmarks exist as product evaluation harnesses, not as authority gates on memory promotion. The repository includes RAG evaluation for Locomo, SyllabusQA, Qasper, and FinanceBench; separate Locomo comparisons for Claude Code, mem0, OpenClaw, supermemory, and Vikingbot; tau2, VAKA, skillsbench, and contention benchmarks (benchmark/RAG/README.md, benchmark). These test retrieval and agent-memory performance, but the reviewed core does not use benchmark outcomes as an automatic gate for whether a memory, overview, skill, or retrieval rule is promoted.

Comparison with Our System

Lens axis OpenViking Commonplace
Primary substrate Service-backed viking:// namespace over AGFS/RAGFS plus vector projections Git-tracked markdown library, instructions, schemas, generated indexes, and reports
Main unit Resource, memory, skill, session directory with L0/L1/L2 sidecars Typed note, source, instruction, ADR, review, index, or generated report
Retrieval Vector/hybrid search, hierarchical directory recursion, rerank, MCP/HTTP/plugin tools Lexical search, authored links, generated indexes, validation, review workflows
Trace-derived loop Session messages and tool use are committed into archives, summaries, memories, and diffs Mostly curated artifact production; trace-derived systems are reviewed, not core default behavior
Authority boundary Memory/skill/resource retrieval can be injected automatically by plugins; skills and tools can instruct agents Stronger distinction among notes, instructions, type specs, validators, commands, and review gates
Tenancy Account/user/agent identity shapes storage and retrieval Single-repo methodology KB, with project separation outside the core library
Filesystem metaphor API-shaped virtual filesystem backed by service storage Actual files in git, directly inspectable and diffable

OpenViking is stronger at serving many agents and users from one context service. It has tenant-aware request identity, a server-side filesystem abstraction, automatic vector projections, async semantic queues, lifecycle hooks for multiple agent shells, and built-in APIs that make memory available before the model asks for it. Commonplace is intentionally less service-like: the repo itself is the substrate, and the main affordances are inspectable files, deterministic validation, review artifacts, git history, and agent procedures.

The most important comparison is authority. In OpenViking, a generated .abstract.md, .overview.md, memory file, retrieved context, or auto-injected plugin output can shape future model behavior quickly. That is useful for agent UX, but it makes provenance and curation harder: a future agent receives ranked memories and summaries whose source session, extraction prompt, dedup decision, and benchmark validity may not be visible in the immediate context. Commonplace is slower but makes promotion into higher-authority artifacts explicit.

The second distinction is the filesystem metaphor. OpenViking borrows filesystem operations to organize context, and the Rust RAGFS trait really implements create, read, write, list, rename, remove, stat, chmod, truncate, and grep-style operations (crates/ragfs/src/core/filesystem.rs). But viking:// is still a mediated namespace whose canonical behavior depends on server identity, AGFS backend, vector sync, semantic queues, and API policy. Commonplace's files are less dynamic but easier to audit with ordinary editor, shell, and git tools.

Read-back: both — agents can search and read through MCP or HTTP tools, while plugins can auto-recall and inject memories before later turns.

Borrowable Ideas

L0/L1/L2 sidecars as a directory contract. Ready to borrow where commonplace eventually stores large source trees, traces, or workshops. The useful idea is not the exact filenames; it is the consistent split between cheap abstract, navigational overview, and original detail, with clear lineage back to the source directory.

Hierarchical retrieval over directory summaries. Worth borrowing only when lexical search and authored indexes stop being enough. OpenViking's global-start plus recursive search model is a good answer to "where should the agent drill next?" for large imported corpora.

Lifecycle-aware commit boundaries. Ready to borrow as a design pattern for agent traces. The Codex and Claude plugins recognize that memory extraction should happen at compaction, session end, startup orphan recovery, or token thresholds, not necessarily every turn. Commonplace workshop traces would need similar lifecycle boundaries if they become first-class evidence.

Memory diffs as archive-local audit artifacts. Useful for trace-derived extraction. OpenViking records adds, updates, deletes, and summaries under the archive that caused them. A commonplace analogue would keep extraction reports next to the source trace and link any promoted note or instruction back to that report.

Tenant-aware context roots. Not needed for this repo today, but relevant for a future hosted commonplace-like system. OpenViking's account/user/agent separation cleanly distinguishes shared resources, user memories, and agent skills.

Do not borrow automatic memory authority without stronger review gates. OpenViking's automation is valuable for product memory, but commonplace should not let an LLM-extracted memory become instruction-like merely because it was retrieved. Promotion into notes, instructions, schemas, skills, or validators should remain visible and reviewable.

Trace-derived learning placement

Trace source. OpenViking qualifies as trace-derived learning. The raw traces are agent/user session messages, assistant turns, context references, tool calls, skill usage, plugin-captured transcripts, and lifecycle events. Server sessions store structured messages and tool parts; plugins such as Codex and Claude Code capture turns and commit at compaction/session boundaries (openviking/session/session.py, examples/codex-memory-plugin/README.md, examples/claude-code-memory-plugin/README.md).

Extraction. Commit Phase 1 archives the raw message slice. Phase 2 generates a working-memory summary, extracts candidate memories, classifies them into eight categories, vector-prefilters similar memories, and uses an LLM dedup oracle for skip/create/merge/delete decisions. Tool and skill usage get special aggregate merge paths. The extraction oracle is therefore mixed: deterministic message boundaries, category code, vector nearest-neighbor filtering, and LLM judgment.

Storage substrate. Raw session traces live under viking://session/{session_id}/messages.jsonl and history/archive_NNN/messages.jsonl. Derived summaries and completion metadata live as .abstract.md, .overview.md, .meta.json, .done, .failed.json, and memory_diff.json in archive directories. Long-term memories live under canonical user and agent memory paths in AGFS/RAGFS, with vector rows in the context collection. Plugin state also lives in local runtime files outside the server, such as Codex/Claude/OpenCode plugin state and logs.

Representational form. Raw traces are mixed symbolic/prose artifacts: roles, message IDs, timestamps, text, tool input/output, context URIs, and usage records. L0/L1 summaries and memory bodies are prose. URI paths, categories, namespace policies, task records, relations, vector metadata, queues, and API schemas are symbolic. Dense and sparse vectors, rerank scores, active counts, and hotness signals are distributed-parametric or numeric operational state. Skills combine prose instructions with scripts and tool contracts.

Lineage. The strongest lineage is local: a memory extraction run knows the source session id and archive URI, and memory_diff.json records memory changes from that commit. Vector sidecars retain URI, level, owner, and timestamps, and VikingFS updates or deletes index rows when files move or disappear. The weaker lineage is semantic: extracted memory files do not appear to carry a mandatory source-span citation schema that lets a later agent reconstruct every claim without reading archive diffs and traces.

Behavioral authority. Raw sessions, resources, archive overviews, and extracted memories are knowledge artifacts when consumed as evidence, context, or advice. Vector rows and hierarchical retrieval have ranking and routing influence. MCP tools, REST endpoints, plugins, hooks, namespace policy, permissions, skills, and CLI commands are system-definition artifacts because they route, authorize, inject, delete, commit, or instruct behavior. A memory becomes more than advice when a plugin injects it automatically before a turn, but it still lacks the explicit enforcement force of code, schema, or access policy.

Scope. The learning scope is tenant/user/agent/session scoped. Resources can be shared at account level, user memories are per-user, agent memories and skills are per-agent or per-agent-per-user depending on namespace policy, and session archives are per session.

Timing. Capture happens online during agent use. Commit can happen at explicit API calls, compaction, stop/session-end hooks, startup orphan recovery, or token thresholds. Summary and memory extraction run asynchronously after archive creation. Retrieval and recall happen online before or during future turns.

Survey placement. On the trace-derived learning survey, OpenViking belongs with service-backed trace-to-memory systems rather than harness-patch systems. It strengthens the survey claim that useful trace-derived learning requires separating raw traces, distilled artifacts, storage/index surfaces, and activation channels. It also shows a high-authority injection path: plugin auto-recall can make LLM-extracted memories behavior-shaping even without promoting them into code or validated instructions.

Curiosity Pass

OpenViking's design is most persuasive when treated as a context service, not as "a filesystem for agents" in the literal local-file sense. The filesystem API gives deterministic names and familiar operations, but the behavior that matters comes from service identity, async semantic processing, vector/rerank indexes, and plugins that decide when memories enter the model context.

The L0/L1/L2 scheme is a good context-engineering primitive, but it shifts trust into generated sidecars. If the overview is stale or overconfident, hierarchical retrieval can confidently lead the agent to the wrong place. The architecture has queues and regeneration hooks, but the reviewable contract for "this overview faithfully represents L2" is weaker than a hand-authored note.

The memory extractor is unusually concrete for a product repo: category schema, special tool/skill handling, batch-internal dedup, merge/delete decisions, archive diffs, and async task tracking are all implemented. The tradeoff is that memory quality depends on several model calls whose prompts and decisions are not naturally visible to the downstream agent.

The tenancy model is operationally important. Many memory systems claim personalization; OpenViking makes account/user/agent boundaries affect both path canonicalization and vector filtering. That is a stronger product property than just putting user_id in metadata.

The benchmark folders are useful evidence that the maintainers care about retrieval and memory evaluation, but they should not be confused with in-product governance. A bad extracted memory can still be written unless the extraction/dedup path rejects it; benchmark success does not give an individual memory provenance or review status.

What to Watch

  • Whether OpenViking adds mandatory source citations from extracted memory claims back to archive message ids, tool ids, or spans.
  • Whether memory diffs become first-class rollback/review surfaces in the API, not just archive-local JSON artifacts.
  • Whether auto-injected plugin memories gain confidence, source, recency, and authority annotations visible to the model.
  • Whether viking:// becomes mountable or remains mostly HTTP/MCP/CLI-mediated.
  • Whether benchmark harnesses become promotion gates for retrieval and extraction changes.
  • Whether skill storage grows into a reviewed system-definition layer with versioning, tests, and permissioned promotion.

Bottom Line

OpenViking is a substantial service-backed context database: viking:// gives agents a unified namespace, AGFS/RAGFS stores raw and derived files, vector indexes provide ranking and navigation, sessions become archives and extracted memories, and plugins activate recall and capture in real agent runtimes. Its best idea for commonplace is the explicit separation of raw sessions/resources, derived abstracts/overviews/memories, storage substrate, index/runtime surfaces, and behavior-shaping tools. Its main caution is the same separation viewed from the other side: generated memories and summaries can shape behavior before they have the review status, lineage, or authority contract that commonplace expects from durable system-definition artifacts.

Relevant Notes: