Virtual Context
Type: ../types/agent-memory-system-review.md · Status: current · Tags: trace-derived
Virtual Context, by Y. Ahmed Kidwai, is a Python proxy and engine that virtualizes LLM context windows by rewriting provider requests, compacting old conversation/tool traces into durable memory, and paging selected memory back into later prompts. The inspected source supports trace-derived placement: raw conversations and tool outputs persist as canonical turns, request captures, chain snapshots, and tool-output rows; compaction turns those traces into segment summaries, tag/topic summaries, structured facts, fact links, embeddings, and paging hints that shape later model behavior.
Repository: https://github.com/virtual-context/virtual-context
Reviewed commit: 3f989ca2567359094e4c899697ddca21c1663f4c
Last checked: 2026-05-16
Core Ideas
The proxy owns the active context window, not just retrieval. The documented request path detects Anthropic, OpenAI Chat, OpenAI Responses, and Gemini payloads, strips envelope noise, ingests history, calls on_message_inbound, injects a <virtual-context> block, forwards the request upstream, and runs on_turn_complete afterward (architecture, proxy lifecycle). The code follows that shape: prepare_payload filters body messages, stubs tool/media payloads, injects retrieved context, injects VC tools when paging is enabled, enforces budget, and sends either the original or enriched body upstream (proxy server, format abstraction, proxy helpers).
The engine separates inbound retrieval from post-turn distillation. VirtualContextEngine wires a store, alias resolution, tag generator, monitor, segmenter, assembler, retriever, compactor, tag splitter, paging manager, semantic search, fact query layer, temporal resolver, and telemetry ledger (engine wiring). The public entry points are narrow: on_message_inbound delegates to retrieval/assembly, while on_turn_complete tags the turn and compacts if needed (engine entry points). This is the core design difference from additive RAG: old prompt material is also removed, summarized, indexed, and re-addressed.
Retrieval is topic-shaped and budgeted. The retriever tags inbound messages against conversation-scoped vocabulary, skips active tags already present in recent history, scores candidates through IDF/tag, text, and embedding signals, fetches summaries, adds alias ride-alongs, falls back to FTS, and prefetches facts by relevant tags when configured (retriever, retrieval scoring). The assembler then constructs a bounded pool of tag sections and fact lines, with working-set depth levels for summary, segment, or full-text expansion (assembler, retrieval assembler).
Compaction produces several retained artifact families from traces. The compaction pipeline loads uncompacted canonical turns outside the protected window, segments them, summarizes them, stores segment records, marks canonical turns compacted, and builds per-tag rollups with source segment and canonical-turn lineage (compaction pipeline). The compactor prompts preserve decisions, action items, entities, dates, exact numbers, code references, and facts from raw conversation text; code-specific prompts suppress investigative chatter and preserve resulting artifact state (compactor).
Storage is service-owned but schema-rich. SQLite is the default store, with PostgreSQL for multi-worker deployments, graph stores for facts and links, and a filesystem store for markdown-like segment files (README, SQLite store, Postgres store, filesystem store). Core tables include segments, segment_tags, tag_aliases, tag_summaries, engine_state, canonical_turns, ingest_batches, conversations, ingestion_episode, compaction_operation, facts, fact_tags, fact_links, tool_outputs, request_captures, tag_summary_embeddings, tool_calls, request_context, turn_tool_outputs, segment_tool_outputs, chain_snapshots, and media_outputs. That makes the storage substrate inspectable through SQL and dashboards, but not directly editable as authored KB files.
Tool-output compaction is first-class memory, not silent truncation. The proxy stubs large tool results before compaction and collapses full tool chains after compaction, stores raw outputs and chain snapshots, links them back to turns and segments, and injects vc_restore_tool or vc_find_quote when the model needs raw evidence (proxy tool stubbing, message filter, restore runtime). Assembler hints also mark segment summaries with linked tool names and restorable output counts (assembler tool hints).
Paging is exposed as model tools and MCP tools. The proxy can inject vc_expand_topic, vc_find_quote, vc_search_summaries, vc_query_facts, vc_recall_all, vc_remember_when, and vc_restore_tool, with the tool loop executing calls against the engine and suppressing repeated unproductive searches (tool definitions). The MCP server exposes the same engine through recall_context, compact_context, expand_topic, recall_all, remember_when, find_quote, search_summaries, and status resources (MCP server). These tools are system-definition artifacts when inserted into the model's tool surface because they route which retained artifacts can enter the next reasoning step.
Operator surfaces are broad for a young project. The package exposes a virtual-context CLI with status, tags, recall, compact, retrieve, transform, aliases, chat TUI, proxy, presets, onboard, daemon install/start/stop/status, import, config validation, and admin backfill commands (CLI, import command). The TUI runs an interactive chat around the same engine, with budget bar, tag panel, turn list, manual compaction, replay, turn inspection, and session save (TUI app). The proxy dashboard exposes request logs, request inspectors, compaction events, active tags, sessions, cost/latency views, replay, settings, tool-call history, and recall context records (dashboard docs, dashboard implementation).
Benchmark and regression support is unusually dense. The repo documents Locomo, LongMemEval, MRCR, AMB, and stress tests, and ships benchmark runners/providers under benchmarks/ (benchmark docs, benchmarks tree). The test tree has 180+ top-level test files covering proxy formats, compaction lifecycle, canonical turns, storage backends, fact links, retrieval, paging, tool output interception, TUI, and benchmark support; tests/REGRESSION_MAP.md maps production bugs to regression tests (tests tree, regression map).
Comparison with Our System
| Dimension | Virtual Context | Commonplace |
|---|---|---|
| Primary problem | Keep long live LLM sessions usable by virtualizing the context window | Preserve methodology and operational knowledge as reviewed KB artifacts |
| Storage substrate | SQLite/Postgres/graph stores, optional filesystem segment store, request captures, proxy/dashboard state | Git-tracked markdown, YAML frontmatter, source snapshots, validation reports, scripts |
| Raw trace artifacts | Canonical turns, raw user/assistant content, request captures, tool outputs, chain snapshots, media records | Source snapshots, logs, work artifacts, git history |
| Compacted knowledge artifacts | Segment summaries, tag summaries, facts, fact links, quote/search results, dashboard views | Notes, reviews, source summaries, indexes |
| System-definition artifacts | Proxy rewrite code, provider adapters, retrieval rankers, budget rules, compaction leases, tool definitions, MCP tools, daemon config | AGENTS.md, collection contracts, schemas, skills, validation/review commands |
| Activation | Automatic proxy injection, model tool calls, MCP, TUI, dashboard replay, CLI | Agent search/navigation, authored links, skills, validation/review workflows |
| Governance | SQL schema, lifecycle epochs, leases, request inspectors, regression tests, dashboard visibility | File diffs, review archives, explicit statuses, authored citations, validation gates |
Virtual Context is the strongest reviewed example of managing the context window itself. It is not merely a memory database behind a chat app. It intercepts the request, decides what old material to drop or stub, injects retrieved summaries/facts/tool handles, and gives the model tools to page raw material back in. Commonplace is slower and more editorial: it promotes durable knowledge into reviewed files with citation and link contracts rather than into a live proxy's operational store.
The artifact-authority split is the key comparison. In Virtual Context, raw canonical turns, request captures, tool outputs, chain snapshots, segment summaries, tag summaries, facts, fact links, and dashboard views are knowledge artifacts when consumed as evidence, context, explanation, or advice. Retrieval scoring, active-tag skipping, budget enforcement, compaction leases, tag splitters, fact supersession, provider adapters, proxy injection, MCP tools, daemon config, and model tool definitions are system-definition artifacts because they route, rank, enforce, or instruct future behavior.
The tradeoff is inspectability. Virtual Context has better runtime observability than most systems here: schemas, request inspectors, saved contexts, tool-call history, test coverage, progress snapshots, and admin backfills. But its canonical operational state is still service-owned. A generated fact can affect future prompts without becoming a reviewed claim in a collection. Commonplace gives up automatic prompt-time continuity to make durable claims easier to read, diff, cite, supersede, and govern.
Read-back: both — the proxy injects retrieved summaries, facts, and tool handles automatically, while agents can page memory through VC and MCP tools.
Borrowable Ideas
Use proxy-level context ownership for live agents. Commonplace should not become a provider proxy, but consumer projects that run long coding or assistant sessions could use a thin proxy/hook layer to activate KB context before the model repeats work. Ready as a deployment pattern where the proxy remains transparent and reversible.
Treat tool-output compaction as pointer-bearing compression. This is immediately borrowable for runtime agents: replace bulky tool outputs with visible stubs, store the raw output, and provide an explicit restore/search path. Silent truncation destroys lineage; Virtual Context's restore references keep hidden evidence addressable.
Separate raw turns, summaries, facts, topic memory, embeddings, and control rules. The repo is a useful taxonomy reference because it implements all of these surfaces separately. Commonplace can borrow the separation without adopting the database substrate.
Expose paging operations as distinct tools. find_quote, search_summaries, query_facts, remember_when, expand_topic, and restore_tool are not the same operation. Commonplace search tools would benefit from making those retrieval intentions explicit instead of offering one generic search command.
Use operation leases for background memory mutation. The ingestion_episode and compaction_operation tables, lifecycle epochs, ownership guards, heartbeats, and takeover tests are practical governance for multi-worker memory maintenance. This is more relevant to a future service layer than to the current file KB.
Do not borrow service opacity for curated methodology. Virtual Context's database is appropriate for high-volume runtime traces. It is a poor canonical substrate for commonplace's long-lived methodology claims unless paired with reviewed, file-visible promotions.
Trace-derived learning placement
Trace source. Virtual Context consumes client conversation history, live user/assistant turns, provider-specific tool calls and tool results, media blocks, request payloads, imported conversation exports, MCP-supplied message lists, dashboard replay prompts, and benchmark conversations. Canonical turns store normalized and raw user/assistant content with hashes, tags, fact signals, code refs, source batches, timestamps, and compaction state (canonical turns schema, ingest reconciler, import adapters).
Extraction. Extraction is staged. Inbound tagging uses existing conversation vocabulary and embeddings or LLM tags for retrieval. Post-turn tagging persists richer tags and fact signals. Compaction summarizes selected canonical turns into segments, extracts facts from raw conversation text, replaces per-segment facts atomically, builds tag rollups, stores embeddings, and marks canonical turns compacted. Supersession and fact links add contradiction/relationship structure rather than leaving facts as flat notes (tagging pipeline, compactor, fact query, fact link checker).
Storage substrate. Raw traces persist in canonical_turns, request_captures, tool_outputs, chain_snapshots, media_outputs, and import/ingest batch records. Distilled artifacts persist in segments, segment_tags, tag_summaries, facts, fact_tags, fact_links, tag_summary_embeddings, segment_chunks, canonical_turn_chunks, and optional graph stores. Operational state persists in engine_state, conversations, conversation_lifecycle, conversation_aliases, ingestion_episode, compaction_operation, tool_calls, request_context, and session-state providers. The default substrate is SQLite WAL; Postgres mirrors the same store shape for multi-worker/cloud deployments.
Representational form. Raw turns, tool outputs, summaries, tag summaries, quote excerpts, and fact what strings are prose. Segments, facts, fact links, tag aliases, canonical IDs, lifecycle epochs, request context, tool-call records, provider adapters, MCP tools, and config are symbolic. Embeddings and chunk vectors are distributed-parametric retrieval state. Many rows are mixed: a fact row is prose evidence plus symbolic filters and supersession fields; a tag summary is prose plus source refs, token counts, embeddings, and coverage markers.
Lineage. Lineage is stronger than in a typical vector-memory sidecar. Segment metadata carries turn ranges and canonical-turn IDs; tag summaries carry source segment refs, source turn numbers, source canonical-turn IDs, coverage through a turn, and generating turn IDs; canonical turns carry hashes and source batch IDs; tool outputs and chain snapshots carry restore refs linked to turns and segments. The weak point is epistemic review: the system can preserve where a generated fact came from, but it does not require a human/agent review state before that fact or summary influences future prompts.
Behavioral authority. Raw traces, request captures, tool outputs, media originals, segment summaries, tag summaries, facts, fact links, quote search, and dashboard views are knowledge artifacts when they advise the model. The proxy rewrite path, provider adapters, retrieval scores, active-tag skipping, budget enforcement, tool stubbing, vc_* tool definitions, MCP tools, compaction thresholds, lifecycle leases, tag splitters, supersession rules, daemon installation, and config presets are system-definition artifacts. Some artifacts cross the boundary: a tag summary advises as prose, but its embedding and source coverage also rank future context selection.
Scope. Memory is conversation-scoped by default, with aliases and VCATTACH-style conversation attachment for shared continuity. Tags, facts, summaries, canonical rows, request context, and tool outputs are keyed by conversation ID and sometimes tenant ID. This is session/project memory, not a global curated knowledge base.
Timing. Activation is online on every proxied request. Tagging and retrieval are on the inbound path; response tagging and compaction run after a turn, often in background workers; admin backfills rebuild tag summaries or session markers later; benchmarks and imports exercise the same engine offline. This makes it a live-session trace-to-working-memory system rather than a periodic archive compiler.
Survey placement. On the trace-derived learning survey, Virtual Context strengthens the "raw trace plus derived working memory" axis and splits it into more layers than most systems: raw canonical trace, compacted segment, tag/topic summary, structured fact, linked fact graph, vector/FTS index, and model-visible paging tool. It weakens any simple "trace-derived memory equals notes" framing: here the behavior-changing artifact is mostly a proxy/runtime control plane plus database-backed memory, not reviewed prose files.
Curiosity Pass
The "virtual memory" analogy is structurally real but should be read as selective reconstruction. The client can pretend to have a huge context window, but the model receives a curated bounded prompt plus paging tools. Success depends on the quality of tagging, summaries, fact extraction, restore paths, and scoring, not on literal addressability of every prior token.
The system is more governed operationally than epistemically. It has lifecycle epochs, leases, heartbeats, dashboard inspectors, request captures, SQL schemas, and many regression tests. It does not have a commonplace-style review contract for extracted facts, summaries, or tag rollups before they gain prompt-time influence.
Provider compatibility is part of the memory architecture. Anthropic/OpenAI/Gemini payload differences, tool-call schemas, streaming behavior, prompt-cache economics, and client truncation are not integration details; they determine which traces can be preserved and restored.
The benchmark story is credible infrastructure, not independent proof. The benchmark directories and docs are real, but this review did not run Locomo, LongMemEval, MRCR, AMB, or stress suites. Claims about headline accuracy should be treated as project-reported until independently reproduced.
The filesystem backend is not the center of gravity. It exists and stores markdown-like segment files, but the serious implementation energy is in SQLite/Postgres operational state, canonical rows, background compaction, and proxy/dashboard flows.
What to Watch
- Whether generated facts and tag summaries gain explicit review/confidence states before prompt-time activation.
- Whether independent benchmark runs reproduce the reported gains across provider formats and agent-tool-heavy sessions.
- Whether VCATTACH/conversation aliasing becomes a robust collaboration model or remains mostly session-continuity plumbing.
- Whether graph backends become central to retrieval or stay a parallel fact-link option beside SQLite/Postgres.
- Whether prompt-cache-aware deferral, fill pass, and budget enforcement remain stable as provider caching and tool schemas change.
- Whether dashboard and MCP responses expose enough lineage for an agent to debug bad retrieved context without direct SQL inspection.
Relevant Notes:
- Trace-derived learning techniques in related systems - places: Virtual Context is a live-session trace-to-working-memory system with raw turns, summaries, facts, topic memories, embeddings, and runtime paging tools.
- Knowledge artifact - classifies: canonical turns, tool outputs, request captures, segment summaries, tag summaries, facts, fact links, and quote results advise later behavior as evidence or context.
- System-definition artifact - classifies: proxy rewriting, provider adapters, retrieval/ranking, compaction leases, paging tools, MCP tools, daemon config, and budget rules route or enforce behavior.
- Lineage - clarifies: canonical-turn IDs, hashes, source batches, segment refs, coverage markers, chain refs, and request captures are the system's derivation controls.
- files beat a database for agent-operated knowledge bases - contrasts: Virtual Context is a serious database-backed runtime memory system, but its strongest state is operational rather than editorially curated.