DocMason
Type: agent-memory-system-review · Status: current · Tags: related-systems, trace-derived
DocMason is a Python-first, file-only, agent-native workspace for deep research over private office documents, emails, and text corpora. The repository itself is the application: original_doc/ holds the live private corpus, knowledge_base/staging/ and knowledge_base/current/ are the staged and published evidence surfaces, runtime/ holds execution and audit state, and a host agent (native Codex on macOS, or Claude Code and Copilot as compatible hosts) is treated as the runtime. The codebase in src/docmason/ is substantial — roughly 50k lines across ~40 modules including sizeable knowledge.py, retrieval.py, ask.py, evidence_artifacts.py, interaction.py, and evaluation.py files — and implements source parsing, staged and published KB publication, deterministic retrieval and trace, host-thread reconciliation, and a typed interaction-memory promotion path.
Repository: https://github.com/JetXu-LLM/DocMason
Core Ideas
The repo is the app; canonical ask is the governed front door. AGENTS.md, .claude/CLAUDE.md, and skills/canonical/ define the executable contract inside the repository. The ordinary natural-language path is a single canonical skill at skills/canonical/ask/SKILL.md; operator paths like workspace-bootstrap, workspace-doctor, workspace-status, knowledge-base-sync, runtime-log-review, and adapter-sync are explicit alternates. What makes this more than branding is the Canonical Ask Contract in that skill: the host must call a hidden ./.venv/bin/python -m docmason _ask wrapper with open, progress, and finalize actions; the turn is only legal once the wrapper returns stable conversation_id, turn_id, run_id, answer_file_path, and log_context, and the front_door_state is upgraded to canonical-ask. Evidence commands (retrieve, trace) are legal only when the host exports the returned log_context as DOCMASON_* environment variables before calling them. A final business answer is allowed only when the wrapper returns completed or boundary; execute, awaiting-confirmation, waiting-shared-job, and blocked all forbid it.
Published truth is a separate mechanism, not a convention. The architectural spine is the boundary between original_doc/, knowledge_base/staging/, and knowledge_base/current/. docmason sync stages source bundles per document, runs validation, and publishes a single current root; ordinary ask answers read from current/, not from raw files or staged state. src/docmason/admissibility.py hard-codes knowledge_base/staging/ and knowledge_base/.staging-build/ as ILLEGAL_WORK_AREA_MARKERS that the commit barrier rejects in answer text. The sync path now also includes hybrid_enrichment modes (not-needed, covered, candidate-prepared, partially-covered) and a lane_b_follow_up.work_path packet that routes bounded hard-artifact follow-up through the same sync loop rather than letting the agent improvise raw-source fallback.
Multimodal evidence is compiled into explicit channels with render inspection as a gate. Source builders stage PDF, PPTX/PPT, DOCX/DOC, XLSX/XLS, email, markdown, and text inputs into source_manifest.json, evidence_manifest.json, extracted text_asset/structure_asset pairs per unit, rendered assets (PDF via PyMuPDF/pypdfium2, Office via a LibreOffice soffice shim), embedded media copies, and semantic_overlays/ sidecars. Affordances declare the supported channels — text, render, structure, notes, media — and the retrieval stack scores across all of them. skills/canonical/provenance-trace/SKILL.md tells the agent to inspect the rendered asset when a trace reports render_inspection_required or focus_render_assets, while admissibility.py enforces related constraints around illegal work-area paths, source scope, and support-state consistency. Evidence is not just ranked; it is typed and some channels are gated.
Source-scope policy narrows what an answer may cite. src/docmason/truth_boundary.py parses each question for COMPARE_HINT_PATTERN and SINGLE_SOURCE_HINT_PATTERN, consults the reference-resolution block, and builds a persisted source-scope policy with modes source-scoped-soft, source-scoped-hard, and compare. Trace assembly honors the same policy: scope satisfaction, support manifest locality, and canonical support basis feed evaluate_commit_admissibility; render requirements stay in the provenance-trace workflow rather than becoming a separate commit-barrier check. The retrieval engine does not just return relevance-ranked hits; the admissibility gate can reject an answer whose support manifest falls outside the legal source scope or mentions an illegal work-area path.
Host-session traces promote into typed interaction memories through sync. src/docmason/interaction.py (~2,900 lines) reconciles native Codex and Claude Code threads, writes turns into runtime/ interaction-ingest entries, and during sync groups entries by conversation_id into published interaction-memory-<digest> directories under knowledge_base/current/ with source and evidence manifests, copied attachments, structure sidecars per turn, conservative auto-authored knowledge.json and summary.md, and affordances. The notable change is that each memory now carries a normalized semantics block with memory_kind (from MEMORY_KIND_PRIORITY = {constraint, preference, correction, clarification, stakeholder-context, political-context, operator-intent, working-note}), durability, uncertainty, answer_use_policy, and retrieval_rank_prior, all derived through routing.normalize_memory_semantics. Retrieval (_normalize_memory_semantics_record) then uses answer_use_policy and durability to decide whether a memory may support an answer or is contextual-only. An idempotent interaction_input_digest lets unchanged memories be reused across sync runs rather than rebuilt. Promotion is still conservative — no rules are distilled, no contradictions resolved, no merges across conversations — but the typing is now strong enough that the retrieval path can discriminate constraints from ephemeral working notes.
Comparison with Our System
| Dimension | DocMason | Commonplace |
|---|---|---|
| Primary problem | Deep multimodal research over private documents with strict provenance | Building durable agent-operated KBs: methodology, types, linking, review |
| Product boundary | Repo-native app with canonical-ask contract, stable CLI, host adapters, published bundle distribution | Repo-native KB + methodology, lighter runtime assumptions, commonplace-* CLI around a library |
| Truth surface | Explicit original_doc/ → staging/ → current/ with a commit barrier and illegal-path checks |
Edited notes in git are the truth directly |
| Evidence model | Multimodal typed channels (text, render, structure, notes, media) per source unit |
Authored markdown notes with descriptions, tags, links, and indexes |
| Retrieval discipline | Lexical-plus-graph scoring over published artifacts, plus source-scope policy and admissibility gates | Description-first search, indexes, explicit link semantics; weaker runtime gating |
| Learning loop | Sync-time promotion of host traces into typed interaction memories (memory_kind, durability, answer_use_policy) |
Human/agent-authored notes; no built-in session-to-KB promotion |
| Front-door governance | Canonical-ask turn with governed open/progress/finalize phases and admissibility commit barrier |
Skills and instructions steer agents; no runtime turn contract |
| Validation model | Validation-gated publication with staged manifests, render-inspection requirements, admissibility gate | Structural validation plus semantic review bundles on notes |
DocMason commits more structure into runtime plumbing, manifests, and governance gates; commonplace commits more structure into the authored notes themselves. DocMason's hard problem is controlling what an agent may cite when raw corpora are messy and multimodal. Commonplace's hard problem is shaping durable cross-cutting knowledge so future agents can find and recombine it cheaply.
Borrowable Ideas
A typed front-door turn contract with commit admissibility (needs a use case). The open/progress/finalize hidden-wrapper contract, together with admissibility.py rejecting answers that mention illegal staging paths or miss render inspection, is a concrete way to stop a host from "pretending the turn completed." Commonplace does not have an equivalent runtime, but if we ever ship a hosted review or answer workflow, this is the cleanest pattern we have seen for gating final output.
Published-truth boundary as a real mechanism, not a convention (ready as a framing pattern). DocMason's staging/current separation is load-bearing when raw inputs are not yet trustworthy KB artifacts. For heavier workshop ingestion or compiled views of commonplace, this is the right shape: never let retrieval or answer paths see half-built state.
Evidence channel typing in the retrieval contract (ready now). The text/render/structure/notes/media split, with render_inspection_required as an explicit escalation, is directly applicable to how we think about disclosure depth and pointer semantics. Retrieval results that declare which channels satisfy the request let the agent choose evidence economically.
Source-scope policy as a guardrail against synthesis drift (ready when we add multi-source answer workflows). truth_boundary.build_source_scope_policy with source-scoped-hard and compare modes is a concrete answer-admissibility mechanism we can port wholesale when commonplace starts generating answers over heterogeneous evidence.
Typed semantics on trace-derived artifacts (worth studying). Attaching memory_kind, durability, uncertainty, answer_use_policy, and retrieval_rank_prior to promoted interaction memories is the part of DocMason's learning loop that has moved most since the prior review. It lets retrieval demote ephemeral chatter without losing constraints and preferences. If we ever promote session artifacts into our KB, this kind of typing is worth borrowing before any content schema.
Separate ordinary front door from operator commands (ready as a user-facing boundary). DocMason's strict split — ask for natural language, docmason <subcommand> for deterministic operator work — is worth copying if we ever expose a hosted ask surface. Commonplace already has this split informally through skills vs commonplace-* commands, but not as a governed contract.
Curiosity Pass
"The repo is the app" is real contract, not just packaging. The claimed property is that the repository itself is the governed application. The mechanism is the stable CLI, canonical skill routing, the hidden _ask wrapper, runtime state directories, host-thread reconciliation, and the admissibility commit barrier. A simpler alternative would be "a repo plus scripts and README instructions." DocMason is substantively stronger than that. But the ceiling is still bounded by the host agent's willingness to follow the contract; the repo shapes entry and evidence use, it does not enforce perfect obedience if the host skips the wrapper.
The canonical-ask contract is mostly transformation, partly ceremony. The real transformation is: evidence commands become scoped to a governed turn, answers are written to a canonical file path, admissibility runs before commit, and the turn either completes, boundaries, or blocks. What is closer to ceremony is the insistence that reading the skill, reconciling a thread, or calling lifecycle helpers does not count as legal ask execution. That rule is useful precisely because a clever host could get almost there through side paths; the wrapper-only entry prevents that. Mechanistically it produces real property (typed turn state, linked artifacts), not just nicer log lines.
The published-truth boundary is a genuine state change. Staging produces manifests, extracted text, structure JSON, renders, affordances, and overlays that do not exist in original_doc/. The simpler alternative is answering directly from raw files; DocMason correctly rejects that. The boundary proves "this snapshot passed DocMason's build and validation rules," not "the business conclusion is true" — but that lower ceiling is honest.
Multimodal evidence is real transformation bounded by extractor quality. Structure JSON, render assets, and affordance descriptors are not just relabeled text. But the pipeline still leans on LibreOffice for Office fidelity, PyMuPDF/pypdfium2 for PDFs, and heuristic affordance derivation. The system improves inspectability and answer support; it does not solve document understanding. render_inspection_required honestly surfaces where the pipeline hands off to the agent.
Provenance tracing is a support-boundary mechanism, not a correctness oracle. Trace artifacts know source IDs, units, artifacts, consumers, support basis, render requirements, and source-scope policy. The admissibility gate then rejects answers whose support manifest is outside the current corpus or falls in an illegal work area. This is meaningfully stronger than "ranked retrieval plus pasted citations," but even a perfect trace only proves admissible support, not semantic correctness.
Interaction-memory promotion is typed artifact learning without rule distillation. The property is that live host interactions become durable, retrievable evidence with semantics. The mechanism is real: turns are grouped by conversation, copied into memory directories with manifests, typed with memory_kind/durability/answer_use_policy, and retrieved through normalized semantics. What it still does not do is distill rules, resolve contradictions between memories, retire stale ones by explanatory reach, or mutate an existing memory. Compared to ExpeL's EDIT/REMOVE verbs, ACE's bullet counters, or Voyager's critic-gated skill promotion, DocMason's loop is closer to typed session condensation with policy-aware reinjection. The typing addition since the prior review pushes it toward — but does not cross — principled maintenance.
What to Watch
- Whether the canonical-ask
Canonical Ask Contractstays the single governed entry, or whether competing host-adapter routes (Codex native, Claude Code, Copilot) accumulate parallel entry surfaces. - Whether interaction-memory promotion grows a real maintenance model — deduplication across conversations, contradiction handling, retirement by explanatory reach, or explicit mutation verbs.
- Whether
hybrid_enrichmentandlane_b_follow_upstabilize as the authoritative hard-artifact queue, or whether the sync path accumulates alternative follow-up surfaces. - Whether the published-truth discipline and admissibility commit barrier stay the product center as bundle update, evaluation, and adapter machinery expand.
- Whether multimodal fidelity keeps pace with messy decks and spreadsheets, or leans increasingly on semantic overlays and manual follow-up to close gaps.
Trace-derived learning placement. Trace source — reconciled native Codex and Claude Code threads, captured as runtime interaction-ingest entries keyed by conversation_id/interaction_id, with user text, assistant excerpts, attachment refs, and relation hints; trigger boundaries are per-turn reconciliation plus per-sync grouping. Extraction — build_promoted_interaction_memories groups entries by conversation, derives memory_kind/durability/uncertainty/answer_use_policy/retrieval_rank_prior through routing.normalize_memory_semantics, writes per-turn text and structure assets, auto-authors conservative knowledge.json and summary.md, and computes an interaction_input_digest oracle for idempotent reuse. No LLM judge decides rule extraction; semantics are heuristic-normalized from turn content and hint records. Promotion target — typed, inspectable interaction-memory-<digest>/ directories under knowledge_base/current/, consumable by the same retrieval and trace paths as document sources; published when docmason sync runs. No weight updates. Memories are service-owned in the sense that DocMason controls their schema, but they live as files in the repo. Scope — per-conversation groupings within one workspace; no cross-task generalization, no cross-workspace sharing, no cross-agent abstraction. Timing — staged in cycles on every sync, and reused across syncs when the input digest is unchanged; no online update during a live turn, and no offline benchmark-driven batch.
On the survey axes: axis 1 (ingestion) — single-session extension aggregated at sync time, with host-owned session formats (Codex, Claude Code) normalized into DocMason's own interaction schema; closer to ClawVault's vault-plus-observer shape than to cass-memory's cross-agent aggregator, because DocMason owns the workspace rather than discovering sessions across agents. Axis 2 (promotion target) — symbolic-artifact learning on a file backend, with typed durable records rather than scored flat rules or executable code. Relative to the survey: DocMason strengthens the claim that "typed durable observations" (the ClawVault/OpenViking band) can carry real retrieval-policy consequences without moving to ExpeL-style mutation verbs. It does not warrant a new subtype — the existing "typed durable observations" category fits — but the retrieval-side policy attachment is a sharpening of how typed memories can gate answer use, worth noting when the survey is next updated.
Relevant Notes:
- Files beat a database for agent-operated knowledge bases — foundation: DocMason pushes the file-first bet from KB authoring into multimodal private-document analysis and published evidence serving
- Inspectable substrate, not supervision, defeats the blackbox problem — foundation: DocMason's trust story depends on inspectable manifests, renders, and trace artifacts rather than a hidden service boundary
- Deterministic validation should be a script — exemplifies: DocMason makes validation a hard publish gate, and
admissibility.pyextends that discipline into answer commit - A functioning KB needs a workshop layer, not just a library — extends: interaction-memory promotion is a concrete workshop-to-library bridge built from host-session traces
- Trace-derived learning techniques in related systems — extends: DocMason adds a repo-native live-session artifact-learning case with typed
memory_kind/answer_use_policysemantics that gate retrieval - Substrate class, backend, and artifact form are separate axes that get conflated — sharpens: DocMason's interaction memories are symbolic artifacts in a file backend, typed enough to carry policy without becoming a separate substrate