WUPHF

Type: ../types/agent-memory-system-review.md · Status: current · Tags: related-systems, trace-derived

WUPHF, from Nex CRM, is a local multi-agent office/runtime rather than only a memory library. Its memory system is a git-backed markdown wiki under the user's WUPHF runtime home, per-agent notebooks, fact and execution JSONL logs, derived SQLite/Bleve indexes, compiled playbook skills, and broker state. It qualifies as trace-derived because committed raw artifacts and playbook execution logs can be distilled into facts, entity briefs, "What we've learned" sections, team learnings, and invokable skills.

Repository: https://github.com/nex-crm/wuphf

Reviewed commit: ea18ae5227d01411bb746a1791563872822c6ca4

Commit URL: https://github.com/nex-crm/wuphf/commit/ea18ae5227d01411bb746a1791563872822c6ca4

Last checked: 2026-05-16

Core Ideas

Fresh-session execution is a runtime contract, not a summarizer. Claude turns are launched with claude --print, a freshly built prompt, strict per-agent MCP config, and stdin carrying the current notification plus bounded memory context; there is no resume flag in the runner path (internal/team/headless_claude.go). Codex turns use codex exec --ephemeral with the office prompt and notification on stdin (internal/team/headless_codex_runner.go). The README makes this a product claim: each turn starts clean, prompt prefixes align for caching, and tool schema size is reduced by role/channel scoping (README.md).

The broker is the office state authority. The broker keeps channel messages, tasks, requests, actions, scheduler jobs, skills, wiki workers, indexes, entity synthesizers, playbook synthesizers, and agent stream buffers in one process-local state object (internal/team/broker.go, internal/team/broker_types.go). It persists a JSON snapshot to ~/.wuphf/team/broker-state.json with a .last-good fallback and activity-score selection on load (internal/team/broker_persistence.go). Channel/task traces in this state are runtime coordination artifacts: they can shape the current office loop, but they are not the canonical long-term knowledge substrate until committed or distilled elsewhere.

Workspace isolation is per task/agent. Coding agents can receive git worktrees rooted under the WUPHF runtime home, with stale worktrees pruned, prior task state overlaid, and the path exported as WUPHF_WORKTREE_PATH (internal/team/worktree.go, internal/team/headless_claude.go, internal/team/headless_codex_runner.go). Worktrees are storage substrates for active work products, not memory artifacts by themselves. They become knowledge artifacts only when their results are discussed, committed as artifacts, written to notebooks/wiki, or captured in task state.

MCP tools are scoped by channel, role, and memory backend. The launcher writes per-agent MCP config files that can restrict an agent to wuphf-office or add the legacy nex server (internal/team/mcp_config.go). The MCP server then registers different tools for 1:1, DM, office, lead, and specialist contexts; DM mode intentionally gets a smaller surface than full office mode (internal/teammcp/server.go). On the markdown backend, agents see team_wiki_*, notebook_*, lookup, playbook, learning, lint, and contradiction-resolution tools; on Nex/GBrain they see the legacy team_memory_* tools instead (internal/teammcp/server.go, README.md). Tool registration is a system-definition artifact: it routes and constrains what an agent can do.

Markdown wiki is the default, with legacy backends retained as a separate surface. Configuration names none, nex, gbrain, and markdown, and the status resolver treats markdown as the file-over-app default at ~/.wuphf/wiki (internal/config/config.go, internal/team/memory_backend.go). The older Nex and GBrain backends remain as external shared-memory adapters with their own query/write/promotion semantics, while markdown exposes wiki and notebook tools directly through the WUPHF MCP server. That matters for review: WUPHF's current distinctive memory design is the git wiki, not the legacy remote graph backends.

The wiki is a git repo plus a single-writer queue. Repo owns git initialization, layout, commits, backup mirror, fsck/recovery paths, and per-author commit identity for ~/.wuphf/wiki (internal/team/wiki_git.go). WikiWorker serializes writes through a buffered channel, commits, publishes SSE events, triggers backup mirroring, reconciles derived indexes, and routes special write kinds for notebooks, artifacts, facts, playbooks, learnings, lint reports, and human edits (internal/team/wiki_worker.go). The explicit schema says markdown is source of truth, SQLite/Bleve are rebuildable caches, and all writes should pass through the single worker (docs/specs/WIKI-SCHEMA.md).

The retained artifact taxonomy is real in the implementation. WUPHF separates channel/task traces in broker state, headless logs under ~/.wuphf/logs, raw immutable artifacts under wiki/artifacts/{kind}/{sha}.md, per-agent notebooks under agents/{slug}/notebook/, canonical wiki pages under team/..., fact logs under wiki/facts/{kind}/{slug}.jsonl, entity and graph rows in SQLite, BM25 text rows in Bleve, playbook execution logs under team/playbooks/{slug}.executions.jsonl, compiled playbook skills under team/playbooks/.compiled/{slug}/SKILL.md, and team skills under team/skills/{name}/SKILL.md (internal/team/headless_logging.go, internal/team/artifact_commit.go, internal/team/notebook_worker.go, internal/team/wiki_index_sqlite.go, internal/team/wiki_index_bleve.go, internal/team/playbook_compiler.go, internal/team/skill_crud_endpoints.go). The storage substrate is mixed: git markdown/JSONL is canonical for wiki memory, SQLite/Bleve are derived activation caches, JSON broker state is runtime coordination state, and worktrees/logs are execution support.

Raw artifact commits close the trace-to-fact loop. CommitArtifact writes source artifacts to wiki/artifacts/{kind}/{sha}.md without catalog regeneration and the wiki worker asynchronously invokes the extractor hook after successful artifact commits (internal/team/artifact_commit.go, internal/team/wiki_worker.go). The extractor reads the artifact, renders an entity-extraction prompt, calls the configured one-shot provider, parses JSON, resolves entities, computes deterministic fact IDs, submits facts/entities through the worker, persists new facts to append-only JSONL, writes minimal ghost briefs, and sends failures to a DLQ for replay (internal/team/wiki_extractor.go, internal/team/broker_wiki_extract.go). Raw artifacts are knowledge artifacts; extracted fact logs and entity briefs become queryable knowledge artifacts with stronger lineage.

Entity briefs are synthesized, not merely searched. Fact logs accept deterministic, append-only observations with source paths and recording identity (internal/team/entity_facts.go). EntitySynthesizer is a broker-level worker that shells out through the configured CLI, coalesces jobs per entity, uses a threshold of new facts, commits updated markdown briefs under the archivist identity, and preserves contradictions as callouts instead of resolving them automatically (internal/team/entity_synthesizer.go). Minimal ghost briefs make entity rows rebuildable from markdown if the derived index is wiped (internal/team/entity_minimal_brief.go).

Lookup is cited-answer retrieval over the fact index. QueryHandler classifies the query, refuses out-of-scope general questions without an LLM call, retrieves top-K facts from the wiki index, hydrates sources with staleness and source paths, escapes untrusted excerpts, renders a prompt, and parses structured JSON back to the caller (internal/team/wiki_query.go, internal/teammcp/server_wiki_tools.go). Retrieval answers advise the agent as knowledge artifacts; the lookup prompt and MCP tool contract are system-definition artifacts because they constrain answer shape, citations, and refusal behavior.

Notebooks are draft memory with promotion requests. Notebook writes use the same wiki worker queue but are author-owned and stored under agents/{slug}/notebook/; reads and searches are intentionally cross-agent (internal/team/notebook_worker.go, internal/teammcp/notebook_tools.go). notebook_promote creates a review request from a notebook source to a team/... wiki target, and approval applies a copy-not-move promotion that writes the wiki page, updates notebook frontmatter with promotion backlinks, regenerates the catalog, and records git commits under the approver identity (internal/teammcp/notebook_tools.go, internal/team/promotion_commit.go). Notebooks are weak knowledge artifacts; approved wiki pages are canonical knowledge artifacts, and promoted playbooks/skills can become system-definition artifacts.

Playbooks compile into skills and learn from execution logs. A source playbook at team/playbooks/{slug}.md compiles deterministically into team/playbooks/.compiled/{slug}/SKILL.md; the compiled skill instructs future agents to read the canonical playbook, execute its steps, record an outcome, and optionally record team learnings (internal/team/playbook_compiler.go). playbook_execution_record appends outcome JSONL entries, and PlaybookSynthesizer uses recent executions to maintain a trailing "What we've learned" section without rewriting the author's main body (internal/team/playbook_executions.go, internal/team/playbook_synthesizer.go, internal/teammcp/playbook_tools.go). This is the clearest behavior-shaping loop: execution traces become prose lessons, and the compiled skill gives those lessons instruction authority.

Team skills have approval, safety, invocation, and publishing surfaces. Agents can create/propose skills through structured MCP; only CEO can create active skills directly, while ordinary agents propose skills for approval (internal/teammcp/skills.go). The scanner walks wiki articles, asks an LLM whether each article is a reusable skill, writes proposals with provenance, skips notebooks and existing skills, budgets LLM calls, and coalesces compile passes (internal/team/skill_scanner.go, internal/team/skill_compile.go). The guard rejects dangerous/cautionary patterns depending on trust level (internal/team/skill_guard.go), and invocation logs skill_invocation while returning canonical instructions the agent is told to follow (internal/teammcp/skills.go). Active skill content is a system-definition artifact.

Linting is a maintenance loop with mutation authority. run_lint executes contradiction, orphan, stale-claim, missing-cross-ref, and dedup-review checks, commits a dated markdown report, and uses an LLM judge for semantic contradiction clusters (internal/team/wiki_lint.go, internal/team/broker_lint.go, internal/teammcp/server_wiki_tools.go). resolve_contradiction can rewrite fact JSONL to set supersedes, valid_until, or reciprocal contradicts_with fields. Lint reports are knowledge artifacts when read; the resolver is a system-definition artifact with mutation authority over the canonical fact logs.

Comparison with Our System

Dimension WUPHF Commonplace
Primary purpose Local multi-agent office with runtime coordination, fresh turns, tools, and memory Agent-operated KB methodology and durable review/validation system
Canonical substrate Git-backed markdown/JSONL wiki plus broker JSON state Git-backed markdown KB with typed collections and generated indexes
Raw traces Channel messages, task state, headless logs, runtime artifacts, raw wiki artifacts, playbook execution logs Source snapshots, review reports, validation outputs, git history, work notes
Derived knowledge Fact JSONL, entity briefs, lookup sources, playbook lessons, team learnings Notes, reviews, ADRs, indexes, source-derived analysis
System-definition artifacts MCP tools, prompts, compiled playbook skills, team skills, guards, lint resolver, runner configs Instructions, type specs, validators, commands, review gates, skills
Activation Broker notification, scoped MCP, lookup, wiki read/search, skill invocation, playbook compile rg, indexes, links, validation, instructions, review bundles
Authority model Mixed: runtime tools coordinate, wiki facts advise, compiled skills instruct, lint resolver mutates Explicit collection/type contracts and validation; less live runtime orchestration

WUPHF and commonplace share a core bet: ordinary files, git history, and explicit artifact contracts are better agent memory substrates than opaque chat history. WUPHF pushes that bet into a live office runtime. Commonplace is quieter: it accumulates durable methodology and uses validation/review workflows to keep that library coherent.

The biggest difference is timing. WUPHF is online and push-driven: messages wake agents, agents run fresh sessions, and the broker mediates tools and state. Commonplace is mostly library and maintenance infrastructure: agents search and edit a durable corpus, but there is no central office broker deciding who wakes, which tools appear, or which runtime gets used.

The behavioral authority split is sharper in WUPHF than in many related systems. Channel messages and notebooks advise; raw artifacts evidence; facts and briefs answer; playbook execution logs evidence; "What we've learned" advises future playbook readers; compiled skills and active team skills instruct agents; MCP registration and runner config constrain what agents can do; lint resolution mutates canonical facts. That separation is not just vocabulary in the review: it is visible in different paths, queues, tools, commit authors, and write handlers.

WUPHF is less mature than commonplace on static type contracts for prose artifacts. The wiki schema is explicit and implementation-aware, but many wiki article shapes are enforced by prompts, path checks, and runtime handlers rather than by a deterministic repository validator comparable to commonplace-validate. Conversely, WUPHF is stronger on live feedback loops: channel events, SSE, auto-extraction, synthesis thresholds, playbook recompilation, skill invocation logging, and contradiction mutation all happen inside the running office.

Borrowable Ideas

A workshop runtime can keep fresh sessions cheap without losing continuity. WUPHF's fresh-turn runner model is a useful complement to commonplace's workshop layer: continuity comes from broker state, scoped memory lookup, task packets, notebooks, wiki facts, and skills rather than from unbounded chat resume.

Use one canonical writer for git-backed shared memory. The wiki worker's single queue is a clean pattern for any agent-maintained git repo that has concurrent MCP, HTTP, human, and background writes. Commonplace currently relies more on human/agent discipline and git operations; a single-writer queue is worth borrowing only for a live multi-agent service, not for ordinary repo editing.

Keep draft memory visually and mechanically distinct from canonical memory. Per-agent notebooks with author-owned writes and promotion backlinks are a strong pattern. In commonplace terms, notebooks are workshop artifacts; wiki pages are library artifacts; the promotion request is the authority transition.

Compile playbooks into invokable skills, but preserve the source. WUPHF's compiled playbook skill is explicitly derived and tells the agent to read the canonical source before acting. That is a safer promotion target than copying an LLM-rewritten procedure into a hidden tool registry.

Treat execution logs as evidence before instruction. WUPHF's playbook synthesizer preserves the author's body and appends learned lessons. Commonplace should borrow that authority limit if it ever synthesizes procedural improvements from repeated runs: traces can advise until a stronger review gate authorizes rewriting instructions.

Make contradiction resolution mutate structured facts, not prose summaries. The lint resolver writes supersedes, valid_until, and contradicts_with into fact logs. That is more auditable than only editing a brief paragraph, and it preserves source-level disagreement for future synthesis.

Trace-derived learning placement

Trace source. WUPHF consumes several trace families: broker channel messages, task records, agent headless stdout/logs, committed raw artifacts, per-agent notebooks, playbook execution logs, skill invocation events, and external action evidence. The qualifying trace-derived mechanisms are the artifact extraction path and the playbook/skill learning paths, not merely the existence of chat logs.

Extraction. Raw artifacts committed under wiki/artifacts/{kind}/{sha}.md trigger an extractor that uses the configured LLM CLI to produce entities and facts, resolves entities against the index, computes deterministic fact IDs, writes facts/entities, persists fact JSONL, and creates minimal ghost briefs. Playbook execution logs trigger thresholded or on-demand synthesis into "What we've learned"; wiki articles can be scanned by an LLM and proposed as reusable skills; notebook clusters and self-heal signals can become skill candidates.

Storage substrate. Raw traces live in broker JSON state, log files, runtime streams, worktrees, notebooks, raw artifact markdown, and execution JSONL. Distilled durable state lives in git-backed markdown and JSONL under the wiki repo. SQLite and Bleve indexes are derived caches; deleting them should not delete canonical memory. Compiled skills are git-backed derived views under .compiled or team/skills.

Representational form. Raw traces are mostly prose/JSON transcripts and runtime records. Facts and indexes are symbolic JSON rows plus text fields. Entity briefs and playbook lessons are prose. Compiled playbook skills and active team skills are mixed prose-plus-frontmatter system-definition artifacts. No model weights or adapters are updated.

Lineage. Artifact-derived facts carry source path, sentence offset, artifact excerpt, created identity, and deterministic IDs. Entity briefs track synthesis state through fact counts and commit metadata. Notebook promotions preserve the source notebook path and promotion commit in frontmatter. Playbook compiled skills carry source_path, and execution logs carry recorded-by/outcome timestamps. The weakest lineage surface is ordinary channel/task broker state: useful operational context, but not self-sufficient durable evidence until committed into the wiki/artifact path.

Behavioral authority. Channel/task traces and notebooks are knowledge artifacts when they advise an agent about ongoing work. Raw artifacts are immutable knowledge artifacts used as evidence. Fact JSONL, entity briefs, and lookup answers are knowledge artifacts with stronger citation authority. Lint reports are knowledge artifacts; contradiction resolution has system-definition authority because it mutates fact validity. Compiled playbook skills, active team skills, MCP tool registration, runner prompt/config, and guard rules are system-definition artifacts because they instruct, route, constrain, or validate future agent behavior.

Scope. The scope is workspace/team-local. WUPHF learns about the user's company, agents, tasks, playbooks, skills, and operating patterns inside one local office/wiki. It is not trying to train a portable model-level memory.

Timing. The loop is online and staged. Agent work happens in fresh sessions; broker and notebook state accumulate during work; raw artifact extraction and fact submission run after artifact commits; entity/playbook synthesis fires by threshold, demand, or manual trigger; skill scanning runs by cron/manual/event-like compile passes with coalescing and cooldown.

Survey placement. On the trace-derived survey, WUPHF belongs in the operational trace-to-artifact and trace-to-skill branches. It strengthens the survey's raw/distilled split: raw office traces do not themselves deserve the trace-derived label's behavior-changing force, but WUPHF does implement durable derivation into facts, briefs, playbook learnings, and skills.

Curiosity Pass

WUPHF's most interesting design move is that it combines an ephemeral-session runtime with a very file-native memory substrate. It does not solve context growth by summarizing one endless conversation; it routes each fresh turn through broker state, scoped tools, notebooks, wiki lookup, and skills.

The system also has two different "memory" eras in the same codebase. The current markdown wiki is the distinctive design; Nex/GBrain are still supported, but the new wiki/notebook/playbook/lint loops are where WUPHF has become an agent-operated KB rather than a thin adapter to an external memory backend.

The risk is authority sprawl. WUPHF has many behavior-shaping paths: prompts, MCP tools, broker state, skills, playbook compilers, lint resolvers, synthesis workers, scanner proposals, and runtime worktrees. The code often separates these well, but operators will need UI and validation discipline so a draft note, a proposed skill, an active skill, a wiki fact, and a runtime task are not treated as equal truth.

The derived-index contract is better than the usual "vector store memory" pattern. SQLite and Bleve are caches over markdown/JSONL, not canonical memory. That makes failure and migration easier to reason about, but it also means every write path has to close the substrate loop; the extractor comments show that this has already been a real source of bugs.

What to Watch

  • Whether the markdown backend remains the default and the legacy Nex/GBrain surface continues to shrink or becomes a migration path.
  • Whether WUPHF adds deterministic validation for wiki schema, fact logs, playbook files, skill frontmatter, and compiled views beyond prompt discipline and runtime tests.
  • Whether artifact extraction expands beyond committed artifacts into automatic channel/task trace capture, and whether privacy/redaction gates keep up.
  • Whether notebooks gain stronger review lifecycle state, expiration, or archival policies as the number of agents grows.
  • Whether playbook synthesis can cite exact execution IDs inside each learned bullet, making the "What we've learned" section audit-ready.
  • Whether skill proposals get clearer distinction between human-authored, scanner-classified, LLM-synthesized, self-heal-derived, approved, disabled, and published authority levels.
  • Whether lint's dedup-review stub becomes a real audit log for borderline entity merges.

Bottom Line

WUPHF is a substantial trace-derived agent-memory system embedded in a local multi-agent office. Its strongest contribution is not any one store, but the artifact gradient: runtime traces and notebooks feed raw artifacts, facts, briefs, playbook lessons, compiled skills, active skills, and lint mutation paths, each with different storage substrates and behavioral authority. Compared with commonplace, it is more runtime-native and automated; compared with many "agent memory" systems, it is much more inspectable because the canonical durable layer is git markdown/JSONL rather than hidden chat state or a vector database.

Relevant Notes: