browzy.ai
Type: agent-memory-system-review · Status: current · Tags: related-systems, trace-derived
browzy.ai is a single-author terminal PKB that turns URLs, PDFs, images, and pasted text into a compiled local wiki, then answers questions against that wiki. The repo is written in TypeScript with an Ink/React CLI on top of a core/ library. Storage is split between a readable file tree at ~/.browzy/ (raw/, wiki/, drafts/, output/, sessions/) and a derived SQLite FTS5 index at .browzy/browzy.db. The interesting architectural bets are that ahead-of-time LLM compilation into a wiki beats chatting over raw sources, and that a prompt-shaped schema file can steer domain behavior without code changes. A thin session-memory layer writes digests and crystallized insight drafts, but it is much lighter than the README's "your browzy keeps getting smarter" framing suggests.
Repository: https://github.com/VihariKanukollu/browzy.ai
Core Ideas
Compiled wiki as the middle layer. The central pipeline in compile/compiler.ts is raw source -> LLM compile step -> wiki article(s) with [[slug]] links and per-article frontmatter (title, tags, sources, backlinks, summary). Sources under ~2000 characters bypass the LLM and get a template article; longer sources go through a compiler prompt that can emit multiple articles, merge into existing ones, and cite [source-id]. A separate backlink pass scans [[...]] links across all articles and rewrites frontmatter.backlinks whenever it changes, so the wiki is self-maintaining at the link graph level. The bet is not "chat over documents" — it is "pay the LLM cost once during compile, then run cheap retrieval forever."
Readable files plus a rebuildable SQLite FTS5 index. wiki/*.md is the user-facing substrate; storage/sqlite.ts is the operational layer. Each article write updates both articles and the articles_fts virtual table inside a single transaction. Retrieval uses FTS5 with per-column BM25 weights (title 10.0, tags 8.0, summary 5.0, content 1.0) and Porter stemming. This is a clean scoped exception to "files only" thinking: the FTS table is an index keyed off the canonical markdown, not a competing source of truth.
Context is assembled from sections, not whole articles. retrieval/contextBuilder.ts is the operational core of the query path. It FTS-searches for candidates, backfills from the article index if there are fewer than five hits, ranks with multi-signal scoring in relevanceRanker.ts (keyword density, title/tag boosts, recency, backlink authority), then clips each article to at most five relevant sections before paying the final prompt cost. Caps are explicit: 15 candidate articles, 8K tokens per article, 50K token article budget overall, with an always-include intro-section rule when a single article exceeds the per-article cap. Confidence (high | medium | low) and gap terms are returned alongside the context so downstream callers can react.
A prompt-shaped schema file is a real control plane. schema.ts reads browzy.schema.md from the data dir, skips comment-only templates, caps at 4000 chars, and both compile/compiler.ts and query/engine.ts call buildCompilerSystemPrompt(schema) / buildQuerySystemPrompt(schema) to inject it into system prompts. The file is advisory — there is no deterministic enforcement of article classes, terminology, or topic focus — but it is a genuine operator-editable artifact that shapes both ingest and retrieval without code edits.
Session-derived artifacts are a real but thin lifecycle. cli/hooks/useSession.ts persists each session to ~/.browzy/sessions/<id>.json with MAX_SESSIONS=50 pruning by mtime, and writes a last-session-meta.json stats snapshot. On the next startup, if the last session had 3+ user turns, query/digest.ts generates a 2–3 sentence digest and writes it both as a loose <id>-digest.txt file and as a markdown wiki article at session-YYYY-MM-DD.md (see cli/app.tsx around line 240). The article still needs the normal indexing path before it shows up in FTS. Separately, query/crystallizer.ts attempts at most one saved crystallized insight per session after any answer that drew on 2+ source articles; if the LLM decides the synthesis is genuinely novel, the result lands in drafts/ with derived: true frontmatter. Failed or NONE attempts can be retried on later multi-source answers. Crystallized insights sit in a separate drafts/ directory with no promotion path to wiki/.
Quality checks and gap discovery are prompt-shaped and narrow. lint/linter.ts mixes three deterministic checks — broken [[wiki-links]], orphan articles without incoming links or backlinks, missing summary/tags/sources — with a single LLM-mediated consistency pass that asks for contradictions, duplicates, gaps, and inconsistent terminology as a JSON array. discovery/gapResolver.ts is small: reject blacklisted greetings and short terms, otherwise append "research paper overview" to the gap term and return the first DuckDuckGo result. These are real capabilities but they are far thinner than autonomous research or a hard-oracle quality system.
Comparison with Our System
| Dimension | browzy.ai | Commonplace |
|---|---|---|
| Primary shape | Personal terminal PKB that compiles sources into a local wiki for Q&A | Agent-operated KB centered on authored notes, links, and instructions |
| Main storage | Files for raw/wiki/drafts plus a derived SQLite FTS5 index | Files in git with only narrow operational-database exceptions |
| Knowledge creation | LLM compile step writes and updates wiki articles from raw sources | Agent+operator authored notes, reviews, and instructions with explicit semantic links |
| Retrieval path | FTS candidate search -> multi-signal ranking -> section extraction -> token-budgeted context | Search plus authored descriptions, indexes, and explicit read-next decisions |
| Session-derived learning | Session digest saved as markdown wiki article, then indexed through the normal path; crystallized insight written to drafts/ |
Workshop layer exists, but session-to-library promotion is manual and theory-led |
| Inspectability | Strong for wiki/, weaker for operational state hidden in SQLite and session caches | Stronger overall because most important state is a checked-in artifact |
| Verification | Deterministic link/orphan/field checks plus one LLM consistency pass, plus FTS/ingest tests | Deterministic note validation and review gates; weaker automated retrieval tests |
browzy is stronger where fast personal knowledge compilation matters — no note-writing discipline required, and a user who pastes URLs gets a linked wiki plus Q&A. Commonplace is stronger where knowledge needs explicit relationships, inspectable maturation, and a legible theory of why one artifact should exist at all. The deepest architectural difference is where the structure lives: browzy commits more into the runtime (compiler prompts, retrieval heuristics, FTS5 index, session logs, slash commands), while commonplace commits more into the artifacts (typed note forms, relationship semantics, library-versus-workshop distinctions).
Trace-derived learning placement. browzy.ai qualifies as trace-derived because it mines the current session's conversation to produce durable symbolic artifacts. The trace source is per-session message history — the session.messages array captured in useSession.ts — with two different trigger boundaries: digest generation is end-of-last-session, gated on the next startup having an LLM provider configured and ≥3 user turns; crystallizer is per-answer, gated to at most one saved crystallized insight per session and only when sourcesUsed.length >= 2. The extraction step is two-path: generateSessionDigest produces a 2–3 sentence freeform text digest from Q: / A: pairs under SESSION_DIGEST_PROMPT, while crystallize produces a 2+ source synthesis article in the same ===ARTICLE===/===END=== mini-format the compiler uses, with a NONE escape hatch for unsalvageable cases. The oracle is the LLM itself on both paths — no reward, no retrieval test, no human sign-off. The promotion target is split and asymmetric: digests are written as loose text and as a wiki article at session-YYYY-MM-DD.md, with the normal indexing path making them queryable later, while crystallized insights land only in drafts/ with derived: true frontmatter and no implemented promotion path to wiki/. Nothing goes into weights. The scope is per-task at best — one session's narrow conversation shapes one digest or one draft; there is no cross-session aggregation, dedup, scoring, or decay. The timing is online: digests run on the next startup (effectively at-boot), crystallization runs during live answering.
On the survey's two axes: axis 1 places browzy under single-session extension, next to Napkin and Pi Self-Learning — it mines the current session inside the same runtime and writes back into markdown artifacts, though it differs by defining its own session-file schema rather than extending a host agent. Axis 2 places it clearly in symbolic artifact learning, at the minimal-structure end alongside Reflexion-style verbal hints: digests are freeform prose, crystallized drafts are slightly more structured but inherit the compiler's article format. browzy neither strengthens nor weakens the survey's existing claims; it confirms the single-session / symbolic-artifact cell with a minor twist (split promotion between queryable wiki and inert drafts), and does not warrant a new subtype.
Borrowable Ideas
Use a small operational database as a derived layer, not a substrate replacement. browzy's wiki/ plus articles_fts split is a clean scoped exception to strict files-only thinking: readable markdown stays authoritative, the FTS table is rebuilt from it, and every write updates both atomically. This is ready to borrow wherever a commonplace subsystem hits the search-over-markdown wall; the important move is "rebuildable index beside readable files," not "move the KB into SQLite."
Assemble answer context from relevant sections, not whole documents. contextBuilder.ts does a concrete thing worth copying: retrieve candidates symbolically, score them, extract the matching sections, then clip to a token budget before paying the final prompt cost. The always-include-intro rule for oversized articles is a small but useful detail. Ready to borrow when any commonplace retrieval pipeline becomes budget-constrained.
Give the runtime a prompt-shaped schema file. browzy.schema.md is a compact control-plane move. It keeps domain adaptation in an editable artifact that is injected into both compile-time and query-time system prompts, instead of scattering it through code. Ready to borrow where a subsystem needs operator-tunable doctrine that does not yet merit a structured schema.
Write mined insights to drafts before promoting them. The crystallizer deliberately writes into drafts/ rather than mutating the trusted wiki. That is a workshop-layer pattern — generated insight first, curation later — and it is already aligned with commonplace's library/workshop split. Ready to borrow now for any automated-mining subsystem where promotion needs review.
Treat retrieval quality as something worth testing directly. The repo's src/core/__tests__/sqlite.test.ts and related tests exercise FTS behavior, stemming, migration, and sanitization. This is a useful reminder that knowledge systems benefit from runtime-behavior tests, not only artifact validation. Borrow once a commonplace retrieval path becomes stable enough to freeze.
Curiosity Pass
The most important real idea is the compiled middle layer, not "your browzy keeps learning." The compile step actually transforms raw sources into a more navigable representation with typed frontmatter, backlinks, and inter-article citations — that is a genuine data transformation. The session-learning story is real but narrow: one digest per returning session (if the LLM is available and the prior session had 3+ user turns) and at most one crystallized draft per session. Calling that "every question makes your browzy smarter" overstates what the code does.
The storage design is better read as files plus a derived database, not files versus database. browzy is a clean example of the pattern: canonical artifacts stay in markdown for Obsidian compatibility and human inspection, while the operational access pattern for search is delegated to SQLite. Commonplace reaches similar shapes in narrower domains; browzy is the strongest whole-KB instance of the pattern in the review corpus.
The session-memory layer is a workshop pattern dressed as library memory. Digests and crystallized drafts are durable, but they are not yet fully curated knowledge. The asymmetry is telling: digests become markdown wiki articles that are queryable after indexing and available to future context builds, while crystallized insights sit in drafts/ with no promotion path. That makes the "memory" loop closer to workshop-layer notes with one automatic library-promotion path and one inert path. A light review/promote lifecycle for drafts/ is the obvious missing piece.
Several ambitious features reduce to prompt-shaped heuristics. Gap discovery is "append 'research paper overview' and take DuckDuckGo's first result" after a token blacklist and length check. Contradiction detection is one LLM pass over article summaries, not a maintained structural check. Concept extraction is another LLM pass whose output only affects the concept index. These are real capabilities but are materially thinner than a maintained subsystem; treat them as useful prototypes rather than finished mechanisms.
The compile step's contradiction protocol is an unchecked prompt obligation. The compiler prompt tells the model to "follow the contradiction protocol" when a new source disagrees with existing content, but nothing in compiler.ts verifies that the protocol was followed or preserves the original text when a conflict occurs. Merge quality therefore depends entirely on model obedience — a simpler description than "contradiction resolution."
What to Watch
- Whether crystallized drafts gain a real review, promotion, or retirement lifecycle instead of remaining a sidecar draft directory.
- Whether
browzy.schema.mdstays prompt doctrine or grows stronger deterministic structure around article classes and compile behavior. - Whether the linter's LLM consistency pass becomes stronger than a prompt-mediated advisory check (e.g. structural rules, article-type contracts).
- Whether gap hunting grows beyond "first safe DuckDuckGo result" into a more deliberate source-discovery loop with dedup against existing articles.
- Whether the compiled wiki stays high quality as the corpus grows, or whether prompt-only article maintenance starts to drift into inconsistency.
- Whether session-memory features expand to cross-session aggregation, scoring, or decay — which would move browzy closer to playbook-style trace-derived systems.
Relevant Notes:
- Files beat a database for agent-operated knowledge bases — complicates: browzy is a clean example of readable files as canonical substrate with SQLite FTS5 as a justified derived index
- Agents navigate by deciding what to read next — exemplifies:
contextBuilder.tsis a runtime implementation of "what should I read next?" as ranking plus section extraction - A functioning knowledge base needs a workshop layer, not just a library — sharpens:
drafts/behaves like a workshop layer, but with no implemented promotion path to the wiki library - Deploy-time learning is the missing middle — frames: session digests are the clearest deploy-time artifact-learning path here; crystallized drafts are a weaker pre-promotion variant
- Pal — compares: both combine a compiled wiki with session-derived continuity, but PAL has a broader live-tool control plane and a sharper split between routing and learned behavior
- Siftly — compares: both use SQLite-backed operational stages around source ingestion, but browzy pushes further into compiled wiki synthesis and interactive query-time context assembly
- OpenViking — contrasts: both capture durable agent memory, but OpenViking makes service-owned memory extraction central while browzy keeps the center of gravity on a local compiled wiki