KB design

Type: index · Status: current

How agent-operated knowledge bases are built, installed, and evaluated. Architecture decisions, skill design, and the evaluation loop for the knowledge system itself. For document structure and types, see document-system. For the learning theory knowledge bases draw on, see learning-theory.

Architecture

Skills & Methodology

Evaluation

  • what-works — proven patterns: prose-as-title, template nudges, frontmatter queries, discovery-first
  • what-doesnt-work — anti-patterns and insufficient evidence: auto-commits, queue overhead
  • needs-testing — promising but unconfirmed: extract/connect/review cycle, input classification
  • what-cludebot-teaches-us — techniques from cludebot worth borrowing, what we already cover, and what to watch for at scale
  • prompt-ablation-converts-human-insight-to-deployable-framing — methodology for testing prompt framings: vary only the framing against a known-correct target, analyze mechanisms, deploy the winner as instruction

Design Principles

Workshop Layer

Gaps

Decisions

Reference material

  • Toulmin argument — formal argumentation model (claim/grounds/warrant/qualifier/rebuttal/backing) that grounds claim-title conventions and the structured-claim type
  • Agentic Note-Taking 23: Notes Without Reasons — practitioner validation of propositional links over embedding-based adjacency; confirms the Goodhart risk in quality signals
  • A-MEM: Agentic Memory for LLM Agents — academic paper implementing Zettelkasten-inspired automated memory with link generation and memory evolution; provides empirical evidence for boiling cauldron mutations and scaling data for embedding-based linking
  • Context Engineering for AI Agents in OSS — empirical study of AGENTS.md/CLAUDE.md adoption in 466 OSS projects; validates the loading-frequency principle's content categories, provides evolution data showing constraining maturation in the wild, and confirms the dual-audience split between human READMEs and machine context files
  • document-system — types, writing conventions, and validation that the KB's documents follow
  • learning-theory — the learning mechanisms (constraining, codification, distillation) that KB operations instantiate
  • computational-model — PL concepts (scheduling, partial evaluation, scoping) that inform KB architecture; the scheduling notes moved here
  • links — linking methodology, navigation, and link contracts
  • maintenance — detection, operations, and dynamics that keep the KB healthy over time
  • related-systems — external system comparisons

All notes

  • 004-Replace areas with tags — Replaces the areas frontmatter field with freeform tags and restructures index pages to have both curated and generated sections, decoupling navigation from comparative reading
  • A functioning knowledge base needs a workshop layer, not just a library — The current type system models permanent knowledge (library) but not in-flight work with state machines, dependencies, and expiration (workshop) — tasks are a prototype of the missing layer, and a functioning knowledge base needs both plus bridges between them
  • A good agentic KB maximizes contextual competence through discoverable, composable, trustworthy knowledge — Theory of why commonplace's arrangements work — three properties (discoverable, composable, trustworthy) serve contextual competence under bounded context; accumulation is the basic learning operation (reach distinguishes facts from theories); constraining, distillation, and discovery transform accumulated knowledge; Deutsch's reach criterion distinguishes knowledge that transfers from knowledge that merely fits
  • A knowledge base should support fluid resolution-switching — Good thinking requires moving between abstraction levels — broad for context, narrow for mechanism, back out for pattern. A KB's quality should be measured by how fluidly it supports this resolution-switching, not just retrieval accuracy.
  • Active-campaign understanding needs a single coherent narrative, not composed notes — Why durable-knowledge graph composition (many linked notes) is wrong for tracking understanding during active engineering — a single holistically rewritten narrative maintains the coherence that working memory requires
  • Ad hoc prompts extend the system without schema changes — When a new requirement doesn't fit existing types or skills, writing an ad hoc instructions note absorbs it without any schema change — the collections problem is a concrete example
  • Agent statelessness makes routing architectural, not learned — Agents never develop navigation intuition — every session is day one — so all knowledge routing infrastructure (skills, type templates, routing tables, naming conventions, activation triggers) is permanent architecture, not scaffolding that learners outgrow
  • Agent statelessness means the harness should inject context automatically — Since agents can't carry vocabulary or decisions between reads, the harness should auto-inject referenced context — definitions once per session, ADRs when relevant. The trigger mechanism (type, link semantics, term detection) is an open question; the need follows directly from statelessness.
  • AGENTS.md should be organized as a control plane — Theory for deciding what belongs in AGENTS.md using loading frequency and failure cost, with layers, exclusion rules, and migration paths
  • Alexander's patterns connect to knowledge system design at multiple levels — Christopher Alexander's pattern language, generative processes, and centers may connect to our knowledge system design at multiple levels — from structured document types to codification to link semantics. Vague but persistent.
  • Always-loaded context has two surfaces with different affordances — CLAUDE.md enforces universal constraints (imperative/push); skill descriptions advertise opt-in capabilities (suggestive/pull) — guidance belongs on whichever surface matches its enforcement model
  • Areas exist because useful operations require reading notes together — Areas are defined by operations that require reading notes together — orientation and comparative reading — which need sets that are both small enough for context and related enough to yield results
  • Automating KB learning is an open problem — The KB already learns through manual work (every improvement is capacity change per Simon). The open problem is automating the judgment-heavy mutations — connections, groupings, synthesis — which require oracles we can't yet manufacture.
  • Capability placement should follow autonomy readiness — Capability artifacts should be placed by autonomy readiness so AGENTS.md stays free of inventories and only routes or constrains behavior
  • Claw learning is broader than retrieval — A Claw's learning loop must improve action capacity (classification, planning, communication), not just retrieval — question-answering is one mode among many
  • Commonplace architecture — The commonplace repo's own internal layout — what exists, what's missing, and the decision to put global types in CLAUDE.md instead of kb/types/
  • Commonplace installation architecture — Design for how commonplace installs into a project — two trees (user's kb/ and framework's commonplace/), operational artifacts copied for prompt simplicity, methodology referenced for deeper reasoning
  • Context efficiency is the central design concern in agent systems — Context — not compute, memory, or storage — is the scarce resource in agent systems; context cost has two dimensions (volume and complexity) that require different architectural responses, making context efficiency the central design concern analogous to algorithmic complexity in traditional systems
  • Deep search is connection methodology applied to a temporarily expanded corpus — Design exploration for a deep search skill that reuses /connect's dual discovery and articulation testing on web search results, building a temporary research graph before bridging to KB
  • Design methodology — borrow widely, filter by first principles — We borrow from any source but adopt based on first-principles support — except programming patterns, which get a fast pass because the bet is that knowledge bases are a new kind of software system
  • Distillation status determines directory placement — Hunch that procedural artifacts distilled for execution belong in kb/instructions/ — the directory boundary is "distilled into a procedure", not "compressed" or "frequently loaded"
  • Enforcement without structured recovery is incomplete — The enforcement gradient covers detection and blocking but has no recovery column — recovery strategies (corrective → fallback → escalation) are the missing layer, and oracle strength determines which are viable at each level
  • Files beat a database for agent-operated knowledge bases — Files beat a database early on — a schema commits to access patterns before you know them, and files let you constrain incrementally while getting free browsing, versioning, and agent access from day one
  • Frontloading spares execution context — Pre-computing static parts of LLM instructions and inserting results spares execution context — the primary bottleneck in instructing LLMs; the mechanism is partial evaluation applied to instructions with underspecified semantics
  • Generate KB skills at build time, don't parameterise them — KB skills should be generated from templates at setup time, not parameterised with runtime variables — applying the general principle that indirection is costly in LLM instructions
  • Indirection is costly in LLM instructions — In code, indirection (variables, config, abstraction layers) is nearly free at runtime — in LLM instructions, every layer of indirection costs context and interpretation overhead on every read
  • Injectable configuration extends frontloading to installation-specific values — Values static within an installation but variable across installations — sibling repo paths, local tool locations — are frontloadable through configuration the orchestrator resolves and injects into sub-agent frames; the context savings depend on sub-agent isolation since injection into the main context just adds tokens
  • Instruction specificity should match loading frequency — The loading hierarchy (CLAUDE.md → skill descriptions → skill bodies → task docs) should match instruction specificity to loading frequency — always-loaded context competes for attention every session
  • Instructions are skills without automatic routing — Reusable distilled procedures that live in kb/instructions/ — same format as skills but without activation triggers or CLAUDE.md routing entries; invoked when a human points the agent at them
  • Instructions are typed callables with document type signatures — Skills and tasks are typed callables — they accept document types as input and produce types as output, and should declare their signatures like functions declare parameter types.
  • MCP bundles stateless tools with a stateful runtime — MCP forces stateless tool operations through a persistent server process — most tools are pure functions that don't need session state, connections, or lifecycle management, but pay the complexity tax anyway
  • Mechanistic constraints make Popperian KB recommendations actionable — Bounded context and underspecification don't just permit conjecture-and-refutation — they require it; derives three concrete practices (falsifier blocks, contradiction-first connection, rejected-interpretation capture) from KB mechanics.
  • Methodology enforcement is constraining — Instructions, skills, hooks, and scripts form a constraining gradient for methodology — from underspecified and indeterministic (LLM interprets and may not follow) to fully deterministic (code always runs), with hooks occupying a middle ground of deterministic triggers with indeterministic responses
  • Needs testing — Promising ideas without enough evidence — extract/connect/review cycle, input classification before processing
  • Prompt ablation converts human insight into deployable agent framing — Methodology for testing prompt framings — uses controlled variation against a human-verified finding to identify which cognitive moves agents can reliably execute, then deploys the winning framing as instruction
  • Scenario decomposition drives architecture — Deriving architectural requirements by decomposing concrete user stories into step-by-step context needs — not from abstract read/write operations but from what the agent actually has to load at each stage, in both the commonplace repo and installed projects
  • Scenarios — Concrete use cases for the knowledge system — upstream change analysis and proposing our own changes
  • Skills derive from methodology through distillation — The methodology→skill relationship is distillation (extracting operational procedures from discursive reasoning in the same medium) — distinct from codification (prompt→code phase transition) and constraining (narrowing output distribution)
  • The fundamental split in agent memory is not storage format but who decides what to remember — Comparative analysis of eleven agent memory systems across six architectural dimensions — storage unit, agency model, link structure, temporal model, curation operations, and extraction schema — revealing that the agency question (who decides what to remember) is the most consequential design choice and that no system combines high agency, high throughput, and high curation quality.
  • Two context boundaries govern collection operations — Any note collection faces two context boundaries — a full-text boundary where all bodies can be loaded together, and an index boundary where all titles+descriptions fit — creating three operational regimes that govern areas, /connect, and whole-KB operations differently
  • Vibe-noting — Vibe coding works because code is inspectable, not just verifiable — a KB adds that same inspectability to knowledge work, enabling augmentation even where automation is blocked on oracle construction
  • What cludebot teaches us — Techniques from cludebot worth borrowing — what we already cover, what to adopt now, and what to watch for as the KB grows
  • What doesn't work — Anti-patterns and areas with insufficient evidence — auto-commits, queue overhead, validation ceremony, session rhythm
  • What works — Patterns proven valuable in practice — prose-as-title, template nudges, frontmatter queries, semantic search via qmd, discovery-first, public/internal boundary