Notes Directory
Type: index
- "Agent" is a useful technical convention, not a definition (note) — A lightweight technical convention — an agent is a tool loop (prompt, capability surface, stop condition) — sidestepping the definitional debate in favor of a unit that organizes code
- 002-inline-global-types-in-writing-guide (adr) — Decision to inline note and structured-claim templates into WRITING.md so the agent gets type structure and writing conventions in a single hop — eliminates one read for the two most common note types
- 003-connect-skill-discovery-strategy (adr) — Design options and scaling strategy for how the connect skill discovers candidate connections — index-first with semantic search backup, and what changes when the KB grows
- 004-Replace areas with tags (adr) — Replaces the areas frontmatter field with freeform tags and restructures index pages to have both curated and generated sections, decoupling navigation from comparative reading
- 005-quality-check-placement (adr) — Where quality checks belong — WRITING.md pre-save checklist vs post-write instructions (semantic-review, validate) — based on cost, false-positive tolerance, and whether the check blocks creation
- 006-two-tree-installation-layout (adr) — Decision to split installed projects into two directory trees — user content in kb/, framework in commonplace/ — with operational artifacts copied to kb/ for fast agent access and methodology kept in commonplace/ as fallback
- 007-reports-directory-for-generated-snapshots (adr) — Decision to create kb/reports/ for generated, regenerable analytical snapshots — distinct from workshop (temporal work-in-flight) and notes (durable claims)
- 008-Stdlib-only core scripts (adr) — Core scripts use only Python stdlib by defining a strict frontmatter grammar that a regex parser handles completely
- 009-Link relationship semantics (adr) — Adopts a fixed vocabulary of link relationship types (extends, grounds, contradicts, enables, exemplifies) borrowed from arscontexta and adapted for agent navigation under bounded context
- A functioning knowledge base needs a workshop layer, not just a library (note) — The current type system models permanent knowledge (library) but not in-flight work with state machines, dependencies, and expiration (workshop) — tasks are a prototype of the missing layer, and a functioning knowledge base needs both plus bridges between them
- A good agentic KB maximizes contextual competence through discoverable, composable, trustworthy knowledge (note) — Theory of why commonplace's arrangements work — three properties (discoverable, composable, trustworthy) serve contextual competence under bounded context; accumulation is the basic learning operation (reach distinguishes facts from theories); constraining, distillation, and discovery transform accumulated knowledge; Deutsch's reach criterion distinguishes knowledge that transfers from knowledge that merely fits
- A knowledge base should support fluid resolution-switching (note) — Good thinking requires moving between abstraction levels — broad for context, narrow for mechanism, back out for pattern. A KB's quality should be measured by how fluidly it supports this resolution-switching, not just retrieval accuracy.
- ACE (note) — Playbook-learning loop with generator, reflector, and curator roles — learns from execution feedback by scoring bullets and appending new playbook entries, without weight promotion
- Ad hoc prompts extend the system without schema changes (note) — Any system with an LLM agent layer can absorb new requirements through natural language prompts without changing the deterministic base
- Agent context is constrained by soft degradation, not hard token limits (note) — The binding constraint on agent context is silent degradation across multiple dimensions (volume, complexity, possibly irrelevant context), not the hard token limit providers advertise
- Agent orchestration needs coordination guarantees, not just coordination channels (note) — Coordination channels say how bounded contexts interact, but the missing discriminator is which guarantee prevents contamination, inconsistency, amplification, or liability diffusion across the composed system
- Agent orchestration occupies a multi-dimensional design space (note) — Agent orchestration is not ordered along a single ladder — scheduler placement, persistence, coordination form, coordination guarantees, and return artifacts vary independently across architectures
- Agent runtimes decompose into scheduler context engine and execution substrate (note) — Practitioner runtime taxonomies converge on three separable components — scheduler, context engine, and execution substrate — because each solves a different class of model limitation
- Agent Skills for Context Engineering (note) — Skill-based context engineering framework — 14 instructional modules covering attention mechanics, multi-agent patterns, memory, evaluation. Strong on operational patterns, weaker on learning theory.
- Agent statelessness makes routing architectural, not learned (note) — Agents never develop navigation intuition — every session is day one — so all knowledge routing infrastructure (skills, type templates, routing tables, naming conventions, activation triggers) is permanent architecture, not scaffolding that learners outgrow
- Agent statelessness means the context engine should inject context automatically (structured-claim) — Since agents can't carry vocabulary or decisions between reads, the context engine should auto-inject referenced context — definitions once per session, ADRs when relevant. The trigger mechanism is open; the need follows from statelessness
- Agent-R (note) — Iterative self-training agent that mines MCTS search trees into revision conversations and weight-update datasets, using strong environment rewards rather than persistent artifact memory
- Agentic systems interpret underspecified instructions (note) — LLM-based systems have two distinct properties — semantic underspecification of natural language specs (the deeper difference from traditional programming) and execution indeterminism (present in all practical systems) — the spec-to-program projection model captures the first, which indeterminism tends to obscure
- Agents navigate by deciding what to read next (note) — Context surrounding a pointer determines how cheaply an agent judges relevance without loading the target; inline links carry most, search results least — descriptions are load-bearing
- AGENTS.md should be organized as a control plane (note) — Theory for deciding what belongs in AGENTS.md using loading frequency and failure cost, with layers, exclusion rules, and migration paths
- Alexander's patterns connect to knowledge system design at multiple levels (note) — Alexander's pattern structure (Context/Problem/Forces/Solution) anticipates typed document contracts; his generative-process argument supports incremental codification over upfront type systems. Connection strengthens at concrete levels, grows vague at the 'centers' level.
- Always-loaded context mechanisms in agent harnesses (note) — Survey of always-loaded context mechanisms across agent harnesses — system prompt files, capability descriptions, memory, and configuration injection — cataloguing what each carries, how write policies differ, and where the gaps are
- Analysis: Adaptation of Agentic AI (arXiv:2512.16301) — Catalogues data-driven constraining and relaxing signals from an agentic AI adaptation taxonomy — maps the paper's agent/tool × execution/output signal grid onto llm-do's neural/symbolic spectrum and the constrain/relax cycle
- Any symbolic program with bounded calls is a select/call program (note) — Any program whose symbolic execution between bounded LLM calls can be reified as explicit state can be mechanically converted into the select/call loop with the same call sequence
- Apparent success is an unreliable health signal in framework-owned tool loops (note) — When framework-owned tool loops recover from broken tools via agent workarounds, final success stops being a reliable signal that the underlying scripts and workflows are healthy
- Architecture (index) — How commonplace is structured and installed — repo layout, two-tree split, control-plane design, file-based storage
- Areas exist because useful operations require reading notes together (note) — Areas are defined by operations that require reading notes together — orientation and comparative reading — which need sets that are both small enough for context and related enough to yield results
- Ars Contexta (note) — Claude Code plugin that generates knowledge systems from conversation, backed by 249 research claims. Ancestor of our KB — we borrowed link semantics, propositional titles, and three-space architecture, then diverged in theory and structure.
- Autocontext (note) — Closed-loop control plane for iterative agent improvement via multi-role orchestration (competitor/analyst/coach/architect), tournament evaluation, and accumulated playbook context — strongest reference for automated iterative learning loops, but the "context compilation" is concatenation with budget-aware trimming, not transformation
- Automated synthesis is missing good oracles (note) — Generating synthesis candidates (cross-note connections, novel combinations) is easy — LLMs do it readily. The hard part is evaluating whether a candidate is genuine insight or noise.
- Automated tests for text (note) — Text artifacts can be tested with the same pyramid as software — deterministic checks, LLM rubrics, corpus compatibility — built from real failures not taxonomy
- Automating KB learning is an open problem (note) — The KB already learns through manual work (every improvement is capacity change per Simon). The open problem is automating the judgment-heavy mutations — connections, groupings, synthesis — which require oracles we can't yet manufacture.
- Backlinks — use cases and design space (note) — Four use cases for inbound link visibility (hub identification, source-to-theory bridging, impact assessment, tension surfacing) with four design options and their maintenance trade-offs
- Bounded-context orchestration model (note) — Formalises agent orchestration as a symbolic scheduler driving bounded LLM calls through a select/call loop — explains why selection is hard while still supporting local strategy comparisons
- Brainstorming how to enrich web search (note) — Design exploration for enriching web search by reusing /connect's dual discovery and articulation testing on results, building a temporary research graph before bridging to KB
- Brainstorming: how reach informs KB design (note) — Brainstorming on Deutsch's "reach" concept applied to KB notes — reach is a maintenance risk signal (not a retrieval signal) because high-reach revisions break downstream reasoning silently
- Brainstorming: how to test whether pairwise comparison can harden soft oracles (note) — Staged test plan for whether pairwise comparison improves soft-oracle properties (discrimination, stability, calibration) in LLM evaluation loops
- Capability placement should follow autonomy readiness (note) — Three tiers — skills (autonomous-ready), instructions (reusable-but-steered), methodology notes (exploratory) — keep AGENTS.md free of capability inventories with a clear promotion path
- cass-memory (note) — Three-layer cognitive architecture (episodic/working/procedural) with confidence-decayed playbook bullets, Jaccard conflict detection, and cross-agent session mining across AI coding agents
- Changing requirements conflate genuine change with disambiguation failure (note) — Agile's 'changing requirements' hide two distinct phenomena — genuine change (world moved) and late discovery that downstream specs committed to a wrong interpretation of an underspecified upstream spec — short iterations limit interpretation-error propagation, not just change-response latency
- Claim notes should use Toulmin-derived sections for structured argument (structured-claim) — Three independent threads converged on Toulmin's argument structure — adopting Toulmin sections as base type
structured-claimseparates claim-titled notes (any note) from fully argued claims (the type) - Claw learning is broader than retrieval (note) — A Claw's learning loop must improve action capacity (classification, planning, communication), not just retrieval — question-answering is one mode among many
- Claw learning loops must improve action capacity not just retrieval (note) — A Claw learning loop must target contextual competence (execution, classification, planning, communication), not just retrieval accuracy — question-answering is one mode among many
- ClawVault (note) — TypeScript memory system for AI agents with scored observations, session handoffs, and reflection pipelines — has a working workshop layer where we have theory, making it the strongest source of borrowable patterns for ephemeral knowledge
- Cludebot (note) — Generative Agents memory SDK with five-type decay, six-phase dream cycles, entity graph, Hebbian reinforcement, and clinamen anomaly retrieval; richest trajectory-to-lesson loop reviewed
- Codification (note) — Definition — codification is constraining that crosses a medium boundary from natural language to a symbolic medium (code), where the consumer changes (LLM → interpreter) and verification becomes exact — the far end of the constraining spectrum
- Codification and relaxing navigate the bitter lesson boundary (note) — Since you can't identify which side of the bitter lesson boundary you're on until scale tests it, practical systems must codify and relax — with spec mining avoiding the vision-feature failure mode
- Codified scheduling patterns can turn tools into hidden schedulers (note) — As agent behavior matures, deterministic next-step policies need explicit control logic; if the framework offers only tools, scheduling patterns end up there and the tools become hidden schedulers
- Cognee (note) — Pipeline-first knowledge engine (add/cognify/memify/search) with Pydantic-schema graph extraction, poly-store backends, and multi-tenancy — the strongest database-side counterexample to files-first architecture, but treats knowledge as a data engineering problem rather than a curation problem
- Commonplace architecture (note) — The repo's own layout (kb/, sources, instructions, scripts) as distinct from the two-tree installed layout; global types inlined in CLAUDE.md rather than kb/types/
- Computational model (index) — Tag index — PL concepts (scoping, homoiconicity, partial evaluation, typing) applied to LLM instructions, plus the scheduling architecture that follows from context scarcity
- Constraining (note) — Definition — constraining narrows the space of valid interpretations an underspecified spec admits, from partial narrowing (conventions, structured sections) to full commitment (stored outputs, deterministic code) — one of two co-equal learning mechanisms alongside distillation
- Constraining and distillation both trade generality for reliability, speed, and cost (note) — Constraining narrows interpretation (largest gain at codification, where substrate changes); distillation extracts under a context budget. Same capacity decomposition, different operations
- Constraining during deployment is continuous learning (note) — Continuous learning can happen outside of weights; constraining is one symbolic-artifact form where prompts, schemas, tools, and tests accumulate durable adaptive capacity during deployment
- Context efficiency is the central design concern in agent systems (note) — Context is the single scarce resource in agent systems — this note is the basis for deriving architectural responses from the soft-degradation cost model
- Context engineering (note) — Definition — context engineering is the discipline of designing systems around bounded-context constraints; its operational core is routing, loading, scoping, and maintenance for each bounded call
- Continuous learning requires durability, not weight updates (note) — The real disagreement is whether durable changes to tips, prompts, rules, schemas, tests, and memory artifacts count as learning; Simon's capacity-change definition says they do
- Conversation vs prompt refinement in agent-to-agent coordination (note) — Conversation preserves the execution trace; prompt refinement compresses it into a clean handoff. The right choice depends on architecture and how much intermediate work should survive
- Convert still requires semantic description
- cq (note) — Local-first agent knowledge commons with SQLite local/team stores, approval-gated team sharing, and a plugin-packaged query/propose/confirm loop; strongest reviewed shared-learning reference so far
- CrewAI Memory (note) — Unified vector-memory system for agent crews with LLM-driven scope inference, composite scoring, and consolidation — sophisticated retrieval engineering but no learning theory, treating memory as infrastructure rather than a knowledge medium
- Decapod (note) — Rust governance kernel for AI coding agents that forces intent codification, proof-gated completion, and workspace isolation before code touches a repo — strongest reference for hard-oracle verification in agent workflows, but constitution documents claim transformations the code does not perform
- Decomposition heuristics for bounded-context scheduling (note) — Working heuristics for symbolic scheduling over bounded LLM calls — separate selection from joint reasoning, choose representations not just subsets, save reusable intermediates in scheduler state
- Deploy-time learning is agile for human-AI systems (note) — Argues deploy-time learning and agile share the same core innovation — co-evolving prose and code — but deploy-time learning extends it by treating some prose as permanently load-bearing
- Deploy-time learning is the missing middle (note) — Deploy-time learning fills the gap between training and in-context — durable symbolic artifacts provide inspectable adaptation across sessions along a verifiability gradient
- Deterministic validation should be a script (note) — Half of /validate's checks are hard-oracle (enums, link resolution, frontmatter structure) and could run as a Python script in milliseconds instead of burning LLM tokens via the skill
- Directory-scoped types are cheaper than global types (note) — Global types tax every session's context; directory-scoped types load only when working in that directory — most structural affordances are directory-local, so the type system should match that economy
- Discovery is seeing the particular as an instance of the general (note) — Proposes that discovery has a dual structure — positing a new general concept while recognizing existing particulars as instances of it — and that similarity-based connections vary by abstraction depth (shared feature → shared structure → generative model), not link kind. Scoped to similarity connections; contrastive and causal links are a different axis.
- Distillation (note) — Definition — distillation compresses knowledge so a consumer can act on it within bounded context, making operations feasible that raw source material would exceed; co-equal learning mechanism alongside constraining
- Distillation status determines directory placement (note) — Hunch that procedural artifacts distilled for execution belong in kb/instructions/ — the directory boundary is "distilled into a procedure", not "compressed" or "frequently loaded"
- Distilled artifacts need source tracking at the source (note) — Distilled artifacts should not link back to sources (focus), but sources should link forward to distilled targets ("Distilled into:") so that source changes trigger staleness review of downstream artifacts
- Document classification (spec) — Taxonomy overview — the base types table and migration from old flat types; global field definitions, status, and traits live in types/note.md
- Document system (index) — Index of notes about document types, writing conventions, validation, and structural quality — how notes are classified, structured, and checked
- Document types should be verifiable (note) — Document types should assert verifiable structural properties, not subject matter — with a base type + traits model inspired by gradual and structural typing
- Dynamic Cheatsheet (note) — Test-time adaptive memory that carries forward a prompt-shaped cheatsheet across queries — artifact-learning via full cheatsheet rewrites and optional retrieval/synthesis, without weight updates
- Effective context is task-relative and complexity-relative not a fixed model constant (note) — Synthesizes Paulsen MECW, ConvexBench, and GSM-DC — usable context varies with task type, compositional complexity, and irrelevant context load, so nominal window size is a misleading abstraction
- Elicitation requires maintained question-generation systems (note) — Four elicitation strategies ordered by user expertise required, composable into review architectures with maintenance loops that prevent ossification
- Enforcement without structured recovery is incomplete (note) — The enforcement gradient covers detection and blocking but has no recovery column — recovery strategies (corrective → fallback → escalation) are the missing layer, and oracle strength determines which are viable at each level
- Entropy management must scale with generation throughput (note) — In agent-maintained systems, cleanup throughput must match generation throughput — agents replicate existing patterns including bad ones, so without proportional maintenance, quality degrades as a function of output volume
- Ephemeral computation prevents accumulation (note) — Ephemeral computation — discarding generated artifacts after use — trades accumulation for simplicity, making it the inverse of codification
- Ephemerality is safe where embedded operational knowledge has low reach (note) — Kirsch's barriers all mark cases where software carries decisions that must survive into future runs, users, and audits; ephemerality is safe only when that knowledge stays local
- Error correction works with above-chance oracles and decorrelated checks (note) — Error correction for LLM output is viable whenever the oracle has discriminative power (TPR > FPR) and checks are decorrelated — amplification cost scales with 1/(TPR-FPR)² and independence of errors
- Error messages that teach are a constraining technique (note) — In agent systems the error channel is an instruction channel — making errors teach the fix is nearly free and eliminates the agent's need to diagnose, an orthogonal axis to enforcement strength
- Evaluation (index) — What works, what doesn't, what needs testing — empirical observations about KB operations and prompt design
- Evaluation automation is phase-gated by comprehension (note) — Optimization loops require manual error analysis and judge calibration before automation can improve behavior rather than just score
- Evolving understanding needs re-distillation, not composition (note) — When understanding evolves, reconciling fragments into a coherent picture can exceed effective context; a pre-distilled narrative keeps the whole picture within feasible bounds
- Execution indeterminism is a property of the sampling process (note) — The same prompt can produce different outputs across runs due to token sampling — this is a property of the execution engine, theoretically eliminable but practically ubiquitous, and often confused with the deeper issue of underspecification
- ExpeL (note) — Cross-task experiential learning pipeline that gathers trajectories, maintains natural-language rules, and retrieves past traces at inference; reviewed source promotes artifacts, not weights
- Files beat a database for agent-operated knowledge bases (note) — Files beat a database early on — a schema commits to access patterns before you know them, and files let you constrain incrementally while getting free browsing, versioning, and agent access from day one
- First-principles reasoning selects for explanatory reach over adaptive fit (note) — Deutsch's adaptive-vs-explanatory distinction — explanatory knowledge has "reach" (transfers to new contexts) because it captures why, not just what works; grounds the KB's first-principles filter as selecting for reach over fit
- Flat memory predicts specific cross-contamination failures that are empirically testable — Flat memory predicts three cross-contamination failures — search pollution, identity scatter, insight trapping — testable via an observation protocol against real agent systems
- Foundations (index) — Core theory the rest of the KB builds on — contextual competence, bounded context, reach, design methodology, composability
- Frontloading spares execution context (note) — Pre-computing static parts of LLM instructions and inserting results spares execution context — the primary bottleneck in instructing LLMs; the mechanism is partial evaluation applied to instructions with underspecified semantics
- G-Memory is a mixed-substrate multi-agent memory harness (note) — Multi-agent memory harness that combines state-graph traces, task-neighborhood retrieval, and scored text insights for prompt-time reuse across fixed agent workflows
- Generate KB skills at build time, don't parameterise them (note) — Template generation pays the flexibility cost once at setup; runtime variables pay it on every use across every substitution site, with occasional LLM misreads
- getsentry/skills (note) — Sentry's shared skills repo with a skill-writer meta-skill that codifies the skill creation process itself — source-driven synthesis with depth gates, labeled iteration, description-as-trigger optimization, and the Agent Skills cross-tool spec
- Hindsight (note) — Database-backed biomimetic agent memory with LLM-driven fact extraction, four-way parallel retrieval (semantic + BM25 + graph + temporal), auto-consolidation into observations, and agentic reflection — strongest production evidence that three-space memory separation yields measurable retrieval gains
- Human writing structures transfer to LLMs because failure modes overlap (note) — Human writing genres evolved to prevent specific reasoning failures; the same structures help LLMs because LLMs exhibit empirically demonstrated human-like failure modes (content effects on reasoning) — per-convention transfer evaluation, not wholesale analogy
- Human-LLM differences are load-bearing for knowledge system design (note) — Knowledge systems both inherit human-oriented materials and produce dual-audience documents (human + LLM), making human-LLM cognitive differences a first-class design concern rather than a generic disclaimer
- HyperAgents (note) — Meta's self-referential agent-evolution harness using git diff lineage, Docker replay, and benchmark-scored parent selection; useful for deploy-time learning comparisons, but not a knowledge system
- In-context learning presupposes context engineering (note) — In-context learning only works when the right knowledge reaches the context window — the selection machinery that ensures this is itself learned and refined over deployment
- Indirection is costly in LLM instructions (note) — In code, indirection (variables, config, abstraction layers) is nearly free at runtime — in LLM instructions, every layer of indirection costs context and interpretation overhead on every read
- Information value is observer-relative (note) — The value of information depends on the observer — prior knowledge, computational capacity, tools, and goals determine what they can extract. Grounds distillation, discovery, and context arrangement as observer-relative operations.
- Inspectable substrate, not supervision, defeats the blackbox problem (note) — Chollet frames agentic coding as ML producing blackbox codebases — codification counters this not by requiring human review but by choosing a substrate (repo artifacts) that any agent can inspect, diff, test, and verify
- Instruction specificity should match loading frequency (note) — The loading hierarchy (CLAUDE.md → skill descriptions → skill bodies → task docs) should match instruction specificity to loading frequency — always-loaded context competes for attention every session
- Instructions are typed callables with document type signatures (note) — Skills and tasks are typed callables — they accept document types as input and produce types as output, and should declare their signatures like functions declare parameter types.
- Interpretation errors are failures of the interpreter (note) — Real LLMs produce outputs that no valid interpretation of the spec allows — violating explicit constraints, hallucinating, failing at fully specified bookkeeping — a property of the interpreter itself, absent from the idealised two-phenomena model
- KB goals in always-loaded context guide inclusion decisions (note) — Without explicit goals in the always-loaded control-plane file, agents cannot reject well-written but off-scope material — WRITING.md provides quality criteria but not domain scope
- KB maintenance (index) — Index of notes about keeping the KB healthy over time — detection of staleness and quality degradation, maintenance operations, and the dynamics that govern system entropy
- Knowledge storage does not imply contextual activation (note) — Distinguishes stored knowledge (retrievable on direct probe) from contextually activated knowledge (brought to bear during task execution without being directly queried); formalizes the activation gap and the expertise gap
- Learning is not only about generality (note) — Per Simon, any capacity change is learning; accumulation is the most basic learning operation and reach is its key property — facts (low reach) vs theories (high reach); capacity also decomposes into generality vs a reliability/speed/cost compound
- Learning theory (index) — Index of notes about how systems learn, verify, and improve — accumulation, reach, constraining, distillation, discovery, oracle theory, and memory architecture
- Legal drafting solves the same problem as context engineering (note) — Legal drafting parallels context engineering because both write ambiguous natural-language specifications for judgment-based interpreters, but law develops constraining more than codification
- Link graph plus timestamps enables make-like staleness detection (note) — Existing links already encode dependency information; comparing note and target timestamps flags notes that may be stale without any new annotation, analogous to make's file-based rebuild logic.
- Link strength is encoded in position and prose (note) — Not all links are equal — inline premise links ("since [X]") carry more weight than footer "related" links. Position and prose encode commitment level, creating a weighted graph that affects traversal, scoring, and quality signals.
- Link-following and search impose different metadata requirements (note) — Link-following is local with context; search is long-range with titles/descriptions; indexes bridge both modes
- Linking theory (note) — Links are decision points; link quality is the reduction of navigation uncertainty per token of context consumed. Grounds our relationship vocabulary, title-as-claim, and position-encodes-strength practices under one model.
- Links (index) — Index of notes about linking — how links work as decision points, navigation modes, link contracts, and automated link management
- LLM context is a homoiconic medium (note) — LLM context windows are homoiconic — instructions and data share the same representation (natural language tokens), so there is no structural boundary between program and content, producing both the extensibility benefits and the scoping hazards of Lisp, Emacs, and Smalltalk
- LLM context is composed without scoping (note) — LLM context is flat concatenation — no scoping, everything global, producing dynamic scoping's pathologies (spooky action at a distance, name collision, inability to reason locally) but without even a stack; sub-agents are the one mechanism that provides isolation through lexically scoped frames
- LLM interpretation errors (index) — Three sources of deviation between intended and actual LLM output — prompt underspecification, execution indeterminism, and interpreter failure — plus oracle theory, error correction, and architectural responses for managing each
- LLM learning phases fall between human learning modes rather than mapping onto them (note) — Pre-training acquires both structural priors (evolution's role in humans) and world knowledge in one pass — making it and in-context learning intermediate on the evolution-to-reaction spectrum
- LLM-mediated schedulers are a degraded variant of the clean model (note) — When the agent scheduler lives inside an LLM conversation it becomes bounded and degrades; three recovery strategies — compaction, externalisation, factoring into code — restore the clean separation to increasing degrees
- Maintenance operations catalogue should stage distillation into instructions (note) — Catalogue of periodic KB maintenance operations and distillation status, used as a staging ground before promotion into kb/instructions procedures
- MCP bundles stateless tools with a stateful runtime (note) — MCP forces stateless tool operations through a persistent server process — most tools are pure functions that don't need session state, connections, or lifecycle management, but pay the complexity tax anyway
- Mechanistic constraints make Popperian KB recommendations actionable (note) — Bounded context and underspecification don't just permit conjecture-and-refutation — they require it; derives three concrete practices (falsifier blocks, contradiction-first connection, rejected-interpretation capture) from KB mechanics.
- Memory management policy is learnable but oracle-dependent (note) — AgeMem stores facts in memory but learns the governing policy in weights; it is a clean subsymbolic case of durable learning, but one that depends on task-completion oracles the KB lacks
- Methodology enforcement is constraining (note) — Instructions, skills, hooks, and scripts form a constraining gradient for methodology — from underspecified and indeterministic (LLM interprets and may not follow) to fully deterministic (code always runs), with hooks occupying a middle ground of deterministic triggers with indeterministic responses
- Minimum viable vocabulary is the set of names that maximally reduces extraction cost for a bounded observer (note) — Reframes "minimum viable ontology" as an optimization problem — the vocabulary that, once acquired, maximally reduces a bounded observer's extraction cost for a domain; grounds the pedagogical intuition of "conceptual thresholds" in the KB's information-theoretic framework
- Napkin (note) — Obsidian-vault CLI with NAPKIN.md pinned context, TF-IDF overviews, agent-shaped search defaults, and pi-based auto-distill — clearest reference for adapting Obsidian into an agent memory interface
- Notes need quality scores to scale curation (note) — As the KB grows, /connect will retrieve too many candidates — note quality scores (status, type, inbound links, recency, link strength) filter candidates and prioritise what's worth connecting
- Nuggets (note) — Pi-coupled personal memory assistant with local HRR nugget files and chat-channel scheduling - strongest reference for file-backed scratch memory, though its promotion loop is only partially wired
- Observability (index) — Index of notes about making hidden state, hidden failure, and quality drift visible — runtime inspectability, degraded-execution signals, and maintenance-oriented detection mechanisms
- OpenSage (note) — ADK-based agent framework where agents create subagents and tools at runtime, with Neo4j graph memory, Docker sandboxes, and RL training hooks — strongest reference for self-modifying agent topology
- OpenViking (note) — Filesystem-paradigm context database for AI agents with L0/L1/L2 tiered loading, hierarchical recursive retrieval, three context types (Resource/Memory/Skill), and session-driven memory extraction — auto-generates and guarantees the same three-tier progressive disclosure pattern (link phrase / description / full text) that our system achieves through convention, but unifies all context into a virtual filesystem where the metaphor may promise more structure than it delivers
- Operational signals that a component is a relaxing candidate (note) — Six operational signals — five early-detection (paraphrase brittleness, isolation-vs-integration gap, process constraints, unspecifiable failure modes, distribution sensitivity) plus composition failure as late-stage confirmation — for shifting confidence about whether a component encodes theory or specification.
- Oracle strength spectrum (note) — Exploratory framework — proposes oracle strength (how cheaply you can verify correctness) as a gradient underlying the bitter lesson boundary, with hypotheses about engineering priorities and an oracle-hardening pipeline
- Periodic KB hygiene should be externally triggered, not embedded in routing (note) — Routing instructions load every session for high-frequency decisions; periodic hygiene adds noise on every session while helping only occasionally, blurring routing and operations
- Pi Self-Learning (note) — Pi extension with automatic task-end reflection, scored learnings index, temporal memory hierarchy (daily/monthly/core), and context injection — purest implementation of the automated mistake-extraction loop among reviewed systems, but the reflection pipeline relocates rather than transforms
- Pointer design tradeoffs in progressive disclosure (note) — Design tradeoffs for progressive disclosure pointers — context-specificity vs precomputation cost vs reliability; fixed pointers (descriptions, abstracts) trade specificity for reliability and cheap reads, query-time pointers (re-rankers) trade cost for specificity, crafted pointers (link phrases) achieve highest density but depend on authoring discipline
- Process structure and output structure are independent levers (note) — Constraining what reasoning steps must occur (process structure) is an independent lever from constraining what the result looks like (output structure) — the KB's structured-reasoning cluster conflates the two, but the agentic-code-reasoning evidence shows process constraints driving accuracy gains where output format alone would not
- Programming patterns get a fast pass but other borrowed ideas must earn first-principles support (note) — We borrow from any source but adopt based on first-principles support — except programming patterns, which get a fast pass because the bet is that knowledge bases are a new kind of software system
- Prompt ablation converts human insight into deployable agent framing (note) — Methodology for testing prompt framings — uses controlled variation against a human-verified finding to identify which cognitive moves agents can reliably execute, then deploys the winning framing as instruction
- Quality signals for KB evaluation (note) — Catalogues graph-topology, content-proxy, and LLM-hybrid signals that could be combined into a weak composite oracle to drive a mutation-based KB learning loop without requiring usage data.
- ReasoningBank (note) — Trajectory-to-memory pipeline extracting structured items from successes and failures, with embedding retrieval and parallel-trajectory test-time scaling; append-only, sits between Reflexion and ExpeL
- Reflexion (note) — Verbal reinforcement loop that converts failed attempts into short natural-language reflections reused on later tries — early trajectory-based artifact learning without weight updates
- Related Systems (index) — Comparable knowledge/agent systems tracked for evolving ideas, convergence signals, and borrowable patterns
- Reliability dimensions map to oracle-hardening stages (note) — The four reliability dimensions from Rabanser et al. (consistency, robustness, predictability, safety) each harden a different oracle question — mapping empirical agent evaluation onto the oracle-strength spectrum
- REM is a database-heavy episodic memory service with LLM consolidation (note) — Four-database episodic memory service (Postgres+Qdrant+Neo4j+Redis) with LLM consolidation from episodes to scored semantic facts and temporal graph expansion; heaviest infra, thinnest transformation
- Reverse-compression (inflation) is the failure mode where LLM output expands without adding information (note) — LLMs can inflate a compact seed into verbose prose that carries no more extractable structure — the test for whether a KB resists this is whether notes accumulate epiplexity across the network, not just token count
- RLM has the model write ephemeral orchestrators over sub-agents (note) — RLM packs orchestration over sub-agents into the tool-loop model by having the model write orchestrators in a REPL — elegant but ephemeral because the orchestrators are discarded after each run
- SAGE (Sovereign Agent Governed Experience) (note) — BFT-branded agent memory with CometBFT consensus, Ed25519 signing, application-level validators, confidence decay, and encryption — consensus is ceremony in single-node mode; real value is the validation gate pattern and domain-scoped RBAC
- Scenario decomposition drives architecture (note) — Deriving architectural requirements by decomposing concrete user stories into step-by-step context needs — not from abstract read/write operations but from what the agent actually has to load at each stage, in both the commonplace repo and installed projects
- Scheduler-LLM separation exploits an error-correction asymmetry (note) — Bookkeeping and semantic operations have different error profiles across all three phenomena (underspecification, indeterminism, bias) — symbolic substrates eliminate all three for bookkeeping; mixing forces bookkeeping onto the expensive semantic-correction substrate
- Selector-loaded review gates could let review-revise learn from accepted edits (note) — Brainstorm on learning reusable review gates from accepted note edits: mine candidate gates from before/after diffs, store them atomically, and load a bounded subset into future reviews
- Semantic review catches content errors that structural validation cannot (note) — Four specific semantic checks (enumeration completeness, grounding alignment, boundary-case coverage, internal consistency) that require LLM adversarial reading — structural validation catches form errors but misses content errors like incomplete enumerations that contradict their own grounding definitions
- Semantic sub-goals that exceed one context window become scheduling problems (note) — Some semantic subgoals exceed one context window, so they must be partitioned into smaller semantic judgments with symbolic collection, filtering, and staged summarization between them
- Session history should not be the default next context (note) — Storing execution history and loading it into the next agent call are separate decisions; chat and framework-owned tool loops conflate them by making session history the default next context
- Short composable notes maximize combinatorial discovery (note) — The library's purpose is to produce notes that can be co-loaded for combinatorial discovery — short atomic notes are a consequence of this goal; longer synthesized artifacts belong in workshops or distilled instructions
- sift-kg (note) — LLM-powered document-to-knowledge-graph pipeline with schema discovery, human-in-the-loop entity resolution, and interactive visualization
- Siftly (note) — Next.js + SQLite bookmark ingestion system whose deterministic-first, resumable enrichment pipeline offers concrete patterns for scaling KB source loading with explicit progress state
- Silent disambiguation is the semantic analogue of tool fallback (note) — When an agent silently resolves unacknowledged material ambiguity in a spec, final success hides that the contract failed to determine the path — an extension of the tool-fallback observability problem
- Skills are instructions plus routing and execution policy (note) — Skills add structured discovery, user-facing invocation, and declarative execution policy (tool permissions, model override, context isolation) beyond the shared procedure
- Skills derive from methodology through distillation (structured-claim) — The methodology→skill relationship is distillation (extracting operational procedures from discursive reasoning in the same medium) — distinct from codification (prompt→code phase transition) and constraining (narrowing output distribution)
- Soft-bound traditions as sources for context engineering strategies (note) — Survey of twelve soft-bound traditions as candidate sources for context engineering strategies, with a three-tier assessment of what transfers, what's plausible, and what's blocked
- Solve low-degree-of-freedom subproblems first to avoid blocking better designs (note) — Ordering heuristic for decomposition: commit first to decisions with the fewest viable options, then place flexible choices around them to preserve global optionality.
- Spacebot (note) — Rust concurrent agent framework whose process-type architecture (channels, branches, workers, cortex) is the cleanest production implementation of code-level scheduling over bounded LLM calls among reviewed systems
- Spec mining is codification's operational mechanism (note) — Operationalizes codification by extracting deterministic verifiers from observed stochastic behavior — the mechanism that converts blurry-zone components into calculators
- Specification strategy should follow where understanding lives (note) — Among durable artifacts, spec-first, bidirectional spec, and spec mining fit different phases: when understanding is available upfront, discovered during execution, or only visible after observation
- Specification-level separation recovers scoping before it recovers error correction (note) — OpenProse-like DSLs expose control flow and discretion boundaries while leaving scheduling and validation on the LLM substrate, creating an intermediate regime between flat prompting and symbolic scheduling
- Stale indexes are worse than no indexes (note) — An agent trusts an index as exhaustive — a missing entry doesn't trigger search, it makes the note invisible
- Stateful tools recover control by becoming hidden schedulers (note) — Granting the strongest stateful-tool escape hatch shows that recovered control comes from relocating the scheduler into an exceptional tool or runtime, not from the framework loop itself
- Storing LLM outputs is constraining (note) — Choosing to keep a specific LLM output resolves semantic underspecification to one interpretation and freezes it against execution indeterminism — the same constraining move the parent note describes for code, applied to artifacts
- Structure activates higher-quality training distributions (note) — Structured templates like Evidence/Reasoning sections steer autoregressive generation toward higher-quality training data (scientific papers, legal analyses) rather than unstructured web text — the structure acts as a distribution selector
- Structured output is easier for humans to review (note) — Separated Evidence and Reasoning sections let human reviewers check facts and logic independently — a purely readability argument that doesn't depend on LLM behavior at all
- Substrate class, backend, and artifact form are separate axes that get conflated (note) — Tips, notes, rules, prompts, schemas, and playbooks belong to one symbolic artifact substrate even when stored in repos, databases, or memory services; backend and artifact form are separate axes
- Subtasks that need different tools force loop exposure in agent frameworks (note) — When decomposition creates child tasks with different tool surfaces, the parent must construct fresh calls for each child, so a framework-owned loop is no longer the right control surface
- Supermemory (note) — Monorepo that open-sources Supermemory's MCP/SDK integration layer while delegating core memory extraction, contradiction handling, and profile synthesis to hosted /v3 and /v4 APIs
- Synthesis is not error correction (note) — Synthesis propagates errors by merging all agent outputs; voting corrects errors by discarding minorities — Kim et al.'s 17.2× amplification is a synthesis failure, not evidence against multi-agent coordination
- Systematic prompt variation serves verification and diagnosis, not explanatory-reach testing (note) — Controlled prompt variation either decorrelates checks or measures brittleness under fixed task semantics; Deutsch's variation test instead changes the explanation to test mechanism and reach
- Tags (index) — Hub for all tag indexes — browse the KB by conceptual domain rather than by directory
- Text testing framework — source material
- Thalo (note) — Custom plain-text language for knowledge management with Tree-Sitter grammar, typed entities, 27 validation rules, and LSP. Makes the same programming-theory-over-psychology bet we do, but went further into formalization with a custom DSL.
- Thalo entity types compared to commonplace document types (note) — Reference for borrowing recurring note shapes from Thalo — their entity types (opinion, reference, lore, journal, synthesis) map onto our types with concrete gaps still open (supersedes links, source status tracking)
- The augmentation-automation boundary is discrimination not accuracy (note) — Crossing from augmentation to automation requires per-instance discrimination, not aggregate accuracy — discrimination is empirically stagnant, so scaling capability alone cannot cross the boundary
- The bitter lesson has a boundary (note) — The boundary is whether the spec fully captures the problem — arithmetic specs define the problem (scale can't replace them), vision-feature specs approximate it (scale eats them)
- The boundary of automation is the boundary of verification (note) — Synthesis — three lines of evidence (oracle theory, labor economics, frontier-lab capability predictions) with distinct reasoning paths converge on verification cost as the primary structural determinant of automation
- The chat-history model trades context efficiency for implementation simplicity (note) — Chat history persists because appending messages preserves information and avoids interface design, but that convenience trades away selective loading under bounded context
- The fundamental split in agent memory is not storage format but who decides what to remember (note) — Comparative analysis of eleven agent memory systems across six architectural dimensions — storage unit, agency model, link structure, temporal model, curation operations, and extraction schema — revealing that the agency question (who decides what to remember) is the most consequential design choice and that no system combines high agency, high throughput, and high curation quality.
- The wikiwiki principle: lowest-friction capture, then progressive refinement in place (note) — Ward Cunningham's wiki design principle — minimize capture friction, refine in place — drives the text→note→structured-claim codification ladder
- Three-space agent memory echoes Tulving's taxonomy but the analogy may be decorative — The value of separating knowledge, self, and operational memory is that each has a different lifecycle — accumulation, slow evolution, and high churn; whether the Tulving mapping adds explanatory power beyond different retention policies is open
- Title as claim enables traversal as reasoning (note) — When note titles are claims rather than topics, following links between them reads as a chain of reasoning — the file tree becomes a scan of arguments, and link semantics (since, because, but) encode relationship types
- Title as claim exposes commitments, enabling Popperian maintenance (note) — When an index is a list of claims rather than topics, reviewing the KB becomes scanning hypotheses — each title exposes its commitment and invites the question "do I still believe this?" without opening the file
- Title as claim makes overlap between notes visible (note) — When note titles are claims, overlap between notes is visible at the index level — similar assertions are obvious without opening files; topical titles hide overlap behind different labels for the same territory
- Tool loop (index) — Index for the tool-loop argument — the framework-owned tool loop is useful but should yield control when tasks need different tool surfaces, exceed one context window, or codify scheduling
- Topology, isolation, and verification form a causal chain for reliable agent scaling (note) — Decomposition, scoping, and verification may form a strict dependency chain (topology → isolation → verification) rather than independent design choices — tests the simpler account that decomposition alone implies the other two
- Trace-derived learning techniques in related systems (note) — Sixteen code-inspected systems compared on trace ingestion pattern, promotion target (symbolic artifacts vs weights), artifact structure spectrum, and maintenance paths
- Traditional debugging intuitions break when tool loops can recover semantically (note) — Programmers trained on traditional software expect broken infrastructure to fail loudly; semantic recovery in agent tool loops violates that expectation, so successful outcomes can create false confidence during debugging and maintenance
- Traversal improvements should be deferred via logging to avoid mid-task context switching (note) — Loading writing methodology into an already-committed context window is expensive; a one-line log entry preserves the improvement signal at near-zero cost and lets a separate pass do the fix
- Two context boundaries govern collection operations (note) — Any note collection faces two context boundaries — a full-text boundary where all bodies can be loaded together, and an index boundary where all titles+descriptions fit — creating three operational regimes that govern areas, /connect, and whole-KB operations differently
- Type system (index) — Index of notes about the document type system — why types exist, what roles they serve, how they improve output quality, and how they're structured
- Type system enforces metadata that navigation depends on (note) — Descriptions don't appear spontaneously — they exist because the note base type requires them; without enforcement, metadata degrades and navigation collapses to opening every document
- Types give agents structural hints before opening documents (note) — Types and descriptions let agents make routing decisions without loading full documents — the type says what operations a document affords, the description filters among instances of that type
- Underspecification and indeterminism make programming practices harder in distinct ways when applied to prompting (note) — Indeterminism doubles test runs (statistical testing over distributions); underspecification doubles test targets (spec analysis for ambiguity). Conflating the two leads to misdiagnosis
- Unified calling conventions enable bidirectional refactoring between neural and symbolic (note) — When agents and tools share a calling convention, components can move between neural and symbolic without changing call sites — llm-do demonstrates this with name-based dispatch over a hybrid VM
- Unit testing LLM instructions requires mocking the tool boundary (note) — Skills are programs whose I/O boundary is tool calls — mocking that boundary creates controlled environments for testing whether instructions produce correct behavior, complementing text artifact testing with instruction-level regression detection
- Vibe-noting (note) — Vibe coding works because code is inspectable, not just verifiable — a KB adds that same inspectability to knowledge work, enabling augmentation even where automation is blocked on oracle construction
- Voyager (note) — Embodied lifelong-learning agent that turns successful Minecraft trajectories into reusable JavaScript skills with vector retrieval, automatic curriculum, and critic-gated refinement
- Why directories despite their costs (note) — Directories buy one–two orders of magnitude of human-navigable scale over flat files, and enable local conventions per subsystem — but each new directory taxes routing, search config, skills, and cross-directory linking
- Why notes have types (note) — Six roles of the type system — navigation hints, metadata enforcement, verifiable structure, local extensibility, output quality through structured writing discipline, and maturation through constraining
- Writing styles are strategies for managing underspecification (note) — The five empirically observed context-file writing styles (descriptive, prescriptive, prohibitive, explanatory, conditional) are not stylistic variation — they correspond to different strategies for narrowing the interpretation space agents face, trading off constraint against generalisability
- Zikkaron (note) — MCP memory server for Claude Code: 26 neuroscience-branded subsystems implemented as heuristic Python without LLM calls — vocabulary over mechanism, but compaction hooks and WRRF retrieval fusion are genuinely borrowable