Ingest: LLM Wiki

Type: kb/sources/types/ingest-report.md

Source: karpathy-llm-wiki.md Captured: 2026-04-04 From: https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

Classification

Type: conceptual-essay — although grounded in Karpathy's own practice, the gist mainly argues a general pattern and analogy for agent-maintained wikis rather than reporting a measured implementation or a concrete software design for one named system. Domains: knowledge-management, context-engineering, file-based-systems, agentic-workflows Author: Andrej Karpathy is a high-signal AI practitioner whose workflow choices often propagate into wider practice, but this is still a single-author manifesto rather than an inspectable system or empirical study.

Summary

Karpathy argues for replacing query-time rediscovery over raw documents with a persistent markdown wiki that an LLM incrementally maintains. The gist makes the pattern more explicit than his earlier short post: there are three layers, not two — immutable raw sources, an LLM-owned compiled wiki, and a schema/control file (AGENTS.md or CLAUDE.md) that defines structure and workflows. It also names a minimal lifecycle around ingest, query, and lint, and distinguishes two special coordination artifacts: index.md as the content-oriented map the model reads to navigate the wiki, and log.md as the chronological trace of what has happened recently. The core thesis is that knowledge work compounds when synthesis, cross-references, and outputs become durable file artifacts instead of being re-derived in chat on every question.

Connections Found

/connect placed this source in the file-first / control-plane / navigation cluster. It most directly extends the earlier Karpathy source LLM Knowledge Bases: the April 2, 2026 X post gave the workflow, while this April 4, 2026 gist turns it into an explicit architecture with raw/wiki/schema layers and separate roles for index.md and log.md. It exemplifies AGENTS.md should be organized as a control plane and Instruction specificity should match loading frequency because the schema file is treated as a first-class routing layer rather than incidental documentation. It also exemplifies Files beat a database for agent-operated knowledge bases, Agents navigate by deciding what to read next, and Vibe-noting: the wiki stays in inspectable markdown, maintained indexes and summaries guide what to load next, and Obsidian functions as the human-facing IDE over agent-maintained artifacts. The most useful extension is to Knowledge storage does not imply contextual activation: the compiled wiki is not just storage, it is an activation scaffold. The main tension is with A functioning knowledge base needs a workshop layer, not just a library, because Karpathy files queries, maintenance, and chronology back into one wiki where our note argues durable and temporal artifacts may need different lifecycles.

Extractable Value

[quick-win] Schema/control-plane is a first-class layer in agent-maintained wikis. The earlier Karpathy post already implied raw sources plus a compiled wiki; this gist adds the missing third layer explicitly. High reach: many KB discussions collapse storage and retrieval while leaving the control-plane layer implicit.
[quick-win] Indexes and logs are orthogonal coordination surfaces. index.md is a navigational map; log.md is a temporal trace. That separation is surprisingly crisp and transfers beyond this specific workflow. High reach: many KBs treat both as generic "metadata" and lose the distinct jobs each artifact performs.
[quick-win] "No fancy RAG" is better read as compiled symbolic navigation, not retrieval-free magic. The simpler account of the source is that the wiki itself contains the activation scaffolds — summaries, indexes, backlinks, concept pages — that replace a separate RAG stack at small-to-medium scale. High reach: this reframes a lot of "memory" discussion as pointer design and maintenance.
[experiment] Ingest/query/lint is a compact lifecycle for an agentic KB. The three-operation frame is a useful minimal operating model to compare against our own ingest/connect/validate workflow and against the workshop/library split. Medium-high reach: it is small enough to test and broad enough to generalize.
[experiment] The IDE/codebase analogy sharpens why inspectability matters for knowledge work. "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase" is a stronger and more memorable formulation of the vibe-noting argument than we currently have. Medium reach: the analogy is useful if kept at the inspectability level rather than overextended.
[deep-dive] The pattern targets action capacity, not just Q&A. The examples span personal tracking, research, reading, and team knowledge. That broadens the evaluation target in the direction of Claw learning loops must improve action capacity not just retrieval: the wiki is meant to support planning, organization, communication, and continuity, not just answering questions. Medium reach: the source gestures at this, but does not yet operationalize it.

Limitations (our opinion)

This is mostly a systematization of an existing claim, not a new evidence base. The gist was created on April 4, 2026, two days after the X post we already ingested on April 3, 2026. It sharpens the architecture, but does not add inspectable implementation details, failure cases, or comparative results.
The hard part is named, not solved. Raw/wiki/schema is a useful decomposition, but the central curation problem remains: how does the system know when a summary, backlink, contradiction flag, or article update is actually good? Automating KB learning is an open problem is the relevant caution: naming a control plane is easier than building the oracle that makes its mutations trustworthy.
The index-centric design inherits a maintenance burden. The gist explicitly makes index.md load-bearing for navigation. That means it inherits the failure mode in Stale indexes are worse than no indexes: if the map drifts, the wiki becomes misleading rather than merely incomplete.
The IDE/codebase analogy is suggestive but not yet explanatory. "Obsidian is the IDE" and "wiki is the codebase" names a resemblance, but it does not yet explain which properties of code actually transfer to knowledge artifacts and which do not. The simpler account may still just be "inspectable files plus low-friction maintenance."
Temporal and durable artifacts are still conflated. The source wants query outputs, health checks, and chronological logging all to "add up" in the wiki. Our workshop-layer note suggests this may not scale cleanly: some artifacts should probably be consumed, triaged, or promoted rather than accumulated directly in the durable library.

Recommended Next Action

Write a note titled "Indexes and logs are orthogonal coordination surfaces in agentic knowledge bases" connecting to agents-navigate-by-deciding-what-to-read-next, stale-indexes-are-worse-than-no-indexes, and traversal-improvements-should-be-deferred-via-logging-to-avoid-mid-task-context-switching. It would argue that indexes are navigational maps while logs are chronological traces, and that collapsing them into one artifact weakens both navigation and maintenance.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search