Sources Directory

Type: kb/types/index.md

← Parent

"Creative Thinking" (snapshot)
/problem-first: a simple skill to invert bad ideas (snapshot)
A harness for every task: dynamic workflows in Claude Code (snapshot)
A Mini Exercise on the Mismanaged Geniuses Hypothesis (RLMs on LongCoT) (snapshot) - Alex Zhang and Omar Khattab's LongCoT-mini RLM prompt-tuning case study, where guardrails and graph-aware decomposition raise GPT-5.2 RLM performance from 50.6% to 65.6%
A new way to think about composing skills to increase leverage: Skill Graphs 2.0 (snapshot)
A-Mem: Agentic Memory for LLM Agents (snapshot)
A-MEM: Learning Operations Analysis (note) - Dissects A-MEM's four fully-automatic operations (construct, link, evolve, retrieve) — all accretive, none curative — identifying the missing vocabulary (delete, split, reorganize, assess quality) that separates accumulation from curation
Adaptation of Agentic AI: A Survey of Post-Training, Memory, and Skills (snapshot) - Survey taxonomy of agentic AI adaptation across post-training, memory, skills, and agent/tool optimization paradigms
Agent Behavioral Contracts for Reliable Agents (snapshot)
Agent Harness for Large Language Model Agents: A Survey (snapshot) - Survey framing the LLM agent harness as a six-component runtime governance layer and primary determinant of deployed agent reliability
Agent Workflow Memory (snapshot) - AWM paper on inducing reusable web-agent workflows online or offline, with WebArena and Mind2Web evaluations against exemplar prompting and workflow baselines.
Agentic Code Reasoning (snapshot) - Semi-formal reasoning templates (explicit premises, execution traces, formal conclusions) improve LLM code verification by 5-12pp across patch equivalence, fault localization, and code QA tasks
Agentic Memory for LLM Agents (snapshot)
Agentic Note-Taking 23: Notes Without Reasons (snapshot)
Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity (snapshot) - Solution-injection benchmark paper showing agents often discover explicit task solutions but fail to exploit them.
AI Agents: Four Skills More Important Than a Good Prompt (snapshot) - English translation of Mezha article applying Anthropic's AI Fluency framework to agent use: delegation, description, discernment, and diligence as operator skills beyond prompting.
AI Components for a Deterministic System (An Example) (snapshot)
An Enigma of Artificial Reason (snapshot) - Paper introducing VAIR, a benchmark showing large reasoning models can produce correct answers while failing to evaluate invalid reasoning that reaches those answers
Andrej Karpathy talks about "Claws" (snapshot)
Artifacts as Memory Beyond the Agent Boundary (snapshot) - RL paper formalizing environmental artifacts as externalized memory that can reduce agent history capacity.
Automated linking improves retrieval but may degrade navigability (note) - Triangulates A-MEM, Notes Without Reasons, and the open-problem note — automated linking improves retrieval (QA benchmarks) but degrades navigability (agent trust in link infrastructure); the distinction is adjacency versus connection
Autoreason: Self-Refinement That Knows When to Stop (snapshot) - Self-refinement paper that makes "do nothing" a first-class candidate via blind fresh-agent Borda tournaments, finding gains mostly in mid-tier models with a generation-evaluation gap
Beyond Transformers: Sudoku Bench (snapshot) - Pathway's BDH model achieves 97.4% accuracy on extreme Sudoku while leading LLMs score 0%, using the gap as evidence that transformer architecture has fundamental limits for constraint-satisfaction reasoning and arguing for post-transformer latent-space models.
Beyond “Not Novel Enough”: Enriching Scholarly Critique with LLM-Assisted Feedback (snapshot) - LLM-assisted scholarly novelty-assessment paper using related-work retrieval and structured comparison against human novelty reviews.
Build Systems à la Carte (snapshot) - Framework unifying Make, Excel, Shake, Bazel, Buck, and Nix as scheduler+rebuilder combinations, formalizing minimality, dynamic dependencies, and early cutoff.
Building a Good Vertical Agent (snapshot)
Can LLM Agents Infer World Models? Evidence from Agentic Automata Learning (snapshot) - Agentic automata-learning benchmark evaluating whether tool-calling LLM agents infer hidden DFAs through membership and equivalence queries.
Claude Fable 5 Made Most of My Agent Scaffolding Obsolete. Here's What Survived. (snapshot)
Coding Agents are Effective Long-Context Processors (snapshot) - Benchmark paper arguing coding agents process long contexts better by turning text corpora into file-system-native tool workflows rather than latent attention or fixed retrieval
Cognee: Knowledge Engine for AI Agent Memory (snapshot)
Components of A Coding Agent (snapshot) - Raschka's breakdown of the six architectural components of a coding agent harness — distinguishing the harness from the model and arguing that context quality drives apparent model quality.
Context Engineering for AI Agents in Open-Source Software (snapshot)
Context Is What You Need: The Maximum Effective Context Window for Real World Limits of LLMs (snapshot) - Empirical study defining and measuring Maximum Effective Context Window (MECW) across 11 frontier LLMs — finds MECW is drastically smaller than advertised MCW, shifts by task type, and that large context windows cause hallucination rates to approach 100%.
Context providers: the missing layer between agents and tools (snapshot)
Continual Learning in Token Space (snapshot) - Letta reframes continual learning for agents as optimization over learned context rather than weights, arguing token-space memory is the primary transferable substrate for long-lived agents
ConvexBench: Can LLMs Recognize Convex Functions? (snapshot)
Dario Amodei — "We are near the end of the exponential" (snapshot) - Anthropic CEO's capability-timeline predictions — verifiable domains get confident timelines, unverifiable ones get hedged, implicitly confirming oracle-strength thesis
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models (snapshot) - Defines and measures semantic leakage — undue prompt-to-generation association from learned concept links — across 13 GPT and Llama models via control/test prompt pairs and Leak-Rate metric; instruction-tuned models leak more.
Emergent Analogical Reasoning in Transformers (snapshot) - Transformer analogy paper that treats analogical reasoning as relational-structure alignment plus functor-like vector transformation
EsoLang-Bench (snapshot) - OOD code benchmark using esoteric languages to separate transferable reasoning from benchmark memorization and contamination
Evaluating Long-Context Reasoning in LLM-Based WebAgents (snapshot) - Benchmark showing LLM-based web agents fail badly under long context with injected irrelevant task sequences — success rates drop from 40-50% to under 10% at 150k tokens, with loop and lost-objective failures dominating; implicit RAG provides only modest relief.
Everything you need to know about LLM memory (snapshot) - Notion essay arguing that LLM memory needs retrieval, salience, summarization, forgetting, and memory objects rather than raw chat logs.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering (snapshot) - Survey paper framing LLM agent progress as externalization into memory, skills, protocols, and harness engineering rather than only stronger model weights.
From Agent Loops to Structured Graphs: A Scheduler-Theoretic Framework for LLM Agent Execution (snapshot) - Position paper recasting the Agent Loop as a single-ready-unit scheduler and proposing Graph Harness (SGH) — a static-DAG execution model with immutable plan versions, three-layer separation, and bounded three-level recovery.
From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence (snapshot)
From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs (snapshot) - Survey proposing a 3D-8Q taxonomy (object/form/time) for LLM memory mechanisms, mapping human memory types (sensory, working, explicit, implicit) to AI implementations across eight quadrants — useful for comparing how different systems position their memory architectures.
Gentle-Coding (From here on is the old section...must do it for now) (snapshot) - README snapshot for Gentle-Coding, a proof-of-concept repository about kindness-oriented prompt framing for AI coding and reasoning tasks.
Geometry of Knowledge Allows Extending Diversity Boundaries of Large Language Models (snapshot) - Proposes a plug-in latent-conditioning framework that traverses semantic manifolds to expand LLM generation diversity without fine-tuning, with empirical gains on NoveltyBench and the Alternative Uses Test.
GIANTS: Generative Insight Anticipation from Scientific Literature (snapshot) - GIANTS benchmark and 4B model for predicting downstream scientific insights from two cited parent-paper summaries
Graphiti: Temporal Knowledge Graph for AI Agents (snapshot)
Harness Engineering Is Cybernetics (snapshot)
Harness Engineering: Leveraging Codex in an Agent-First World (snapshot)
How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark (snapshot) - Introduces GSM-DC, a controlled benchmark using symbolic DAGs to systematically measure how irrelevant context degrades LLM reasoning — quantifies power-law error scaling with distractor count, and shows Hard-IC training plus PRM-guided tree search are the most effective robustness interventions.
How to Build an AI Second Brain With Claude and Obsidian That Gets Smarter Every Day (Full Guide) (snapshot)
How to build your own agent harness??? (snapshot)
Human Bottlenecks (snapshot) - Fernando Borretti's argument that AI's transformative potential is capped by internal human bottlenecks — executive function, intelligence, and foundational knowledge — that software cannot substitute for.
Human Routers of Machine Words (snapshot) - Fernando Borretti's polemic arguing that outsourcing writing to AI is contemptible because writing is thinking — not transcription — and delegating it skips the cognitive work that makes ideas real.
Huxley-Gödel Machine (snapshot) - ICLR 2026 Huxley-Gödel Machine paper proposing clade-metaproductivity as a better search signal for self-improving coding agents than immediate benchmark score
Improving AI Skills with autoresearch & evals-skills (snapshot)
in-toto: Providing farm-to-table guarantees for bits and bytes (snapshot) - USENIX Security '19 paper introducing in-toto, a framework that cryptographically verifies the integrity of each step in a software supply chain from source to deployment via a signed layout and per-step link metadata.
Infinite midwit (snapshot) - Adam Mastroianni's objective-vs-subjective intelligence framing for why AI competence and benchmarks still miss taste, wisdom, and idea selection.
Ingest Report: SkillOpt: Executive Strategy for Self-Evolving Agent Skills (ingest-report) - SkillOpt paper showing validation-gated text-space optimization of compact agent skills as readable deploy-time learning around frozen models
Ingest: "Creative Thinking" (ingest-report) - Shannon's 1952 lecture cataloguing six explicit problem-solving operators (simplification, analogy, restatement, generalization, structural analysis, inversion) as a portable creative toolkit
Ingest: /problem-first: a simple skill to invert bad ideas (ingest-report) - Practitioner report on a PM skill that forces solution ideas back into problem statements, useful as a process-structure example for agent skills
Ingest: A harness for every task: dynamic workflows in Claude Code (ingest-report) - Anthropic practitioner account of dynamic workflows in Claude Code: model-authored ephemeral JS orchestrators that spawn and coordinate sub-agents
Ingest: A Mini Exercise on the Mismanaged Geniuses Hypothesis (RLMs on LongCoT) (ingest-report) - LongCoT-mini RLM case study where trace-derived prompt tips, guardrails, and sub-answer checking improved graph-structured compositional reasoning
Ingest: A new way to think about composing skills to increase leverage — Skill Graphs 2.0 (ingest-report) - Sakhuja's practitioner reframe of skill graphs into a three-tier compositional hierarchy (atoms/molecules/compounds) driven by the reliability ceiling of deep skill chains and the 'brain RAM' bottleneck in parallel-agent supervision
Ingest: A Scheduler-Theoretic Framework for LLM Agent Execution (ingest-report) - Position paper formalising LLM agent execution as a scheduler; corroborates the KB's clean-model orchestration cluster and supplies ready-set cardinality as a quantitative axis
Ingest: A-MEM: Agentic Memory for LLM Agents (ingest-report) - Zettelkasten-inspired flat agent memory with embedding linking and LLM-driven evolution — benchmark success without curation operations or inspectable links
Ingest: Adaptation of Agentic AI (ingest-report) - Survey mapping agentic adaptation across A1/A2 agent training, T1/T2 tool adaptation, memory, skills, and dynamics-aware evaluation
Ingest: Agent Behavioral Contracts for Reliable Agents (ingest-report) - Formal framework (ABC) extending Design-by-Contract to autonomous agents — introduces probabilistic compliance model (p,delta,k), Lyapunov drift bounds, hard/soft constraint separation with typed recovery, and a YAML DSL for specifying behavioral contracts
Ingest: Agent Harness for Large Language Model Agents (ingest-report) - Survey defining the LLM agent harness as execution loop, tool registry, context manager, state store, lifecycle hooks, and evaluation interface
Ingest: Agent Workflow Memory (ingest-report) - AWM paper showing web agents can learn reusable prompt workflows from successful trajectories, with online induction helping most as train-test domain gaps widen.
Ingest: Agentic Code Reasoning (ingest-report) - Semi-formal reasoning templates (explicit premises, execution traces, formal conclusions) improve LLM code verification by 5-12pp — empirical evidence for structure-as-distribution-selector and interpretation-narrowing with quantified cost (2.8x steps)
Ingest: Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for LLM Agents (ingest-report) - RL-trained unified LTM/STM memory policy for LLM agents — confirms memory management is learnable when task-completion oracles exist, but operates on opaque weights and low-reach facts
Ingest: Agentic Note-Taking 23: Notes Without Reasons (ingest-report) - First-person agent testimony that propositional link semantics differ in kind from embedding adjacency, with a Goodhart corruption argument and an unresolved curation-scaling question
Ingest: Agents Explore but Agents Ignore (ingest-report) - Solution-injection paper showing agents observe explicit task solutions in context but often fail to integrate them into action.
Ingest: AI Agents: Four Skills More Important Than a Good Prompt (ingest-report) - AI Fluency explainer mapping operator competence beyond prompts onto delegation, description, discernment, and diligence for agent work.
Ingest: AI Components for a Deterministic System (An Example) (ingest-report) - Evans argues that separating modeling (schema creation) from classification (schema application) tames LLM non-determinism — a practitioner case study of constraining via taxonomy freezing
Ingest: An Enigma of Artificial Reason (ingest-report) - VAIR paper showing large reasoning models can solve math problems while failing to evaluate invalid reasoning that reaches correct answers
Ingest: Andrej Karpathy talks about "Claws" (ingest-report) - Willison and Karpathy framing "Claw" as a term of art for local persistent AI-agent systems with scheduling, context, tools, and personal-hardware execution.
Ingest: Artifacts as Memory Beyond the Agent Boundary (ingest-report) - RL paper formalizing environment-side artifacts as externalized memory and testing memory by capacity/performance counterfactuals.
Ingest: Autoreason: Self-Refinement That Knows When to Stop (ingest-report) - Autoreason paper showing self-refinement improves only when candidate synthesis is paired with blind comparative judging and incumbent survival, with gains concentrated in the generation-evaluation gap
Ingest: Beyond "Not Novel Enough" (ingest-report) - Assesses an LLM-assisted scholarly novelty-review paper as evidence for soft-oracle hardening, human-analysis-first evaluator design, and separating reasoning alignment from conclusion agreement
Ingest: Beyond Transformers: Sudoku Bench (ingest-report) - Company blog using Sudoku benchmark (97.4% vs 0% LLM) to argue transformers are fundamentally limited for constraint satisfaction; undisclosed BDH architecture, weak methodology, but adds a third problem domain to the architectural-limits evidence cluster alongside Ebrahimi and ConvexBench
Ingest: Build Systems à la Carte (ingest-report) - How the scheduler×rebuilder build-systems framework grounds the KB's derived-artifact freshness machinery (staleness, verifying traces, recompute-vs-store)
Ingest: Building a Good Vertical Agent (ingest-report) - BrainsAndTennis on vertical-agent quality as task-distribution-aware context compression, with L1/L2/L3 cache tiers for prompts, curated specs, and raw references
Ingest: Can LLM Agents Infer World Models? (ingest-report) - Agentic automata-learning paper showing that hard-oracle interaction tasks can separate LLM-agent evidence collection, hypothesis construction, and final world-model success.
Ingest: Claude Fable 5 Made Most of My Agent Scaffolding Obsolete (ingest-report) - claude-workstream-kit announcement arguing that stronger models relax model-management scaffolding but make project-scoped, git-versioned active work state more important
Ingest: Coding Agents are Effective Long-Context Processors (ingest-report) - Benchmark paper claiming coding agents beat RAG and context scaling on long-context tasks by using filesystem-native search, slicing, and scripting
Ingest: Cognee: Knowledge Engine for AI Agent Memory (ingest-report) - Pipeline-first knowledge engine with custom Pydantic schemas for LLM entity extraction, poly-store graph+vector design, and an undersized enrichment phase that concretely marks the boundary between automatable extraction and open enrichment problems
Ingest: Components of A Coding Agent (ingest-report) - Practitioner decomposition of coding agent harnesses into six named components, with the central claim that apparent model quality is really context quality — independent convergent evidence for the KB's context-efficiency thesis.
Ingest: Context Engineering for AI Agents in Open-Source Software (ingest-report) - First empirical study of AI context files across 466 OSS projects — provides naturalistic data on content categories, five writing styles as constraint strategies, add-then-modify evolution pattern, and 50% stagnation rate that grounds and challenges KB constraining theory
Ingest: Context providers: the missing layer between agents and tools (ingest-report) - Ashpreet Bedi's ContextProvider pattern: source-scoped sub-agents collapse many raw tools into query/update surfaces to reduce tool-context pollution
Ingest: Continual Learning in Token Space (ingest-report) - Letta reframes continual learning as optimizing learned context rather than weights, but the KB's stronger frame is weight space versus repo artifacts, including codified procedures
Ingest: ConvexBench: Can LLMs Recognize Convex Functions? (ingest-report) - Benchmark proving LLM compositional reasoning collapses with depth (not token count), recovered by recursive decomposition with focused context — quantitative evidence for scheduling model predictions
Ingest: Dario Amodei — "We are near the end of the exponential" (ingest-report) - Anthropic CEO's capability-timeline predictions implicitly confirm oracle-strength thesis — verifiable domains (coding, math) get confident timelines while unverifiable domains (novel writing, science) get hedged ones
Ingest: Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models (ingest-report) - Semantic leakage — undue prompt-to-generation association from unrelated context — measured by control/test Leak-Rate across 13 models; instruction-tuned models leak more
Ingest: Emergent Analogical Reasoning in Transformers (ingest-report) - Transformer analogy paper linking analogical transfer to relational-role alignment, with useful evidence for discovery, reach, and cognitive-analogy transfer methodology
Ingest: EsoLang-Bench (ingest-report) - Esoteric-language code benchmark arguing standard coding scores mostly measure pretraining fit, with interpreter feedback beating textual critique on OOD tasks
Ingest: Evaluating Long-Context Reasoning in LLM-Based WebAgents (ingest-report) - Ingest of NeurIPS 25 workshop paper benchmarking LLM web agents under long context (25k-150k tokens) with injected irrelevant task sequences — provides agent-level empirical evidence for soft degradation, loop entrapment, and objective loss, extending GSM-DC's distractor findings to multi-session agentic tasks.
Ingest: Everything you need to know about LLM memory (ingest-report) - Rosebud Journal memory essay reframing LLM memory as a policy stack over raw/derived artifacts, retrieval timing, curation, and forgetting propagation
Ingest: Externalization in LLM Agents (ingest-report) - Survey paper unifying LLM agent memory, skills, protocols, and harness engineering as externalized cognitive infrastructure rather than model-weight capability alone
Ingest: From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence (ingest-report) - Epiplexity paper formalizing extractable structure for computationally bounded observers, useful for observer-relative information value and context-efficiency theory.
Ingest: From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs (ingest-report) - Ingest of the 3D-8Q LLM-memory survey: how object/form/time taxonomy fits our memory-axes and cognitive-analogy-skepticism notes
Ingest: Gentle-Coding (ingest-report) - Gentle-Coding argues that low-stakes prompt framing and explicit fallback tokens reduce loops, freezing, and latency in AI coding and reasoning tasks.
Ingest: Geometry of Knowledge Allows Extending Diversity Boundaries of Large Language Models (ingest-report) - Latent-conditioning framework raises LLM output diversity on NoveltyBench/AUT while authors admit a missing low-quality/OOD oracle — a clean positive instance of generate-cheap-verify-expensive at the embedding substrate.
Ingest: GIANTS: Generative Insight Anticipation from Scientific Literature (ingest-report) - GIANTS backcasts scientific discovery into a two-parent insight prediction benchmark, showing RL gains under a manufactured soft similarity oracle
Ingest: Graphiti: Temporal Knowledge Graph for AI Agents (ingest-report) - Graph-first agent memory with bi-temporal edge invalidation — the strongest counterexample to files-first architecture in the surveyed memory systems
Ingest: Harness Engineering Is Cybernetics (ingest-report) - Conceptual thread framing harness engineering as cybernetic feedback-loop design: sensors, actuators, constraints, and externalized judgment.
Ingest: Harness Engineering: Leveraging Codex in an Agent-First World (ingest-report) - Practitioner report on 1M LOC fully agent-generated codebase — harness engineering as constrain/inform/verify/correct, entropy management via background cleanup agents, error messages as dual-function constraining
Ingest: How Is LLM Reasoning Distracted by Irrelevant Context? (ingest-report) - Controlled benchmark quantifying how irrelevant context degrades LLM reasoning via power-law error scaling with distractor count — strongest empirical grounding for the soft-degradation thesis in this KB; training and inference-time mitigations tested.
Ingest: How to Build an AI Second Brain With Claude and Obsidian That Gets Smarter Every Day (ingest-report) - Convenient Claude+Obsidian second-brain setup guide -- useful adoption packaging, with live-account connectors as the risk boundary
Ingest: How to build your own agent harness??? (Mike Piccolo / iii) (ingest-report) - iii ships a production agent harness as independently-versioned workers on one bus behind a uniform trigger primitive, making harness layers hot-swappable
Ingest: Human Bottlenecks (ingest-report) - Borretti argues AI value has a human-side competence floor — knowledge, executive function, and intelligence are bottlenecks software cannot lift.
Ingest: Human Routers of Machine Words (ingest-report) - Borretti polemic 'writing is thinking' — corroborating field evidence for reverse-compression and vibe-noting risks
Ingest: Huxley-Gödel Machine (ingest-report) - ICLR 2026 HGM paper arguing immediate benchmark score is a weak parent-selection signal for self-improving coding agents; clade-metaproductivity better predicts productive lineages
Ingest: Improving AI Skills with autoresearch & evals-skills (ingest-report) - Three-take Auto Research field report where optimization only worked after manual error analysis, failure taxonomy design, and judge calibration across the Three Gulfs.
Ingest: in-toto — Providing farm-to-table guarantees for bits and bytes (ingest-report) - Cryptographic whole-chain supply-chain verification (in-toto) as a cross-domain exemplar for the KB's verification-cost, lineage, and staleness theory
Ingest: Infinite midwit (ingest-report) - Objective-vs-subjective intelligence essay arguing that AI's real bottleneck is taste and boringness judgment, not benchmarked competence
Ingest: Intelligent AI Delegation (ingest-report) - Google DeepMind delegation framework centers verifiability, liability, trust, and 11 task axes in agent delegation; notable for accountability vacuum and liability firebreaks in long chains
Ingest: Interpolation, Extrapolation, Hyperpolation (ingest-report) - Toby Ord's hyperpolation paper gives a geometric vocabulary for off-subspace creativity, sharpening the KB's synthesis-oracle and discovery/reach notes.
Ingest: Into the Unknown: Self-Learning Large Language Models (ingest-report) - Hallucination-driven self-learning LLM paper proposing Points in the Unknown, a self-question/search/train loop, and metrics for selecting models that can discover factual knowledge gaps
Ingest: Language Models, Like Humans, Show Content Effects on Reasoning Tasks (ingest-report) - Empirical demonstration that LLMs mirror human content effects on reasoning (syllogisms, NLI, Wason) — content bias survives scaling and instruction tuning but chain-of-thought partially restores content-independent reasoning
Ingest: Large Language Model Agents Are Not Always Faithful Self-Evolvers (ingest-report) - Causal-intervention paper (v3, 13 backbones) showing self-evolving agents faithfully use raw trajectories but largely ignore condensed experience, making behavioral faithfulness the missing evaluation criterion for distilled memory
Ingest: Lessons from Building AI Agents for Financial Services (ingest-report) - Production practitioner report on building AI agents for financial services — validates files-not-database at commercial scale (S3-first with derived PostgreSQL), documents skill shadowing as user-customization mechanism, and articulates "model eats scaffolding" as an explicit design principle with fiscal-period normalization as calculator-regime counterexample
Ingest: Letta (MemGPT): Stateful Agents with Self-Managed Memory (ingest-report) - Agent memory platform where the LLM self-manages a three-tier memory hierarchy (core/recall/archival) using an OS analogy — the strongest existing exemplar of the agent-self-managed agency model, now evolving toward git-backed memory files
Ingest: LLM Knowledge Bases (ingest-report) - Karpathy on agent-maintained research wikis in Obsidian — index files and brief summaries replacing fancy RAG at roughly 100-article scale
Ingest: LLM Position Bias Benchmark (Swapped-Order Pairwise Judging) (ingest-report) - Swapped-order pairwise-judging benchmark showing that across 27 LLM judges the median model flips its underlying winner in 44.8% of decisive cases, with large model-dependent first-position lifts on sibling story-edit pairs
Ingest: LLM Wiki (ingest-report) - Karpathy's long-form agent-maintained wiki manifesto — explicit raw/wiki/schema architecture plus index/log separation beyond his earlier X-post workflow sketch
Ingest: Maximum Effective Context Window (ingest-report) - Empirical study measuring Maximum Effective Context Window (MECW) across 11 frontier LLMs — finds MECW is up to 99% smaller than advertised MCW, varies by task type, and that exceeding MECW drives hallucination rates toward 100%; directly grounds the KB's bounded-context theory with multi-model dose-response data
Ingest: Mem0: Universal Memory Layer for AI Agents (ingest-report) - Mem0's two-phase add pipeline (extract facts + LLM-judged CRUD reconciliation) is the purest production example of automated accretion-without-synthesis — now contextualized against eleven systems in the comparative review
Ingest: Memory Intelligence Agent (ingest-report) - MIA mixed-substrate deep-research agent memory paper — search trajectories become both workflow memory and Planner weight updates during test-time learning
Ingest: Memory Scaling for AI Agents (ingest-report) - Databricks memory-scaling experiments showing enterprise agent gains from external memory only when retrieval, distillation, and governance scale with the store
Ingest: Mesa Optimizers and Language Recursion (ingest-report) - Speculative essay arguing mesa optimizers may emerge suddenly because language recursion and learned search both compress many cases into reusable generative rules.
Ingest: Meta-Harness: End-to-End Optimization of Model Harnesses (ingest-report) - Controlled ablation showing raw execution traces (10 MTok/iter) outperform summaries by 10+ points in automated harness search — first empirical evidence for diagnostic richness as binding constraint
Ingest: Minimum Viable Ontology / Domain Maps (ingest-report) - Tweet thread proposing "minimum viable ontology" — the smallest term list to orient a newcomer in a domain — with a vibecoded prototype (domainmaps.co) and pedagogical framing via "conceptual thresholds"
Ingest: Multi-Agent Memory from a Computer Architecture Perspective (ingest-report) - Computer-architecture analogy for multi-agent memory — shared/distributed paradigms, three-layer hierarchy, consistency protocols as the critical unsolved problem
Ingest: Natural-Language Agent Harnesses (ingest-report) - NLAH paper externalizes agent control logic as portable natural-language artifacts — key empirical finding: explicit structure helps only when it tightens alignment with evaluator acceptance criteria, not by adding process layers
Ingest: Novel Memory Forgetting Techniques for Autonomous AI Agents (ingest-report) - Formula-based adaptive forgetting with constrained optimization for agent memory — the inspectable alternative to RL-trained memory policy, with empirical evidence that uncontrolled accumulation causes false memory propagation
Ingest: On Doctors, Mechanics and Computer Specialists — Where are the Problems with Credence Goods? (ingest-report) - Credence-goods microeconomics paper — evidence and refinement for the verification-boundary claim, adding liability as an institutional substitute for verifiability
Ingest: On Learning How to Learn Learning Strategies (ingest-report) - Schmidhuber's reward-gated self-modification report as historical evidence for oracle-dependent behavior learning and reversible promotion
Ingest: On the "Induction Bias" in Sequence Models (ingest-report) - 190k-run empirical study showing transformers need orders-of-magnitude more data than RNNs for state tracking due to absence of step-by-step induction bias; introduces sharing factor kappa quantifying cross-length mechanism reuse
Ingest: OpenClaw-RL: Train Any Agent Simply by Talking (ingest-report) - RL framework that trains agents from live next-state signals (user replies, tool outputs, terminal feedback, GUI state) during deployment — collapses the training/deployment boundary and challenges the KB's three-timescale model by performing weight updates from interactions the agent is already having.
Ingest: Orchestrate subagents at scale with dynamic workflows (ingest-report) - Claude Code dynamic-workflows docs — the saveable, script-authored counterpoint to RLM's ephemeral orchestrators; evidence for the orchestration/run-state persistence cluster
Ingest: Palantir Ontology vs Decision Traces (ingest-report) - Jaya Gupta frames Palantir-style top-down ontology and workflow-first decision traces as two ways to build LLM-facing world models
Ingest: Post by @deepfates — LLM "memory" as context stuffing (ingest-report) - Deepfates argues LLM "memory" is just context-stuffing that creates false salience (Chekov's gun), advocates agentic context-building, but concludes weight updates are necessary — directly contradicts this KB's durability-not-weights position
Ingest: Post by @koylanai (ingest-report) - Argues that pairwise judging plus round-robin win rates is a better evaluation primitive than absolute scoring for open-ended LLM tasks with no hard ground truth
Ingest: Professional Software Developers Don't Vibe, They Control (ingest-report) - Empirical study (N=112) finding experienced developers control AI agents through SE practices, not vibe coding -- grounds constraining, underspecification, and programming-practices-transfer arguments
Ingest: Prompt Stability in Code LLMs (ingest-report) - Empirical study measuring code LLM stability under emotion/personality prompt variations — finds performance and stability are decoupled objectives, smaller models can be more stable, and emotional prompting reveals confidence miscalibration invisible to standard benchmarks
Ingest: PROV-Overview — An Overview of the PROV Family of Documents (ingest-report) - W3C PROV family roadmap — the canonical standard for 'full provenance' that the KB's lineage concept deliberately trims down from
Ingest: Psychology already solved AI memory — identity isn't stored, it's constructed (ingest-report) - Thread proposing five psychology principles (Conway, Damasio, Bruner, Klein & Nichols) for AI memory as identity construction — directly engages the KB's open question about whether cognitive science analogies are decorative or mechanistic
Ingest: Recursive Language Models - what finally gave me the 'aha' moment (ingest-report) - Detailed practitioner walkthrough of RLM architecture via five-architecture comparison (direct gen, RAG, ReAct, CodeAct, CodeAct+subagents, RLM) — the most concrete evidence for REPL-as-substrate, symbolic variable return, and scaffold-level truncation in the KB
Ingest: Researchers Asked LLMs for Strategic Advice. They Got "Trendslop" in Return. (ingest-report) - HBR trendslop article: LLM strategy advice follows fashionable management discourse despite prompt and context variation.
Ingest: Scaling Managed Agents: Decoupling the brain from the hands (ingest-report) - Anthropic Managed Agents report showing brain/hand/session interface decomposition, durable session logs, and stale harness assumptions as model capability changes
Ingest: Self-Revising Discovery Systems for Science (ingest-report) - Category-theoretic discovery framework: typed-artifact copresheaves, retrieval/search/discovery typology, and MDL/AIC gates
Ingest: Self-training Large Language Models through Knowledge Detection (ingest-report) - EMNLP paper turning unknown-detection scores into filtered DPO preference data, with selective self-training reducing hallucination and limiting forgetting on Wikipedia QA
Ingest: Skill Synthesis — Materializing Knowledge as Skills (ingest-report) - Sentry co-founder's practitioner report on synthesizing Claude Code skills from domain-specific source material (commit history, security patches, OWASP docs) — found 8 real IDORs missed by professional pen testing
Ingest: SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning (ingest-report) - SkillRL paper showing trajectory-distilled skill banks co-evolving with GRPO-trained agent policy, bridging readable skill artifacts and weight-based learning
Ingest: Slate: Moving Beyond ReAct and RLM (ingest-report) - Practitioner report on thread-weaving agent architecture — bounded worker threads return compressed episodes to an orchestrator, solving working memory, strategic coherence, and task decomposition simultaneously; the strongest practitioner convergence evidence for the bounded-context orchestration model to date
Ingest: Solving a Million-Step LLM Task with Zero Errors (ingest-report) - MAKER achieves zero errors over one million LLM steps via maximal decomposition into single-step microagents with first-to-ahead-by-k voting and red-flagging — proves O(s ln s) cost scaling when hard per-step oracles exist
Ingest: Spacebot: AI Agent for Teams and Communities (ingest-report) - Spacebot README ingest covering process-typed concurrent agent runtime architecture, branch scoping, cortex supervision, and typed unified memory
Ingest: Structured Test-Time Scaling: From Multi-Agent Systems to General Inference Architectures (ingest-report) - Formal proof that topology compression, scope isolation, and verification form a causal dependency chain enabling hierarchical MAS to bypass exponential error accumulation — directly grounds the KB's separate treatments of decomposition, scoping, and error correction as a unified principle
Ingest: SuperARC — Can Complexity and Uncomputability Explain Intelligence? (ingest-report) - Ingest of SuperARC — AIT-grounded benchmark where frontier LLMs score phi ~0.03 while neuro-symbolic CTM/BDM achieves 1.000 on recursive compression; newer models regress; print-statement-only outputs demonstrate zero algorithmic abstraction
Ingest: The "Mismanaged Geniuses" Hypothesis (ingest-report) - Hypothesis that current frontier LMs are bottlenecked by learned decomposition/scaffold policy rather than base capability, using RLMs and orchestrator-subagent systems as evidence
Ingest: The Agent Loop Architecture (ingest-report) - Inngest practitioner framing of durable agent loops as loop + skill + orchestrator, useful for the run-state and skill-persistence boundary
Ingest: The Anatomy of an Agent Harness (ingest-report) - Practitioner taxonomy deriving harness components (filesystem, bash, sandboxes, memory, context management, long-horizon execution) from model limitations — provides the component anatomy that bridges Lopopolo's practice and the cybernetics framing
Ingest: The Bitter Lesson (ingest-report) - Wikipedia-contextualized capture of Sutton's Bitter Lesson, useful for scaling arguments and caveats about general methods versus hand-coded knowledge.
Ingest: The Bug That Shipped (ingest-report) - 3,700-trial practitioner evidence that coding models can diagnose deployment failures when explicitly probed but rarely surface them in undirected self-review
Ingest: The File System Is the New Database: How I Built a Personal OS for AI Agents (ingest-report) - Practitioner report on a file-based personal OS for AI agents, useful as self-reported evidence for filesystem-first context engineering.
Ingest: The Flawed Ephemeral Software Hypothesis (ingest-report) - Essay distinguishing vibe coding from true software ephemerality, arguing that state, integration, interface stability, and auditability keep important systems anchored to durable artifact stacks.
Ingest: The Geometry of Forgetting (ingest-report) - Embedding-memory paper arguing that interference and low effective dimensionality, not time decay, drive forgetting and false recall in similarity retrieval.
Ingest: The Log Is the Agent (ingest-report) - Omnara essay arguing an agent IS its durable append-only event log; log ownership = agent ownership
Ingest: The No Free Lunch Theorem (ingest-report) - No Free Lunch explainer grounding codification as an unavoidable inductive-bias bet: strategies win only when their assumptions match the problem distribution.
Ingest: The Price of Meaning: Why Every Semantic Memory System Forgets (ingest-report) - Formal no-escape theorem for semantic memory interference, with exact-record and symbolic-verifier escape clauses that sharpen retrieval-vs-verification tradeoffs.
Ingest: The Second Brain Trap (ingest-report) - PlugLab AI founder reframes "second brain" failure as stored knowledge that never activates in context, then proposes trigger-rich graph structure as the fix
Ingest: The Self-Healing Agent Harness (ingest-report) - Practitioner report where live agent-response graders feed tickets, auto-fixes, re-grading, and rollout gates instead of sitting as offline eval dashboards
Ingest: The Spec Is the New Code — A Guide to Spec Driven Development (ingest-report) - MercadoLibre engineering lead's practitioner guide to Spec Driven Development — the spec/plan/task/implement cascade as methodology for eliminating agent ambiguity, with ecosystem convergence evidence and maturity-level progression
Ingest: The Y-Combinator for LLMs (ingest-report) - λ-RLM preprint replacing open-ended RLM REPL code with typed combinators, formal bounds, and long-context benchmark evidence
Ingest: Thread by @LechMazur (ingest-report) - Lech Mazur's public benchmark announcement compressing the headline position-bias result, the GPT-5.4 callout, and the operational motivation from everyday comparison prompts
Ingest: Toulmin Argument (ingest-report) - Pedagogical treatment of Toulmin's six-part argument model — canonical source for the structured-claim type's Evidence/Reasoning/Caveats sections
Ingest: Towards a Science of AI Agent Reliability (ingest-report) - Reliability framework paper arguing mean task success is inadequate for agents, replacing it with consistency, robustness, predictability, and safety.
Ingest: Towards a Science of Scaling Agent Systems (ingest-report) - Controlled multi-agent scaling paper showing coordination gains depend on task decomposability, verification, and context overhead rather than agent count.
Ingest: Towards Automating Scientific Review with Google's Paper Assistant Tool (ingest-report) - Google PAT paper as evidence for verifiable-subrole review automation: segmenting manuscripts, scaling inference, and keeping humans accountable for final review authority.
Ingest: tracecraft (ingest-report) - S3-backed CLI coordination tool for multi-agent systems — exemplifies coordination-without-guarantees and the files-over-database bet applied to inter-agent state rather than knowledge storage
Ingest: Trajectory-Informed Memory Generation for Self-Improving Agent Systems (ingest-report) - IBM pipeline extracts strategy/recovery/optimization tips from agent execution trajectories and injects at runtime — subtask granularity and LLM-guided retrieval drive gains, especially on complex tasks (+14.3 pp SGC); provides a concrete closed learning loop with inspectable output but narrow oracle (AppWorld task completion).
Ingest: Transformers Learn In-Context by Gradient Descent (ingest-report) - Mechanistic ICML paper showing in-context regression can be implemented as gradient descent inside Transformer forward passes, sharpening the internal half of the KB's in-context-learning theory
Ingest: Verbalizable Representations Form a Global Workspace in Language Models (ingest-report) - Anthropic J-space paper as evidence for probeable parametric state, activation-vs-presence, and externalized reasoning as internal-workspace relief
Ingest: We Should Take Text Optimization More Seriously (ingest-report) - Manifesto arguing text optimization is a legitimate, sample-efficient update mechanism enabling update-time compute — upstream restatement of the KB continual-learning cluster
Ingest: What is an Agent Harness (ingest-report) - Arize practitioner taxonomy of nine harness components and an explicit anti-LangGraph framing — fifth independent source converging on the harness decomposition, distinguished by elevating permission/safety and life-cycle hooks to first-class architectural primitives
Ingest: What spec-driven development gets wrong (ingest-report) - Augment's argument that spec-driven development fails unless agents co-maintain the spec — bidirectional spec as a mechanism for matching maintenance throughput to generation throughput
Ingest: What Survives in Multi-Agent Systems (ingest-report) - Applied bitter-lesson analysis predicting which multi-agent patterns survive stronger models — argues filesystem, forking, and spawning are structural while fixed orchestration is a vision feature
Ingest: When code is free, research is all that matters (ingest-report) - Investor/researcher argument that oracle availability (not capability) determines automation boundary for cognitive work — research taste is unautomatable because problem selection has no ground truth
Ingest: Where It Lives Is Not What It Is (architectural vocabulary for retained adaptation) (ingest-report) - Position paper distilled from this KB's notes, publishing the four-field artifact-analysis vocabulary externally. Lineage runs paper-from-notes (not notes-from-paper); KB value is the refinements made while writing (a sovereignty-risk axis the notes don't yet cover) plus a future citable-authority reference once the paper is accepted.
Ingest: Where It Lives Is Not What It Is (June 2026 version) (ingest-report) - Updated self-authored ASISAS position paper adding 141-system corpus evidence to the four-field retained-artifact vocabulary.
Ingest: Why AI systems don't learn and what to do about it (ingest-report) - Position paper arguing current AI externalizes learning into human-run MLOps and proposing an A-B-M architecture where meta-control arbitrates observation and action learning for lifelong adaptation.
Intelligent AI Delegation (snapshot) - Google DeepMind framework for intelligent AI delegation — proposes adaptive protocols covering task decomposition, multi-objective optimization, trust/reputation, verifiable completion, and security for human-AI and AI-AI delegation networks, with explicit analysis of how MCP, A2A, AP2, and UCP map onto these requirements.
Interpolation, Extrapolation, Hyperpolation: Generalising into new dimensions (snapshot) - Toby Ord's hyperpolation proposal - a third generalisation mode for reasoning outside the data subspace, linked to creativity and AI limits
Into the Unknown: Self-Learning Large Language Models (snapshot) - Self-learning LLM paper proposing Points in the Unknown, hallucination-based unknown detection, self-questioning/search/training loop, and self-learning capability metrics
Lab Leaks, Black Holes, and Eggs: Epistemic Case Study Competition (snapshot) - Future of Life Foundation competition brief for AI-assisted epistemic investigations and compounding knowledge bases, including its ingestion, structure, assessment, and personal-use implications.
Language Models, Like Humans, Show Content Effects on Reasoning Tasks (snapshot)
Large Language Model Agents Are Not Always Faithful Self-Evolvers (snapshot) - Causal-intervention paper showing self-evolving agents rely on raw trajectories more faithfully than condensed experience, exposing a compression-faithfulness gap across frameworks, models, and environments
Lessons from Building AI Agents for Financial Services (snapshot)
Letta (MemGPT): Stateful Agents with Self-Managed Memory
LLM Wiki (snapshot) - Karpathy's idea file for agent-maintained personal wikis, centered on a persistent markdown layer between raw sources and query-time chat
Mem0: Universal Memory Layer for AI Agents (snapshot)
Memory Intelligence Agent (snapshot) - MIA paper on converting deep-research search trajectories into workflow memory and Planner test-time training
Memory Scaling for AI Agents (snapshot) - Databricks AI Research argument and experiments for external-memory scaling as a third agent improvement axis alongside model and inference scaling
Mesa Optimizers and Language Recursion (snapshot) - Speculative blog post connecting mesa optimizers to language recursion by treating both as compressed generative rules that can appear as sudden capability jumps.
Meta-Harness: End-to-End Optimization of Model Harnesses (snapshot) - Stanford/MIT paper proposing Meta-Harness, an outer-loop system that uses a coding agent with full filesystem access to prior code and execution traces to automatically search over and optimize LLM harnesses — outperforming hand-engineered baselines on text classification and TerminalBench-2.
Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead (snapshot) - Position paper reframing multi-agent memory management through a computer architecture lens — proposes shared vs. distributed memory paradigms, a three-layer hierarchy (I/O, cache, memory), and identifies memory consistency as the most urgent unresolved challenge for scalable multi-agent systems.
Natural-Language Agent Harnesses (snapshot) - Proposes externalizing agent control logic (contracts, roles, stages, failure taxonomy) as portable natural-language artifacts (NLAHs) with an Intelligent Harness Runtime, evaluated on SWE-bench and OSWorld — key finding: explicit structure helps only when it tightens alignment with evaluator acceptance criteria.
Novel Memory Forgetting Techniques for Autonomous AI Agents: Balancing Relevance and Efficiency (snapshot) - Adaptive budgeted forgetting framework for long-horizon conversational agents — relevance scoring (recency, frequency, semantic alignment) plus constrained optimization to prune memory while reducing false memory propagation.
On Doctors, Mechanics and Computer Specialists Or Where are the Problems with Credence Goods? (snapshot) - Unifying game-theoretic model of credence goods showing that most conflicting results in the literature (Darby-Karni, Pitchik-Schotter, Wolinsky, Emons, Taylor) reduce to relaxing one of three conditions (homogeneous consumers, verifiability-or-liability, commitment) that otherwise yield a costless efficient market.
On Learning How to Learn Learning Strategies (snapshot) - Schmidhuber's 1995 technical report introducing incremental self-improvement through reward-gated self-modification of learning strategies
On the "Induction Bias" in Sequence Models (snapshot)
OpenClaw-RL: Train Any Agent Simply by Talking (snapshot) - Framework that converts live next-state signals (user replies, tool outputs, terminal feedback, GUI state) into RL rewards and token-level supervision, enabling a single policy to personalize and improve on agentic tasks simultaneously.
Orchestrate subagents at scale with dynamic workflows (snapshot) - Claude Code's dynamic workflows feature docs — JavaScript scripts that orchestrate up to 1,000 subagents per run, covering when to use them, how to write/save/manage them, and how they compare to subagents and agent teams.
Post by @deepfates (snapshot)
Post by @JayaGup10 (snapshot)
Post by @karpathy (snapshot)
Post by @koylanai (snapshot)
Professional Software Developers Don't Vibe, They Control: AI Agent Use for Coding in 2025 (snapshot)
Prompt Stability in Code LLMs (snapshot)
PROV-Overview: An Overview of the PROV Family of Documents (snapshot) - W3C roadmap document to the PROV family of specifications for representing and interchanging provenance information on the web.
Psychology already solved AI memory — identity isn't stored, it's constructed (snapshot) - Thread arguing AI memory should adopt psychology's model of identity construction through autobiographical memory, citing Conway, Damasio, Bruner, and Klein & Nichols
Recursive Language Models - what finally gave me the 'aha' moment (snapshot)
Researchers Asked LLMs for Strategic Advice. They Got “Trendslop” in Return. (snapshot) - HBR article naming strategy trendslop: LLMs favor fashionable management options even when prompts and business context change.
Scaling Managed Agents: Decoupling the brain from the hands (snapshot) - Anthropic's Managed Agents architecture argues for stable brain, hand, and session interfaces that outlast changing agent harness implementations.
Self-Revising Discovery Systems for Science (snapshot) - Category-theoretic framework for agentic scientific discovery systems that models artifact states as copresheaves and distinguishes retrieval, search, and discovery as structurally different operations — grounded in a materials-science case study.
Self-training Large Language Models through Knowledge Detection (snapshot) - EMNLP 2024 paper on self-training LLMs by filtering DPO preference data to unknown samples using reference-free contradiction scores
Skill Synthesis: Materializing Knowledge as Skills (snapshot)
SkillOpt: Executive Strategy for Self-Evolving Agent Skills (snapshot) - SkillOpt paper treating agent skill documents as trainable external state with bounded text edits, validation gates, rejected-edit buffers, and slow/meta updates
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning (snapshot) - SkillRL paper on distilling successful and failed agent trajectories into a hierarchical skill bank co-evolved with GRPO policy training
Solving a Million-Step LLM Task with Zero Errors (snapshot)
Spacebot: AI Agent for Teams and Communities (snapshot)
Structured Test-Time Scaling: From Multi-Agent Systems to General Inference Architectures (snapshot) - Unified theoretical framework explaining how three structural mechanisms (topology compression, scope isolation, verification) enable hierarchical multi-agent systems to bypass exponential error accumulation in test-time scaling.
SuperARC: Recursive Compression Benchmark (snapshot) - Introduces SuperARC, an AIT-grounded benchmark showing frontier LLMs score near zero on recursive compression tasks and newer versions often regress, while neuro-symbolic CTM/BDM methods achieve perfect scores — evidence that statistical pattern matching differs fundamentally from algorithmic abstraction.
The "Mismanaged Geniuses" Hypothesis (snapshot)
The Agent Loop Architecture (snapshot)
The Anatomy of an Agent Harness (snapshot)
The Bitter Lesson (snapshot)
The Bug That Shipped (snapshot)
The File System Is the New Database: How I Built a Personal OS for AI Agents (snapshot)
The Flawed Ephemeral Software Hypothesis (snapshot) - Essay arguing AI makes software more malleable, not ephemeral, because validation, state, interface stability, and auditability remain the load-bearing bottlenecks.
The Geometry of Forgetting (snapshot) - Embedding-space account of human-like forgetting and false memory — interference, low effective dimensionality, and semantic clustering reproduce classic memory effects.
The Log Is the Agent (snapshot)
The No Free Lunch Theorem: Why No Learning Algorithm Is Universally Best (snapshot) - Bruce Nielson's No Free Lunch explainer connecting inductive bias, optimization, neural-network smoothness assumptions, and Popperian fallibility
The Price of Meaning: Why Every Semantic Memory System Forgets (snapshot) - Formal no-escape theorem paper arguing semantic memory systems face interference-driven forgetting and false recall under finite effective dimensionality.
The Second Brain Trap (snapshot) - PlugLab AI article arguing that second-brain systems fail when stored notes never activate in working context
The Self-Healing Agent Harness (snapshot)
The Spec Is the New Code. A Guide to Spec Driven Development (snapshot)
The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus (snapshot) - λ-RLM paper replacing open-ended recursive LLM control code with typed λ-calculus combinators for bounded long-context reasoning.
Thread by @LechMazur (snapshot)
Thread by @melodyskim (snapshot)
Toulmin Argument (snapshot)
Towards a Science of AI Agent Reliability (snapshot)
Towards a Science of Scaling Agent Systems (snapshot)
Towards Automating Scientific Review with Google's Paper Assistant Tool (snapshot) - Google Paper Assistant Tool paper on inference-scaled AI review, STOC/ICML author pilots, and roles for AI automation in peer review.
tracecraft (snapshot) - S3-backed CLI coordination layer for multi-agent AI systems — shared memory, messaging, task claiming, and barriers stored as JSON files in any S3-compatible bucket, with no servers or databases required.
Trajectory-Informed Memory Generation for Self-Improving Agent Systems (snapshot) - IBM Research framework that extracts three categories of actionable tips (strategy, recovery, optimization) from agent execution trajectories and injects them at runtime — evaluated on AppWorld showing up to 14.3 pp gains in scenario goal completion.
Transformers Learn In-Context by Gradient Descent (snapshot) - Mechanistic ICML paper showing linear self-attention can implement gradient descent for in-context regression and trained Transformers can recover that construction
Verbalizable Representations Form a Global Workspace in Language Models (snapshot) - Anthropic interpretability paper arguing that verbalizable J-space representations act as a limited global workspace for LLM reasoning, auditing, and training interventions
We Should Take Text Optimization More Seriously (snapshot)
What is an Agent Harness (snapshot)
What spec-driven development gets wrong (snapshot)
What Survives in Multi-Agent Systems
When code is free, research is all that matters (snapshot)
Where It Lives Is Not What It Is: Vocabulary for Retained Adaptation in Agentic Systems (snapshot) - Position paper proposing a four-field architectural vocabulary (storage substrate, representational form, lineage, behavioral authority) for classifying retained behavior-shaping artifacts in agentic systems, beyond storage-first labels.
Where It Lives Is Not What It Is: Vocabulary for Retained Adaptation in Agentic Systems (snapshot) - Updated position paper proposing a four-field architectural vocabulary for retained behavior-shaping artifacts in agentic systems and applying it to a 141-system agent-memory corpus.
Why AI systems don't learn and what to do about it (snapshot) - Dupoux, LeCun, and Malik argue current AI externalizes learning into human-run MLOps, then propose an A-B-M architecture where observation learning, action learning, and a meta-control plane are integrated for lifelong adaptation.
Writing conventions for kb/sources/ (descriptive profile)
ШІ-агенти: чотири навички, які важливіші за добрий промпт (snapshot) - Mezha article applying Anthropic's AI Fluency framework to agent use: delegation, description, discernment, and diligence as operator skills beyond prompting.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search