LLM context is a homoiconic medium
Type: note · Status: seedling
In the LLM context window, instructions and data share the same representation — natural language tokens. A system prompt, a user message, a tool output, and a piece of content being analyzed are all just text. There is no type-level distinction between "program" and "data." This is homoiconicity: the medium used to express programs is the same medium used to express the data they operate on.
Precedents
Lisp's homoiconicity. Code and data share the same representation — lists. A list is both a data structure you can inspect and a program you can evaluate. In an LLM-based system, a markdown file is both content you can read/link/analyze and instructions you can hand to a sub-agent for execution. Lisp macros — code that writes code — map to instructions that produce reports containing further instructions. Quote/eval — toggling between treating something as data vs executing it — maps to reading an instructions note vs handing it off.
Emacs as ad hoc extension culture. Emacs is written in Elisp — the system and its extension language are the same thing, so there is no boundary between built-in functionality and user extensions. The init file mixes configuration and ad hoc programs. The stabilisation trajectory is the same: inline snippet in init.el → extracted function → published package. The KB's equivalent: ad hoc instructions note → extracted skill → registered tool.
Smalltalk's live image. The image is both the program and the development environment, and you modify the running system from inside itself using the same language. The KB has the same property — its methodology is written in the same markdown it operates on.
Other homoiconic languages. Prolog (programs are clauses in the same database as facts), Tcl (everything is a string, including code), Rebol/Red (code is data blocks), XSLT (XML transforming XML). All share the property that the boundary between using and extending the system is fluid.
What homoiconicity enables
The common thread across these systems: blurring the boundary between using and extending makes the system more adaptable than systems with rigid extension points. In the KB, ad hoc prompts extend the system without schema changes precisely because instructions and content share the same medium. No registration, no type system gatekeeping, no compilation step — write a markdown file and it's both content and executable spec.
What homoiconicity costs
The same lack of boundary that enables extensibility creates hazards:
Scoping failures have no guardrails. LLM context is composed without scoping — and homoiconicity makes it worse, because there is no structural way to distinguish "this is a binding the sub-task should see" from "this is leftover from an earlier computation." Both are just tokens. The stabilisation gradient from instructions to scripts is one response — crystallising practices into deterministic code imposes structural boundaries that the homoiconic medium itself does not provide.
Prompt injection. The most direct consequence: if instructions and data are the same medium, data can contain instructions. This is the LLM equivalent of SQL injection, and it exists for the same structural reason — the program/data boundary is conventional, not enforced.
Discoverability. Emacs's .emacs files are notoriously personal and opaque, Lisp macros can make code unreadable, and ad hoc instructions notes are invisible unless you know they exist. When everything can be both program and data, it's hard to know what a system actually does.
Relevant Notes:
- ad hoc prompts extend the system without schema changes — exemplifies: the extensibility benefit of homoiconicity — instructions and content share a medium, so new requirements get absorbed without schema changes
- LLM context is composed without scoping — exemplifies: the scoping cost of homoiconicity — no structural boundary means no guardrails against scope contamination
- agentic systems interpret underspecified instructions — foundation: the underspecified semantics of the shared medium is what makes both the extensibility and the hazards distinctive
- unified calling conventions enable bidirectional refactoring — applies: the calling convention works because agents and tools share a homoiconic interface — names resolve to either neural or symbolic implementations
- programming practices apply to prompting — context: the structuring disciplines (typing, testing, compilation) that programming practices bring are especially needed in a homoiconic medium that provides no built-in structure
- methodology enforcement is stabilisation — grounds: the stabilisation gradient from instruction to script imposes structural boundaries that the homoiconic medium itself cannot provide
- context efficiency is the central design concern in agent systems — intensified by: homoiconicity means instructions and data compete as equal tokens, removing any priority mechanism for protecting instructions from dilution
Topics: