Context engineering

Type: kb/types/definition.md · Status: seedling · Tags: computational-model

Context engineering is the architectural discipline of designing systems around bounded-context computation. The immediate problem is getting the right knowledge into a bounded context at the right time, but the scope is wider than prompt assembly. If context is the governing constraint, the structures that determine what can be loaded, when, and what survives across boundaries also belong to context engineering.

Anthropic defines it as "strategies for curating and maintaining the optimal set of tokens during LLM inference" (Anthropic, 2025). This KB's treatment is consistent with that operational view but broader in architectural scope, because context efficiency is the central design concern: when bounded context is the scarce resource, whole-system structure must be designed around it.

The operational core decomposes into five components within a single bounded call:

Routing — deciding what knowledge is relevant before loading it. The instruction-specificity/loading-frequency match (always-loaded → on-reference → on-invoke → on-demand) is routing. CLAUDE.md as a router is routing. Retrieval-oriented descriptions that let agents decide "don't follow this" without loading the target are routing.

Loading — assembling the prompt from selected knowledge. The select function in the bounded-context orchestration model formalizes this: given the scheduler's accumulated state and a token budget, build a prompt that fits within the budget. Loading includes both what to include and how to frame it — the same knowledge under different framing has different extractable value.

Scoping — isolating what each consumer sees. Sub-agents as lexically scoped frames is scoping. The flat context has no scoping; architecture must impose it.

Maintenance — keeping loaded context healthy over time. Compaction, observation masking, and the workshop layer's holistic-rewrite discipline are maintenance. Without maintenance, context accumulates debris that degrades reasoning even when token counts are low.

Observability — attribution of where tokens went and why, as a prerequisite for tuning the other four components. Per-source accounting — tool results vs. tool requests vs. assistant output vs. system prompt vs. attachments — reveals what actually dominates context. Intuition about token usage is almost always wrong: developers routinely assume model output is the biggest cost, while in agentic coding sessions tool results commonly dominate instead. Useful signals include duplicate-read detection (the model re-reading files it already saw is a direct symptom that flat history is failing as a working set) and query-source tracking (which agent or subsystem initiated each call). The other four components all have tuning surfaces, but none of those surfaces are usable without attribution to anchor them.

Distillation — compressing knowledge for a specific task under a context budget — is the main operation the first four components perform, but not the only one. The bounded-context orchestration model formalizes the machinery: the solve loop where a symbolic scheduler drives routing, loading, and scoping decisions for each bounded LLM call.

Architectural scope beyond a single call

The operational core succeeds or fails based on decisions made before and after prompt assembly:

Storage format — knowledge stored in forms that are cheap to retrieve selectively. Notes, descriptions, and indexes are context-engineering structures because they determine whether routing can happen before full loading.

Knowledge lifecycle — how raw interaction becomes reusable knowledge and how that knowledge is curated over time. A KB that only accumulates transcripts has already failed the context problem upstream.

Session boundaries — a system can inherit transcript history by default or treat each call as a fresh assembly problem. Session history should not be the default next context is context engineering at the boundary level, not just the prompt level.

Inter-agent communication — when sub-agents return compressed artifacts instead of full transcripts, the boundary itself becomes a context-engineering primitive. Execution boundaries are natural sites for distillation.

Tool and interface design — tool descriptions, instruction surfaces, and generated interfaces consume context budget too. Frontloading and build-time generation shift interpretive cost out of the live context window.

A system with poor storage shape, transcript-oriented boundaries, or verbose tool surfaces cannot be rescued by a clever selector alone.


Relevant Notes: