Knowledge storage does not imply contextual activation

Type: kb/types/note.md · Status: seedling · Tags: llm-interpretation-errors, failure-modes, evaluation

An agent system can have the right knowledge and still fail to use it. The knowledge may exist in model weights, notes, memory records, documentation, source files, or even the live context window. That does not mean it will affect the next answer or action.

The missing step is contextual activation: making available knowledge action-relevant in the current task. Retrieval proves that the system can produce a fact when asked. Context presence proves that the fact was visible to the model. Activation is stronger: the fact changes what the agent notices, says, checks, or does without the user naming it directly.

This is why "the model knows X" is often the wrong operational question. The useful question is: will X be brought to bear at the moment when it matters?

Two Places The Transition Fails

Activation can fail before knowledge reaches the context window, or after it is already there.

Storage-to-context failure. Relevant knowledge exists somewhere, but the workflow never retrieves or loads it. This is the ordinary second-brain failure: a note, memory, or prior lesson is stored, but nothing cues it during the task. PlugLab AI's Second Brain Trap ingest is a practitioner example: abundant stored notes still left the author "starting from zero" because the material was not available in the working context.

Context-to-action failure. Relevant knowledge is visible, but the agent does not connect it to the task, plan, or next action. Englaender et al.'s Agents Explore but Agents Ignore demonstrates this boundary with solution injection. Agents often discovered explicit task solutions in their environment but did not exploit them. In AppWorld, discovery was above 90%, while exploitation was below 7%. The problem was not missing information. The information was seen and still treated as background.

Both failures produce the same practical result: a lesson that could have changed the outcome does not enter the active computation.

The Expert-Witness Pattern

Models often behave like expert witnesses rather than advisors. An expert witness answers the question asked. An advisor raises the concern the questioner did not know to ask about. Current models are much better at the first than the second.

The gap is easiest to see in review tasks. A model may explain a failure mode perfectly when prompted directly, yet omit it during an open-ended review where that failure mode would change the decision. The knowledge is retrievable. It is not reliably self-triggering.

Humans have the same shape of failure: "I knew this, but it did not occur to me." LLM systems make the control surface more explicit. Prompt context, retrieved notes, tool observations, role assignments, and checklists are the cues that decide what becomes active.

What Helps

Different interventions target different transitions.

Storage-to-context failures need routing: indexes, search, retrieval filters, skill triggers, maintained summaries, and explicit loading rules.

Context-to-action failures need integration pressure: reflection prompts, "revise the plan in light of observations" steps, mandatory investigation of surprising evidence, salience checks, and process structures that make the agent ask whether visible information should change the current plan. This is one reason process structure and output structure are independent levers: changing the reasoning process can activate knowledge without changing the final answer format.

Both transitions are affected by context scarcity. More context can help by making knowledge present, but it can also hurt by diluting cues or increasing competition. Agent context is constrained by soft degradation, not hard token limits is the broader mechanism: well-formed output can hide the fact that important material in the context was ignored.

Why It Matters

Most evaluations collapse these stages. They ask whether the model can answer a question, solve a task, or use information after its relevance has been made explicit. That tests capability after activation. It does not test whether the system will activate the right knowledge unprompted.

The expertise problem makes this worse. The user who most needs the model's latent expertise is often least able to ask the question that would activate it. That is why elicitation requires maintained question-generation systems, not just better one-off prompts. The missing questions have to come from somewhere outside the novice user and the activation-limited model.

For memory and KB design, the implication is simple: storing more knowledge is not enough, and loading more context is not enough. The system must also create reliable routes from stored knowledge to context, and from context to action.

Open Questions

  • How often does context-to-action failure occur in ordinary agent workflows, outside artificial solution-injection benchmarks?
  • Which process structures most cheaply convert visible information into plan updates?
  • Does the activation gap reliably grow with distance from the immediate artifact: syntax, operational behavior, system-level consequences?

Relevant Notes: