Conversation vs prompt refinement in agent-to-agent coordination
Type: note · Status: seedling · Tags: computational-model
This note examines one local case of the broader handoff-artifact problem: when a sub-agent returns with a question instead of an answer, what should cross the boundary?
The calling agent has at least three options:
- Conversational Q&A — answer the question and let the sub-agent continue with its accumulated context.
- Prompt refinement — incorporate the answer into a revised, self-contained prompt and re-dispatch with a clean context.
- Hybrid — answer the question to continue the current invocation, but also capture the answer to refine the prompt template for future similar invocations.
Conversation feels natural because humans can't rewind. Once we've said something, we can only append corrections. Agents have no such constraint — they can cheaply re-invoke with a better prompt, effectively rewinding to before the misunderstanding.
The tradeoff
Conversation is cheaper for the caller. The caller just passes the answer string. The sub-agent continues with its existing context, including whatever useful work it did before asking the question. This preserves the execution trace in-band.
Prompt refinement is cleaner for the callee. Each invocation gets a fresh lexically scoped frame without the accumulated debris of the initial misframing, the question, and the correction. The refined prompt is a compressed handoff artifact: it preserves what mattered from the exchange without carrying the whole trace forward.
Prompt refinement is more work for the caller. The caller must: parse the sub-agent's question, formulate the answer, integrate it into a revised self-contained prompt, and re-dispatch. This is genuine coordination work that conversation avoids.
Conversation preserves intermediate results. If the sub-agent has done significant work before asking the question — 80% through a long-running task, say — refinement discards all of it. Conversation preserves it. The later in the task a question arises, the stronger the case for continuing rather than restarting.
Where should complexity live?
The tradeoff resolves differently depending on the architecture. In the bounded-context orchestration model, the scheduler is already the coordination layer — it holds unbounded symbolic state, assembles prompts, and orchestrates the workflow. Adding prompt-refinement logic to the scheduler is incremental complexity in the right place. Adding conversation history to the sub-agent's context is wasted tokens in the scarce resource.
When the caller is also an LLM (the degraded variant of the clean model), prompt refinement requires the caller to do integration work within its own bounded context. The "right place for complexity" argument weakens — both sides are context-constrained.
This suggests a tentative design heuristic rather than a hard principle: conversation is the natural interface for human-agent interaction; prompt refinement has advantages for agent-agent interaction when the caller is a symbolic scheduler. The qualifier matters — the heuristic depends on the caller having unbounded state to work with.
Onboarding and forking
The voooooogel multi-agent prediction proposes onboarding interviews — spawned instances ask questions back to their parent in an interactive conversation to gather context before starting. The argument: "it's just too difficult to ask a model to reliably spawn a subagent with a single prompt."
One reading through the refinement lens: the onboarding interview is useful not because conversation is the right interface, but because the caller's initial prompt was underspecified. The interview surfaces what the caller should have said. A refinement-oriented caller could capture those answers and build a better single-shot prompt — possibly for re-use across many similar sub-agent invocations, which a pure conversation model cannot reuse.
But voooooogel's forking pattern adds a twist. The pattern is: spawn one sub-agent, onboard it via conversation, then fork into N instances that each carry "the whole onboarding conversation in context." The forked instances don't continue the dialogue — they branch from a shared prefix. This is a sub-pattern within conversation, not a third option: the onboarding is conversational, and forking is an optimization that amortizes that conversation across N workers via KV-cache sharing. The prefix is preserved, not distilled into a clean prompt, so it inherits conversation's strengths (preserves intermediate context) and weaknesses (carries the full trace, including any misframing).
Open Questions
- When does the sub-agent's partial work (before asking the question) outweigh the cost of re-doing it from a clean prompt? Is there a useful heuristic based on task progress?
- Can the refinement loop be made cheaper by having the caller maintain prompt templates that get progressively refined across multiple sub-agent invocations?
- Does the conversation/refinement distinction collapse with KV-cache sharing? If forked instances share cached prefixes, clean context may be achievable even within a conversational interface.
- Does forking have failure modes beyond those inherited from conversation (e.g., prefix staleness as the task evolves)?
Relevant Notes:
- LLM context is composed without scoping — foundation: sub-agents as lexically scoped frames is what makes prompt refinement produce cleaner context than conversation
- session history should not be the default next context — context: conversation, refinement, and forking are three different answers to whether execution history itself should be the unit later calls inherit
- bounded-context orchestration model — foundation: the scheduler already holds the coordination state that prompt refinement requires
- context efficiency is the central design concern in agent systems — motivation: conversation adds volume (misframing and correction transcript) to the scarce context resource
- LLM-mediated schedulers are a degraded variant of the clean model — complicates: when the caller is also an LLM, the "push complexity to the scheduler" argument weakens
- distillation — foundation: prompt refinement is distillation — targeted extraction of the caller's knowledge into a focused artifact shaped by the sub-agent's task
- evolving understanding needs re-distillation not composition — parallel: holistic rewrite of evolving narratives is the same operation (re-distillation) applied to a different target