Unified calling conventions enable bidirectional refactoring between neural and symbolic

Type: note · Status: current · Tags: learning-theory, computational-model

The underspecified instructions framing says components should move between underspecified (LLM-interpreted) and precise (code) semantics as systems evolve — constrain patterns to code, relax rigid code back to LLM. But the framing doesn't say how to make the boundary movable in practice. The answer is a unified calling convention: if neural and symbolic components present the same interface, callers don't need to know which they're talking to, and refactoring across the boundary becomes a local operation.

The mechanism

llm-do implements this through a hybrid VM where agents (.agent files, LLM-backed) and tools (Python functions) share a single namespace. The LLM sees both as callable functions. A call to ticket_classifier might dispatch to an agent today and a Python function tomorrow — the prompt that invokes it doesn't change.

This requires name-based dispatch: components are identified by string name rather than direct object reference. Names enable dynamic resolution (the LLM outputs a string, the runtime looks it up), late binding (the callee needn't exist when the caller is defined), and implementation-agnostic interfaces (the same name can resolve to either neural or symbolic).

Agent ──calls──▶ Tool ──calls──▶ Agent ──calls──▶ Tool ...
neural          symbolic         neural          symbolic

The calling convention is uniform across the chain. Each link can be independently refactored without disturbing the rest.

Why this matters for constraining

Constraining and codification describe the learning mechanisms — narrowing the interpretation space, changing medium. But without a unified interface, each codification step is a breaking change: call sites must be updated, prompt structure must change, the agent's view of available operations shifts. This friction discourages incremental refactoring and pushes toward big-bang rewrites.

With unified calling, the progression is smooth:

Start neural — define an agent to handle a task. Quick to add, handles ambiguity.
Observe patterns — the agent consistently lowercases and replaces spaces with underscores. This is spec-mining — discovering that an underspecified spec consistently resolves to one interpretation, then committing to it in code.
Codify — extract sanitize_filename() to Python. The agent still handles ambiguous cases. The call site doesn't change.
Extend via relaxing — new requirements emerge (handle Unicode, detect dates). Add an LLM call for the new cases. Again, the call site doesn't change.

Each step is local. The system evolves without coordination cost.

The scheduler layer

On top of the hybrid VM, llm-do adds an imperative scheduler — Python code that owns control flow rather than a graph DSL. This is a deliberate contrast with declarative agent frameworks (LangGraph, CrewAI) where orchestration is defined as node-edge graphs.

Aspect	Graph DSLs	llm-do Scheduler
Orchestration	Declarative: Node A → Node B	Imperative: Agent A calls Agent B as a function
State	Global context through graph	Local scope — each agent gets only its arguments
Refactoring	Redraw edges, update graph	Change code — extract functions, inline agents
Control flow	DSL constructs	Native Python: `if`, `for`, `try/except`

The imperative style means refactoring between neural and symbolic uses the same patterns as normal code refactoring — extract function, inline, rename. No graph topology to update.

The connection to typed callables

Instructions are typed callables argues that prompts, skills, and tools share a callable structure with typed inputs and outputs. llm-do operationalises this: .agent files are YAML frontmatter (type signature) plus system prompt (implementation), and tools are Python functions with type annotations. Both are callables with defined interfaces. The unified calling convention is what makes the type-theoretic view practical rather than just analogical.

Open Questions

Does unified calling break down at scale, when the namespace grows to hundreds of components and name collisions become likely?
How does debugging work when a call chain crosses the neural-symbolic boundary multiple times — do existing observability tools handle this, or does the hybrid VM need its own tracing?
Is the imperative scheduler pattern specific to Python, or does it transfer to other host languages?

Relevant Notes:

agentic-systems-interpret-underspecified-instructions — foundation: the underspecified instructions framing that this note makes architecturally concrete
constraining — the mechanism that unified calling makes frictionless
codification — the phase transition from neural to symbolic that unified calling makes a local operation
spec-mining-as-codification — the operational mechanism: observe agent behavior, extract to code — enabled by stable call sites
instructions-are-typed-callables — the type-theoretic view that llm-do operationalises
programming-practices-apply-to-prompting — extends: extract-function and inline refactoring transfer directly when calling conventions are unified
operational-signals-that-a-component-is-a-relaxing-candidate — extends: relaxing signals answer "when should you relax?" — unified calling makes the relaxing refactoring cheap once the signal fires

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search