Execution indeterminism is a property of the sampling process
Type: note · Status: seedling · Tags: llm-interpretation-errors
LLMs sample from probability distributions over tokens. The same prompt can produce different outputs across runs. This is a property of the execution engine — conceptually simpler than underspecification, and theoretically eliminable via deterministic decoding (temperature=0).
In practice, true determinism is hard to guarantee (floating-point non-determinism, batching effects, infrastructure changes) and may not be desirable — temperature > 0 helps explore reasoning paths, enables self-consistency techniques, and avoids degenerate repetitive outputs. All deployed systems exhibit indeterminism.
Why this matters as a distinct claim
Indeterminism is engineering noise — variation in how a chosen interpretation is executed, not variation in which interpretation is chosen. At temperature=0, the LLM still picks one interpretation from the space the spec admits; you just get the same one every time. This is why lowering temperature alone doesn't solve the "wrong interpretation" problem — it eliminates variation without ensuring the remaining interpretation is the one you wanted.
Counterintuitively, indeterminism obscures the deeper issue of underspecification. Because outputs vary across runs, people attribute the variation to randomness — "it's stochastic" — and reach for familiar tools: temperature tuning, retries, sampling strategies. This framework avoids confronting the real difference from traditional programming: that the specification language doesn't have precise semantics.
The remedy is sampling control: temperature adjustment, deterministic decoding, best-of-N selection. These address run-to-run variation but leave both underspecification and interpretation error untouched.
Relevant Notes:
- agentic-systems-interpret-underspecified-instructions — elaborates: the full framework including how indeterminism and underspecification layer on each other; covers the deeper phenomenon (underspecification as a property of the specification language) that indeterminism obscures
- interpretation errors are failures of the interpreter not the spec — sibling: the third phenomenon, also unaddressed by sampling control
- LLM interpretation errors — parent area: the three-phenomena taxonomy this note is part of
Sources:
- Ma et al. (2026). Prompt Stability in Code LLMs — cleanest empirical separation of indeterminism from underspecification: by varying prompt framing (emotion/personality) while holding task constant, they isolate the effect of interpretation choice from run-to-run sampling noise