An outcome check licenses replay; a rule needs the process verified

Type: kb/types/note.md · Status: seedling · Tags: learning-theory, agent-memory

The verify rung of trace-derived memory — the checked-episode stage before a lesson is distilled into a reusable rule — needs "an oracle that can discriminate a correct diagnosis from a plausible-but-wrong one," but leaves open what the oracle checks. There are two choices, and they are different oracles. An outcome check asks whether the final answer came out right. A process check asks whether the intermediate steps held — whether the answer was reached for a reason that survives inspection. They pass different things, and the difference decides which rung an artifact can climb to.

Outcome checks admit the right answer for the wrong reason

An outcome oracle has a characteristic false positive: the right answer reached by a wrong or coincidental route — a lucky guess, a spurious shortcut, two errors that cancel. The check fires "correct," but the route that produced the answer is unexamined. A process check spends its discrimination on exactly that route, so it catches failures an outcome pass waves through. This is why the trace-derived systems survey records OpenClaw-RL, a source-covered policy-learning backend for agent API traffic, as combining scoring over assistant output plus next-state evidence with reward-style training samples: process-like evidence can reject trajectories that a final-state outcome signal might otherwise reinforce.

What each oracle licenses

The two checks license different uses of the verified episode, and the uses correspond to the two rungs above verify:

  • A success that passed only an outcome check is safe to replay verbatim in the same context. The claim is just "this produced the right result here," and replay re-runs here — claim and use match. This is the success preserved as a concrete, replayable demonstration rather than a rule.
  • Distilling a rule transfers the mechanism to new contexts: a rule asserts "do X because Y." An outcome check never inspected Y, so it cannot tell a real mechanism from a coincidence — and the coincidence is precisely the part that fails to transfer. Only a process check verifies the why, and the why is what carries past the original case.

So the verify and distill rungs do not merely "have different oracles" in the abstract; the distill rung specifically requires a process oracle. An outcome oracle, however hard, can climb fail→verify for replay but cannot license verify→distill. Generalizing a rule off an outcome check is over-generalization at its root — it stamps rule authority on a correlation whose mechanism was never checked.

This is a what-you-check axis, orthogonal to oracle hardness

The oracle-strength spectrum grades oracles by how cheaply and reliably they check, hard to soft. Process-versus-outcome is a different axis: what the check inspects. A hard outcome oracle (a passing end-to-end test) and a hard process oracle (a checker over the steps) can both be cheap and deterministic yet license different things. Hardening along the strength axis does not convert an outcome check into a process check. You have to inspect the steps. That inspection costs more, and the checker may still be soft: a process reward model — a model that scores intermediate steps — needs its accuracy measured for the target modality before it becomes a strong oracle. Process verification buys the right kind of discrimination, not necessarily a strong amount of it.

Scope

Process verification makes a lesson's basis articulable; it does not by itself produce a statable boundary — verifying the mechanism is necessary for a trustworthy rule, not sufficient, and you still have to say where the rule stops. Where the task is an exact spec — the outcome is the thing wanted, with no hidden mechanism to generalize — the gap narrows: right-answer-wrong-reason still blocks generalization but not replay. The claim bites wherever a rule will be applied outside the context that produced it, which is the whole point of distilling one.


Relevant Notes: