The bitter lesson boundary is a gradient, not a binary
Type: structured-claim · Status: current
The bitter lesson boundary draws a line between calculators (spec is the problem) and vision features (spec is a theory about the problem). But real systems have components spread across a spectrum of oracle strength — how cheaply and reliably you can check whether output is correct.
Evidence
The spectrum
- Hard oracle: exact, cheap, deterministic check. Unit tests, type checks, cryptographic verification. The calculator regime.
- Soft oracle: proxy score that correlates but isn't the real thing. BLEU, helpfulness rubrics, heuristic checks, consistency scores.
- Interactive oracle: you can ask for feedback. User edits, thumbs up/down, preference pairs.
- Delayed oracle: you only know later. Did the user churn? Did the bug surface? Did the decision pay off?
- No oracle: vibes and anecdotes.
The bitter lesson is strongest when you have a decent training signal — hard or soft oracle. It's weakest at the no-oracle end, where there's nothing for scale to optimise against. This maps directly to the Karpathy verifiability framing that deploy-time learning builds on: a task is verifiable to the extent it is resettable, efficient to retry, and rewardable — three properties that strengthen as oracle strength increases.
Reasoning
The engineering move: manufacturing guidance
The interesting reframe: the core engineering challenge isn't "crystallise or soften?" but "convert no-oracle into some-oracle, then harden the oracle." That's crystallisation applied to the objective itself, not just to the implementation.
Examples of oracle hardening: - Logging user corrections turns no-oracle into interactive oracle - Adding schema validation turns soft-oracle ("does this look right?") into hard-oracle ("does this parse?") - Building regression test suites turns delayed-oracle into hard-oracle for known cases
This suggests a priority order: invest in telemetry and eval harnesses before investing in capability, because guidance is the bottleneck, not compute. The generator/verifier pattern is only viable when verification is cheap relative to generation — which is another way of saying oracle strength must be sufficient. A high-variance generator plus quality gate outperforms a constrained generator, but only in the hard-to-soft oracle range where the gate can actually discriminate.
Connections resolved
Spec mining is the systematic method for moving components toward the hard-oracle end: it extracts regularities from observed behavior into deterministic checks, converting soft/delayed oracles into hard ones. Meanwhile, softening signals provide testable indicators for where a component sits on this spectrum — brittleness under paraphrase, isolation-vs-integration gaps, and process-heavy constraints all suggest the oracle is softer than it appears.
Caveats
- Oracle strength is itself hard to assess. Proxy scores that seem cheap and reliable may turn out to correlate poorly with the real objective — you don't always know whether your oracle is hard or soft until you test at scale.
- Open question: does oracle strength predict bitter-lessoning? If so, the spectrum is prescriptive — invest in crystallisation where oracles are hard, invest in learned approaches where oracles are soft. But this remains conjecture.
- Open question: oracle strength and crystallisation timescales. Hard oracles crystallise fast (you can test immediately); delayed oracles crystallise slowly (you have to wait for signal). The connection to crystallisation timescales seems natural but hasn't been tested.
Relevant Notes:
- bitter-lesson-boundary — foundation: the binary distinction this note refines into a gradient
- deploy-time-learning — the verifiability gradient maps to oracle strength: harder oracles enable tighter iteration loops
- deploy-time-learning — the Karpathy verifiability framing (resettable, efficient, rewardable) is an oracle-strength argument
- spec-mining-as-crystallisation — the operational mechanism for oracle hardening: extracting deterministic rules from observed behavior
- softening-signals — provides testable indicators for where a component sits on the oracle spectrum
- storing-llm-outputs-is-stabilization — the generator/verifier pattern depends on oracle strength: verification must be cheap for the pattern to work
- quality-signals-for-kb-evaluation — concrete oracle-hardening instance: manufacturing a composite soft oracle from many no-oracle/weak-oracle signals (graph topology, content proxies, LLM judgment) to drive a KB learning loop
Topics: