Inspectable substrate, not supervision, defeats the blackbox problem

Type: note · Status: current · Tags: learning-theory, observability

The claim from ML

Chollet observes that sufficiently advanced agentic coding is essentially machine learning: an optimization process (coding agents) iterates against a goal (spec + tests) until convergence, producing a blackbox artifact (the generated codebase) that is "deployed without ever inspecting its internal logic, just as we ignore individual weights in a neural network."

He predicts classic ML failure modes will follow: overfitting to the spec, Clever Hans shortcuts that don't generalize, data leakage, concept drift. And asks: what will be the Keras of agentic coding — the optimal high-level abstractions for steering this process?

Where the framing breaks

The blackbox analogy holds only if the output substrate is opaque. Neural network weights are opaque to any inspector — human or LLM. But repo artifacts (prompts, schemas, evals, deterministic code) are inherently inspectable. They can be diffed, tested, reverted, and reviewed — by humans or by LLMs. The substrate is what matters, not who reviews it.

An LLM can review a diff and catch a Clever Hans shortcut in generated code. It can run evals and detect overfitting to the test suite. It can compare a codified function against its specification and flag edge cases. None of this is possible with weight updates — not because LLMs lack judgment, but because weights lack structure.

The failure mode mapping

Chollet's predicted ML problems map directly to codification failure modes — but with mitigations that weight-based systems can't match:

ML failure mode Codification equivalent Mitigation available
Overfitting to spec Goodharting on evals Broader eval sets, LLM-as-judge on unseen cases
Clever Hans shortcuts Bad assumptions codified confidently Diff review (human or LLM), property-based tests
Concept drift Model drift breaking codified prompts Regression evals, CI gates
Data leakage Test/train contamination in eval suites Held-out eval sets, adversarial test generation

Every mitigation relies on the same property: the artifact is inspectable. You can write a test for a function. You can't write a test for a weight.

The real question

Chollet asks "what will be the Keras of agentic coding?" — the abstraction layer that lets humans steer codebase "training" with minimal cognitive overhead. The verifiability gradient is a candidate answer: it tells you which grade of codification to use for each piece of your system, based on how verifiable you need it to be. The constrain/relax cycle is the steering mechanism — codify when patterns emerge, relax when new requirements appear. And crucially, neither the gradient nor the cycle requires a human in the loop. They require an inspectable substrate.


Relevant Notes: