Eric Evans: AI Components for a Deterministic System
Source: https://www.domainlanguage.com/articles/ai-components-deterministic-system/
This note analyzes Eric Evans' article on integrating LLM-based components into deterministic software systems and explores how its framework validates and extends llm-do's design.
Article Overview
Evans identifies a fundamental tension: LLMs produce non-deterministic outputs that resist integration into structured, conventional software. Using domain classification in code repositories as an example, he proposes separating concerns to manage this tension.
Core Principles
- Separate Modeling from Classification
- Modeling: Creating categorization schemes (exploratory, creative)
- Classification: Assigning categories within a scheme (repeatable, deterministic)
-
Treat these as fundamentally different tasks
-
Create Canonical Categories First
- Freeze a taxonomy before classification begins
-
Ensures comparable results across invocations
-
Leverage Established Standards
- Use published classification systems (NAICS, ISO, etc.) for generic domains
-
"Published languages have great advantages! They are worth looking for."
-
Human-Driven Modeling for Core Domains
- For custom categorization: "have humans drive the modeling in an exploratory, iterative process"
- LLMs excel at classification within human-designed frameworks
Alignment with llm-do
Evans' framework maps directly to llm-do's core philosophy:
| Evans' Concept | llm-do Equivalent |
|---|---|
| Modeling (exploratory) | LLM workers exploring options |
| Classification (repeatable) | Extracted Python tools |
| Frozen taxonomy | Schemas (input_model_ref, output_model_ref) |
| "Stabilize the categories" | "Extend with LLMs, stabilize with code" |
The unified calling convention in llm-do means transitioning from modeling to classification is local—callers don't change when a worker becomes a tool.
Schema-Driven Design
llm-do already supports Evans' "canonical categories first" via:
- input_model_ref / output_model_ref in worker frontmatter
- Pydantic models as frozen contracts
- Validation at trust boundaries
Potential Extensions
1. Leverage Established Standards for Generic Subdomains
When workers deal with generic subdomains, reference established taxonomies rather than letting LLMs invent categories:
| Domain | Standard to Consider |
|---|---|
| Business sectors | NAICS codes |
| Document types | ISO standards |
| Licenses | SPDX identifiers |
| Commit messages | Conventional Commits |
| Error categories | HTTP status codes, syslog severity |
Implementation: Document this as a pattern; add examples showing workers that use external taxonomies.
2. Judge Model Pattern for Taxonomy Selection
Evans describes iterative refinement with a "judge" model:
1. Sampling worker: generates N candidate categorization schemes
2. Judge worker: evaluates candidates against criteria (coverage, overlap, specificity)
3. Output: frozen schema for downstream classification workers
This could be:
- A documented meta-pattern
- A reusable worker template
- An example in examples/taxonomy-generation/
3. Explicit Modeling vs Classification Phase Markers
Consider a worker config flag:
---
name: categorize_files
phase: classification # vs "modeling" for exploratory work
---
This could: - Enable stricter validation (same input → same output expected) - Trigger warnings if outputs vary too much across runs - Guide approval policies (classification = lower risk, more automatable)
4. Two-Phase Workflow Documentation
Document the pattern explicitly:
Phase 1: Modeling (human-in-loop) - Workers generate candidate schemas - Human reviews, refines, selects - Output: frozen Pydantic model or enum
Phase 2: Classification (automated) - Workers use frozen schema - Repeatable, testable - Progressive stabilization candidate
Implications for Progressive Stabilization
Evans reinforces the stabilization workflow with clearer triggers:
Signals to stabilize (worker → tool): - Classification task with frozen taxonomy - Consistent output structure across runs - High repeatability requirement
Signals to keep underspecified (LLM-interpreted): - Exploratory modeling phase - Evolving requirements - Edge cases requiring judgment
Semantic Boundaries
Evans' modeling/classification distinction maps to llm-do's semantic boundaries — the crossings between underspecified (LLM-interpreted) and precise (deterministic code) semantics. "Freeze a taxonomy before classification" is a specific instance of the broader pattern that storing LLM outputs is stabilization — resolving semantic underspecification to a fixed interpretation, then working deterministically with the result.
| Type | Semantics | Testing Approach |
|---|---|---|
| Tools (classification) | Precise — same input, same output | assert result == expected |
| Workers (modeling) | Underspecified — spec admits multiple valid interpretations | Sample and check invariants |
Schema validation sits at the trust boundary between these. The two testing approaches map to the two distinct testing targets for stabilized artifacts: testing the interpretation space (does the prompt reliably produce good output?) vs testing a specific interpretation (is this specific output good?).
Summary
Evans' article validates llm-do's core approach ("extend with LLMs, stabilize with code") while suggesting we could be more explicit about:
- The modeling/classification boundary
- When to use established standards
- How to generate and freeze taxonomies
- Phase markers in worker configuration
The key insight: LLMs are excellent classifiers but unreliable modelers. Design systems that leverage this asymmetry. The bitter lesson provides the counter-argument: general-purpose methods scaling with computation have historically outperformed hand-crafted domain knowledge, which suggests that freezing human-designed taxonomies may be a temporary engineering expedient rather than a durable design principle. Whether Evans' approach survives as a wise division of labor or gets dissolved by scaling model capability is an open question.
References
- Article: https://www.domainlanguage.com/articles/ai-components-deterministic-system/
- Evans' DDD work: https://www.domainlanguage.com/
- agentic systems interpret underspecified instructions — underspecified vs precise semantics, interpretation narrowing, stabilise/soften
- Related: adaptation-agentic-ai-analysis.md