Structured-output codec for the review protocol

Type: ../types/design-proposal.md · Tags: kb-maintenance

The review protocol's output side is one encoding: sentinel-delimited markdown blocks parsed by a hand-written state machine, with the outcome recovered from a strict final result footer. Harnesses are starting to offer schema-validated structured output for sub-agent calls — Claude Code's dynamic workflows agent(prompt, {schema}) forces a validated structured-output tool call and returns a parsed object, with validation retried at the tool-call layer. If that capability generalizes across harnesses, the remaining free-text failure classes disappear at the source. This proposal holds the design for making the output encoding a codec choice rather than a protocol property.

Current state (as of 2026-07-01)

One encoding: === PAIR REVIEW START: {note} :: {criterion} === blocks (ADR 029). protocol/parser.py extracts blocks into ParsedJobOutput; protocol/outcomes.py requires exactly one final marker allowed by the pair's persisted result kind: PASS|WARN|FAIL for verdict pairs or REPORT for report pairs; ERROR fails the job without producing a pair result.
The codec is contained in protocol/ plus finalization/artifact rendering.
Consumers downstream of parsing are mostly encoding-independent already: review_pairs.outcome stores the parsed enum, a derived result artifact path points to the retained markdown review body, and warn_selector extracts findings from a ### Findings section convention in that artifact.
External executors receive the contract through rendered prompts created from selector JSON (commonplace-create-review-jobs --input ... --grouping {note,criterion}), write job-owned output files, and return control to finalization.
Trigger not yet met: schema-validated sub-agent output ships in one harness's workflow scripts; the subprocess CLIs (claude -p, codex exec) and the live-agent file-artifact path have no equivalent surface the review system could consume today.

The design

A codec is the pair (render the output contract into the prompt, decode raw output into ParsedJobOutput). Two codecs:

markdown-sentinel (today): contract rendered as sentinel instructions plus a block template; decoder is the existing parser plus outcome protocol.
structured (new): contract expressed as a JSON schema — per pair: note path, gate id, summary, findings (severity + text), optional suggested revision, outcome as an enum. The decoder validates and maps to ParsedJobOutput; the outcome parser chain is bypassed entirely because the outcome arrives as a constrained field. The per-pair markdown result file can be rendered from the structured fields, so warning extraction and human review still use the derived result artifact boundary.

ParsedJobOutput is already the codec-independent boundary type. Missing pairs and structural errors both translate to failed live finalization under the all-or-nothing policy; a structured decoder should still distinguish them so callers can report the cause.

Free choices

Where codec selection lives. A flag on prepare/ingest commands chosen by the orchestrator (it knows whether its harness supports schemas), a per-runner-adapter property, or a project-level default. The orchestrator-chooses option matches the medium-pluggability direction carried from ADR 030 through ADR 035: the parent agent or harness owns dispatch and fan-out, Commonplace owns state and parsing.
Schema shape for findings. Mirror the current markdown sections one-to-one (summary/findings/revision) or take the opportunity to constrain severity to an enum and findings to a list — more validation power, but historical rationale text and the new format diverge in expressiveness.
Canonical retained form. Keep the markdown result file as the single retained review-body representation (structured output rendered to markdown on ingest), or add a new structured artifact alongside it. The first keeps one read path; the second preserves machine-readable findings for the gate-statistics ambitions in gate learning from accepted edits.
Whether the markdown codec ever retires. Free-text markdown is the lowest common denominator every harness supports; retiring it would couple the review system to schema-capable harnesses.

Adoption criteria

Adopt when a harness medium the project actually uses for reviews exposes schema-validated output at a surface the system can consume (a workflow orchestrating review batches, or a subprocess CLI flag). Adopting earlier buys nothing: the markdown codec must stay regardless, and the structured decoder cannot be integration-tested without the capability.

Risks

Two codecs mean two prompt contracts to keep semantically aligned; the codec interface must own both sides (render + decode) so a contract change cannot touch one encoding only.
Schema validation failures look terminal but may be retryable at the harness layer; the decoder should distinguish "harness delivered invalid object" from "model omitted a pair" for diagnostics even though both fail live finalization.
Findings-as-enum tightens what reviewers can express; the markdown codec's prose findings have carried nuance (multi-severity bullets, inline suggested rewrites) that a first schema draft may flatten.

Relevant Notes:

Claude Code dynamic workflows — abstracted-from: the schema-validated agent() option whose generalization is this proposal's trigger
029-review execution unified on (note, gate) pairs — see-also: established the single grammar and ParsedJobOutput boundary a second codec would plug into
035-review jobs finalize all-or-nothing with derived artifacts — see-also: current decision carrying forward the medium-pluggability seams (parent-owned dispatch, superseding ADR 030 via ADR 034) codec selection would ride on
gate learning from accepted edits — see-also: per-gate statistics would benefit from machine-readable findings, one of the free choices here

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search