035-Review jobs finalize all-or-nothing with derived artifacts

Type: ../types/adr.md · Status: accepted

Status: accepted Date: 2026-07-01

Context

ADR 034 established queued review jobs, selector-JSON creation, parent-dispatched workers, and execution provenance separate from freshness identity. Its first implementation still carried extra state and workflow surface:

commonplace-claim-review-job moved jobs from queued to running before worker dispatch.
review_jobs stored artifact paths that were derivable from the job id.
review_pairs stored pair_status, so failed multi-pair jobs could retain completed pairs.
Live parsing accepted several free-text aliases and inference fallbacks.

Those features added maintenance surface without enough current operational value. There is no scheduler with leases or heartbeats, so running did not enforce ownership. Persisted artifact paths duplicated deterministic naming rules. Partial salvage made acceptance reasoning harder because a failed job could still advance freshness for a subset of pairs. Permissive parsing made model drift look like successful review output.

Decision

Review jobs now have exactly three statuses: queued, completed, and failed. Job creation prepares queued work and prompt artifacts. Worker dispatch remains parent-owned and does not mutate the review DB. commonplace-finalize-review-job records optional provenance at finalization time:

--runner records the execution medium or worker label.
--model records the concrete worker model and validates build_model_partition(--model, --effort) against the job's model_partition.
--effort requires --model.

Artifact paths are derived, not persisted. The job directory is kb/reports/bundle-reviews/review-job-{review_job_id}/; prompt, bundle output, manifest, and per-pair result paths are pure functions of the job id, packing, and complete pair set.

Finalization is all-or-nothing. The finalizer validates parse coverage before mutating acceptance state, and result-file write failures roll back the DB completion. Missing, duplicate, unexpected, malformed, or result-less pair blocks fail the whole job. Failed jobs reset pair completion state (decision and reviewed_at remain null) and write no acceptance rows.

Live parsing is strict: each pair block must end with exactly one final result line:

## Result: PASS
## Result: WARN
## Result: FAIL
## Result: ERROR
## Result: REPORT

Aliases and inferred decisions are invalid in live finalization. Since the schema-v5 amendment below, parsing is against the pair's persisted result_kind: verdict pairs accept only PASS, WARN, FAIL, or ERROR; report pairs accept only REPORT, which is a completion marker rather than a decision.

Acceptance evidence is guarded at the SQL boundary. current_gate_acceptances joins acceptance through review_pairs and review_jobs, and only exposes rows whose parent job is completed and whose pair satisfies per-kind completion: reviewed_at plus a decision for verdict pairs, or reviewed_at plus a null decision for report pairs. This makes the freshness selector robust even if an accidental acceptance row is inserted.

Result files are evidence and remain fatal: a result-file write failure prevents DB completion and then marks the job failed in a separate failure transaction. MANIFEST.json is display/debug output, so a manifest refresh failure after DB completion does not fail the job; finalization reports it as a warning.

ADR 036 later changed successful acceptance from append-only events to a current-state upsert and moved superseded-review pruning inline with that success transaction.

Schema migration remains exceptional rather than a general compatibility promise. Stores whose historical evidence cannot be represented by the current schema must be recreated. The schema-v5 amendment below adds one recorded exception for populated v4 stores whose verdict evidence can be preserved exactly.

Amendment: result kinds and the v4→v5 migration (2026-07-11)

Review pairs persist result_kind = verdict | report, separating protocol completion from a decision. Jobs are result-kind homogeneous, finalization parses against the persisted kind, and REPORT completes a report pair with reviewed_at while leaving decision null. This extends the all-or-nothing rule without weakening it: every expected pair must still complete under its own contract before the job advances acceptance.

Because populated v4 stores contained paid-for verdict evidence already representable in v5, the v4→v5 migration upgraded them in place while preserving review-pair IDs and acceptance references. This was a narrow evidence-preservation migration, not restoration of the general migration substrate removed by this ADR.

Amendment: criterion-axis naming and schema v6 (2026-07-11)

The generic assay axis now uses criterion throughout the schema, Python API, JSON and artifact fields, protocol labels, stale reason, and CLI: criterion_path, criterion_id, current_criterion_acceptances, criterion-changed, --grouping criterion, commonplace-ack-review, and commonplace-resolve-criteria. Gate remains only for the closed-ended, verdict-kind criterion type, its authored gate_id, its catalog, and --all-gates.

This is schema v6. Existing stores are rejected and must be recreated. The narrow v4→v5 migration script remains available for one final evidence-preservation use, but normal initialization does not migrate old stores and v5 is not accepted as a v6 store. Commonplace has no external consumers that justify a general compatibility layer, and the cleaner invariant is that the concept and every generic identifier share one name.

Deferred:

selector/create consolidation into a convenience command;
schema-validated structured output;
deciding whether MANIFEST.json should shrink or remain as an inspection artifact.

Consequences

Easier:

The live workflow has one fewer required command: create jobs, dispatch workers, finalize jobs.
Job status reflects only durable review-state transitions, not unenforced worker ownership.
Artifact naming is centralized in code and cannot diverge from DB rows.
Failed jobs cannot silently advance freshness.
Strict parsing makes malformed live output visible immediately.
The freshness boundary has a defensive SQL invariant, not only a caller convention.

Harder / accepted costs:

A parent cannot mark a job as in progress inside the review DB. External orchestration must track dispatch progress itself.
A mostly complete multi-pair output with one missing pair must be rerun or repaired outside finalization; the completed subset is not accepted.
Historical prose and proposals that discuss claim/running or partial salvage must be read as superseded context unless they cite this ADR as current.

Relevant Notes:

034-Queued review jobs and execution provenance — supersedes: keeps parent-dispatched queued jobs and nullable execution provenance while moving provenance to finalization and removing running state.
033-Honest review state behind a versioned migration substrate — supersedes-in-part: keeps honest queued work but removes running/start state.
032-Review freshness uses DB snapshots, not Git — extends: reinforces DB-owned accepted baselines through the guarded current-acceptance view.
029-review execution unified on (note, gate) pairs — supersedes-in-part: keeps the pair grammar and packing model while removing partial salvage from live finalization.
review system — implemented-by: current operator-facing workflow.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search