Read-back placement — backfill plan
Handoff from the read-back-placement workshop, which added a ## Read-back Placement section + push-activation tag to the agent-memory-system review type. This plan applies that section to the legacy review corpus. New reviews that already contain read-back treatment are explicitly skipped.
Goal
- Every legacy non-archived review carries a one-line direction verdict — pull / push / both, from the agent's perspective.
- Reviews of systems with a relevance-gated or engineered activation path also get the full
## Read-back placementsection and thepush-activationtag.
Scope
- In: the 96 legacy non-archived review files in
kb/agent-memory-systems/reviews/that do not already contain the new read-back treatment. - Out:
dir-index.md, all*.replaced.*archived reviews, and the four new reviews that already contain read-back treatment:a-mem.md,graphiti.md,letta.md, andmem0.md. - Target queue:
read-back-backfill-targets.txtlists the 96 in-scope review paths in deterministic order. - Inventory + clone paths:
systems.csv(100 systems;path to cloned repogives source for Tier B).
Two tiers (match the gating rule)
| Tier A — direction verdict | Tier B — full section + tag | |
|---|---|---|
| Applies to | all 96 | engineered-activation subset only |
| Source needed? | No — re-express what the existing review already documents | Yes — re-read source for the 6 axes |
last-checked |
do not bump (not a re-inspection) | bump + record reviewed_revision |
| Adds tag? | no | push-activation |
| Cost | cheap, batchable | expensive |
Freshness rule (non-negotiable): the Tier-A one-liner is a classification of existing review findings under the new axis, not a fresh source claim — so it must not touch last-checked. Only Tier B, which actually re-reads source, updates freshness metadata. If an existing review lacks enough detail to classify direction, write direction: unclear (needs source) rather than guessing — never invent a push/pull claim the prior review doesn't support.
Phases
Phase 0 — Triage from existing review text (cheap, no source)
Read each in-scope legacy review's existing retrieval/navigation prose and bucket it:
- pull-only — retrieval/query interface, no proactive injection → Tier A one-liner, no tag.
- always-load-only — unconditional context injection, no relevance gating → Tier A one-liner (name always-load as a deliberate push choice), no tag.
- engineered-activation — a matcher (embedding / action-classifier / LLM-judge / typed cue), scope budget, before-action hook, or faithfulness test → Tier B shortlist.
- unclear — prose insufficient to classify → needs a source peek before Tier A.
Output: the Tier-B shortlist + the unclear set. The 69 trace-derived reviews are the highest-yield place to look for engineered activation, but learning ≠ activation — confirm each independently.
Phase 1 — Tier-A sweep (all 96)
Add the one-line direction verdict to every in-scope legacy review (placed in Comparison with Our System or Core Ideas; no fixed heading — the one-liner is not schema-gated). Resolve unclear cases with a quick source check (clone path in CSV). Do not bump last-checked. Do not edit the four skipped new reviews.
Phase 2 — Tier-B full sections (shortlist)
For each engineered-activation system: re-read source, write the ## Read-back placement section (direction, trigger, timing, scope, authority-at-consumption, faithfulness, other consumers), add push-activation to tags, mark precision/dilution/effective-authority as not-verified-from-code, report library API surface as capability. Bump last-checked + reviewed_revision.
Phase 3 — QA + validate
uv run commonplace-validateover the reviews dir — the schema conditional enforcespush-activation⇒ body contains "Read-back placement", catching tag/section mismatches.- Spot-check a random sample of Tier-A verdicts against source for classification accuracy.
Execution mechanics
- Track in a dedicated ledger
read-back-backfill-ledger.csvkeyed by review filename (review_file,direction,tier,source_checked,notes) for the 96 processed legacy reviews — cleaner than wedging columns intosystems.csv, which is keyed by system, not by review file. See the Tier-A instructions. - Queue: assign reviews from
read-back-backfill-targets.txt. For one-by-one sub-agent execution, give each worker exactly one path from the target queue and require it to process only that path. - Batching: 96 reviews is fan-out-shaped. A multi-agent workflow (Phase 0 triage in parallel → Phase 1/2 per-review) is the efficient execution, but only on explicit opt-in — say "workflow" to authorize it; otherwise this proceeds as sequential one-file agent passes.
- Guardrail: validation after each batch, not just at the end.
Recommended integration with the review pass — hybrid
Do not run Phase 2 as a standalone sprint. Recommended:
- Now: Phase 0 + Phase 1 (the cheap one-liner sweep over all 96) as a standalone task — gives full direction coverage immediately and is the high-value, low-cost half.
- Riding the next review pass: Phase 2 full sections land when each shortlisted system is re-reviewed anyway (it needs source either way). This avoids a second source-read just for read-back and keeps
last-checkedhonest.
This matches the default already recorded in the read-back-placement workshop: new reviews carry the section; existing ones get the one-liner now, full section on re-review.
Definition of done
- All 96 in-scope legacy reviews have a direction verdict or an explicit
unclearflag. - Every engineered-activation system that has been re-reviewed has a full section +
push-activationtag. read-back-backfill-ledger.csvreflects per-review status for the 96 processed legacy reviews.- Validation passes clean across the reviews dir.
- (Until Phase 2 fully rides out the review pass, "done" for full sections is tracked per-system, not as a single sprint completion.)
Risks / cautions
- Honesty of freshness — the main trap. Batch tooling must be wired so Tier A cannot bump
last-checked. Treat any Tier-A edit that touches freshness metadata as a bug. - Learning ≠ activation — don't auto-promote
trace-derivedreviews topush-activation; classify activation on its own evidence. - Library under-determination — for SDK/library reviews, the repo may show only pull primitives; report capability, don't assert deployed push, and don't force a Tier-B section where the host wiring is invisible.
- Silent truncation — if a run caps coverage (e.g. top-N), log what was skipped; partial coverage must not read as complete.