Tracecraft
Type: ../types/agent-memory-system-review.md · Status: current
Tracecraft, from Arrmlet, is a Python CLI for coordinating multiple agents through S3-compatible object storage or HuggingFace Buckets. Its implemented system is not a long-term agent memory engine: it is a shared object layout plus command-line conventions for presence, key-value state, messages, task claims, handoff notes, dependency waiting, and file artifacts. The behavior-changing force comes from agents agreeing to run the CLI and respect the objects it writes.
Repository: https://github.com/Arrmlet/tracecraft
Reviewed commit: 2ee4631eaffce1e463eeb6caffc82feb194b2e4c
Last checked: 2026-05-16
Core Ideas
The storage substrate is object storage with a project prefix. The SDK exposes two backends behind the same small interface: a boto3 S3 wrapper and a HuggingFace HfFileSystem wrapper. Both place objects under a project namespace and support put_json, get_json, list_keys, existence checks, deletion, and file upload/download (sdk/tracecraft/s3.py, sdk/tracecraft/hf.py, sdk/tracecraft/store.py). The documented layout is agents/, memory/, messages/, steps/, artifacts/, and planned runs/ prefixes under s3://bucket/project/ (docs/s3-architecture.md).
Coordination is encoded as JSON objects and client-side conventions. tracecraft init writes local config and registers an agent object. tracecraft agents lists agents/*.json and marks heartbeats older than five minutes as stale, but the current CLI only writes the heartbeat at initialization; there is no background heartbeat updater in the inspected SDK (sdk/tracecraft/cli/init_cmd.py, sdk/tracecraft/cli/agents.py, sdk/tracecraft/config.py). Agent presence is therefore a symbolic status record, not a liveness protocol with enforced leases.
Shared memory is last-writer-wins key-value state. tracecraft memory set maps dotted keys to memory/<path>.json, stores a string value with set_by and set_at, and memory get/list read the object tree back as dot keys (sdk/tracecraft/cli/memory.py). These records are knowledge artifacts when read as current shared facts or progress hints. They are not curated durable knowledge: there is no schema beyond value/setter/time, no conflict handling, no provenance chain, no review state, and no promotion path into instructions, notes, tests, or skills.
Messages and handoffs are inbox objects, not a delivery service. tracecraft send writes timestamped JSON files under messages/<recipient>/ or messages/_broadcast/; inbox lists direct and broadcast prefixes, prints messages, optionally deletes them, and skips the current agent's own broadcasts (sdk/tracecraft/cli/messages.py). complete also writes steps/<step>/handoff.json with a freeform note (sdk/tracecraft/cli/steps.py). These objects advise and resume later agents, but ordering, acknowledgement, deduplication, retention, and deletion are left to agent behavior.
Task claiming is advisory and not actually atomic. The architecture doc says claim.json is atomic and the first writer owns the step, but the CLI implements claim as exists(claim.json) followed by put_json(claim.json) and put_json(status.json) (docs/s3-architecture.md, sdk/tracecraft/cli/steps.py). There is no conditional put, object version compare-and-set, ETag precondition, lock TTL, retry loop, or lease expiry. Under concurrent claims, two agents can both observe absence and race to write; the object store's final value becomes the apparent owner. The primitive still coordinates cooperative low-contention work, but it is not a consistency guarantee.
Dependency waiting and artifact sharing are polling and object transfer. wait-for polls each step's status.json every five seconds until all are complete or the timeout expires. artifact upload/download/list stores arbitrary files under artifacts/<step>/ or artifacts/shared/ with no manifest, checksum, dependency graph, or content metadata beyond the object key (sdk/tracecraft/cli/steps.py, sdk/tracecraft/cli/artifacts.py). Artifacts are knowledge artifacts when used as evidence or work products; they become system-definition artifacts only when an agent treats a downloaded file as executable configuration, code, or instruction outside Tracecraft's own enforcement.
The richer server architecture is mostly scaffold at this commit. server/ contains FastAPI, config, security, SeaweedFS bucket-management, and storage wrappers, but the active API exposes only health and status endpoints; the advertised REST resources, WebSockets, PostgreSQL metadata model, Redis pub/sub, CAS shared memory, and coordination services described in CLAUDE.md are not implemented in the checked files (server/tracecraft_server/main.py, server/tracecraft_server/core/config.py, server/tracecraft_server/core/security.py, CLAUDE.md). The installable surface is the Click CLI declared by the SDK package (sdk/pyproject.toml).
Docs are ahead of tests and enforcement. The README, S3 architecture doc, and example agent instructions clearly describe the intended coordination flow, but the checkout has no test files even though the SDK package config names pytest and a tests path (README.md, docs/s3-architecture.md, examples/claude-code/CLAUDE.md, sdk/pyproject.toml). That matters because the most coordination-sensitive behavior--claiming, heartbeat freshness, inbox deletion, and artifact lookup--is precisely where regressions or race assumptions would need executable checks.
Comparison with Our System
| Dimension | Tracecraft | Commonplace |
|---|---|---|
| Primary purpose | Multi-agent coordination through shared object keys | Durable agent-operated KB methodology and review system |
| Storage substrate | S3-compatible bucket or HuggingFace Bucket, plus local .tracecraft.json config |
Git-tracked markdown collections, sources, indexes, type specs, commands, reports |
| Retained objects | Agent JSON, memory JSON, message JSON, step claim/status/handoff JSON, artifact blobs | Typed notes, instructions, reference docs, sources, reviews, ADRs, generated indexes |
| Behavioral authority | CLI conventions: agents should claim, wait, read inboxes, share state, and respect status objects | Collection contracts, validation, authored links, skills, review gates, command outputs |
| Consistency model | Cooperative object-store convention; no implemented CAS or leases in CLI claims | Git/workflow discipline plus deterministic validators and explicit review lifecycle |
| Learning model | None implemented; state records coordinate current work | Manual and semi-automated distillation into durable knowledge and system-definition artifacts |
Tracecraft and commonplace share the habit of making agent state inspectable in ordinary tools. A Tracecraft bucket can be browsed in MinIO, AWS, R2, or HuggingFace; a commonplace KB can be searched and reviewed in a git checkout. Both avoid hiding all state behind a proprietary service.
The systems diverge on artifact contracts and time horizon. Tracecraft uses object names as the contract: memory/foo/bar.json, messages/agent/timestamp_sender.json, steps/task/status.json. That is enough for lightweight coordination, but the objects do not know whether they are evidence, current state, instruction, claim, or artifact except by path convention. Commonplace makes those distinctions explicit through collection types, frontmatter, indexes, validation, and review workflows.
The behavioral authority split is also different. Tracecraft's JSON records mostly advise or coordinate: a memory record tells agents what someone set, a message tells agents what someone said, a handoff note summarizes what to do next, and a status object tells waiters when to continue. The strongest system-definition artifacts are outside the bucket: the CLI implementation, the path layout, and example instruction/skill files that tell agents to use claim, inbox, wait-for, and complete (examples/claude-code/CLAUDE.md, examples/hermes-agent/SKILL.md). Commonplace pushes more authority into validated repository artifacts.
This is coordination, not durable knowledge learning. Tracecraft can preserve traces of coordination because its objects remain in the bucket, but the code does not mine messages, claims, artifacts, or runs into later behavior-changing memories. There is no extractor, judge, summarizer, retriever, ranking layer, promotion workflow, or compiled instruction update. For the trace-derived survey vocabulary, Tracecraft is a source of raw coordination records and shared knowledge artifacts, not a trace-derived learning system.
Read-back: pull — agents deliberately call CLI commands to read shared memory, inboxes, step status, handoffs, and artifacts.
Borrowable Ideas
Use object storage as a low-friction multi-agent rendezvous point. Useful for ephemeral workshops where agents do not share a filesystem. A commonplace workshop could use an S3 prefix for cross-machine messages or artifacts, but it should treat that prefix as work-in-flight state, not as the durable library.
Keep coordination primitives small and inspectable. Ready to borrow as a design heuristic. Tracecraft's useful contribution is the tiny vocabulary: agents, memory, messages, steps, barriers, handoffs, and artifacts. Each maps to obvious keys and simple commands.
Package coordination as agent-native instructions. The Claude Code and Hermes examples show that the system-definition surface can be a short startup guide or skill that teaches agents when to claim, check inboxes, share state, and verify writes. Commonplace workshops could use similar scoped instructions without promoting the bucket objects into library notes.
Do not borrow the claim implementation as a lock. The current check-then-put claim works as a courtesy protocol, not a distributed mutual-exclusion primitive. If commonplace ever needs object-store task claims, it should require conditional writes, leases, expirations, or an external coordinator.
Treat shared memory records as volatile facts. A memory set value is useful coordination context, but it should not outrank sourced notes or instructions. Borrowing this pattern requires an explicit retirement or promotion boundary.
Curiosity Pass
Tracecraft's most interesting design choice is that it takes "filesystem coordination" and moves it to S3 without adding a service. That makes adoption easy: any agent that can run a CLI can participate, and any S3 browser can inspect the state.
The term "memory" is overloaded here. The implemented memory command stores shared current-state records; it does not perform semantic retrieval, consolidation, reflection, or durable learning. In commonplace vocabulary, most Tracecraft memory objects are knowledge artifacts with weak evidence authority, not system-definition artifacts.
The architecture docs and CLAUDE.md describe a more ambitious system with CAS, locks, runs, experiment tracking, replay, Redis, PostgreSQL, WebSockets, and service APIs. The reviewed commit's CLI is simpler and more coherent than that plan: it is a small coordination protocol over object storage. The review should follow the code, not the blueprint.
The biggest operational risk is silent convention drift. Because the bucket has no schema validator, manifest, status lifecycle, or authority metadata, two agent groups can use the same prefixes with different meanings. That is acceptable for short-lived projects and risky for durable knowledge.
What to Watch
- Whether
claimgains real object-store atomicity through conditional put, ETags, version checks, or leases. - Whether agent heartbeats become an active update loop with expiry/cleanup semantics rather than init-time timestamps.
- Whether
runs/experiment tracking and session replay become implemented CLI/API surfaces. - Whether artifacts gain manifests, checksums, dependency metadata, or lineage back to producing steps.
- Whether the server grows beyond health/status into actual coordination APIs, WebSockets, CAS memory, and registry services.
- Whether Tracecraft adds any trace-derived distillation path from messages, steps, artifacts, or runs into durable rules, skills, summaries, or model updates.
Bottom Line
Tracecraft is a clean coordination-by-convention layer over S3-compatible storage. Its coordination JSON objects, shared memory records, messages, task claims, handoffs, and artifact blobs are useful retained state for active multi-agent work, but their behavioral authority comes from the CLI and agent instructions that tell participants to honor them. It should be compared with commonplace's workshop layer, not with the durable KB library: Tracecraft helps agents avoid collision and exchange state; it does not yet learn durable knowledge from traces or promote coordination records into governed system-definition artifacts.
Relevant Notes:
- A functioning KB needs a workshop layer, not just a library - compares-with: Tracecraft is a coordination substrate for active work rather than an accumulating library.
- Knowledge artifact - defined-in: Tracecraft memory records, messages, handoffs, and artifacts usually advise or evidence later work.
- System-definition artifact - defined-in: the CLI, path layout, and agent instruction examples have instruction and coordination authority.
- Behavioral authority - grounds: Tracecraft objects matter by who reads them and whether they advise, coordinate, instruct, or block.
- Storage substrate - exemplifies: Tracecraft makes S3-compatible object storage the primary persistence layer.