Dynamic Cheatsheet

Type: note · Status: current · Tags: related-systems

Dynamic Cheatsheet is a Python framework for test-time learning with an evolving cheatsheet. A model answers a query with the current cheatsheet in context, then a second prompt rewrites that cheatsheet for the next query. The repo also supports retrieval variants that pull similar past examples and optionally synthesize them into a per-query cheatsheet. Built by Mirac Suzgun and collaborators, MIT-style research code rather than a production memory service.

Repository: https://github.com/suzgunmirac/dynamic-cheatsheet

Core Ideas

The learned artifact is one carried-forward cheatsheet string. The primary state is a single cheatsheet text blob passed from one query to the next. In the cumulative mode, each call returns final_cheatsheet, and the benchmark runner feeds that text into the next sample. There is no separate note graph, ledger, or typed memory store in the repo.

Curation is full-document rewrite, not local mutation. The prompts talk about preserving good content, incrementing counts, removing redundancy, and refining the cheatsheet. But the implementation extracts a <cheatsheet>...</cheatsheet> block from the model response and replaces the old cheatsheet wholesale. This is real artifact learning, but the maintenance mechanism is rewrite-and-carry-forward, not operation-based editing.

The strongest implemented distinction is cumulative versus retrieval-shaped memory. The repo exposes several approaches: cumulative cheatsheet growth, retrieval-only, retrieval plus synthesis, and cumulative plus retrieval. The retrieval paths use embeddings and cosine similarity over prior examples, then either inject those examples directly or ask the model to synthesize a custom cheatsheet for the current query.

The trace substrate is benchmark query history, not live sessions. Dynamic Cheatsheet learns from ordered problem attempts in benchmark runs. The raw signal is the current question, the model answer, and optionally previous input-output pairs or retrieved examples. It is not mining assistant tool traces or multi-role execution logs.

Counts exist at the prompt level, not as enforced state transitions. The curator prompts require ** Count: fields on memory items, but the repo does not enforce or independently update those counts in code. The counter policy is delegated to the LLM's rewrite behavior.

Comparison with Our System

Dynamic Cheatsheet is a clear artifact-learning system, but a much looser one than the workshop-memory systems in our KB. It has persistent learning across queries, yet the substrate is one prompt-shaped document rather than a collection of separately addressable artifacts.

Dimension	Dynamic Cheatsheet	Commonplace
Trace source	Ordered benchmark queries, answers, and retrieved prior examples	Human+agent editing traces, notes, links, workshop artifacts
Learned substrate	One evolving cheatsheet text blob	Notes, links, instructions, workshop artifacts
Promotion target	Inspectable text only	Inspectable text only
Update style	Full cheatsheet rewrite each step	Manual curation and targeted file edits
Retrieval model	Optional embeddings + cosine retrieval over prior examples	Agent-driven navigation over linked markdown
Oracle strength	Usually implicit task success within benchmark workflow	Weak, mostly human judgment
Storage model	Files plus embedding cache/results outputs	Files in git

Trace-derived learning placement. On axis 1 of the survey, Dynamic Cheatsheet fits the trajectory-run pattern: it learns from repeated problem attempts, not from one live session stream. On axis 2, it is clearly trace-derived artifact-learning: the learned result is an inspectable cheatsheet, never weights. It should be added to the survey, but with a caveat: it is closer to prompt-state accumulation than to explicit memory-system curation.

Compared with Pi Self-Learning, Dynamic Cheatsheet uses a much broader freeform artifact and much weaker structural constraints. Compared with trajectory-informed memory generation, it is more online and cumulative, but less explicit about extraction categories and lifecycle.

Borrowable Ideas

Carry-forward artifact as the minimum learning loop. Ready now as a framing. Dynamic Cheatsheet shows the minimal closed loop for artifact learning: current artifact in, revised artifact out, feed it forward. That is useful as a lower bound against which richer workshop systems can be judged.

Retrieval-plus-synthesis as a hybrid mode. Needs a use case first. The retrieval-synthesis variants are a concrete pattern for combining global accumulated memory with task-local recalled examples.

Example-conditioned artifact rewrite. Needs a use case first. The retrieval-synthesis prompt is a plausible mechanism for rewriting a local playbook from a small bank of nearby precedents, rather than from the full history.

Curiosity Pass

The repo's strongest idea is not "adaptive memory" in the abstract. It is the explicit comparison between several cheap memory-update strategies: keep one growing cheatsheet, retrieve similar examples, synthesize a per-query cheatsheet, or combine both. That makes the system useful as an ablation reference for artifact-learning loops.

The weaker part is maintenance fidelity. The prompts ask the model to preserve, compress, count, and improve the cheatsheet, but the code does not enforce those invariants. Because the state update is "extract <cheatsheet> and replace the old one," the whole lifecycle depends on the rewrite quality of one model call.

So Dynamic Cheatsheet is genuinely more than retrieval, but less than a structured memory system. It is best understood as test-time artifact learning through prompt-mediated state carryover.

The follow-up paper Ingest: Large Language Model Agents Are Not Always Faithful Self-Evolvers evaluates Dynamic Cheatsheet directly and makes this maintenance-risk more concrete: agents in that setup often remain weakly sensitive to interventions on condensed experience. In other words, carrying forward a cheatsheet can improve the system while still failing to make the cheatsheet strongly behaviorally binding.

What to Watch

Whether later versions move from full cheatsheet rewrite to operation-level maintenance
Whether the retrieval variants outperform cumulative-only mode outside narrow benchmark settings
Whether the repo grows any explicit memory schema stronger than freeform XML-like blocks
Whether this line of work converges toward structured workshop artifacts or stays at prompt-state level

Relevant Notes:

trace-derived learning techniques in related systems — extends: Dynamic Cheatsheet is a clean additional artifact-learning case built from repeated task attempts and prompt-carried memory
Pi Self-Learning — contrasts: both reinject textual learnings, but Pi Self-Learning uses narrow schemas and explicit scoring while Dynamic Cheatsheet relies on freeform cheatsheet rewrites
Autocontext — contrasts: both learn across runs into artifacts, but Autocontext has richer orchestration and optional weight export, while Dynamic Cheatsheet stays at prompt-state artifacts
memory management policy is learnable but oracle-dependent — sharpens: Dynamic Cheatsheet is the artifact-side counterpart to weight-learning systems that also depend on strong recurring task signals
deploy-time learning — sharpens: this system sits squarely in the deploy-time artifact-update space, using persistent prompt-state rather than weights
Ingest: Large Language Model Agents Are Not Always Faithful Self-Evolvers — evidence: evaluates Dynamic Cheatsheet directly and finds condensed cheatsheet-style experience has limited causal influence under intervention

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search