Thalo

Type: ../types/agent-memory-system-review.md · Status: current

Thalo is rejot-dev's "Thought And Lore Language": a custom plain-text language and toolchain for structured personal knowledge. The repository implements a Tree-sitter grammar, TypeScript semantic model, schema checker, CLI, LSP, VSCode extension, Prettier plugin, VFS layer, merge driver, and synthesis automation around .thalo files and fenced thalo blocks in Markdown. Its design bet is close to commonplace's: keep the storage substrate in readable, git-versioned files, then add enough symbolic structure that agents and tools can validate, query, and regenerate higher-level views.

Repository: https://github.com/rejot-dev/thalo

Reviewed commit: cdb9aae983e6bc0b75eff1606bc99b088c3aebff

Last checked: 2026-05-16

Core Ideas

The language is compiler-shaped, not convention-shaped. The grammar has a unified entry structure with schema entries (define-entity, alter-entity) and data entries (create, update, define-synthesis, actualize-synthesis), plus tokens for timestamps, links, tags, typed metadata values, schema blocks, Markdown-ish sections, comments, and embedded content (packages/grammar/grammar.js). The docs present the same model as timestamped entries with metadata, sections, ^link IDs, #tags, entity definitions, and syntheses (apps/docs/content/docs/syntax.mdx, apps/docs/content/docs/entities.mdx). Source .thalo knowledge files are therefore mixed retained artifacts: instance entries are knowledge artifacts when read as evidence or context, while schema entries become system-definition artifacts when the checker, query engine, LSP, or actualizer consumes them with validation, completion, routing, or selection force.

Entity schemas are authored inside the knowledge base. define-entity entries declare fields, types, defaults, and required or optional sections; alter-entity entries can add or remove fields and sections. The schema registry resolves a schema by starting from the defining entry and applying later alters in timestamp order (packages/thalo/src/schema/registry.ts). The CLI's initializer seeds journal, opinion, reference, lore, and me schemas plus a starter reference file (apps/thalo-cli/src/commands/init.ts). This puts type evolution in the same plain-file/git substrate as ordinary knowledge, rather than in an external database migration or hidden service config.

The semantic layer compiles files into indexes, not a new canonical store. Document owns source text and Tree-sitter blocks, including fenced thalo blocks in Markdown, with source maps back to the containing file (packages/thalo/src/model/document.ts). Workspace aggregates documents, semantic models, the schema registry, a link index, and dependency maps for incremental invalidation (packages/thalo/src/model/workspace.ts). The analyzer builds per-document link definitions, link references, and schema-entry lists (packages/thalo/src/semantic/analyzer.ts). These semantic indexes are derived system-definition artifacts while the process is running; their lineage is the source files plus parser version, and their behavioral authority is validation, navigation, completion, and query selection.

Validation is the main agent feedback loop. The checker collects syntax errors, builds a workspace index, runs all active rule visitors, and supports incremental checking by rule scope and dependency declarations (packages/thalo/src/checker/check.ts). The rule registry covers unknown entities, missing required fields or sections, invalid field types, unresolved or duplicate links, schema-definition errors, duplicate metadata, empty required values, date-range validity, duplicate headings, empty sections, update/create ordering, timestamps, titles, and synthesis-specific checks (packages/thalo/src/checker/rules/rules.ts). thalo check exposes this as default, JSON, compact, and GitHub-annotation output with severity and rule overrides (apps/thalo-cli/src/commands/check.ts). Validation outputs are derived artifacts, not source knowledge, but they have enforcement authority in CI or agent loops that refuse to proceed on errors.

Synthesis is a saved query plus prompt, with actualization tracked by checkpoints. define-synthesis entries store source queries and a # Prompt section; actualize-synthesis entries record checkpoints for later runs. The synthesis service extracts sources and prompt text, finds the latest actualization for a target, and returns the raw changed entries needed for prompt construction (packages/thalo/src/services/synthesis.ts, packages/thalo/src/commands/actualize.ts). Git tracking compares current entries against a stored commit marker, guards against uncommitted source files unless forced, handles missing markers by returning all matches, and can honor .git-blame-ignore-revs to suppress formatting-only changes (packages/thalo/src/services/change-tracker/git-tracker.ts). thalo actualize prints prompts, changed entries, and instructions; it does not call an LLM or write the synthesis itself (apps/thalo-cli/src/commands/actualize.ts). The GitHub Action adds an automation shell around that abstraction: it passes synthesis JSON to a user command, commits resulting file changes, and opens or updates a PR (packages/thalo-action/src/action.ts, packages/thalo-action/action.yml).

Tooling is a whole language ecosystem. The CLI registers init, check, format, actualize, query, rules, lsp, and merge-driver commands (apps/thalo-cli/src/mod.ts). The LSP loads all .thalo and .md files from workspace folders and serves diagnostics, definitions, references, hover, completions, semantic tokens, and file-operation updates (packages/thalo-lsp/src/server.ts, packages/thalo-lsp/src/capabilities.ts). The VSCode extension starts the language server through thalo lsp and delegates formatting to thalo format --stdin (packages/thalo-vscode/src/mod.ts). The Prettier parser falls back from native Tree-sitter to WASM and preserves source unchanged when parse errors are present, while the printer normalizes entries, schema blocks, metadata, comments, headings, and prose wrapping (packages/thalo-prettier/src/parser.ts, packages/thalo-prettier/src/printer.ts). The VFS loader makes the same workspace model usable over Node, in-memory, browser, or custom filesystems, with default .thalo and .md discovery and hidden-directory/node_modules ignores (packages/thalo/src/vfs/loader.ts).

Comparison with Our System

Dimension Thalo Commonplace
Canonical substrate Plain .thalo files and Markdown fences, usually in git Markdown files in typed KB collections, also git-first
Structure mechanism Custom Tree-sitter language with entity schemas and typed metadata YAML frontmatter, type specs, collection conventions, validators, and skills
Validation authority Parser, schema registry, checker rules, LSP diagnostics, CI/GitHub outputs Type schemas, collection rules, validation scripts, review bundles, generated indexes
Derived indexes In-memory semantic models, link indexes, schema registry, query results Directory indexes, curated indexes, reports, review artifacts, lexical search surfaces
Synthesis model Saved query + prompt + checkpoint; CLI/action hands changed entries to an external generator Mostly manual/skill-mediated synthesis into notes, reviews, sources, ADRs, and indexes
Trace-derived learning Not implemented as trace mining Explicit review axis; trace-derived status requires source traces and durable distilled artifacts

Thalo is the stronger compiler-formalization of the same filesystem-first instinct. It makes syntax, field types, required sections, links, and synthesis definitions machine-readable from the beginning. Commonplace keeps more of that structure in Markdown conventions, type-spec docs, Python validators, and agent instructions. That makes commonplace easier to evolve by writing ordinary notes, while Thalo gives editors and agents a tighter contract earlier.

The artifact authority split is unusually clear in Thalo. Source .thalo instance entries are knowledge artifacts when queried, read, or passed into a synthesis prompt. Entity schemas, grammar, checker rules, LSP completions, formatter behavior, and merge rules are system-definition artifacts because tooling consumes them to validate, configure, route, or transform behavior. define-synthesis entries are also system-definition artifacts when actualize or the GitHub Action uses them to select source entries and instruct a generator. The generated synthesis content, once committed, is a derived knowledge artifact whose lineage should point back to the source queries, prompt, included entries, and checkpoint.

The main divergence is lifecycle breadth. Thalo validates local structure well, but it does not currently model source status, review status, semantic freshness, replacement archives, link label semantics, or promotion from raw sources into more authoritative artifacts. Commonplace is heavier in exactly those governance paths. Thalo's actualize is closer to a reusable map-reduce prompt harness than to a complete knowledge lifecycle: it can select changed source entries and package a prompt, but the LLM call, write-back policy, review policy, and invalidation semantics live outside the core command.

Trace-derived status should not be assigned. The reviewed code supports agents and humans writing entries, validates those entries, tracks changed source files through git/timestamps, and can automate synthesis updates. It does not mine assistant conversations, tool traces, session logs, rollouts, or action trajectories into durable notes, rules, schemas, weights, rankings, or policies. If a user manually converts a conversation into .thalo entries, those entries are ordinary source knowledge artifacts; the repository does not implement the trace-to-artifact extraction loop.

Read-back: pull — agents deliberately query, validate, navigate, or actualize .thalo source entries through the CLI, LSP, or action.

Borrowable Ideas

Saved query + prompt + checkpoint as a first-class artifact. A commonplace analogue could define repeatable syntheses over reviews, source notes, or working sets, then generate prompts only for changed inputs. This is promising, but should wait for a concrete recurring synthesis workflow so we do not add a DSL-shaped feature prematurely.

Rule metadata for incremental validation. Thalo's rule objects declare category, default severity, scope, and dependencies. Commonplace validation could borrow that if review and fix gates need faster targeted reruns or clearer warning taxonomy.

Editor tooling from the semantic model. The LSP demonstrates the payoff of a real semantic layer: definitions, references, tags, completions, diagnostics, and semantic tokens share the same workspace model. Commonplace should not copy the language, but a future link/type index could feed editor affordances in the same way.

Formatter-aware lineage. The git change tracker's support for .git-blame-ignore-revs is a small but strong idea: generated or formatting-only churn should not force downstream synthesis. Commonplace review and index workflows could use the same principle when deciding whether a source artifact meaningfully changed.

Plain-file schemas as local system-definition artifacts. Thalo's define-entity and alter-entity entries make schemas inspectable and versioned beside knowledge. Commonplace already has type specs, but Thalo is a useful reminder that type evolution itself can be written as a knowledge artifact with validation authority.

Merge-driver support for structured knowledge. Thalo's merge driver parses base/ours/theirs, matches entries, detects conflicts, and builds merged content (packages/thalo/src/merge/driver.ts). If commonplace gains more machine-structured note bodies, semantic merge may matter more than line-based conflict handling.

Curiosity Pass

The "actualize" name sounds more complete than the core command is. In implementation, actualization packages prompt input and instructions; generation and write-back happen through a human, script, or the separate GitHub Action. That is a reasonable boundary, but reviews should not describe core Thalo as an autonomous synthesis writer.

The source checkout contains a Thalo skill whose quick-start claims outrun thalo init. The skill says initialization creates entities.thalo, AGENTS.md, and personal-bio.md, but the CLI initializer at this commit creates entities.thalo and references.thalo only (skills/thalo/SKILL.md, apps/thalo-cli/src/commands/init.ts). Treat the code as authoritative.

The docs and language are still settling. The public examples and thalo-test/ fixtures show a coherent model of opinions, lore, journals, references, and syntheses (apps/docs/content/docs/examples.mdx, thalo-test/entities.thalo, thalo-test/syntheses.thalo). Some wording differs across docs, skill text, and implementation details, so code-grounded reviews should cite exact source files rather than relying on site copy alone.

The system is strong on local correctness, weaker on epistemic governance. It can say an opinion has required fields and sections, that a link resolves, or that a synthesis has sources and a prompt. It does not by itself know whether the claim is well supported, whether a reference was actually processed, whether a synthesis is faithful, or whether an old entry has been superseded except where the user models those concerns in custom schemas.

The storage substrate is intentionally boring. There is no hidden database, vector store, embedding cache, daemon memory, or model checkpoint in the core knowledge path. That makes Thalo easy to inspect and adopt, but also means retrieval is query/filter/navigation-oriented rather than semantic-ranking-oriented.

What to Watch

  • Whether Thalo adds automatic trace ingestion from assistant sessions, IDE activity, tool logs, or GitHub Actions runs; that would change the trace-derived classification.
  • Whether syntheses gain a stronger generated-output contract: source lineage, prompt versioning, reviewer status, invalidation rules, and required checkpoint updates.
  • Whether entity schemas accumulate governance fields such as source status, supersession, confidence, review state, provenance, or freshness.
  • Whether the GitHub Action becomes the primary synthesis workflow, and whether its force-push/branch-management behavior is acceptable for real KB maintenance.
  • Whether the custom language remains small enough that agents can reliably author it, or whether schema/type complexity starts requiring editor-only workflows.
  • Whether semantic merge, formatter, and LSP behavior stabilize across Markdown-embedded and pure .thalo use cases.

Relevant Notes:

  • Thalo entity types compared to commonplace document types - compares-with: maps Thalo's entity schemas onto commonplace note/type shapes.
  • Storage substrate - defined-in: Thalo's retained state primarily persists as plain files in git, with semantic indexes derived in memory.
  • Knowledge artifact - defined-in: source entries and committed synthesis output are mostly consumed as evidence, reference, context, or advice.
  • System-definition artifact - defined-in: grammar, schemas, checker rules, syntheses, formatter behavior, and LSP/tool configuration have validation, routing, instruction, or transformation force.
  • Behavioral authority - defined-in: Thalo is a good case where the same file can carry knowledge authority for readers and system-definition authority for validators or generators.