nao
Type: ../types/agent-memory-system-review.md · Status: current · Tags: trace-derived
nao, from nao Labs, is an open-source analytics-agent framework. The inspected repository combines a Python nao-core CLI for creating and synchronizing analytics context projects with a TypeScript/Bun chat backend and React frontend for asking questions over that context. Its memory system is not one thing: project context lives as files, generated database documentation, rules, skills, and templates; chat state and extracted user memories live in application databases; runtime activation happens through system prompts, mentions, tools, MCP endpoints, and story artifacts.
Repository: https://github.com/getnao/nao
Reviewed commit: d0384a0c069121e6da87bb07e26c4795e7559098
Last checked: 2026-05-16
Core Ideas
A nao project is a file-system context bundle. nao init creates databases/, queries/, docs/, semantics/, repos/, agent/tools/, agent/mcps/, agent/skills/, tests/, RULES.md, and .naoignore, then saves nao_config.yaml (cli/nao_core/commands/init.py). That folder is the project-level storage substrate for raw and compiled context: database metadata, synchronized repos, docs, user-authored rules, skills, tests, and rendered templates.
Sync turns external sources into local context files. nao sync selects providers, runs each provider into its default output directory, then renders project .j2 templates with a nao Jinja context object (cli/nao_core/commands/sync/init.py, cli/nao_core/templates/render.py). Database sync writes markdown files under databases/type=.../database=.../schema=.../table=.../, optionally using query history to generate how_to_use context and profiling refresh policy to decide when expensive profiling should be recomputed (cli/nao_core/commands/sync/providers/databases/provider.py). Repo sync clones or copies configured repositories into repos/ and strips .git/, making source files ordinary context rather than nested checkout state (cli/nao_core/commands/sync/providers/repositories/provider.py).
Runtime activation is prompt assembly plus tools, not background retrieval over one memory store. The backend builds model messages by adding story mode, skill content, citation context, database mention context, the last compaction summary, resolved images, user memories, RULES.md, current database connections, available skills, and provider-specific wrappers before converting UI messages to model messages (apps/backend/src/services/agent.ts, apps/backend/src/components/ai/system-prompt.tsx). Project files are activated through grep, list, read, search, execute_sql, read_query_result, chart, story, clarification, MCP, and optional Python/sandbox tools (apps/backend/src/agents/tools/index.ts).
Rules, skills, and mentions are stronger behavior-shaping surfaces than generic context. RULES.md is read directly from the project folder and inserted into the system prompt as "User Rules"; database @ mentions append table columns.md content to the last user message; / skill mentions replace the last user message text with the selected skill file content (apps/backend/src/agents/user-rules.ts, apps/backend/src/services/agent.ts). Skills are markdown files in agent/skills/, parsed for frontmatter name and description, watched for changes, listed through tRPC, and loaded only when mentioned (apps/backend/src/services/skill.ts, apps/backend/src/trpc/skill.routes.ts).
Chat, stories, compaction, and query results are application-state artifacts. Chats, messages, message parts, feedback, stories, story versions, story data cache, memories, and LLM inference rows are modeled in both SQLite and Postgres schemas (apps/backend/src/db/sqlite-schema.ts). Story tool calls create or version durable story documents tied to a chat; before a later model call, only the latest tool occurrence carries full story code, with earlier occurrences marked stale (apps/backend/src/agents/tools/story.ts, apps/backend/src/services/agent.ts). Compaction exists as a service that summarizes older messages into a <conversation-summary> marker and records LLM inference usage, but the call from prepareStep is commented out at this commit; only previously stored compaction parts are rehydrated (apps/backend/src/services/compaction.ts, apps/backend/src/agents/compaction/compaction-llm.ts).
User memory extraction is implemented as a background chat-derived loop. After sending a request to the agent, AgentManager schedules memory extraction over recent UI messages for the chat's user and project (apps/backend/src/services/agent.ts). The extractor passes recent user/assistant messages plus existing memories to an LLM, asks for user_instructions and user_profile, persists results as global_rule or personal_fact, and can supersede older memories by ID (apps/backend/src/agents/memory/memory-extractor-llm.ts, apps/backend/src/types/memory.ts, apps/backend/src/services/memory.ts, apps/backend/src/queries/memory.ts).
Evaluation is analytics-specific and answer-checking oriented. CLI tests are YAML cases with a natural-language prompt and expected SQL; the runner calls the backend test endpoint, runs the agent, executes the expected SQL, asks an LLM to extract structured answer rows from the agent response, and compares dataframes with normalization and numeric tolerance (cli/nao_core/commands/test/case.py, cli/nao_core/commands/test/runner.py, apps/backend/src/services/test-agent.service.ts, apps/backend/src/routes/test.ts). This evaluates downstream behavior of a configured project, not the intrinsic quality of individual context files or memories.
Comparison with Our System
| Dimension | nao | Commonplace |
|---|---|---|
| Primary purpose | Deployable analytics agent over a project context folder and app database | Agent-operated methodology KB with durable notes, instructions, reviews, validation, and indexes |
| Storage substrate | Project filesystem, generated markdown, cloned/copied repos, SQLite/Postgres app rows, Docker/git context source | Git-tracked markdown, schemas, source snapshots, generated indexes, review outputs, scripts |
| Raw sources | Warehouses, repos, Notion pages, project docs, chat transcripts, query results, stories | Source snapshots, authored notes, instructions, review artifacts, git history |
| Compiled project files | Database docs, profiling files, query-history-derived how_to_use, rendered .j2 outputs, copied repos |
Indexes, type-guided notes, source-derived reviews, validation reports |
| Runtime/index/tool surfaces | Prompt assembly, @ mentions, / skills, file tools, SQL tools, story tool, MCP tools, frontend settings |
rg, directory indexes, authored links, skills, validation commands, review workflows |
| Behavior-shaping artifacts | RULES.md, skill markdown, system prompt sections, memories, agent settings, tests, tools |
Notes, instructions, skills, schemas, ADRs, validation, review gates |
| Lineage | Sync procedure and directory conventions; memory rows retain chatId but not source message IDs |
Source-pinned citations, statuses, archive/replacement history, validation and review gates |
| Activation | Automatic prompt injection plus explicit mentions and tools before each agent call | Agent navigation and skill loading through repo-native conventions and commands |
nao is closer to commonplace than many analytics agents because it treats context as inspectable files. Database schema context, synced repos, rules, skills, templates, tests, and stories are ordinary artifacts around a project, so the agent can read and search them with simple tools. That resembles commonplace's bias toward repo-visible retained artifacts.
The divergence is authority and lifecycle. In commonplace, a note, instruction, skill, type spec, index, and review have different artifact contracts and validation expectations. In nao, project context files and app rows are mostly separated by path, schema, and prompt location. RULES.md, selected skill files, extracted global rules, and the system prompt are system-definition artifacts because they instruct the agent. Database docs, synced repos, stories, query outputs, personal facts, and prior chat messages are knowledge artifacts when consumed as evidence or context. But the system does not require source citations, review state, confidence, or invalidation metadata before a memory or generated doc becomes prompt-visible.
nao's strongest design move is activation. It does not merely store analytics context; it routes it into the next model call through explicit prompt construction, mentions, tools, and UI/MCP workflows. Commonplace is stronger where retained artifacts need long-term governance, semantic review, and fine-grained lineage.
Read-back: both — agents pull project context through tools and mentions, while prompt assembly injects rules, memories, skills, and selected context.
Borrowable Ideas
Treat analytics context as a project folder first. Ready to borrow as a deployment pattern. nao's databases/type=.../database=.../schema=.../table=... convention gives agents a cheap lexical navigation layer before they query live data.
Use mentions as precise activation handles. Ready to borrow for agent-facing KB interfaces. @ table mentions and / skill mentions turn user intent into specific context injection, avoiding an all-context prompt.
Separate raw sources, compiled context, runtime tools, and database memories. Ready as vocabulary. nao works best when its surfaces are not collapsed into one "memory" bucket: warehouse metadata, generated markdown, RULES.md, skill files, chat rows, memory rows, and story versions all have different authority.
Evaluate the assembled agent, not only stored context. Useful with limits. nao's test runner measures whether the configured agent can answer questions correctly against expected SQL results. Commonplace could use analogous task-level evals, but should keep artifact-level validation for notes, instructions, and reviews.
Do not borrow automatic user-rule promotion without stronger review. The implemented extractor is careful about permanence signals and supersession, but a generated global_rule becomes prompt-visible system-definition content. That is acceptable for personal assistant preferences; it is too weak for durable methodology instructions unless paired with source-message lineage, review, and authority controls.
Trace-derived learning placement
Trace source. nao qualifies as trace-derived learning. The direct trace source is recent user/assistant chat messages in the UI or MCP-created chat flow; the broader application also retains chat messages, message parts, tool calls, feedback, query results, story versions, and LLM inference rows in app storage (apps/backend/src/db/sqlite-schema.ts, apps/backend/src/mcp/tools/agent.ts).
Extraction. The implemented extraction loop runs after an agent request is sent. It uses an annotation model over the last 17 user/assistant messages, truncates message text, appends existing memories in XML-like tags, and asks for new user_instructions and user_profile objects. The oracle is the extractor LLM constrained by a structured schema and prompt rules that default to no extraction, require permanence signals for instructions, and allow supersession by existing memory ID (apps/backend/src/agents/memory/memory-extractor-llm.ts, apps/backend/src/components/ai/memory-system-prompt.tsx, apps/backend/src/components/ai/memory-user-prompt.tsx).
Storage substrate. Raw chat traces persist in chat, message, and message-part tables. Distilled user memories persist in the memories table with userId, content, category, timestamps, chatId, and supersededBy fields; SQLite and Postgres schemas are both present. LLM extraction usage persists as llm_inference rows, while user-facing memory settings and edit/delete controls live in backend routes and frontend settings UI (apps/backend/src/db/sqlite-schema.ts, apps/backend/src/trpc/memory.routes.ts, apps/frontend/src/components/settings/memories.tsx).
Representational form. Raw traces are mixed: prose messages, structured message parts, tool calls, tool outputs, charts, stories, SQL results, and feedback. Distilled memories are prose strings with symbolic categories (global_rule, personal_fact), supersession links, and database fields. Runtime selection is symbolic and token-budgeted: active memories are ordered by category priority and included until the memory token limit is reached (apps/backend/src/components/ai/system-prompt.tsx).
Lineage. Memory rows carry chatId, timestamps, category, and supersession state, so they retain coarse source-session lineage and replacement state. They do not carry source message IDs, extractor prompt/model version, confidence, direct source excerpts, or a regeneration policy. The system records inference usage separately, but that does not make each memory auditable back to the exact trace span that justified it.
Behavioral authority. Personal facts are knowledge artifacts when injected as "User Profile": they inform later answers. global_rule memories are system-definition artifacts because the system prompt presents them under "Global User Rules" and tells the agent they were established in previous conversations. RULES.md, skill content selected by /, database context selected by @, and agent settings also act as system-definition artifacts through prompt assembly. Raw chat rows, stories, query results, and synced context files are knowledge artifacts until a tool or prompt path gives them instruction, routing, evaluation, or execution force.
Scope. Memory scope is user-level with project-level enablement. safeGetUserMemories filters active memories for a user and can exclude the current chat, while project settings can disable memory. Project files, rules, synced databases, and skills are project-scoped rather than user-scoped (apps/backend/src/queries/memory.ts).
Timing. Extraction is online and backgrounded after an agent request. Activation is online before each model call. Project context sync is an explicit CLI operation, while deployment can mount a local project folder or use git-backed context refresh in Docker configuration (Dockerfile, docker-compose.yml).
Survey placement. On the trace-derived learning survey, nao is an online chat-to-user-memory system embedded in an analytics-agent product. It strengthens the survey split between knowledge artifacts and system-definition artifacts: the same chat trace can produce personal facts that advise later answers and global rules that instruct later behavior. It also splits "file context" from "trace-derived memory": most project context is synchronized or authored, while user memory is distilled from conversational traces.
Curiosity Pass
The most commonplace-like part is not the memory table. It is the project folder: generated table docs, copied repos, skills, rules, tests, and templates are inspectable retained artifacts. The memory table is the personalized layer on top.
The system has several activation paths with different authority. A table @ mention adds evidence to the user message; a / skill replaces the last user message with skill content; RULES.md and memory rows enter the system prompt; SQL tools execute against live data. These are not equivalent context loads.
Compaction should be described cautiously. The service and prompt are implemented, and stored compaction parts are reused, but automatic compaction before generation is commented out in prepareStep at the reviewed commit. It is a present mechanism with an inactive trigger, not a fully active memory lifecycle.
Tests evaluate behavior but weakly govern artifacts. A passing nao test result says the assembled agent produced the expected answer for examples. It does not validate whether a generated database doc, user memory, or rule is true, current, or properly sourced.
The user-memory loop is more governed than a naive summarizer, but less governed than a KB. It has opt-outs, categories, supersession, editing, deletion, and conservative extraction prompts. It still lacks exact source trace lineage and review before instruction-channel activation.
What to Watch
- Whether memory rows gain source message IDs, confidence, extractor version, or explicit audit trails.
- Whether automatic compaction is re-enabled and how compaction parts are persisted, surfaced, and invalidated.
- Whether project sync records source revisions for repos, database snapshot times, template versions, and query-history derivations.
- Whether skills move from flat markdown files toward versioned, validated, or permissioned system-definition artifacts.
- Whether test results become durable project artifacts that can track regressions across context revisions.
- Whether Docker git-context refresh becomes a first-class context lifecycle with conflict, review, and rollback semantics.
Bottom Line
nao is best read as an analytics-agent product with a file-system context builder and an application-state memory layer. Its project folder shows a practical route for agent-readable analytics context, and its chat loop implements real trace-derived user memory extraction. Commonplace should borrow the file-first project context, explicit mention activation, and task-level eval pattern, while keeping stronger distinctions between raw evidence, compiled context, database memory rows, runtime tools, and authority-bearing instructions.
Relevant Notes:
- Trace-derived learning techniques in related systems - extends: nao distills chat traces into user profile facts and global user rules while separately synchronizing project context from external sources.
- Axes of artifact analysis - exemplifies: nao's context files, memory rows, prompt sections, tools, tests, and stories need separate substrate, form, lineage, and authority labels.
- Knowledge artifact - distinguishes: database docs, synced repos, stories, personal facts, and query results advise or evidence later answers.
- System-definition artifact - distinguishes:
RULES.md, selected skill files, global-rule memories, tools, and prompt sections instruct, route, execute, or evaluate behavior. - Activate behavior-changing memory - exemplifies: nao uses prompt assembly, mentions, and tools to activate retained context before action.