nao

Type: ../types/agent-memory-system-review.md · Status: current · Tags: related-systems, trace-derived

nao is an open-source analytics-agent framework from nao Labs. It is not a general memory substrate first; it is a deployed data-analysis agent whose context system turns databases, repositories, docs, rules, skills, and user preferences into runtime context for a chat UI. The repo's strongest memory-relevant design is the split between a file-shaped project context for analytical work and a database-backed user memory subsystem that extracts durable facts and instructions from chat traces. The repository is https://github.com/getnao/nao.

Repository: https://github.com/getnao/nao Reviewed commit: https://github.com/getnao/nao/commit/a03767cfb779144f55ee540d99266422553034e9

Core Ideas

The primary analytical context is a project folder, not a memory database. The README frames nao as a way to create an analytics-agent context with nao-core and then deploy a UI for business users to chat with that agent (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/README.md#L30-L34). The CLI initializes a folder with nao_config.yaml, .naoignore, RULES.md, databases/, queries/, docs/, semantics/, repos/, agent/tools, agent/mcps, agent/skills, and tests/ (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/cli/nao_core/commands/init.py#L76-L96). That makes the file tree the main authored context surface, even though the deployed application stores conversations and memories elsewhere.

nao sync is a context compiler over external sources. The sync command resolves active providers, runs each provider, then renders user Jinja templates after provider sync (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/cli/nao_core/commands/sync/init.py#L59-L153). The database provider connects to configured warehouses, lists schemas and tables, and writes selected template outputs such as table docs under databases/type=.../database=.../schema=.../table=.../ (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/cli/nao_core/commands/sync/providers/databases/provider.py#L103-L241). Repository sync clones or pulls configured repos, and local-path repos can be copied through include/exclude filters (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/cli/nao_core/commands/sync/providers/repositories/provider.py#L20-L182). Notion sync exports configured pages to markdown under docs/notion (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/cli/nao_core/commands/sync/providers/notion/provider.py#L86-L209). The mechanism is closer to a build step for agent context than to a passive document dump.

Runtime activation combines file tools with explicit mentions. The system prompt tells the agent that context is stored as files in the project folder, that database content is available as files to avoid unnecessary direct database querying, and that tables and skills can be mentioned via @ and / triggers (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/components/ai/system-prompt.tsx#L28-L65). At runtime, _buildModelMessages() applies story refresh, skill injection, database-context injection, the last compaction summary if one exists, image resolution, active user memories, RULES.md, connection metadata, and skill metadata before rendering the system prompt (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/services/agent.ts#L317-L360). Skills are loaded from agent/skills/*.md by parsing frontmatter and exposing name, description, and location; full skill content is read when a / mention resolves to that skill, and table column context is injected when @ mentions resolve to tables (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/services/skill.ts#L30-L96, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/services/agent.ts#L549-L580). File access is bounded by project-folder path checks and .naoignore, and search rejects absolute paths or .. patterns (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/utils/tools.ts#L165-L218, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/agents/tools/search.ts#L10-L61).

Persistent user memory is a separate database subsystem. nao's memory service fetches active memories for prompt injection only when user/project memory is enabled, then returns content and category pairs rather than raw memory rows (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/services/memory.ts#L21-L45). The prompt caps visible memories at 1000 estimated tokens and groups them into global_rule and personal_fact sections (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/components/ai/system-prompt.tsx#L22-L25, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/components/ai/system-prompt.tsx#L184-L238). The SQLite schema stores memories as rows keyed to users and optional chats, with category and superseded_by fields (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/db/sqlite-schema.ts#L544-L570). This is a deliberate mixed substrate: files for project context; database rows for user personalization and application state.

The trace-derived memory loop is narrow and conservative. The extractor receives existing memories plus the last 17 user/assistant messages, truncates each text message, and asks an LLM for structured user_instructions and user_profile output (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/agents/memory/memory-extractor-llm.ts#L19-L82, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/types/memory.ts#L7-L61). The extraction prompt explicitly defaults to no memory change, requires strong permanence signals for instructions, treats identity/background facts as profile memory, and uses supersedes_id for replacement (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/components/ai/memory-system-prompt.tsx#L3-L130). Persistence maps instructions to global_rule, profile facts to personal_fact, filters supersession references against existing IDs, inserts new rows, and marks old rows as superseded in a transaction (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/services/memory.ts#L58-L132, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/queries/memory.ts#L52-L103). The loop is real, but it is intentionally not a broad "learn everything from conversation" system.

Evaluation is unusually concrete for a context-engineering app. nao supports YAML tests with a prompt and expected SQL, runs them through the deployed backend, saves results under tests/outputs/, and exposes a results server (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/README.md#L114-L127, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/cli/README.md#L151-L179). The CLI runner compares extracted final-answer dataframes against expected SQL output with normalization, numeric rounding, and row-order-insensitive comparison (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/cli/nao_core/commands/test/runner.py#L69-L171). The backend test route runs the agent without persisting a normal chat, executes the expected SQL through the same SQL tool path, and asks an LLM to extract a structured answer for comparison (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/routes/test.ts#L14-L67, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/services/test-agent.service.ts#L17-L123). This does not directly test memory extraction, but it gives the context system a hard-ish behavioral regression surface.

Comparison with Our System

Dimension nao Commonplace
Primary purpose Deployed analytics agent over business data Agent-operated KB methodology and review corpus
Main authored context Project folder generated and maintained by nao-core Markdown KB notes, instructions, sources, ADRs, and indexes
Memory substrate Mixed: project files plus chat/memory/story/feedback rows in SQLite/Postgres Mostly files in git, with narrow operational SQLite for review state
Activation File tools, SQL tool, @ table mentions, / skill mentions, system prompt blocks Search, descriptions, indexes, backlinks, and procedure-specific loading
Learning loop LLM extraction of durable user instructions/profile facts from recent chat text Human/agent curation, semantic review, validation, and limited workflow-specific trace learning
Evaluation YAML analytics tests with expected SQL and result comparison Structural validation plus semantic review gates over KB artifacts
Distribution Python CLI plus web app backend/frontend/FastAPI services Repo methodology, Python commands, and agent skills

nao is a strong counterexample to any absolutist reading of "files first." It uses files where they give the agent an inspectable, searchable project context, but it keeps user memory, chat transcripts, tool calls, feedback, stories, and inference records in application tables (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/db/sqlite-schema.ts#L234-L342, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/db/sqlite-schema.ts#L544-L600). That split is defensible because the access patterns differ: a data-team context should be editable and versionable as files, while per-user memory and chat state need product semantics like permissions, supersession, and UI editing.

The biggest alignment with commonplace is the project context layer. nao sync turns external systems into readable files, then the agent navigates that tree through bounded tools. That is close to our view that agents need a shaped substrate, not just a search endpoint. The difference is that nao's corpus is application-specific and operational: database schema, previews, query history, repos, Notion docs, skills, and RULES.md. Commonplace's corpus is a cumulative conceptual library where title claims, note types, semantic links, and review lifecycle matter more than deployment-time data freshness.

The biggest divergence is learning target. Commonplace treats durable knowledge changes as authored and reviewable artifacts by default. nao's automatic learning loop is intentionally smaller: extract durable user preferences and profile facts, then inject a token-capped subset into future prompts. It does not mine tool failures into skills, update project docs from chat, or synthesize analytics methodology. That restraint is a design strength for a business-user assistant, but it means nao's trace-derived loop is personalization memory rather than knowledge-base growth.

nao's evaluation layer is stronger than its memory lifecycle. The nao test flow gives data teams a regression surface for whether the agent can answer expected analytical questions. The memory subsystem has a careful extraction prompt and supersession schema, but no equivalent held-out test harness for whether extracted memories are correct, non-invasive, or useful in later turns.

Borrowable Ideas

Context sync as an explicit build step (ready to borrow where source volume warrants it). nao's provider-plus-template sync pipeline is a concrete pattern for turning external systems into agent-readable files before runtime. In commonplace this suggests keeping high-volume source ingestion as an explicit compilation step, not mixing live connector calls into every read path.

Mention-driven context injection beside tool-based navigation (ready with use cases). The @ table and / skill paths show a useful hybrid: broad project files remain available through tools, while user mentions can force-load a specific table schema or skill. Commonplace could use the same pattern for workshop artifacts or review targets when the user names an exact artifact.

Conservative user-memory extraction schema (ready as a reference). The split between instructions and profile facts, the permanence-signal rule, and supersession IDs are a useful minimal schema for personalizing an agent without turning every chat into memory. This is borrowable as a pattern, not as a replacement for KB note curation.

Evaluation cases that execute through the real backend (ready in spirit). The nao test route exercises the deployed agent path and compares against executable SQL. For commonplace, the analogue would be workflow-level tests that run the real note-selection or review commands and assert concrete outcomes, rather than only validating file structure.

Git-sourced deployment context (needs caution). The git context provider can clone a shallow context repo into a deployed target and refresh it from a branch, which is useful for containerized deployments (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/cli/nao_core/context/init.py#L11-L47, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/cli/nao_core/context/git.py#L13-L80). Its refresh path uses git fetch followed by git reset --hard FETCH_HEAD when changes exist (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/cli/nao_core/context/git.py#L118-L165). That is appropriate for an immutable deployment context, but not for an operator workspace with local edits.

Trace-derived learning placement

nao is a service-owned application trace backend on axis 1 of the trace-derived learning survey and a symbolic database-row learner on axis 2.

Trace source. The live chat stream schedules memory extraction immediately after sending the request to the agent, using the current UI messages for that chat (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/services/agent.ts#L229-L277, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/services/agent.ts#L429-L436). The extractor consumes recent user/assistant text parts, not full tool-result state, story content, or post-response assistant output (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/agents/memory/memory-extractor-llm.ts#L60-L79). The stored chat schema itself is richer, with message parts for text, reasoning, tool input/output, approvals, and provider metadata, but the memory extractor narrows that trace to text (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/db/sqlite-schema.ts#L234-L327).

Extraction. An LLM reads the recent text conversation plus existing memories and returns structured user_instructions and user_profile arrays, or null fields when nothing changed (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/agents/memory/memory-extractor-llm.ts#L52-L82). The oracle is the extractor prompt: permanence signals gate future-facing instructions, profile facts are allowed without trigger words, and uncertainty should produce no extraction (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/components/ai/memory-system-prompt.tsx#L65-L130). There is no separate judge or later outcome check.

Promotion target. Extracted items become database rows in memories, with category values global_rule and personal_fact, optional chat provenance, and supersession pointers (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/types/memory.ts#L7-L23, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/db/sqlite-schema.ts#L544-L570). Active memories are later reloaded into the system prompt in priority order under a 1000-token cap (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/components/ai/system-prompt.tsx#L184-L238).

Scope. Per-user and project-gated. getIsMemoryEnabledForUserAndProject() requires both user memory enablement and project agent settings, and active memory retrieval is by user ID with optional current-chat exclusion (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/queries/memory.ts#L7-L49). The extracted memories are personalization state, not shared project knowledge or cross-agent playbooks.

Timing. Online during deployment. The extraction is scheduled as part of live streaming rather than as an offline batch. It is also asynchronous and failure-tolerant: safeScheduleMemoryExtraction() catches and logs extraction errors without failing the user request (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/services/memory.ts#L47-L56).

Survey placement. nao adds a product-assistant case between OpenViking's broader typed session memory and REM's episodic memory service. It owns the chat schema and memory tables like a service backend, but its extraction target is narrower: durable user instructions and profile facts from conversation text. It strengthens the survey's claim that narrow schemas make extraction concrete while leaving evaluation open; the system has good behavioral analytics tests, but no equivalent hard oracle for memory correctness.

Curiosity Pass

"All context is files" is a system-prompt simplification. For the analytical task, the statement is useful and mostly true: database docs, repos, Notion pages, rules, and skills are exposed as files. At the product level it is not literally true. Chat messages, message parts, feedback, stories, memories, and inference records live in SQLite/Postgres tables. The right reading is "task context is file-shaped," not "nao is file-only."

The memory extractor fires before seeing the assistant's new answer. The stream path schedules memory extraction immediately after the agent request is sent, before the response stream is merged back to the UI. That means the extractor sees the latest user turn and previous conversation, but not the assistant answer being generated in response to that turn. This is probably intentional: it avoids learning from speculative assistant output, but it also means it cannot learn preferences expressed through the user's reaction to that answer until a later turn.

Compaction exists but is not active in the main loop. AgentManager._prepareStep() contains a call to compactConversationIfNeeded(...), but it is commented out; the active path just prunes messages and applies provider cache hints (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/services/agent.ts#L204-L226). _buildModelMessages() can reuse a last compaction summary if one already exists, so the codebase has compaction scaffolding, but the reviewed commit should not be credited with active automatic compaction in normal agent steps.

The feedback loop stops at visibility. Message feedback can be upserted, recent feedback can be queried by admins, and project chat listings aggregate feedback counts and text (https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/trpc/feedback.routes.ts#L10-L53, https://github.com/getnao/nao/blob/a03767cfb779144f55ee540d99266422553034e9/apps/backend/src/queries/feedback.queries.ts#L8-L58). I did not find code that uses feedback to update memories, skills, rules, or tests automatically. It is operational signal for data teams, not an implemented learning loop.

The context project can be overwritten by deployment refresh. The git context provider's hard reset is a reasonable way to keep a container-mounted context in sync with a canonical branch. It would be dangerous as a local editing workflow. That distinction matters because the same file-shaped context can behave either like an editable KB or like a deployed build artifact depending on how it is sourced.

What to Watch

  • Whether automatic compaction is re-enabled in the agent loop, and if so whether it becomes a durable memory layer or just a prompt-budget mechanism.
  • Whether memory extraction starts mining tool results, SQL failures, or feedback, which would move nao from user personalization into operational trace learning.
  • Whether nao test grows memory-specific cases that check extraction and reinjection behavior, not just analytics-answer correctness.
  • Whether sync-generated files gain stronger provenance and lifecycle markers, especially for AI-generated summaries and query-history-derived docs.
  • Whether skills become editable/distributable runtime artifacts or remain markdown files loaded from agent/skills.
  • Whether feedback is connected to rule, skill, or test generation rather than staying dashboard-visible telemetry.

Relevant Notes: