# MemOS Overview (/memos_cloud/getting_started/overview) MemOS is a managed memory platform for AI applications and Agents. After information is written to MemOS, the system automatically produces, recalls, and updates memories, then keeps providing concise and accurate context in later requests. You do not need to repeatedly solve the problem of "how AI remembers." By calling the cloud service APIs, you can add long-term memory to your application. ## What MemOS Gives You - **Long-term continuity for AI**: preserve user facts, preferences, and task progress across sessions, so AI does not start from zero every time. - **Useful memories over time**: continuously extract, deduplicate, update, and correct memories to keep them accurate and effective. - **Lower engineering cost**: use a production memory platform that manages memory production, scheduling, recall, and lifecycle management instead of building the full memory stack yourself. - **Room to extend**: support knowledge bases, Skills, tool memories, multimodal input, and Agent workflows for broader business scenarios. ## Core Workflow ![How MemOS Works](https://cdn.memtensor.com.cn/img/1779432830540_evti9q_compressed.png) ### Add Raw Information Pass user chats, behavior events, knowledge files, images, Skills, and other raw information into MemOS. ### Produce and Update Memories MemOS processes raw information in the background into retrievable memory content, then continuously updates memories, schedules them in real time, corrects them with natural-language feedback, and manages their full lifecycle. ### Retrieve Memories When retrieving memories, MemOS filters and recalls the most relevant memories for model responses, Agent decisions, or business workflows, helping AI maintain continuous understanding in later requests. ## Choose How to Start - [Quick Start](/memos_cloud/getting_started/quick_start): Create your first project from here - [How MemOS Works](/memos_cloud/introduction/mem_production): Understand the memory mechanism before integration - [Cloud Service & Open Source](/memos_cloud/getting_started/cloud_and_opensource): Compare cloud service and open-source self-hosting options - [Use in Agents](/memos_cloud/getting_started/agent_usage): Integrate with OpenClaw, Hermes, or other AI tools --- # Cloud Service & Open Source (/memos_cloud/getting_started/cloud_and_opensource) ## 1. Choose the MemOS Solution That Fits You MemOS provides two memory solutions for AI applications: - **MemOS Cloud Service**: a managed service that simplifies development and daily operation. It is suitable when you want to quickly build and iterate AI applications. - **MemOS Open Source**: a self-hosted solution that you deploy in your own environment. It is suitable when you need deeper customization, secondary development, or full infrastructure control. > Whether you choose the cloud service or the open-source framework, MemOS helps your AI application gain persistent memory. > You can start with the cloud service for a quick trial, then switch to self-hosted deployment when your business needs it. ## 2. Selection Guide ### Choose MemOS Cloud Service - **Fast integration**: enable memory for your AI application in a few minutes, so you can focus on business logic and product features instead of maintaining storage and memory management infrastructure. - **Low-cost validation**: use free trial quotas to validate the solution and product effect before deeper integration. - **Logs and monitoring**: view call logs in the console and inspect the request chain for debugging, monitoring, and optimization. - **Advanced API capabilities**: knowledge bases and continuous dialogue are available through APIs for more flexible customization and integration. ### Choose MemOS Open Source - **Custom configuration**: freely choose LLM providers, inference backends, deployment strategies, and related infrastructure. - **Code extension**: modify the codebase directly, extend features as needed, and contribute improvements back to the community. ## 3. Still Not Sure? - [Try the free platform](/memos_cloud/getting_started/quick_start): register and sign in to [MemOS Cloud](https://memos-dashboard.openmem.net/quickstart) to try the core features. - [Explore the open-source solution](/open_source/getting_started/installation): clone the repository and run MemOS locally. --- # Use in Agents (/memos_cloud/getting_started/agent_usage) In addition to calling the cloud APIs directly, you can connect MemOS to your AI workflow through plugins, MCP, CLI, and other integration methods if you use: - Agent frameworks such as OpenClaw and Hermes. - AI clients such as Cursor, VS Code, Claude Desktop, Cline, and Chatbox. - Any Agent or development environment that can execute shell commands. These integration methods help you save tokens while adding long-term memory to your Agent workflows. ## Quick Integration (Recommended) You can connect MemOS automatically by chatting with your Agent in natural language. Once it is done, you can start chatting right away with no extra setup. ### Use the Plugin MemOS currently provides a cloud plugin deeply integrated with **OpenClaw**. If you use OpenClaw, prefer the plugin integration. Copy the following prompt and paste it into your OpenClaw chat: **View the OpenClaw plugin setup prompt** ```text api-key Help me set up the MemOS OpenClaw plugin. Follow these steps: 1. Verify the API Key Check the value of MEMOS_API_KEY in the commands of step 2: - If it is already a real API Key starting with mpg-, go straight to step 2 - If it is still a placeholder, guide the user to open https://memos-dashboard.openmem.net/quickstart/ to get an API Key, then replace the placeholder in the commands with the real Key 2. Configure the API Key environment variable Detect the current operating system first, then write it accordingly: macOS / Linux: mkdir -p ~/.openclaw echo 'MEMOS_API_KEY=YOUR_API_KEY' >> ~/.openclaw/.env Windows PowerShell: [System.Environment]::SetEnvironmentVariable("MEMOS_API_KEY", "YOUR_API_KEY", "User") 3. Install and enable the plugin openclaw plugins install @memtensor/memos-cloud-openclaw-plugin@latest openclaw gateway restart 4. Verify the installation Read ~/.openclaw/openclaw.json (%USERPROFILE%\.openclaw\openclaw.json on Windows), and confirm that under plugins.entries, memos-cloud-openclaw-plugin has enabled set to true ``` OpenClaw will install the plugin, prompt you to get an API Key, and complete all the configuration. ### Use the MemOS CLI The MemOS CLI provides a more universal way to integrate with Agents and works with any Agent framework that can execute shell commands. Copy the following prompt and send it to your Agent: **View the CLI setup prompt** ```text api-key Help me set up the MemOS CLI development environment. Follow these steps: 1. Install the MemOS CLI globally npm i -g @memtensor/memos-cloud-cli 2. Initialize the CLI configuration memos init --api-key YOUR_API_KEY --agent For step 2, pass the --agent parameter matching the Agent the user is using. If the value of --api-key is already a real Key starting with mpg-, run it directly; if it is still a placeholder, first guide the user to open https://memos-dashboard.openmem.net/ to get an API Key and replace it. ``` Your Agent will install the MemOS CLI and configure the corresponding Skill automatically ([View supported Agents](/mcp_agent/cli/guide#_31-use-with-agents)). ## Manual Configuration ### 1. Before You Start - Register and sign in to the [MemOS Cloud platform](https://memos-dashboard.openmem.net/quickstart). - Get an API Key from the [API Key page](https://memos-dashboard.openmem.net/apikeys). ### 2. Use the Plugin #### Configure the API Key The plugin reads OpenClaw-related environment variables or `.env` files. The minimal configuration is: ```env MEMOS_API_KEY=YOUR_API_KEY ``` You can also write it directly into the OpenClaw environment file: ```bash [macOS / Linux] mkdir -p ~/.openclaw echo 'MEMOS_API_KEY=YOUR_API_KEY' >> ~/.openclaw/.env ``` ```powershell [Windows PowerShell] [System.Environment]::SetEnvironmentVariable("MEMOS_API_KEY", "YOUR_API_KEY", "User") ``` #### Install and enable the plugin ```bash openclaw plugins install @memtensor/memos-cloud-openclaw-plugin@latest openclaw gateway restart ``` > **Tip**: Windows users: if you encounter `Error: spawn EINVAL`, see [OpenClaw Cloud Plugin - Manual Install](/openclaw/guide) for an alternative method. Confirm that the plugin is enabled in `~/.openclaw/openclaw.json`: ```json { "plugins": { "entries": { "memos-cloud-openclaw-plugin": { "enabled": true } } } } ``` #### Start chatting You can now have multi-turn conversations with OpenClaw: - First session: "I prefer using Python." - Second session after restart: "Do you remember which programming language I like?" > **Tip**: The OpenClaw plugin also supports multi-Agent isolation, Config UI, filters, and more detailed configuration. See the [OpenClaw Cloud Plugin](/openclaw/guide) for full configuration. ### 3. Use MCP Mainstream clients that support MCP include **Cursor, Claude Desktop, Cline, VS Code / Trae, and Chatbox**. Taking Cursor as an example, after configuration, Cursor can directly call MemOS memory tools and use memory across clients. #### Add an MCP Server In Cursor, go to: ```text Cursor Settings → Tools & MCP → Add Custom MCP ``` Then add this to `mcp.json`: ```json { "mcpServers": { "memos-api-mcp": { "timeout": 60, "type": "stdio", "command": "npx", "args": [ "-y", "@memtensor/memos-api-mcp@latest" ], "env": { "MEMOS_API_KEY": "YOUR_API_KEY", "MEMOS_USER_ID": "your-user-id", "MEMOS_CHANNEL": "MODELSCOPE" } } } } ``` After configuration, confirm that Cursor's MCP tool list shows tools such as `add_message` and `search_memory`. #### Cursor Rules To make Cursor use memories more reliably, add rules like these to User Rules: ```text Before answering the user's question, call MemOS search_memory to search long-term memories related to the current task. After answering, if this turn contains new user facts, preferences, project background, or other information useful in the long term, call add_message to write it into MemOS. Only use memories relevant to the current task. Ignore memories that are irrelevant, outdated, or about the wrong subject. Do not expose internal implementation details such as "memory store" or "retrieval results" to the user. ``` #### Start chatting - First session: tell it who you are, your hobbies, and your profession, and ask it to remember. - Second session after restart: ask it who you are. > **Tip**: Claude Desktop, Cline, Chatbox, and other clients are configured similarly, though the entry points differ. For more examples, see the [MCP Guide](/mcp_agent/mcp/guide). ### 4. Use CLI + Skill If your Agent framework can execute shell commands (e.g. Cursor, Codex, Claude Code, Hermes), you can use the MemOS CLI to install a memory Skill with one command, enabling your Agent to automatically search and write memories. > **Tip**: Using OpenClaw as an example, in the LOCOMO evaluation, using MemOS CLI alone reduced token usage by about 65.5%; integrating MemOS Cloud + CLI improved accuracy from 66.60% to 77.27%. #### Install the CLI ```bash npm install -g @memtensor/memos-cloud-cli ``` #### Initialize and install Skill ```bash memos init --agent cursor memos init --api-key YOUR_API_KEY --agent cursor ``` `--agent` installs the MemOS memory Skill into the corresponding Agent's skills directory. `--agent` is currently required; if it is omitted, the command fails because the CLI needs to know where to install the Skill. Supported targets: ```bash memos init --agent cursor # ~/.cursor/skills/memos/ memos init --agent codex # ~/.codex/skills/memos/ memos init --agent claude # ~/.claude/skills/memos/ memos init --agent openclaw # ~/.openclaw/skills/memos/ memos init --agent hermes # ~/.hermes/skills/memos/ ``` #### Start chatting Once installed, the Agent will automatically load the Skill. During each conversation turn, the Agent will: 1. **Before answering** — automatically run `memos search` to retrieve long-term memories related to the current task 2. **After answering** — automatically run `memos add` to write new facts, preferences, etc. into MemOS You can verify the same way: - First session: "I prefer using Python." - Second session after restart: "Do you remember which programming language I like?" > **Tip**: The CLI also supports manual memory operations in a terminal (`add`, `search`, `get`, `origin`, `delete`, etc.). If you only use the CLI in a terminal and do not install an Agent Skill, configure the API Key, default user ID, and default conversation ID with `memos config set`. See [MemOS CLI](/mcp_agent/cli/guide) for the full command reference. ### Which Integration Should You Choose? | Integration | Best for | Priority | | --- | --- | --- | | Plugin | OpenClaw and other Agent environments deeply integrated with MemOS | Prefer first; highest automation | | CLI + Skill | Any Agent framework that can execute shell commands | Most portable; works across frameworks | | MCP | Cursor, Claude Desktop, Cline, Chatbox, and other AI clients | Use when the client supports MCP | | API / SDK | Self-built Agents, chatbots, or business applications | Most control; best for production integration | ### Next Steps - [OpenClaw Cloud Plugin](/openclaw/guide): View full installation, enabling, and advanced configuration for the OpenClaw plugin - [MemOS CLI](/mcp_agent/cli/guide): View the full CLI command reference and Skill installation guide - [MCP Guide](/mcp_agent/mcp/guide): Learn how to configure MCP in Cursor, Claude Desktop, Cline, and other clients - [API / SDK](/memos_cloud/getting_started/quick_start): Start here if you are building your own Agent or application --- # Integrate into Your App (/memos_cloud/getting_started/quick_start) ## Use MemOS Skill to Quickly Integrate MemOS Cloud into Your AI App (Recommended) If you are building your AI application with Agent tools such as Claude Code or Cursor, copy the prompt below and send it to your tool: **Expand to view the Skill setup prompt** ```text Help me integrate MemOS Cloud into this project to add long-term memory to my Agent product. Please follow these steps: 1. Install MemOS Skill (skip if already installed): npx skills add https://github.com/MemTensor/MemOS-Cloud-Skill --skill memos-cloud -g -y Auto-fill the --agent argument based on the current Agent environment. 2. Read SKILL.md under the Skill's install path, and strictly follow its instructions in order. 3. Generate complete MemOS Cloud integration code based on this project's actual tech stack and architecture. ``` Your Agent tool will automatically install and use MemOS Skill and integrate MemOS Cloud into your AI application. ## Manual Integration When you integrate MemOS into an AI application, the full flow looks like this. MemOS provides two core APIs: [see API docs](/api_docs/core/add_message). - `addMessage`: send raw conversations to MemOS. MemOS automatically processes and stores them as memories. - `searchMemory`: recall memories in later conversations, so AI responses better match user needs. ![image.svg](https://cdn.memtensor.com.cn/img/1762434889291_h9co0h_compressed.png) ### 1. Before Calling the API - Register and sign in to [MemOS Cloud](https://memos-dashboard.openmem.net/quickstart). - Get an API Key from the [API Key page](https://memos-dashboard.openmem.net/apikeys). - Prepare an environment that can send HTTP requests, such as Python or cURL. ### 2. Create a Memory #### Install the SDK If you choose the Python SDK, make sure Python 3.10+ is installed, then run: ```bash pip install MemoryOS -U ``` #### Set the API Key ```python [Python (HTTP)] import requests API_KEY = "YOUR_API_KEY" BASE_URL = "https://memos.memtensor.cn/api/openmem/v1" ``` ```python [Python (SDK)] from memos.api.client import MemOSClient client = MemOSClient(api_key="YOUR_API_KEY") ``` ```bash [Curl] export MEMOS_API_KEY="YOUR_API_KEY" export MEMOS_BASE_URL="https://memos.memtensor.cn/api/openmem/v1" ``` #### Add Raw Information Session A happened on 2025-06-10. The user chose 7 Days Inn as the hotel for a summer trip to Guangzhou. You only need to pass the raw conversation records to MemOS. ```python [Python (HTTP)] data = { "user_id": "memos_user_123", "conversation_id": "0610", "messages": [ {"role": "user", "content": "I have booked a summer trip to Guangzhou. Which hotel chains are available?"}, {"role": "assistant", "content": "You can consider 7 Days Inn, Ji Hotel, Hilton, and others."}, {"role": "user", "content": "I will choose 7 Days Inn."}, {"role": "assistant", "content": "Got it. Feel free to ask if you have other questions."} ] } res = requests.post( f"{BASE_URL}/add/message", headers={"Authorization": f"Token {API_KEY}"}, json=data ) print(res.json()) ``` ```python [Python (SDK)] messages = [ {"role": "user", "content": "I have booked a summer trip to Guangzhou. Which hotel chains are available?"}, {"role": "assistant", "content": "You can consider 7 Days Inn, Ji Hotel, Hilton, and others."}, {"role": "user", "content": "I will choose 7 Days Inn."}, {"role": "assistant", "content": "Got it. Feel free to ask if you have other questions."} ] res = client.add_message( messages=messages, user_id="memos_user_123", conversation_id="0610" ) print(res) ``` ```bash [Curl] curl "$MEMOS_BASE_URL/add/message" \ -H "Authorization: Token $MEMOS_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "user_id": "memos_user_123", "conversation_id": "0610", "messages": [ {"role": "user", "content": "I have booked a summer trip to Guangzhou. Which hotel chains are available?"}, {"role": "assistant", "content": "You can consider 7 Days Inn, Ji Hotel, Hilton, and others."}, {"role": "user", "content": "I will choose 7 Days Inn."}, {"role": "assistant", "content": "Got it. Feel free to ask if you have other questions."} ] }' ``` #### Search Relevant Memories Session B happened on 2025-09-28. The user asks the AI to recommend a National Day travel destination and hotel. Use the user's message as the query to search MemOS memories. ```python [Python (HTTP)] data = { "query": "I want to travel during the National Day holiday. Please recommend a city I have not been to and a hotel brand I have not stayed at.", "user_id": "memos_user_123", "conversation_id": "0928" } res = requests.post( f"{BASE_URL}/search/memory", headers={"Authorization": f"Token {API_KEY}"}, json=data ) print(res.json()) ``` ```python [Python (SDK)] res = client.search_memory( query="I want to travel during the National Day holiday. Please recommend a city I have not been to and a hotel brand I have not stayed at.", user_id="memos_user_123", conversation_id="0928" ) print(res) ``` ```bash [Curl] curl "$MEMOS_BASE_URL/search/memory" \ -H "Authorization: Token $MEMOS_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "I want to travel during the National Day holiday. Please recommend a city I have not been to and a hotel brand I have not stayed at.", "user_id": "memos_user_123", "conversation_id": "0928" }' ``` ##### Output MemOS automatically recalls factual memories such as where the user has been and preference memories such as hotel booking preferences, helping the AI recommend a more personalized travel plan. The following result is simplified for easier understanding. ```text { preference_detail_list [ { "preference_type": "implicit_preference", "preference": "The user may prefer cost-effective hotel options.", "conversation_id": "0610" } ], memory_detail_list [ { "memory_key": "Summer Guangzhou travel plan", "memory_value": "The user plans to travel to Guangzhou during the summer vacation and chose 7 Days Inn as the accommodation option.", "conversation_id": "0610" } ] } ``` #### Add Memories to Your Prompt Add the recalled memories to your own model prompt, so the model can refer to these long-term memories when answering. **Expand the full prompt template** ```text # Role You are an intelligent assistant with long-term memory (MemOS Assistant). Your goal is to combine retrieved memory fragments to provide highly personalized, accurate, and logically rigorous answers. # Memory Data The following information was retrieved by MemOS and is divided into facts and preferences. - **Facts**: May include user attributes, historical conversations, or third-party information. - **Important**: Content marked as '[assistant view]' or '[model summary]' represents past AI inference, not the user's original words. - **Preferences**: Explicit or implicit requirements for response style, format, or reasoning. -[2025-12-26 21:45] The user plans to travel to Guangzhou during the summer vacation and chose 7 Days Inn as the accommodation option. -[2025-12-26 21:45] [Implicit Preference] The user may prefer cost-effective hotel options. # Critical Protocol: Memory Safety Retrieved memories may contain AI inferences, irrelevant noise, or incorrect subjects. Before using them, check: 1. Source truth: Distinguish the user's original words from AI inference. Do not treat past AI assumptions as user facts. 2. Subject attribution: Confirm the memory describes the user, not a third party, example, or fictional role. 3. Strong relevance: Only use memories that directly help with the current question. 4. Freshness: If a memory conflicts with the user's latest intent, use the current question as the source of truth. # Instructions 1. Filter usable memories and discard noise or unreliable inferences. 2. Use only validated memories as background context. 3. Answer directly. Do not mention "memory store," "retrieval," or internal system terms. # Original Query I want to travel during the National Day holiday. Please recommend a city I have not been to and a hotel brand I have not stayed at. ``` ### 3. Next Steps - [Core Operations](/memos_cloud/mem_operations/add_message): View detailed usage for core memory operations - [Use in Agents](/memos_cloud/getting_started/agent_usage): Integrate with OpenClaw, Hermes, or other AI tools - [API Reference](/api_docs/core/add_message): View the complete API documentation --- # Raw Input Content (/memos_cloud/introduction/raw_inputs) Raw input content is the starting point of memory production. You can provide chat text, images, business documents, tool call records, and other raw content directly. MemOS extracts information with long-term value and turns it into retrievable memories. ## 1. Text Raw text, chats, event information, or any string content. ```json { "user_id": "memos_user_123", "conversation_id": "0910", "messages": [ { "role": "user", "content": "I am Wang, and I like spicy food." } ] } ``` **Best for**: chat messages, user preferences, behavior events, and structured data. --- ## 2. Images MemOS can extract visual information from images and combine it with the text context sent alongside the image. ```json { "role": "user", "content": [ { "type": "text", "text": "This is the MemOS image I am studying." }, { "type": "image_url", "image_url": { "url": "https://cdn.memtensor.com.cn/img/1758706201390_iluj1c_compressed.png" } } ] } ``` For local images, replace `url` with a Base64 data URL: ```json { "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,{base64_image}" } } ``` **Best for**: capturing operation states from screenshots, key information from receipts, design details from mockups, or conclusions from charts. --- ## 3. Documents MemOS can read PDF, Word, Markdown, JSON, XML, TXT, and other file formats, and combine document content with the text context sent alongside the file. ```json { "role": "user", "content": [ { "type": "file", "file": { "file_data": "https://cdn.memtensor.com.cn/file/MemOS 2.pdf" } } ] } ``` For local documents, set `file_data` to a Base64 string. **Best for**: extracting conclusions from reports, constraints from requirement documents, rules from policy materials, or key settings from configuration files. --- ## 4. Tool Calls Agent tool decisions and tool results. MemOS generates tool memories so the Agent can call tools more reliably later. ```json { "messages": [ { "role": "assistant", "tool_calls": [{ "id": "call_123", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"Beijing\"}" } }] }, { "role": "tool", "tool_call_id": "call_123", "content": [{ "type": "text", "text": "{\"temperature\":\"7°C\"}" }] } ] } ``` **Best for**: recording tool selection, parameters, and returned results to improve future tool call success. See [Tool Memory](/memos_cloud/features/tool_calling). --- ## 5. Natural Language Feedback Corrections and supplements from users about answers, memories, or knowledge. You can pass the feedback text directly without locating a specific memory. ```json { "user_id": "memos_user_123", "feedback_content": "The purchase limit for office software is 600 yuan, not 800 yuan.", "allow_knowledgebase_ids": ["kb_xxx"] } ``` **Best for**: correcting wrong memories, updating outdated information, and filling missing details. See [Add Feedback](/memos_cloud/mem_operations/add_feedback). --- ## 6. Knowledge Bases and Skills Project-level knowledge documents or Agent skill packages. These are managed separately from user memories. ```json { "knowledgebase_id": "kb_xxx", "file": [ { "type": "document", "content": "https://cdn.memtensor.com.cn/file/MemOS 2.pdf" }, { "type": "skill", "name": "return_process.md", "content": "data:text/markdown;base64,{base64_skill}" } ] } ``` **Best for**: importing product documentation, policies, SOPs, FAQs, or Agent skill packages. See [Knowledge Base](/memos_cloud/features/knowledge_base) and [Self-Evolving](/memos_cloud/features/self-evolving). --- ## 7. Content Limits | Type | Limit | | --- | --- | | Text messages | 40,000 tokens per request | | Files | URL and Base64 supported, ≤ 20 files per request, each file ≤ 100 MB / 500 pages | | Images | URL and Base64 data URL supported | | Knowledge Base documents | URL and Base64 supported, ≤ 20 files per request, each file ≤ 100 MB / 500 pages | | Single Skill file | URL and Base64 supported, ≤ 100 KB, must include `name` and `description` | | Skill ZIP package | URL and Base64 supported, ≤ 20 MB, ≤ 200 files after extraction, must include `SKILL.md` | --- ## 8. API Overview | Entry | Accepted content | Typical use | | --- | --- | --- | | `add/message` | Text, images, documents, Tool Calls | Write user memories such as facts, preferences, and tool experience | | `chat` | Current user query | Recall memories + generate an answer + write the new conversation | | `extract/memory` | Plain text messages | Extract facts/preferences only, without writing to the long-term memory store | | `add/feedback` | Natural language feedback | Correct or supplement existing memories | | `add/knowledgebase-file` | Knowledge documents, Skill files | Build project knowledge bases and Agent skill libraries | --- ## 9. Explore Different APIs - [Add Message](/memos_cloud/mem_operations/add_message): Write text, images, documents, and Tool Calls - [Chat](/memos_cloud/mem_operations/chat): Use the conversation API with memory recall - [Knowledge Base](/memos_cloud/features/knowledge_base): Manage project documents, policies, and SOPs - [Add Feedback](/memos_cloud/mem_operations/add_feedback): Correct wrong memories or fill missing information --- # Memory Categories (/memos_cloud/introduction/memory_types) MemOS generates and uses different categories of memories. They play different roles during recall: some confirm events, some describe preferences, some guide how an Agent should complete a task, and some provide supporting knowledge. Distinguishing memory categories helps retrieval return more relevant content and gives the model more accurate context. ## 1. Fact Memories Fact memories describe relatively objective information, usually from explicit user statements, behavior events, files, or feedback. Common examples: - The user lives in Shanghai. - The user's device is a 13-inch Intel MacBook Pro. Fact memories are useful for answering questions such as "who is the user", "what has the user done", "what state is the user currently in", and "whether something happened". > **Note**: Memories may change over time. MemOS combines [Time Awareness](/memos_cloud/introduction/time_awareness) and [Memory Lifecycle Management](/memos_cloud/introduction/mem_lifecycle) to handle these changes. ## 2. Preference Memories Preference memories describe long-term or stage-specific user tendencies. They may come from explicit user statements, or from summaries and inferences based on repeated behavior. Common examples: - The user prefers concise and direct answers. - When planning trips, the user prefers cultural attractions and dislikes shopping malls. - When buying pet food, the user needs to avoid chicken flavor. Preference memories are useful for recommendation, generation, ranking, and personalized decisions. They do more than answer "what did the user say"; they help the Agent decide "what would better fit this user". ## 3. Self-Evolving Skills MemOS automatically extracts skills from historical messages to enable self-evolving memory. You can also upload existing skill packages to a knowledge base. Tasks with stable steps, such as travel planning, reimbursement review, and customer issue triage, are good candidates for skills. See [Self-Evolving](/memos_cloud/features/self-evolving). ## 4. Tool Memories Tool memories record "how to use tools". They are distilled from Tool Schemas, tool call parameters, tool results, and tool trajectories. They help the Agent select tools, fill parameters, and use returned results more reliably. See [Tool Memory](/memos_cloud/features/tool_calling). ## 5. Knowledge Base Memories Knowledge Base memories come from project-level documents, policies, manuals, FAQs, process files, or Skill files. They are not part of a single user's personal history and are better shared by multiple users or Agents. Common examples: - Company reimbursement policy. - Product manual. - Return and after-sales policy. Knowledge Base memories can participate in recall together with user memories. For example, when an employee asks "The intranet proxy does not open. Which version should I reinstall?", the knowledge base provides installation instructions, while user fact memory adds that the employee uses an Intel MacBook Pro. Combining both leads to a more accurate answer. See [Knowledge Base](/memos_cloud/features/knowledge_base). ## 6. How to Generate and Use Them ### Generating memories Currently, MemOS generates all memory categories by default. - Fact, preference, and Skill memories are generated after adding messages. - Tool memories require Tool Call information with `"role": "tool"` when adding messages. - Knowledge Base memories require creating a knowledge base and uploading documents. ### Using memories When searching memories, control the following parameters according to the memory categories you need: - `include_preference`: intelligently recall user preference memories based on the query. - `include_skill`: intelligently recall user Skill memories based on the query. - `include_tool_memory`: intelligently recall user tool memories based on the query. - `knowledgebase_ids`: specify the knowledge bases that can be searched. > **Note**: For the full field list, request format, and response format, see the [Search Memory API documentation](/api_docs/core/search_memory). ## 7. Start Using Directly - [Add Message](/memos_cloud/mem_operations/add_message): Write user conversations, files, and tool call traces - [Search Memory](/memos_cloud/mem_operations/search_memory): Retrieve different categories of memories --- # Multi-user / Multi-Agent Isolation (/memos_cloud/introduction/isolation_filters) If your product is used by multiple users, Agents, projects, or business scenarios, memories need clear boundaries. - Project boundaries separate different products or business spaces. - User boundaries separate personal memories for different users within the same project. - Agent / application boundaries separate memories produced by different Agents or apps for the same user. - Conversation context marks which conversation or task a memory came from. - Shared project knowledge can be placed in a knowledge base or public memory, and used by authorized users. ## 1. Project Boundary: Create a Project and Use an API Key If you have not registered for the cloud service, first sign in to the [MemOS Console](https://memos-dashboard.openmem.net/quickstart). New users get a default project. If you need to isolate different products or business spaces, create a new project in the console. Each project has its own API Key list. When you use an API Key from a project, your requests access memories under that project. Go to the [API Key page](https://memos-dashboard.openmem.net/apikeys) to get an API key quickly. > **Note**: For project creation, switching, and API key management, see [Project Configuration](/api_docs/start/configuration). ## 2. User Memories: Use Different `user_id` Values Each user should have a unique `user_id`. The same user can keep using the same `user_id` across different entries or applications, so MemOS can build continuous long-term memories. Write a memory for user A: ```json { "user_id": "user_a", "messages": [ { "role": "user", "content": "I like spicy food." } ] } ``` Search memories for user B: ```json { "user_id": "user_b", "query": "Recommend a hotel for me" } ``` This search only looks in user B's personal memories. It will not retrieve user A's memories. Even if both users are in the same project and talk to the same Agent, different `user_id` values keep the memory boundary clear. ## 3. Multi-Agent Isolation: Use `agent_id` If the same user uses multiple Agents, such as a customer service assistant, health assistant, and coding assistant, assign different `agent_id` values to different Agents. When adding messages for a health assistant, pass `"agent_id": "health_assistant"`: ```json { "user_id": "user_123", "agent_id": "health_assistant", "messages": [ { "role": "user", "content": "I ran 5 km today, and my knee feels a little sore." } ] } ``` When searching memories, use `filter` to limit memories to the corresponding Agent: ```json { "user_id": "user_123", "query": "Give me advice based on my recent exercise.", "filter": { "and": [ { "agent_id": "health_assistant" } ] } } ``` This keeps the user's long-term memory under the same person while still allowing memories from different Agents to be filtered when needed. > **Note**: For more filtering options, see [Memory Filters](/memos_cloud/features/filters). ## 4. Current Conversation: Use `conversation_id` To help MemOS understand context, pass a `conversation_id` when adding user messages. It indicates which conversation or task the message belongs to. ```json { "user_id": "user_123", "conversation_id": "order_refund_001", "messages": [ { "role": "user", "content": "I want to ask about my refund progress." }, { "role": "assistant", "content": "The refund is still being processed and is expected to arrive within 24 hours." } ] } ``` When searching memories, `conversation_id` is not a mandatory filter. Passing it clarifies the current conversation and gives memories from this conversation higher weight. If you do not pass it, MemOS still searches the user's historical memories. ## 5. Shared Knowledge: Use Knowledge Bases or Public Memory Not everything should be written into a user's personal memory. Project documents, policies, product manuals, SOPs, and other shared knowledge are better placed in a knowledge base or public memory. ### Knowledge Base For project documents, policies, and product manuals, create a knowledge base and upload documents. See [Knowledge Base](/memos_cloud/features/knowledge_base) for the full workflow. During search, pass `knowledgebase_ids` in addition to user memory so the answer can also refer to project-level knowledge. ```json { "user_id": "user_123", "query": "Based on my situation and the company policy, can this expense be reimbursed?", "knowledgebase_ids": ["kb_finance_policy"] } ``` ### Public Memory Public memory is suitable for lightweight shared information such as project announcements, team experience, and general rules. If a message should be shared by all users under the project, enable `allow_public` when writing it. ```json { "user_id": "user_123", "allow_public": true, "messages": [ { "role": "user", "content": "The reimbursement deadline for this quarter is June 25." } ] } ``` During retrieval, user personal memories and public memories are searched together and relevant content is recalled. --- # Time Awareness (/memos_cloud/introduction/time_awareness) We treat memories as objects that evolve over time, not as static text. A user's preferences, state, and facts may change; one-time events also need to keep when they happened. With time awareness, MemOS can decide when to use the latest current understanding and when to trace back to historical versions. ## 1. What Problems It Solves ### New and old information conflicts - The user first says "I like apples", and later says "I don't like apples anymore". - **How MemOS handles it**: MemOS automatically detects the change. During retrieval, it returns only the currently valid understanding: "does not like apples". Historical versions are not deleted and can still be traced when needed. ### "Now" and "before" should return different answers - The user used to live in Beijing, then moved to Shanghai. - **How MemOS handles it**: MemOS understands time clues in the question. When the user asks "Where do I live now?", it retrieves Shanghai. When the user asks "Where did I live before?", it retrieves Beijing. ### Events should not be forcibly merged - The user traveled to Xi'an during last year's National Day holiday and to Hangzhou during this year's May Day holiday. - **How MemOS handles it**: MemOS keeps the complete information for each event instead of merging everything into "the user likes travel". When the user asks "Where did I travel recently?", it can prioritize Hangzhou. ## 2. Processing Flow ### At write time: use `chat_time` to anchor historical time Real-time conversations usually do not require a timestamp. MemOS uses the message ingestion time. If you import historical data, pass `chat_time` so memories are anchored to when the event actually happened. ```json { "user_id": "memos_user_123", "messages": [ { "role": "user", "content": "I like spicy food.", "chat_time": "2025-09-12 08:00:00" }, { "role": "user", "content": "I can't eat very oily or spicy food now, and I prefer something lighter.", "chat_time": "2025-09-25 12:00:00" } ] } ``` ### Time handling for two kinds of memories MemOS handles memories differently: - **States that may change**, such as where the user lives, what they like, or their current stage goal, keep different time versions. This lets retrieval use the current version while still allowing historical recall. - **One-time events**, such as where the user went or what they did, keep the details of each event. This lets retrieval return complete event information. ### At search time: understand the time target in the question - **No explicit time target**: retrieve normally and prefer currently valid memories. - **With time clues**: understand the time target and select the corresponding version or event. If you already know the time range, you can also combine [Memory Filters](/memos_cloud/features/filters) with `create_time` to narrow candidate memories: ```json { "user_id": "memos_user_123", "query": "Summarize my reading-related records in December 2025", "filter": { "and": [ { "create_time": { "gte": "2025-12-01" } } ] } } ``` ## 3. Example Try writing a set of events with time information: ```json { "user_id": "test_user", "messages": [ { "role": "user", "content": "I live in Beijing.", "chat_time": "2025-01-10 10:00:00" }, { "role": "user", "content": "I moved to Shanghai.", "chat_time": "2025-06-01 10:00:00" }, { "role": "user", "content": "I traveled to Xi'an during last year's National Day holiday.", "chat_time": "2025-10-08 10:00:00" }, { "role": "user", "content": "I traveled to Hangzhou during this year's May Day holiday.", "chat_time": "2026-05-02 10:00:00" } ] } ``` | Type | Search query | Expected result | | --- | --- | --- | | Current state | "Where do I live now?" | Shanghai, using the latest state | | Historical state | "Where did I live before?" | Beijing, tracing back to the historical version | | Up to now | "Where have I traveled?" | Xi'an and Hangzhou, with Hangzhou ranked first | ## 4. Usage Suggestions - Pass accurate `chat_time` when importing historical data; otherwise, old events may be treated as if they just happened. - For current-state information, let users naturally express updates. MemOS will update the current understanding automatically. ## 5. Next Steps - [Add Message](/memos_cloud/mem_operations/add_message): Write historical messages with timestamps - [Search Memory](/memos_cloud/mem_operations/search_memory): Search and verify time-related memories --- # FAQ (/memos_cloud/introduction/faq) This page answers product-level and concept-level questions about MemOS. If you are already using MemOS Cloud and need help with projects, API Keys, quotas, or API calls, see [Cloud FAQs](/memos_cloud/support/faq) and [Quotas and Limits](/memos_cloud/support/limit). ## How is MemOS different from a standard RAG framework? | Dimension | RAG | MemOS | | --- | --- | --- | | Managed content | Static knowledge chunks or document passages | Memories that evolve with users, tasks, and time | | Content shape | Usually recalls raw text passages | Converts raw input into memory units such as facts and preferences | | Update model | Depends on document updates or re-indexing | Supports continuous writing, updates, feedback correction, and lifecycle management | | Recall goal | Help the model know external knowledge | Help the model understand user state, preferences, and context | RAG is better for stable external knowledge. MemOS is better for user memories that continuously change during conversations and business workflows. They can be used together. ## Can MemOS work with existing RAG systems or knowledge graphs? Yes. RAG handles factual retrieval and knowledge augmentation, while MemOS handles continuous memory and state management. In a business application, you can keep stable content such as policies and product documentation in a knowledge base or RAG system, and let MemOS manage dynamic information such as user conversations, preferences, and task progress. During response generation, the application can use both external knowledge and user memories. ## How does MemOS work? The basic workflow is: 1. Write raw information through `add/message`, knowledge bases, feedback, or related capabilities. 2. MemOS processes raw input into searchable and updateable memories. 3. Later requests recall relevant memories through `search/memory`, `chat`, or Agent integrations. 4. Memories continue to update through new input, feedback, and lifecycle policies. If you only want to integrate the cloud service quickly, start with [Integrate into Your App](/memos_cloud/getting_started/quick_start). ## What are the core capabilities of MemOS? - **User / Agent memory management**: store user-AI interactions and isolate memories across users and Agents. - **Memory production and updates**: generate reusable memories from conversations, behavior events, and knowledge content. - **Memory recall and scheduling**: select memories based on relevance, freshness, and context. - **Memory lifecycle management**: control memory quality and scale through updates, merging, and archiving. - **Cloud and open source options**: use the managed cloud service, or self-host and extend the open-source project. ## How should I choose between Cloud and Open Source? Use [MemOS Cloud](/memos_cloud/getting_started/quick_start) if you want quick validation, lower operational cost, and built-in console, API Key, knowledge base, and quota management. Use [Open Source](/open_source/getting_started/installation) if you need to manage your own deployment environment, modify lower-level implementation, connect custom inference backends, or do deeper secondary development. For a fuller comparison, see [Cloud Service & Open Source](/memos_cloud/getting_started/cloud_and_opensource). ## Does MemOS support private deployment? Yes. For private deployment, commercial customization, or deeper business-specific adaptation, contact the MemOS team to confirm deployment mode, data boundaries, and feature scope. Teams that want to explore and modify MemOS themselves can also start from the open-source project. ## What is the relationship between lifecycle and scheduling? Lifecycle management controls how memory units change over time, such as updates, merging, consolidation, or archiving. Scheduling decides which memories should enter the current context for a specific request. In short: lifecycle management maintains memories over the long term; scheduling decides which memories to use now. ## How does MemOS avoid memory bloat? MemOS does not append all raw history directly into model context. It processes raw input into shorter memory units and controls memory scale through updates, merging, and archiving. During recall, MemOS selects only memories relevant to the current request, reducing unrelated context. ## Are KV-Cache and activating memory the same thing? No. KV-Cache is a model inference-level computation cache. Activating memory is a MemOS product concept for describing recently reusable memory state. In implementation, activating memory can use lower-level cache capabilities to improve recent-context reuse, but the two are not equivalent. ## Will MemOS slow down inference? MemOS aims to reduce irrelevant context through memory processing and recall, instead of sending all history to the model. Actual latency depends on write volume, recall scope, filters, model calls, and business concurrency. If you encounter quota or latency issues in cloud API calls, see [Quotas and Limits](/memos_cloud/support/limit) and [Cloud FAQs](/memos_cloud/support/faq). ## If the information is recent, such as “what I did yesterday,” is scheduling still needed? Yes. Recent information is not always relevant, and it should not always be sent in full. Scheduling considers the current question, conversation, user memories, and relevance to choose the best memories for the current turn. ## What business scenarios is MemOS suitable for? MemOS is suitable for AI applications that need long-term memory and continuous personalization, such as companionship, games, travel, customer service, knowledge management, investment advisory, production operations, and AI learning assistants. You can first validate a specific scenario with Cloud APIs, then decide whether to do deeper integration or private deployment. --- # Memory Production (/memos_cloud/introduction/mem_production) ## 1. What Is Memory Production Memory production is the write and processing stage of the MemOS workflow. After developers submit raw information, MemOS extracts facts, preferences, tool usage processes, skill clues, and knowledge content, then generates memory units that can later be searched, filtered, scheduled, and updated. Raw information may be stored in full first, but what enters later reasoning is usually not the whole original text. It is the memory produced after extraction, denoising, structuring, and version governance. ## 2. Why Not Just Store Raw Text If all raw information is stored directly and pasted into the model next time, three problems appear: - **High context cost**: raw conversations, logs, and files often contain greetings, repetition, and irrelevant details. Pasting them directly wastes tokens. - **Unstable retrieval quality**: long unprocessed text lacks clear topics and structure, so it can recall fragments that look relevant but do not help. - **Hard long-term consistency**: user preferences, locations, and states change. Raw text alone cannot clearly express "what should be trusted now". The goal of memory production is to turn "what happened" into "what can be used later". ## 3. Key Processing Stages From raw input to usable memory, the middle step is not simple text saving. It is an ongoing process of organizing information into long-term usable understanding. It usually includes: | Stage | Role | | --- | --- | | Extraction | Identify information worth keeping long term from raw conversations, events, or documents | | Structuring | Organize information into distinguishable memory categories such as facts, preferences, tools, skills, and knowledge | | Denoising | Filter greetings, repeated content, temporary context, and low-value logs | | Merging | Merge identical or similar information to reduce duplicate memories | | Evolution | Update currently trusted memories when user state or preferences change, while keeping necessary history | ## 4. Example: From Conversation to Memory Raw conversation: ```text User: I have booked a summer trip to Guangzhou. Which hotel chains are available? Assistant: You can consider 7 Days Inn, Ji Hotel, Hilton, and others. User: I'll choose 7 Days Inn. Assistant: Got it. Feel free to ask if you have other questions. ``` Possible generated memories: - Fact memories come from explicit statements in the original text. - Preference memories combine context and reasoning to summarize user preferences. ```text - Fact memory: The user plans to travel to Guangzhou during the summer vacation. - Fact memory: The user chose 7 Days Inn from the accommodation options in Guangzhou. - Implicit preference: The user may prefer economical and practical hotels. ``` ## 5. Next Steps - [Raw Input Content](/memos_cloud/introduction/raw_inputs): Confirm what content can be written - [Memory Categories](/memos_cloud/introduction/memory_types): Understand facts, preferences, tools, skills, and other memory categories - [Add Message](/memos_cloud/mem_operations/add_message): View how message writing and memory production are integrated --- # Memory Scheduling (/memos_cloud/introduction/mem_schedule) ## 1. What Is Memory Scheduling Memory scheduling is MemOS's runtime ability to manage memory availability. It is not just about "which memory is found". Based on the current task, user state, historical topics, and memory heat, it decides which memories should stay closer to model context and which can remain in low-frequency storage. You can think of memory scheduling as attention management in the memory system. When a user enters a task scenario, the system prepares the memories most likely to be needed and reduces interference from irrelevant history. ## 2. Why Scheduling Is Needed If every request only performs a full retrieval, the system faces three problems: - **Slower response**: waiting until the user asks before searching all history increases first-token latency. - **Context overload**: recalling too much history may bury the information actually needed for the current task. - **Unnatural topic switching**: when the user's recent focus changes, the system needs to raise the priority of the new topic and downgrade the old one. The goal of scheduling is not to store more content, but to make the right memories available at the right time. ## 3. What Scheduling Looks At | Signal | Role | | --- | --- | | Current task | Determines which topic or scenario the user is working on | | Memory relevance | Identifies which memories are closer to the current input, conversation, and business goal | | Freshness | Prioritizes information that is still valid and reduces the impact of outdated content | | Usage frequency | Frequently used memories are more likely to be prepared in advance or kept active | | Permission scope | Ensures scheduling respects user, Agent, tenant, and business isolation rules | Scheduling affects later recall and context injection. Relevant, active, and trustworthy memories are more likely to be used first; low-frequency, outdated, or context-inappropriate memories are delayed. ## 4. Example: From Buying a Home to Renovation **Earlier stage: buying a home is the core topic** **User input**: "Help me check the average second-hand home price around Binjiang." "Remind me to view houses on Saturday." "Record the latest mortgage rate changes." **Scheduling result**: Generate memories about communities, house-viewing schedules, and mortgage rates. Determine that "home buying" is a recent high-frequency topic. Keep home-buying memories at a higher priority. **Recently: renovation becomes the new active topic** **User input**: "I'm going to look at tiles this weekend." "Remind me to confirm plumbing and electrical work with the contractor." "Note next week's furniture delivery time." **Scheduling result**: Continue generating renovation-related memories. Determine that "renovation" has become the new high-frequency topic. Move renovation memories to a higher priority. Keep home-buying memories, but gradually downgrade them. The user casually says: "I feel like a lot of things are piling up. Please sort them out for me." **Without scheduling: temporary full retrieval**: Needs to retrieve from all memories on the spot. May mix in low-relevance items such as checking housing prices, viewing houses, grocery shopping, or watching movies. The answer is slower and more likely to drift away from the current task. **With scheduling: prepare the current topic first**: Prioritize renovation memories such as looking at tiles, confirming plumbing and electrical work, and furniture delivery. No need to re-evaluate the full history every time. Responses are faster and closer to what the user is currently worried about. ## 5. Relationship with Recall Scheduling and recall are not the same thing: | Capability | Focus | | --- | --- | | Memory scheduling | Which memories should be more active and closer to model context at the current stage | | Memory recall | Which memories should be retrieved and used in a specific request | Scheduling is runtime preparation and priority management. Recall is retrieval and selection for one request. When scheduling works well, recall is usually faster, more accurate, and less affected by irrelevant history. --- # Memory Recall (/memos_cloud/introduction/mem_recall) ## 1. What Is Memory Recall Memory recall is the core ability of MemOS when reading memories. After a user sends a new request, MemOS combines the input, conversation context, filters, and memory states to find memories that can help the model complete the current task. The goal is not to return all history. It is to put the most useful facts, preferences, tool experience, skill clues, or knowledge content into the limited context window. ## 2. Why Context Alone Is Not Enough If a model only relies on the current conversation window, three problems appear: - **Users must repeat themselves**: preferences, background, and long-term matters cannot continue naturally. - **Historical information is easily lost**: earlier conversations, cross-session behavior, and tool results do not automatically appear in the current context. - **Input becomes overloaded**: adding all raw history is costly and makes answers less stable. The value of memory recall is to turn long-term memories into usable input for the current task when needed. ## 3. What Recall Returns | Memory category | Typical use | | --- | --- | | Fact memories | Add clear facts such as user identity, long-term matters, and business status | | Preference memories | Continue the user's tone, style, choice habits, or constraints | | Tool memories | Help Agents select the right tool and invocation pattern in similar tasks | | Skill memories | Reuse execution steps and constraints distilled from multi-turn tasks | | Knowledge content | Provide documents, images, multimodal content, or knowledge base evidence | Recall results usually include source, time, type, tags, confidence, and status. Developers can further filter, rank, or decide whether to inject them into downstream models. ## 4. Key Stages in Recall | Stage | Role | | --- | --- | | Understand the request | Decide what background, preferences, or knowledge the current input needs | | Filter the scope | Limit candidate memories by user, conversation, time, tags, type, and other conditions | | Retrieve candidates | Find semantically relevant or condition-matched memories from the memory base | | Rank and select | Combine relevance, confidence, freshness, and status to choose more reliable results | | Govern injection | Control which memories enter model context, avoiding excessive, outdated, or non-compliant content | Together, these stages determine whether recalled memories are actually useful. Too little recall leaves the model without background; too much recall adds noise and cost. ## 5. Next Steps - [Memory Filters](/memos_cloud/features/filters): Use filters to control recall scope and reduce irrelevant memories - [Search Memory](/memos_cloud/mem_operations/search_memory): View how to integrate memory recall --- # Memory Lifecycle Management (/memos_cloud/introduction/mem_lifecycle) ## 1. What Is Memory Lifecycle Management A memory is not a record that stays unchanged forever after it is written. As user states, task stages, and business facts change, the same kind of memory may need to be updated, merged, downgraded, archived, or cleaned up. Memory lifecycle management focuses on long-term evolution at the storage layer: which memories are still trustworthy, which are outdated, which need historical traces, and which can be removed from the active index. ## 2. Why Lifecycle Management Is Needed If memories only grow without governance, three problems appear: - **Duplicate memories increase**: similar facts and preferences are written repeatedly, increasing recall noise. - **Outdated information interferes**: old addresses, old preferences, and completed tasks are still treated as current facts. - **Conflicts are hard to judge**: when new and old information conflict, the system needs to know which one should be trusted now. The goal of lifecycle management is to preserve memories long term without continuously bringing outdated, duplicate, or low-value information into reasoning. ## 3. Key Lifecycle States | State | Meaning | Common handling | | --- | --- | --- | | Generated | A new memory is written with source, time, type, confidence, and other metadata | Enters the memory base and waits for future use | | Activated | A memory is frequently used in recent tasks or scheduled to a higher priority | Easier to recall and inject into context | | Merged | Similar or complementary memories are integrated into a more stable expression | Reduces duplication and forms an updated trusted memory | | Archived | Long-term low-frequency memories, or memories not suitable for the current task, are downgraded | Participates less in reasoning by default, but remains traceable | | Expired | A memory exceeds its validity period or is judged unusable by policy | Removed from the active index; minimal audit info may be retained | | Frozen | Compliance, audit, or business-critical memories cannot be automatically rewritten | Preserves full history and limits updates or deletion | Specific state names and policies may vary by integration, but the core idea is the same: memories need continuous governance based on time and usage. ## 4. Example: Changing Learning Preferences Suppose you are building an online education assistant with MemOS to help students solve math problems. #### Generated The student says for the first time: "I always confuse quadratic functions with linear functions." The system extracts a memory: ```json { "content": "The student often confuses quadratic functions with linear functions", "confidence": 0.99, "create_time": "2025-09-11" } ``` State: `Generated` Behavior: stored in the memory base, waiting for future use. #### Activated In the next several problem-solving sessions, the system frequently uses this memory to assist with answers. State: `Activated` Behavior: prioritized by the scheduling mechanism to improve later retrieval and context injection efficiency. #### Merged With more interactions, the system discovers that the student not only confuses linear and quadratic functions, but also struggles with exponential functions. The system merges multiple similar memories into a new one: ```json { "content": "The student is confused about function concepts, especially linear, quadratic, and exponential functions", "history": "The student often confuses quadratic functions with linear functions", "version": "v1" } ``` State: `Merged` Behavior: old entries are compressed into a more complete new version, reducing redundancy. #### Archived Three months later, the student has mastered function-related concepts, and this memory has not been scheduled for a long time. State: `Archived` Behavior: downgraded to a low-frequency state. It does not participate in reasoning by default, but can be used in "learning trajectory backtracking". #### Expired Another year later, the student advances to a new grade. The old "junior high function confusion" memory is judged invalid by policy. State: `Expired` Behavior: removed from the index, retaining only minimal audit information. ```json { "deleted_fact_id": "12345", "deleted_at": "2026-09-11" } ``` #### Frozen (special state) At the same time, the student's "final exam evaluation report" is a compliance file and must not be automatically modified. State: `Frozen` Behavior: locked against automatic updates, with full modification history retained for audit and compliance review. ## 5. Relationship with Production, Scheduling, and Recall | Capability | Focus | | --- | --- | | Memory production | How raw input is processed into usable memories | | Lifecycle management | How memories are updated, merged, archived, and cleaned after writing | | Memory scheduling | Which memories should be more active at the current stage | | Memory recall | Which memories should be retrieved and used for one request | Lifecycle management provides memory state for scheduling and recall. Active, trustworthy, and unexpired memories are better suited for current reasoning. Archived, expired, or frozen memories need to be handled carefully according to policy. --- # MemOS Algorithm Overview (/memos_cloud/introduction/algorithm) > **Note**: Reference: Paper link https://arxiv.org/abs/2507.03724 ## 1. What is MemOS? Today’s large language models (LLMs) have demonstrated strong generative and reasoning capabilities, but they generally lack true “memory.” * In multi-turn conversations, they often forget earlier information; * In application scenarios, they fail to retain users’ personalized preferences; * During knowledge iteration, they update slowly and cannot flexibly adapt to new requirements. This makes LLMs “smart,” but not yet capable of becoming true **teachers, colleagues, or assistants**. **MemOS (Memory Operating System)** was created to address this fundamental gap. It elevates “memory” from a fragmented function to a **system resource** as important as computation, providing LLMs with: * **A unified memory layer**: supporting long-term knowledge retention and context management beyond single conversations; * **Persistence and structuring**: enabling memory to be stored, traced, and reused; * **Memory-augmented reasoning**: recalling historical experiences and preferences during inference to generate answers better aligned with user needs. Compared with traditional approaches (e.g., purely relying on parameter memory or temporary KV cache), the value of MemOS lies in: * Allowing AI to continuously evolve and learn instead of “forgetting after seeing”; * Not only answering the present question, but also improving future performance through accumulated knowledge; * Providing developers with unified APIs that turn “memory” from complex self-built logic into standardized capabilities. In short, MemOS aims to: **Transform large models from disposable dialogue tools into intelligent agents with true long-term memory and adaptive capabilities.** ## 2. MemOS Architecture Design At its core, MemOS treats “memory” as an independent system layer—like computation and storage—becoming a fundamental capability for AI applications. Its overall architecture can be summarized as a **three-layer structure**: **API & Application Interface Layer, Memory Scheduling & Management Layer, Memory Storage & Infrastructure Layer** ![art.gif](https://statics.memtensor.com.cn/memos/art.gif) * In the **API & Application Interface Layer**, MemOS provides standardized Memory APIs. Developers can perform operations such as **memory creation, deletion, and updating** through simple interfaces, giving large models persistent memory capabilities for multi-turn conversations, long-term tasks, and cross-session personalization. > Here, the `API layer` refers to standardized interface design within the framework, illustrating system principles and capability boundaries. **This is different from cloud service developer APIs** (e.g., simplified wrappers like `add`, `search`), which serve as unified endpoints abstracted from MemOS backend capabilities. * In the Memory Scheduling & Management Layer, MemOS introduces a new paradigm of **Memory Scheduling**, enabling context-based **“Next-Scene Prediction”**. This allows memory fragments likely to be needed to be preloaded during generation, significantly reducing response latency and improving inference efficiency. * In the **Memory Storage & Infrastructure Layer**, MemOS integrates plaintext memory, activating memory, and parameter memory through the standardized **MemCube**. It supports multiple persistent storage methods, including graph databases and vector databases, and offers **cross-model memory transfer and reuse**.
Basic structure of standardized MemCube (Memory Cube)
Basic structure of standardized MemCube (Memory Cube)
## 3. Why is MemOS Efficient? > **Note**: From Next-Token Prediction to Next-Scene Prediction * In traditional LLM Q&A systems, the generation process still follows the **synchronous Next-Token mechanism**: the model receives the user’s query → retrieves external fragments in real time → generates the answer token by token. * Any pauses caused by retrieval or computation directly extend the reasoning chain, tightly coupling knowledge injection and generation. This leads to GPU idle waiting and noticeable response delays for users. * Unlike this traditional paradigm, MemOS approaches from the perspective of memory modeling and introduces a **memory scheduling paradigm**. By designing an asynchronous scheduling framework, it predicts memory information likely to be needed, significantly reducing efficiency loss during real-time generation. * MemOS jointly schedules the three core memory types in MemCube (parameter memory, activating memory, plaintext memory) along with external knowledge bases (including internet retrieval and massive local knowledge). * With precise awareness of conversation turns and time gaps, the system intelligently predicts which memory elements may be needed in the next scene. It dynamically routes and preloads the required plaintext, parameter, and activating memories, ensuring immediate hits during generation and maximizing efficiency and fluency of reasoning. ![640.gif](https://statics.memtensor.com.cn/memos/ani.gif)
Core idea of memory scheduling
Core idea of memory scheduling
## 4. MemOS-Preview Performance Evaluation ### 4.1 LoCoMo Memory Benchmark * To systematically validate MemOS in real application scenarios, the MemOS team conducted comprehensive evaluations using the **LoCoMo dataset**. * As an industry-recognized benchmark for memory management, LoCoMo has been adopted by multiple mainstream frameworks to test models’ memory access and multi-turn dialogue consistency. * Public results show that **MemOS achieves significant improvements in both accuracy and computational efficiency**. Compared with OpenAI’s global memory approach, it demonstrates superior performance on key metrics, further verifying its technological leadership in memory scheduling, management, and reasoning integration. ![image.png](https://cdn.memtensor.com.cn/img/1758687655761_blkqnr_compressed.png) ### 4.2 KV Cache Memory Evaluation * Beyond general memory ability assessments, the research team specifically examined the effectiveness of MemOS’s KV Cache mechanism in accelerating inference. * Tests were conducted across different context lengths (Short/Medium/Long) and model sizes (8B/32B/72B), systematically evaluating cache build time (Build), **Time-To-First-Token (TTFT), and overall Speedup**. * Results (see Figure 10) show that **MemOS significantly optimizes KV Cache build and reuse efficiency across configurations**, making inference more efficient and smooth. This reduces user waiting latency and achieves substantial performance acceleration in large-scale model scenarios. ![image.png](https://cdn.memtensor.com.cn/img/1758687596553_iptom0_compressed.png) ## 5. Next Steps * Learn about [Cloud Platform & Open Source](/memos_cloud/getting_started/cloud_and_opensource) and experience the power of MemOS! --- # Add Message (/memos_cloud/mem_operations/add_message) > **Note**: **Why memory matters** > > - Long-term continuity: preserve information across sessions so it is not lost after one conversation ends. > - Better understanding of user preferences: as interactions accumulate, AI can understand the user more accurately. > - Continuous evolution over time: user memories can be updated dynamically during conversations. > - Cross-product experience: share the same user's memories across multiple applications or products for a consistent experience. ## 1. Key Parameters - **User ID (`user_id`)**: identifies which user the messages belong to. Every added message must be associated with a unique user identifier. - **Conversation ID (`conversation_id`)**: identifies which conversation the messages belong to. Every added message must be associated with a unique conversation identifier. - **Messages (`messages`)**: an ordered list of user and AI messages to add to MemOS. ## 2. How It Works - **Information extraction**: MemOS uses LLMs internally to extract facts, preferences, and other information from messages, then processes them into memories such as fact memories, preference memories, and tool memories. - **Conflict resolution**: existing memories are checked for duplication or contradiction and updated when needed. - **Memory storage**: generated memories are stored with vector and graph databases so they can be recalled efficiently later. All of these steps are triggered by calling the `add/message` API. You do not need to manually operate on user memories. ## 3. Quick Start ```python [Python (HTTP)] import requests API_KEY = "YOUR_API_KEY" BASE_URL = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "0610", "messages": [ {"role": "user", "content": "I have booked a summer trip to Guangzhou. Which hotel chains are available?"}, {"role": "assistant", "content": "You can consider 7 Days Inn, Ji Hotel, Hilton, and others."}, {"role": "user", "content": "I choose 7 Days Inn."}, {"role": "assistant", "content": "Got it. Feel free to ask if you have other questions."} ] } res = requests.post( f"{BASE_URL}/add/message", headers={"Authorization": f"Token {API_KEY}"}, json=data ) print(res.json()) ``` ```python [Python (SDK)] from memos.api.client import MemOSClient client = MemOSClient(api_key="YOUR_API_KEY") messages = [ {"role": "user", "content": "I have booked a summer trip to Guangzhou. Which hotel chains are available?"}, {"role": "assistant", "content": "You can consider 7 Days Inn, Ji Hotel, Hilton, and others."}, {"role": "user", "content": "I choose 7 Days Inn."}, {"role": "assistant", "content": "Got it. Feel free to ask if you have other questions."} ] res = client.add_message( messages=messages, user_id="memos_user_123", conversation_id="0610" ) print(res) ``` ```bash [Curl] export MEMOS_API_KEY="YOUR_API_KEY" export MEMOS_BASE_URL="https://memos.memtensor.cn/api/openmem/v1" curl "$MEMOS_BASE_URL/add/message" \ -H "Authorization: Token $MEMOS_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "user_id": "memos_user_123", "conversation_id": "0610", "messages": [ {"role": "user", "content": "I have booked a summer trip to Guangzhou. Which hotel chains are available?"}, {"role": "assistant", "content": "You can consider 7 Days Inn, Ji Hotel, Hilton, and others."}, {"role": "user", "content": "I choose 7 Days Inn."}, {"role": "assistant", "content": "Got it. Feel free to ask if you have other questions."} ] }' ``` > **Note**: Want to know which memories were generated? Copy and run the code above, then continue to [Search Memory](/memos_cloud/mem_operations/search_memory). Need the complete field list, request format, and response format? See the [Add Message API documentation](/api_docs/core/add_message). ## 4. When Should You Add Messages? Raw message content is the foundation of memory. MemOS processes added messages into memories for later retrieval and use. You can choose the right timing based on your scenario: - **One-time import**: import existing user conversation history into MemOS to quickly build initial memories. - **Real-time add**: add messages to MemOS whenever the user sends a message. - **Add by turns**: add user messages every few conversation turns based on your business needs. ## 5. More Usage The fields below are used to add time, categories, isolation, and business context when adding messages. You can use them separately or combine them as needed. ### `chat_time`: specify when the conversation happened By default, MemOS uses the Beijing time when a message is submitted as the memory time. If you are importing historical conversations in bulk, pass `chat_time` for each message so generated memories keep a more accurate timeline. ```python data = { "user_id": "memos_user_123", "conversation_id": "0930", "messages": [ {"role": "user", "content": "I like spicy food.", "chat_time": "2025-09-12 08:00:00"}, {"role": "assistant", "content": "I have noted that you like spicy flavors.", "chat_time": "2025-09-12 08:01:00"}, {"role": "user", "content": "I do not like heavy oil.", "chat_time": "2025-09-25 12:00:00"}, {"role": "assistant", "content": "Got it. You prefer spicy food with a lighter taste.", "chat_time": "2025-09-25 12:01:00"} ] } ``` ### `content`: write user preferences or behavior data Besides conversations, user preferences, behavior data, questionnaire information, and similar content can also be written to MemOS through `content`. ```python data = { "user_id": "memos_user_123", "conversation_id": "0901", "messages": [ { "role": "user", "content": """ Favorite movie genres: sci-fi, action, comedy Favorite TV genres: mystery, historical drama Favorite book genres: popular science, technology, self-growth Preferred chat style: humorous, warm, casual Types of AI help wanted: suggestions, information lookup, inspiration Topics I care about most: artificial intelligence, future technology, film reviews What I want AI to help with: daily study planning, movie and book recommendations, emotional companionship """ } ] } ``` ### `agent_id`: isolate memories by Agent When adding messages, pass `agent_id` to identify which Agent the current conversation belongs to. This helps distinguish memories produced by the same user under different Agents. ```python data = { "user_id": "memos_user_123", "conversation_id": "0610", "agent_id": "health_assistant", "messages": [ {"role": "user", "content": "I ran 5 kilometers today and my knee feels a bit sore."}, {"role": "assistant", "content": "I suggest lowering the intensity tomorrow."} ] } ``` > **Note**: During later retrieval, you can pass `"agent_id":"health_assistant"` in the `filter` parameter to retrieve memories from this user's conversations with that assistant. See [Memory Filters](/memos_cloud/features/filters). ### `tags`: classify memories semantically MemOS automatically generates tags for every memory. If your business already has a tag system, you can also pass custom `tags` when adding messages, so memories better match your business classification. See [Custom Tags](/memos_cloud/features/custom_tags). ```python data = { "user_id": "memos_user_123", "conversation_id": "0610", "tags": ["exercise advice", "fitness planning"], "messages": [ {"role": "user", "content": "I ran 5 kilometers today and my knee feels a bit sore."}, {"role": "assistant", "content": "I suggest lowering the intensity tomorrow."} ] } ``` > **Note**: During later retrieval, you can pass `"tags":"exercise advice"` in the `filter` parameter to retrieve user memories around that tag. See [Memory Filters](/memos_cloud/features/filters). ### `info`: pass custom information When adding messages, include `info` to write structured information such as business scenario, source, or status. It can later be used for precise filtering during retrieval. Common fields include: | Field | Use | | --- | --- | | `business_type` | Business type | | `biz_id` | Unique business identifier | | `scene` | Business or conversation scenario | | `custom_status` | Custom status | You can also pass other custom key-value pairs. All fields can be stored and retrieved normally. ```python data = { "user_id": "memos_user_123", "conversation_id": "0610", "messages": [ {"role": "user", "content": "Help me find flights with suitable times."}, {"role": "assistant", "content": "I found several flights from Beijing to Shanghai."} ], "info": { "scene": "flight" } } ``` > **Note**: During later retrieval, you can pass `"scene":"flight"` in the `filter` parameter to retrieve user memories around that scenario. See [Memory Filters](/memos_cloud/features/filters). ## 6. Common Errors and Troubleshooting | Error Code | Common Cause | How to Fix | | --- | --- | --- | | `40000` | The request JSON structure is invalid, or a field type is incorrect | Check whether `messages` is an array, and whether `role` / `content` are inside each message object | | `40002` | A required field is empty | Check that `user_id`, `conversation_id`, and `messages` are all provided and non-empty | | `40011` | `conversation_id` is too long | Use a short ID. Do not put full conversations, user input, or JSON into `conversation_id` | | `40013` | Total `messages` length exceeds the limit | Split historical conversations and write them in multiple requests | | `40305` | A single request exceeds the token limit | Shorten the content in one write request and keep the key user facts and preferences first | | `40309` | Token usage exceeds the per-time-window limit | Lower concurrency and bulk import speed, then retry in batches | | `50143` / `50144` | Memory or message writing failed | Check the request content and retry later. If it persists, contact support | ## 7. More Features If you need more complex write methods, continue with these extended capabilities. - [Multimodal Messages](/memos_cloud/features/multimodal): Support text, images, documents, and other input content. - [Tool Memory](/memos_cloud/features/tool_calling): Write tool call processes and results into user memory. - [Async Mode](/memos_cloud/features/async_mode): Control how messages are processed after writing, suitable for different latency requirements. - [Knowledge Base Memories](/memos_cloud/features/knowledge_base): Write memories generated from messages into a specified knowledge base. --- # Search Memory (/memos_cloud/mem_operations/search_memory) ## 1. What Is Memory Retrieval? Memory retrieval means that when a user asks a question, MemOS recalls the most relevant and important memories from the memory store, combined with filters predefined by developers. The model can then refer to these memories when generating an answer, making the response more accurate, contextual, and aligned with the user. > **Note**: **Why memory retrieval is needed** > > - Get correct and reliable memories directly instead of rebuilding context from scratch. > - Use filters and other controls to keep recalled memories highly relevant to the current question. ## 2. Key Parameters - **Query (`query`)**: the user's question or statement used for retrieval. MemOS uses semantic matching to find related memories. - **Memory filter (`filter`)**: JSON-based logical conditions used to filter fields such as `agent_id`, `create_time`, `tags`, and `info`, narrowing the retrieval scope. You can also set separate filters for user memories, public memories, and knowledge base memories. - **Relevance threshold (`relativity`)**: controls how semantically relevant a recalled memory must be. The current default threshold is `0.45`; memories below this value are filtered out. ## 3. How It Works - **Query rewriting**: MemOS cleans and semantically enhances the natural-language query, supplementing key information and retrieval intent to improve retrieval accuracy. - **Memory recall**: the system retrieves candidate memories from available memory sources. - **Hybrid retrieval and ranking**: based on the rewritten query, the system generates embeddings and combines keyword retrieval with vector semantic retrieval, then ranks candidate memories by relevance. - **Memory filtering and screening**: structured filters and comparison operators narrow the retrieval scope; the configured relevance threshold controls result quality. - **Result deduplication**: candidate memories are deduplicated and semantically aggregated across sources. - **Memory output**: final results are returned according to the configured memory limit, usually within 600 ms, for later reasoning and answer generation. All of these steps are triggered by calling the `search/memory` API. You do not need to manually operate on user memories. ## 4. Quick Start ```python [Python (HTTP)] import requests API_KEY = "YOUR_API_KEY" BASE_URL = "https://memos.memtensor.cn/api/openmem/v1" data = { "query": "I want to travel during the National Day holiday. Please recommend a city I have not been to and a hotel brand I have not stayed at.", "user_id": "memos_user_123", "conversation_id": "0928" } res = requests.post( f"{BASE_URL}/search/memory", headers={"Authorization": f"Token {API_KEY}"}, json=data ) print(res.json()) ``` ```python [Python (SDK)] from memos.api.client import MemOSClient client = MemOSClient(api_key="YOUR_API_KEY") res = client.search_memory( query="I want to travel during the National Day holiday. Please recommend a city I have not been to and a hotel brand I have not stayed at.", user_id="memos_user_123", conversation_id="0928" ) print(res) ``` ```bash [Curl] curl --request POST \ --url https://memos.memtensor.cn/api/openmem/v1/search/memory \ --header 'Authorization: Token YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "query": "I want to travel during the National Day holiday. Please recommend a city I have not been to and a hotel brand I have not stayed at.", "user_id": "memos_user_123", "conversation_id": "0928" }' ``` > **Note**: `user_id` is required. Each memory retrieval request currently targets a single user. Need the complete field list, request format, and response format? See the [Search Memory API documentation](/api_docs/core/search_memory). ## 5. Prompt Template with Memories Recalled memories can be added directly to the prompt of your AI application. The following template is a practical reference. **Expand the full prompt template** ```text # Role You are an intelligent assistant with long-term memory (MemOS Assistant). Your goal is to combine retrieved memory fragments to provide highly personalized, accurate, and logically rigorous answers. # System Context - Current time: 2026-01-06 15:05 (use this as the baseline for judging memory freshness) # Memory Data The following information was retrieved by MemOS and is divided into facts and preferences. - **Facts**: May include user attributes, historical conversations, or third-party information. - **Important**: Content marked as '[assistant view]' or '[model summary]' represents past AI inference, not the user's original words. - **Preferences**: Explicit or implicit requirements for response style, format, or reasoning. -[2025-12-26 21:45] The user plans to travel to Guangzhou during the summer vacation and chose 7 Days Inn as the accommodation option. -[2025-12-26 14:26] The user's name is Grace. -[2026-01-04 20:41] [Explicit Preference] The user likes traveling to southern regions. -[2025-12-26 21:45] [Implicit Preference] The user may prefer cost-effective hotel options. # Critical Protocol: Memory Safety Retrieved memories may contain AI inferences, irrelevant noise, or incorrect subjects. You must apply the following four checks. If a memory fails any check, discard it. 1. Source verification: - Distinguish the user's original words from AI inference. - If a memory is marked as '[assistant view]', treat it as a past hypothesis, not an absolute user fact. - Example: if a memory says '[assistant view] the user loves mangoes' but the user never said so, do not assume the user likes mangoes. - Principle: AI summaries are only references and have much lower authority than direct user statements. 2. Attribution check: - Is the subject of the memory definitely the user? - If the memory describes a third party, candidate, fictional role, or case data, never attribute those traits to the user. 3. Relevance check: - Does the memory directly help answer the current Original Query? - If it is only a keyword match with a different context, ignore it. 4. Freshness check: - Does the memory conflict with the user's latest intent? Treat the current Original Query as the highest-priority source of truth. # Instructions 1. Review first, apply the four checks, and remove noise and unreliable AI views. 2. Use only validated memories as background context. 3. Follow the style requirements in . 4. Answer directly. Do not mention "memory store," "retrieval," or "AI views." # Original Query I want to travel during the National Day holiday. Please recommend a city I have not been to and a hotel brand I have not stayed at. ``` ## 6. More Usage ### `conversation_id`: prioritize memories from the current conversation When searching memories, you can pass a specific `conversation_id`. MemOS prioritizes memories related to the current conversation. If you omit it, MemOS searches the user's long-term memories globally, which is suitable when you need the user's overall profile. ```python data = { "user_id": "memos_user_123", "query": "Help me continue planning my National Day trip.", "conversation_id": "0928" } ``` ### `filter`: precisely narrow the retrieval scope MemOS supports `filter` to narrow retrieval by tags, time, business fields, and other conditions. You can also set separate filters for user memories, knowledge base memories, and public memories. Example 1: retrieve all conversation memories related to reading in 2025. ```python data = { "user_id": "memos_user_123", "query": "Summarize my reading-related points this year.", "filter": { "and": [ {"tags": {"contains": "reading"}}, {"create_time": {"gte": "2025-01-01"}}, {"create_time": {"lte": "2025-12-31"}}, {"scene": "chat"}, ], }, } ``` Example 2: filter knowledge base, user, and public memories separately. ```python data = { "user_id": "memos_user_123", "query": "Combine knowledge base policies, my conversation records, and project announcements to summarize compliance points.", "knowledgebase_ids": ["kb_xxx"], "filter": { "knowledgebase": { "and": [ {"tags": {"contains": "policy"}}, {"create_time": {"gte": "2025-01-01"}}, {"create_time": {"lte": "2025-12-31"}}, ] }, "user": { "and": [ {"agent_id": "compliance_assistant"}, {"scene": "chat"}, {"create_time": {"gte": "2025-06-01"}}, ] }, "public": { "and": [ {"tags": {"contains": "announcement"}}, ] }, }, } ``` > **Note**: For more filter conditions and nested syntax, see [Memory Filters](/memos_cloud/features/filters). ### `relativity` / `memory_limit_number`: control recall quality and quantity Pass `relativity` to raise the relevance threshold. Pass `memory_limit_number` to limit the number of returned memories and reduce the token cost of later prompt injection. ```python data = { "user_id": "memos_user_123", "query": "Plan a 5-day trip to Chengdu for me.", "relativity": 0.8, "memory_limit_number": 9 } ``` ## 7. Common Errors and Troubleshooting | Error Code | Common Cause | How to Fix | | --- | --- | --- | | `40000` | The request body structure is invalid, or a field type is incorrect | Check whether `query` is a string and whether `knowledgebase_ids` is a string array | | `40002` | A required field is missing | Check that both `user_id` and `query` are provided and non-empty | | `40011` | `conversation_id` is too long | Use a short ID. Do not put the full question or chat history into `conversation_id` | | `40012` | `relativity` is invalid | Pass a number between 0 and 1 | | `40305` | A single request exceeds the token limit | Shorten `query`; do not put long documents directly into the search query | | `50123` | The knowledge base is not associated with the current project | Go to [Project Configuration](/api_docs/start/configuration) and confirm the knowledge base is associated with the project that owns the API Key | | `50005` | Search service is temporarily unavailable | Retry later. If it persists, contact support | ## 8. More Features - [Recall Tool Memories](/memos_cloud/features/tool_calling): Add tool call information and recall tool memories. - [Recall Skills](/memos_cloud/features/self-evolving): Automatically generate skills and recall reusable Skill memories. - [Search Knowledge Bases](/memos_cloud/features/knowledge_base): Enable knowledge bases and specify which knowledge bases can be searched. --- # Delete Memory (/memos_cloud/mem_operations/delete_memory) ## 1. Choose a Deletion Method MemOS Cloud supports two deletion methods. Pass `memory_ids[]` to delete specific memories, or pass `user_id` to delete all memories for a user. Choose one based on your scenario. | Deletion method | Field | Use case | Notes | | :--- | :--- | :--- | :--- | | Delete specific memories | `memory_ids[]` | Delete one or more memories | The `memory_id` comes from the `id` field returned by `search/memory` or `get/memory` | | Delete user memories | `user_id` | Clear all memories for a user in the current project | This deletes all fact, preference, skill, tool, and other memories for that user | > **Warning**: Note > > - Deletion only applies within the project scope of the current API Key. Before deleting, make sure the API Key, project, and target memories belong to the same project. Otherwise, authentication may fail or the `memory_id` may not exist. > > - Do not pass `conversation_id`, `knowledgebase_id`, or other IDs. Deletion by those dimensions is not supported. ## 2. Delete Specific Memories Use specific-memory deletion when you need to remove a memory that was written incorrectly, is outdated, was assigned to the wrong user, or must be deleted at the user's request. ### 2.1 Find the Memory to Delete When you call [Search Memory](/memos_cloud/mem_operations/search_memory) or get memories and find a memory that should be deleted, copy the `id` of that memory. This value is the `memory_id` used for deletion. ```json { "memory_detail_list": [ { "id": "e2a7c194-7062-4fa5-a6c0-bbe554d05d60", "memory_key": "User ice cream preference", "memory_value": "[user opinion] The user likes ice cream.", "memory_type": "WorkingMemory", "memory_time": null, "conversation_id": "0610", "status": "activated", "confidence": 0, "tags": [ "food", "preference", "ice cream" ], "update_time": 1761315278665, "relativity": 0.7524414 } ] } ``` ### 2.2 Call the Delete API ```python [Python (HTTP)] import requests API_KEY = "YOUR_API_KEY" BASE_URL = "https://memos.memtensor.cn/api/openmem/v1" data = { "memory_ids": ["6b23b583-f4c4-4a8f-b345-58d0c48fea04"] } res = requests.post( f"{BASE_URL}/delete/memory", headers={"Authorization": f"Token {API_KEY}"}, json=data ) print(res.json()) ``` ```python [Python (SDK)] from memos.api.client import MemOSClient client = MemOSClient(api_key="YOUR_API_KEY") res = client.delete_memory( memory_ids=["6b23b583-f4c4-4a8f-b345-58d0c48fea04"] ) print(res) ``` ```bash [Curl] curl --request POST \ --url https://memos.memtensor.cn/api/openmem/v1/delete/memory \ --header 'Authorization: Token YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "memory_ids": ["6b23b583-f4c4-4a8f-b345-58d0c48fea04"] }' ``` ### 2.3 Verify the Result The API returns `data.success: true` when the delete request succeeds. After deletion, call [Search Memory](/memos_cloud/mem_operations/search_memory) again to check whether the memory still appears. ```json { "code": 0, "data": { "success": true }, "message": "ok" } ``` ## 3. Delete All Memories for a User When you need to clear all memories for a user in the current project, pass `user_id`. > **Warning**: Before running this operation, make sure: > > - `user_id` is the end-user ID whose memories you want to clear. > - The API Key belongs to the correct project. > - You no longer need this user's fact, preference, skill, tool, or other memories in the current project. ```python [Python (HTTP)] import requests API_KEY = "YOUR_API_KEY" BASE_URL = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123" } res = requests.post( f"{BASE_URL}/delete/memory", headers={"Authorization": f"Token {API_KEY}"}, json=data ) print(res.json()) ``` ```python [Python (SDK)] from memos.api.client import MemOSClient client = MemOSClient(api_key="YOUR_API_KEY") res = client.delete_memory(user_id="memos_user_123") print(res) ``` ```bash [Curl] curl --request POST \ --url https://memos.memtensor.cn/api/openmem/v1/delete/memory \ --header 'Authorization: Token YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "user_id": "memos_user_123" }' ``` After deleting all memories for a user, search again with the same `user_id` for that user's previous facts or preferences to confirm that old memories are no longer returned. ## 4. Delete Memories in the Console If you only need to delete one or a few memories temporarily, you can delete them directly in the console: 1. Log in to the [MemOS Console](https://memos-dashboard.openmem.net/en/memoryList) and make sure the current project is the one you want to operate on. 2. Go to "Memory List" and use the search box, subject ID, subject type, or time range to find the target memory. 3. Click "View Details" and check the memory ID and memory content. Make sure it is not a similar memory under the same user. 4. Click "Delete" and confirm in the second confirmation dialog. To delete multiple memories, select multiple records first, then click "Batch Delete". 5. Refresh the list after deletion, or search again with the same conditions to confirm that the memory no longer appears. ![Delete memories in the console](https://cdn.memtensor.com.cn/img/1781505894179_et8gm6_compressed.png) ## 5. Common Errors and Troubleshooting | Error code | Common cause | How to fix | | :--- | :--- | :--- | | `40000` | The request body is invalid, or unsupported fields are passed together | Pass only one of `memory_ids` or `user_id`; `memory_ids` must be a non-empty string array | | `40002` | A required field is empty | Check whether `memory_ids` / `user_id` is missing, or whether you passed an empty string or empty array | | `40103` / `40132` | The API Key is invalid, expired, or cannot access the current project | Go back to [Project Configuration](/api_docs/start/configuration) and check whether the current project matches the API Key | | `40306` | Delete-memory authentication failed | Make sure the memory belongs to the project of the current API Key and that you have permission to delete it | | `40307` | The `memory_id` does not exist | Get the latest `id` from `search/memory` or `get/memory`; do not use `conversation_id`, `user_id`, or a knowledge base ID | | `40308` | The `user_id` does not exist | Confirm that this user has written memories in the current project | For more error code details, see [Error Codes](/api_docs/help/error_codes). Need the complete field list, request format, and response format? See the [Delete Memory API documentation](/api_docs/core/delete_memory). --- # Add Feedback (/memos_cloud/mem_operations/add_feedback) ## 1. When Should You Add Feedback? MemOS feedback receives natural-language feedback from users about model answers, knowledge content, or historical memories, then automatically corrects and updates memories. You do not need to manually locate a specific memory item. Just pass the user's feedback to `add/feedback`. | Comparison | Natural-language feedback | Direct memory edit | | --- | --- | --- | | How to use | Describe the problem or correction in natural language | Specify a memory item and edit it directly | | User barrier | Low, suitable for non-technical users | Higher, usually handled by developers or admins | | System role | The system parses, locates, links, and updates automatically | Humans lead the update | | Typical use | Conversation correction, outdated knowledge, business rule changes | Precise revision, structured maintenance | ## 2. Key Parameters - **Feedback content (`feedback_content`)**: the user's natural-language feedback on a model answer, knowledge content, or memory result. - **User ID (`user_id`)**: the unique user identifier associated with the feedback. - **Conversation ID (`conversation_id`)**: the unique conversation identifier associated with the feedback, used to provide context. - **Knowledge base scope (`allow_knowledgebase_ids`)**: the list of knowledge bases that new memories from this feedback can be written into. ## 3. How It Works In a chatbot scenario, the user can click "report an issue" below a model answer, enter feedback, and submit it. ![Feedback UI](https://cdn.memtensor.com.cn/img/1770716602140_1z3yi5_compressed.png) Based on the feedback content, your backend calls the MemOS `add/feedback` API and triggers a memory update. - **Validity analysis**: parse the feedback with the current conversation context and decide whether it is valid and related to the conversation. - **Update type recognition**: classify the requested update as keyword replacement or semantic update. - **Memory update**: write new memories and update or override existing memories that are conflicting, outdated, or corrected. ## 4. Quick Start ### Semantic Update of Knowledge Base Memory When enterprise policies, knowledge base content, or business rules change, you can pass the user's natural-language feedback directly to MemOS. The system generates a new high-priority memory. #### Submit Natural-language Feedback A finance manager gives feedback in the conversation: the purchase limit for office software should be 600 CNY, not 800 CNY. ```python [Python (HTTP)] import requests API_KEY = "YOUR_API_KEY" BASE_URL = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "memos_feedback_conv", "feedback_content": "The purchase limit for office software is 600 CNY, not 800 CNY.", "allow_knowledgebase_ids": ["basee5ec9050-c964-484f-abf1-ce3e8e2aa5b7"] } res = requests.post( f"{BASE_URL}/add/feedback", headers={"Authorization": f"Token {API_KEY}"}, json=data ) print(res.json()) ``` ```python [Python (SDK)] from memos.api.client import MemOSClient client = MemOSClient(api_key="YOUR_API_KEY") res = client.add_feedback( user_id="memos_user_123", conversation_id="memos_feedback_conv", feedback_content="The purchase limit for office software is 600 CNY, not 800 CNY.", allow_knowledgebase_ids=["basee5ec9050-c964-484f-abf1-ce3e8e2aa5b7"] ) print(res) ``` ```bash [Curl] curl --request POST \ --url https://memos.memtensor.cn/api/openmem/v1/add/feedback \ --header 'Authorization: Token YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "user_id": "memos_user_123", "conversation_id": "memos_feedback_conv", "feedback_content": "The purchase limit for office software is 600 CNY, not 800 CNY.", "allow_knowledgebase_ids": ["basee5ec9050-c964-484f-abf1-ce3e8e2aa5b7"] }' ``` #### Verify the Update Through Search After feedback is processed, when another user searches for the software reimbursement policy, the result can include a new high-priority memory: the purchase limit for office software is 600 CNY, not 800 CNY. ```python [Python (HTTP)] import requests API_KEY = "YOUR_API_KEY" BASE_URL = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "memos_feedback_check", "query": "Help me check the reimbursement limit for software purchases.", "knowledgebase_ids": ["basee5ec9050-c964-484f-abf1-ce3e8e2aa5b7"] } res = requests.post( f"{BASE_URL}/search/memory", headers={"Authorization": f"Token {API_KEY}"}, json=data ) print(res.json()) ``` ```python [Python (SDK)] from memos.api.client import MemOSClient client = MemOSClient(api_key="YOUR_API_KEY") res = client.search_memory( user_id="memos_user_123", conversation_id="memos_feedback_check", query="Help me check the reimbursement limit for software purchases.", knowledgebase_ids=["basee5ec9050-c964-484f-abf1-ce3e8e2aa5b7"] ) print(res) ``` ```bash [Curl] curl --request POST \ --url https://memos.memtensor.cn/api/openmem/v1/search/memory \ --header 'Authorization: Token YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "user_id": "memos_user_123", "conversation_id": "memos_feedback_check", "query": "Help me check the reimbursement limit for software purchases.", "knowledgebase_ids": ["basee5ec9050-c964-484f-abf1-ce3e8e2aa5b7"] }' ``` ##### Output ```python "memory_detail_list": [ { "id": "8a4f3d2e-c417-4e53-bc25-54451abd5ac8", "memory_key": "Software purchase reimbursement policy trial version", "memory_value": "The policy requires the purchase limit for office software to be 800 CNY and applies to document editing and spreadsheet tools.", "memory_type": "LongTermMemory", "conversation_id": "default_session", "relativity": 0.8931847 }, { "id": "a72a04d1-d7ba-4ebd-9410-0097bfa6c20d", "memory_key": "Office software purchase limit", "memory_value": "The user confirmed that the purchase limit for office software is 600 CNY, not 800 CNY.", "memory_type": "WorkingMemory", "conversation_id": "memos_feedback_conv", "relativity": 0.7196722 } ] ``` The [Knowledge Base console](https://memos-dashboard.openmem.net/knowledgeBase/) also shows knowledge base memories corrected or supplemented through natural-language interaction. ![Knowledge base correction](https://cdn.memtensor.com.cn/img/1765970178683_5tuxe4_compressed.png) ### Keyword Replacement Memory If the user clearly indicates that a name, rule, or field should be replaced globally, you can also describe the replacement intent in natural language. ```python data = { "user_id": "memos_user_123", "conversation_id": "memos_feedback_conv", "feedback_content": "From now on, I changed my name. Replace User 1 with User 2 everywhere.", "allow_knowledgebase_ids": ["basee5ec9050-c964-484f-abf1-ce3e8e2aa5b7"] } ``` ## 5. Common Errors and Troubleshooting | Error Code | Common Cause | How to Fix | | --- | --- | --- | | `40000` | The request JSON structure is invalid, or a field type is incorrect | Check whether `feedback_content` is a string and whether `allow_knowledgebase_ids` is a string array | | `40002` | A required field is empty | Check that `user_id`, `conversation_id`, and `feedback_content` are all provided and non-empty | | `40011` | `conversation_id` is too long | Use a short ID. Do not put full conversations, user input, or JSON into `conversation_id` | | `40305` | A single request exceeds the token limit | Shorten the feedback content and keep the key correction information | | `40309` | Token usage exceeds the per-time-window limit | Lower feedback write concurrency and retry in batches | | `50123` | The knowledge base is not associated with the current project | Go to [Project Configuration](/api_docs/start/configuration) and confirm the knowledge base is associated with the project that owns the API Key | | `50145` | Failed to save feedback and write memory | Check the request content and retry later. If it persists, contact support | Need the complete field list, request format, and response format? See the [Add Feedback API documentation](/api_docs/message/add_feedback). --- # Continuous Conversation Chat (/memos_cloud/mem_operations/chat) ## 1. When to Use the Chat API The Chat API is suitable for quickly building AI conversation applications with long-term memory. You only pass the user's current message; MemOS automatically handles memory recall, prompt assembly, model response generation, and conversation writing. - **Integrated conversational AI**: one API completes conversation generation without a complex custom pipeline. - **Automatic memory handling**: automatically extracts, updates, and retrieves memories, reducing manual maintenance. - **Continuous context**: keeps understanding coherent across turns, days, and even sessions. ## 2. Compared with Memory Operation APIs **Use Chat**: Best for general AI conversations, business PoCs, and quick validation **Use Memory Operation APIs**: Best for complex Agents and deeper business-system integration | Dimension | Chat API | Memory operation APIs | | --- | --- | --- | | Integration complexity | Low, ready to use | Medium, requires orchestration | | Memory management | Automatic | Manually add, search, and assemble | | Model response | Generated by MemOS built-in model | Call your own external model | | Control | Good for common configuration | Good for complex pipelines and fine-grained control | ## 3. How It Works ![Chat API flow](https://cdn.memtensor.com.cn/img/1765973438090_tskx7x_compressed.png) 1. If historical user messages exist, call `add/message` to write them into MemOS first. 2. When the end user sends a message, your AI application calls `chat` with the user message and related parameters. 3. MemOS recalls historical memories related to the current user message and assembles custom instructions, current conversation context, and user memories. 4. MemOS calls the model to generate an answer and returns the result to your AI application. 5. By default, MemOS asynchronously processes the user message and model response in the background and writes them as memories. ## 4. Quick Start ### Optional: add historical messages If you already have conversation history, call `add/message` first. For a new user or a new conversation, skip this step and call Chat directly. ```python [Python (HTTP)] import requests API_KEY = "YOUR_API_KEY" BASE_URL = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "0610", "messages": [ {"role": "user", "content": "I booked a summer trip to Guangzhou. Which hotel chains are available?"}, {"role": "assistant", "content": "You can consider 7 Days Inn, Ji Hotel, Hilton, and others."}, {"role": "user", "content": "I'll choose 7 Days Inn."}, {"role": "assistant", "content": "Got it. Feel free to ask if you have other questions."} ] } res = requests.post( f"{BASE_URL}/add/message", headers={"Authorization": f"Token {API_KEY}"}, json=data ) print(res.json()) ``` ```python [Python (SDK)] from memos.api.client import MemOSClient client = MemOSClient(api_key="YOUR_API_KEY") messages = [ {"role": "user", "content": "I booked a summer trip to Guangzhou. Which hotel chains are available?"}, {"role": "assistant", "content": "You can consider 7 Days Inn, Ji Hotel, Hilton, and others."}, {"role": "user", "content": "I'll choose 7 Days Inn."}, {"role": "assistant", "content": "Got it. Feel free to ask if you have other questions."} ] res = client.add_message( messages=messages, user_id="memos_user_123", conversation_id="0610" ) print(res) ``` ### Call Chat When you call `chat`, MemOS automatically retrieves relevant memories and generates an answer. ```python [Python (HTTP)] import requests API_KEY = "YOUR_API_KEY" BASE_URL = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "0928", "query": "I want to travel during National Day. Recommend a city I have not visited and a hotel brand I have not stayed at." } res = requests.post( f"{BASE_URL}/chat", headers={"Authorization": f"Token {API_KEY}"}, json=data ) print(res.json()) ``` ```python [Python (SDK)] from memos.api.client import MemOSClient client = MemOSClient(api_key="YOUR_API_KEY") res = client.chat( user_id="memos_user_123", conversation_id="0928", query="I want to travel during National Day. Recommend a city I have not visited and a hotel brand I have not stayed at." ) print(res) ``` ```bash [Curl] curl --request POST \ --url https://memos.memtensor.cn/api/openmem/v1/chat \ --header 'Authorization: Token YOUR_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "user_id": "memos_user_123", "conversation_id": "0928", "query": "I want to travel during National Day. Recommend a city I have not visited and a hotel brand I have not stayed at." }' ``` For the full field list, request format, and response format, see the [Chat API documentation](/api_docs/chat/chat). ## 5. Limits - Input limit: 8,000 tokens. - Output limit: up to 25 fact memories and up to 25 preference memories can be recalled. ## 6. More Usage Options The Chat API works out of the box. The following parameters are optional and only needed when you want to control memory recall, model responses, or memory writing. ### Control memory recall scope Use these fields to control which memories are considered and how many are recalled: - `filter`: filter memories by tags, time, business fields, and other conditions. - `knowledgebase_ids`: specify which knowledge bases Chat can search. - `relativity`: control the relevance threshold for recalled memories. - `memory_limit_number`: limit the number of fact memories passed to the model. ```python data = { "user_id": "memos_user_123", "conversation_id": "0928", "query": "Use the knowledge base to summarize travel reimbursement rules.", "knowledgebase_ids": ["kb_xxx"], "filter": { "and": [ {"tags": {"contains": "travel"}}, {"create_time": {"gte": "2025-01-01"}} ] }, "relativity": 0.8, "memory_limit_number": 9 } ``` ### Control model response behavior Use these fields to specify the model, enable streaming, or adjust generation parameters: - `model_name`: specify the conversation model. - `stream`: control whether to stream the response. - `temperature`: control randomness. - `top_p`: control candidate token selection. - `max_tokens`: limit the maximum generated length. ```python data = { "user_id": "memos_user_123", "conversation_id": "0928", "query": "Summarize my travel preferences in a concise tone.", "model_name": "qwen2.5-72b-instruct", "stream": False, "temperature": 0.7, "top_p": 0.95, "max_tokens": 1024 } ``` To fully customize model behavior, pass `system_prompt` to override the default system prompt. ### Control whether new memories are written automatically By default, Chat writes the current user message and model response into memory. If you only want to generate an answer and do not want this turn to enter memory processing, pass: - `add_message_on_answer`: whether to write this user message and model response into memory. ```python data = { "user_id": "memos_user_123", "conversation_id": "0928", "query": "Answer this once, but do not write this turn into memory.", "add_message_on_answer": False } ``` For ordinary conversations, you can ignore these fields. When you want new memories generated by Chat to carry business ownership or control where they are written, use: - `agent_id`: mark which Agent the conversation belongs to. - `app_id`: mark which application the conversation comes from. - `tags`: add tags for future retrieval and filtering. - `info`: write custom business metadata such as scene, order ID, or status. - `allow_public`: whether to allow writing to project-level public memory. - `allow_knowledgebase_ids`: which knowledge bases can be written. ## 7. Common Errors and Troubleshooting | Error Code | Common Cause | How to Fix | | --- | --- | --- | | `40000` | The request JSON structure is invalid, or a field type is incorrect | Check whether `query` is a string, `filter` is an object, and `knowledgebase_ids` / `allow_knowledgebase_ids` are string arrays | | `40002` | A required field is empty | Check that `user_id`, `conversation_id`, and `query` are all provided and non-empty | | `40010` | `user_id` is too long | Use a stable and shorter end-user ID. The length cannot exceed 100 characters | | `40011` | `conversation_id` is too long | Use a short conversation ID. Do not put full conversations, user input, or JSON into `conversation_id` | | `40301` / `40305` | Input content or request tokens exceed the limit | Shorten `query`, `system_prompt`, and filter conditions. Do not put long documents directly into the Chat request | | `40302` / `40303` | Generated content or chat length exceeds the model limit | Lower `max_tokens`, shorten the expected output, or split the request into multiple turns | | `50123` | The knowledge base is not associated with the current project | Go to [Project Configuration](/api_docs/start/configuration) and confirm the knowledge base is associated with the project that owns the API Key | | `50144` | Message writing after the Chat response failed | If `add_message_on_answer` is enabled, check the request content and retry later. If it persists, contact support | For more error code details, see [Error Codes](/api_docs/help/error_codes). For the full field list, request format, and response format, see the [Chat API documentation](/api_docs/chat/chat). --- # Memory Filters (/memos_cloud/features/filters) > **Warning**: Note > > You need to pass the relevant fields when calling the [Add Message API](/api_docs/core/add_message) before you can use them as filter conditions in the [Search Memory API](/api_docs/core/search_memory). > > This page focuses on the feature behavior. For complete API fields and limits, see the API documentation above. ## 1. When to Use Memory Filters When the memory set grows large, or when a single retrieval request accesses user memories, public memories, and Knowledge Base memories at the same time, you often need to narrow the candidate scope before MemOS performs semantic retrieval. Memory Filters are used for this precise pre-retrieval filtering step. Common scenarios include: - Retrieve only memories generated by a specific Agent or App. - Retrieve only memories created or updated within a specific time range. - Retrieve only memories that contain specific tags. - Filter by business fields written when adding messages, such as `scene`, `biz_id`, or `business_type`. - Apply different filters to user memories, public memories, and Knowledge Base memories. ## 2. How It Works 1. **Filter the scope first**: MemOS strictly filters candidate memories based on the conditions in `filter`. 2. **Then perform semantic retrieval**: Within the filtered candidates, MemOS runs [memory retrieval](/memos_cloud/mem_operations/search_memory) and returns the fragments most relevant to `query`. This means Filter is not keyword search. It is a scope control mechanism before retrieval. The stricter the filter, the fewer candidate memories enter semantic retrieval. ## 3. Two Filtering Modes ### Global Filter If you do not need to distinguish memory sources, you can put logical conditions directly at the root of `filter`. The condition applies to the memory scope involved in the current retrieval. ```json "filter": { "and": [ {"tags": {"contains": "reading"}}, {"create_time": {"gte": "2025-01-01"}}, {"create_time": {"lte": "2025-12-31"}}, {"scene": "chat"} ] } ``` ### Source-Specific Filter If one retrieval request accesses multiple memory types, you can set separate filter conditions for `user`, `public`, and `knowledgebase` inside `filter`. | Source | Description | | --- | --- | | `user` | User-specific memories accumulated from the user's conversation history | | `public` | Project-level public memories shared across users in the project | | `knowledgebase` | Knowledge Base memories created from uploaded documents or Skills | Source-specific filters are useful for more precise retrieval strategies, such as filtering user memories by recency, filtering Knowledge Base memories by tags, and leaving public memories unfiltered. Sources not included in `filter` will not receive extra filter conditions. ```json "filter": { "knowledgebase": { "and": [ {"tags": {"contains": "reading"}}, {"create_time": {"gte": "2025-01-01"}}, {"create_time": {"lte": "2025-12-31"}} ] }, "user": { "and": [ {"scene": "chat"}, {"create_time": {"gte": "2025-01-01"}} ] }, "public": { "and": [ {"tags": {"contains": "announcement"}} ] } } ``` > **Note**: Use either a global filter or a source-specific filter. If different memory sources need different conditions, prefer source-specific filtering. ## 4. Available Fields and Operators The root of each filter group must be `and` or `or`, followed by a list of field conditions. Specifying `user_id` in `filter` is not supported. ### Instance Fields For more details about these fields, see the advanced usage section in [Add Message](/memos_cloud/mem_operations/add_message). | Field | Type | Operator | Example | | --- | --- | --- | --- | | `agent_id` | string | `=` | `{"agent_id":"agent_123"}` | | `app_id` | string | `=` | `{"app_id":"app_123"}` | ### Metadata Fields When adding messages, you can write business metadata through `info`. During retrieval, use those fields directly by name in `filter`; do not wrap them in another `info` object. | Field | Type | Operator | Example | | --- | --- | --- | --- | | `business_type` | string | `=` | `{"business_type":"shopping"}` | | `biz_id` | string | `=` | `{"biz_id":"order_123456"}` | | `scene` | string | `=` | `{"scene":"payment"}` | | `custom_status` | string | `=` | `{"custom_status":"VIP3"}` | ```json // Recommended {"scene": "chat"} // Do not write it like this {"info": {"scene": "chat"}} ``` ### Tag Fields | Field | Type | Operator | Example | | --- | --- | --- | --- | | `tags` | list | `contains` | `{"tags": {"contains": "finance"}}` | ### Time Fields | Field | Type | Operator | Example | | --- | --- | --- | --- | | `create_time` | string | `lt`, `gt`, `lte`, `gte` | `{"create_time": {"gte": "2025-12-10"}}` | | `update_time` | string | `lt`, `gt`, `lte`, `gte` | `{"update_time": {"lte": "2025-12-10"}}` | ## 5. Usage Examples ### Filter Memories by Agent ```json "filter": { "or": [ {"agent_id": "agent_123"}, {"agent_id": "agent_456"} ] } ``` ### Filter Memories by Business Scenario ```json "filter": { "and": [ {"business_type": "travel"}, {"biz_id": "travel_001"}, {"scene": "payment"}, {"custom_status": "v1"} ] } ``` ### Filter by Tag and Date Range ```json "filter": { "and": [ {"tags": {"contains": "weather"}}, {"create_time": {"gte": "2025-12-01"}}, {"create_time": {"lte": "2025-12-31"}} ] } ``` ### Filter Different Memory Sources Separately The following example retrieves Knowledge Base memories tagged with "reading" and created in 2025, user memories where `scene=chat`, and public memories tagged with "announcement". ```json "filter": { "knowledgebase": { "and": [ {"tags": {"contains": "reading"}}, {"create_time": {"gte": "2025-01-01"}}, {"create_time": {"lte": "2025-12-31"}} ] }, "user": { "and": [ {"scene": "chat"} ] }, "public": { "and": [ {"tags": {"contains": "announcement"}} ] } } ``` --- # Async Mode (/memos_cloud/features/async_mode) > **Warning**: **[This article expands on the async mode in the [Add Memory - addMessage API], click here to view the detailed API documentation directly](/api_docs/core/add_message)** > **Note**: The `async_mode` parameter currently defaults to `true`. Memory addition operations are processed asynchronously by default, queued for background execution instead of waiting for processing to complete before returning a response. ## 1. Using Async Mode ### Processing Flow When the `async_mode` parameter is set to `true`, the API returns a response immediately and queues the memory for processing in the background: ```json { "code": 0, "data": { "success": true, "task_id": "c464e17e-f2ff-4e9a-a2c2-41cc55ab43b9", "status": "running" }, "message": "ok" } ``` In async mode, memory writing is divided into two stages: "Rough Processing" and "Refined Processing". The system first performs millisecond-level rough processing on the current turn of messages, enabling them to be quickly retrieved in the next turn of conversation; Subsequently, refined processing (taking seconds or more) continues in the background to improve memory quality. Processing progress can be queried via the [get/status](/api_docs/message/get_status) interface: during the rough processing stage, the task status is "running", and updates to "completed" after refined processing is finished. ```json "memory_detail_list": [ { "id": "c436a738-eec9-4010-b65d-dc9c135d3a37", "memory_key": "user: [09:44 AM on 10 December, 2025 UTC]: I've booked a trip to Guangzhou for the summer vacation. What chain hotels are available for accommodation?", "memory_value": "user: [09:44 AM on 10 December, 2025 UTC]: I've booked a trip to Guangzhou for the summer vacation. What chain hotels are available for accommodation?\nassistant: [09:44 AM on 10 December, 2025 UTC]: You can consider [7 Days Inn, Ji Hotel, Hilton], etc.\nuser: [09:44 AM on 10 December, 2025 UTC]: I'll choose 7 Days Inn\nassistant: [09:44 AM on 10 December, 2025 UTC]: Okay, let me know if you have any other questions.\n", "memory_type": "WorkingMemory", "create_time": 1765359875901, "update_time": 1765359875902, "conversation_id": "0610", "status": "activated", "confidence": 0.99, "relativity": 0.05407696, "tags": ["mode:fast"] } ] ``` Get the async task status via the [get/status](/api_docs/message/get_status) interface: ```python import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "task_id": "c464e17e-f2ff-4e9a-a2c2-41cc55ab43b9" } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/get/status" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` ### When to Use Async Mode * **Reduce Interface Response Latency**: Users do not need to wait and can continuously use memory within the application; * **Batch Add Memories**: Process large amounts of data simultaneously to avoid blocking the application; * **Background Task Processing**: Offload time-consuming memory processing operations to the background to improve system concurrency capabilities. > **Note**: Note > When a message contains multimodal content, since file memory processing takes a long time, the `async_mode` field you pass becomes invalid, and "Async Mode" is used by default. You can query the processing progress of file memory via the `get/status` interface. ## 2. Using Sync Mode ### Processing Flow When the `async_mode` parameter is set to `false`, the API returns the result after memory processing is completed: ```json { "code": 0, "data": { "success": true, "task_id": "c464e17e-f2ff-4e9a-a2c2-41cc55ab43b9", "status": "completed" }, "message": "ok" } ``` At this point, retrieving memory will return memories that have been fully processed: ```json "memory_detail_list":[ { "memory_key": "Summer Vacation Guangzhou Travel Plan", "memory_value": "The user plans to travel to Guangzhou during the summer vacation and has chosen 7 Days Inn as the accommodation option.", "conversation_id": "0610", "tags": [ "Travel", "Guangzhou", "Accommodation", "Hotel" ] } ], "preference_detail_list":[ { "preference_type": "implicit_preference", "preference": "The user may prefer high cost-performance hotel choices.", "reasoning": "7 Days Inn is usually known for being economical. The user's choice of 7 Days Inn may indicate a preference for high cost-performance options in accommodation. Although the user did not explicitly mention budget constraints or specific hotel preferences, choosing 7 Days Inn among the provided options may reflect an emphasis on price and practicality.", "conversation_id": "0610" } ] ``` ### When to Use Sync Mode * **Debugging and Development Phase**: View the results after memory processing directly, facilitating memory retrieval debugging; * **Instant Query**: Need to confirm that memory has been created or updated when the API call returns, such as in performance testing, functional verification, etc. * **Small-scale Operations**: Sync mode can be used when the data volume is small and latency impact is minimal. ### Important Notes * The default behavior for async processing is now `async_mode=true`. * If you need sync mode, please set `async_mode=false` when adding messages. --- # Multimodal Messages (/memos_cloud/features/multimodal) > **Warning**: **[This article expands on how to add multimodal data in the [Add Memory - addMessage API], click here to view the detailed API documentation directly](/api_docs/core/add_message)** MemOS supports not only text but also multimodal data, including documents and images. Users can seamlessly integrate text, documents, and images into their interactions with MemOS, enabling the system to extract relevant information from multiple media types, enrich memory content, and enhance the capabilities of the memory system. ## 1. How to Add Multimodal Messages > **Note**: Note > When a message contains multimodal content, since file memory processing takes a long time, the `async_mode` field you pass becomes invalid, and "Async Mode" is used by default. You can query the processing progress of file memory via the `get/status` interface. When a user uploads a document or image, MemOS extracts text, visual information, and other relevant details, and processes them into user memory. > **Note**: **Multimodal Messages and Tool Memory** > > In addition to processing document and image content, MemOS also supports processing tool calling information. When you add tool calling information to a message, the system processes it into tool memory, including Tool Schema and Tool Trajectory Memory. See [Tool Calling](/memos_cloud/features/tool_calling) for details. ### Add Message ```python import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "1211", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "I am studying MemOS." }, { "type": "image_url", "image_url": { "url": "https://cdn.memtensor.com.cn/img/1758706201390_iluj1c_compressed.png" } } ] }, {"role": "assistant", "content": "Okay, do you need any help?"} ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/message" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(json.dumps(res.json(), indent=2, ensure_ascii=False)) ``` ### Search Memory ```python import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "query": "Help me summarize this image", "user_id": "memos_user_123", "conversation_id": "1214" } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/search/memory" res = requests.post(url=url, headers=headers, data=json.dumps(data)) # Replace print part print("Result:") print(json.dumps(res.json(), indent=2, ensure_ascii=False)) ``` ### Output Result ```python { "code": 0, "data": { "memory_detail_list": [ { "id": "a5136287-de10-4df2-afc5-e412cdb8b649", "memory_key": "Studying MemOS", "memory_value": "The user is studying MemOS and shared a relevant image at 7:07 AM on December 18, 2025 (UTC).", "memory_type": "WorkingMemory", "create_time": 1766041646311, "conversation_id": "1211", "status": "activated", "confidence": 0.99, "tags": [ "Study", "MemOS", "Image Sharing" ], "update_time": 1766041689234, "relativity": 0.5170716 }, { "id": "4a1d42f4-c9fa-41bf-805d-2ea985bba984", "memory_key": "MemOS Feature Overview", "memory_value": "MemOS is an intelligent memory system capable of storing information by adding paths and retrieving information through query functions. The system supports various document formats, such as PDF and DOC, and utilizes AI for intelligent response and processing.", "memory_type": "WorkingMemory", "create_time": 1766041689091, "conversation_id": "1211", "status": "activated", "confidence": 0.99, "tags": [ "MemOS", "Intelligent Memory", "Information Storage", "Query Function", "image", "visual" ], "update_time": 1766041689234, "relativity": 0.38406307 } ], "preference_detail_list": [], "tool_memory_detail_list": [], "preference_note": "" }, "message": "ok" } ``` ## 2. Media Types MemOS currently supports the following media types: 1. **Images** - JPG, PNG, and other common image formats 2. **Documents** - PDF, DOCX, DOC, TXT, JSON, MD, XML ## 3. File Upload Limits 1. When adding messages, upload no more than 20 files per request, with a single file size not exceeding 100 MB and 200 pages. 2. When the number of files, single file size, or page count exceeds the above limits, the current task will be judged as "Processing Failed". You need to adjust according to the limit requirements and re-initiate the request. ## 4. Usage Examples ### Upload Image Message **Use Image URL** When adding a message, you can directly upload the image URL. ```python import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "1211", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://cdn.memtensor.com.cn/img/1758706201390_iluj1c_compressed.png" } } ] } ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/message" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(json.dumps(res.json(), indent=2, ensure_ascii=False)) ``` **Upload Local Image using Base64 Encoding** To upload a local image or embed an image directly, you can use Base64 image encoding. ```python import os import requests import json import base64 # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" # Path to the image file image_path = "path/to/your/image.jpg" # Encode image using Base64 with open(image_path, "rb") as image_file: base64_image = base64.b64encode(image_file.read()).decode("utf-8") data = { "user_id": "memos_user_123", "conversation_id": "1211", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"} } ] } ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/message" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(json.dumps(res.json(), indent=2, ensure_ascii=False)) ``` ### Upload Document Message **Use Document URL** ```python import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "1211", "messages": [ { "role": "user", "content": [ { "type": "file", "file": { "file_data": "https://cdn.memtensor.com.cn/file/MemOS 2.pdf" } } ] } ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/message" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(json.dumps(res.json(), indent=2, ensure_ascii=False)) ``` **Upload Local Document using Base64 Encoding** ```python import os import requests import json import base64 # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" # Path to the document file document_path = "path/to/your/document.pdf" # Function to convert file to Base64 string def file_to_base64(file_path): with open(file_path, "rb") as file: return base64.b64encode(file.read()).decode('utf-8') # Encode document using Base64 base64_document = file_to_base64(document_path) data = { "user_id": "memos_user_123", "conversation_id": "1211", "messages": [ { "role": "user", "content": [ { "type": "file", "file": {"file_data": base64_document} } ] } ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/message" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print("Result:") print(json.dumps(res.json(), indent=2, ensure_ascii=False)) ``` ### Complete Example Here is a complete example showing how to add conversation messages containing different media types between a user and an assistant: ```python import os import json import requests # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "1211", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "I am studying MemOS." } # Text message ] }, { "role": "user", "content": [ { "type": "image_url", "image_url": { "url": "https://cdn.memtensor.com.cn/img/1758706201390_iluj1c_compressed.png" } } # Upload image ] }, { "role": "user", "content": [ { "type": "file", "file": { "file_data": "https://cdn.memtensor.com.cn/file/MemOS 2.pdf" } } # Upload document ] } ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/message" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(json.dumps(res.json(), indent=2, ensure_ascii=False)) ``` --- # Custom Tags (/memos_cloud/features/custom_tags) > **Warning**: Note > > > > **[You need to pass the tag list when calling addMessage (Click here for detailed API documentation)](/api_docs/core/add_message)** > > > **[Only then can you use tags for filtering when calling searchMemory (Click here for detailed API documentation)](/api_docs/core/search_memory)** > > > > **This article focuses on functional description. For detailed API fields and limits, please click the text links above.** MemOS automatically generates tags for each memory, but these tags may not be completely consistent with the tags used in your business. You can pass a list of custom tags when adding messages, and MemOS will automatically apply relevant tags to the memory content based on the meaning of the tags you provide. > **Note**: When to use custom tags? > > You want MemOS to use the product team's existing tag system to label memory content. > > You need to apply these tags to generate structured content. ## 1. Tag Mechanism * **Automatic Tag Generation**: MemOS analyzes semantics when processing memories and automatically generates relevant tags for subsequent retrieval and filtering. * **Custom Tags**: When adding messages, you can pass a set of custom tags through the `tags` field as a candidate tag set. * **Semantic Matching**: MemOS will judge the semantic similarity between the memory content and the tag list provided by the developer, select matching tags from them, and write them into the `tags` field of the memory along with the system-generated tags. ## 2. Usage Example > **Note**: Tip > * Tag content should be concise while clearly distinguishing the meanings of different categories to facilitate identification and matching. > > * Use a unified list under the same project dimension and do not replace it easily to ensure consistency in retrieval and filtering. ## 3. Add Message ```python import os import json import requests # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "1210", "messages": [ {"role": "user","content": "How is the weather today?"}, {"role": "assistant","content": "Shanghai, December 10th, cloudy, temperature 8-12 degrees."} ], "tags":["Weather","Cloudy"], "async_mode":False } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/message" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` ### Search Memory ```python import os import json import requests # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "query": "Shanghai Weather" } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/search/memory" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` ### Output Result ```json "memory_detail_list": [ { "id": "9bc102cb-76d8-4a59-86d7-8fd1c4542407", "memory_key": "Weather Conditions", "memory_value": "On December 10, 2025, the weather in Shanghai was cloudy with temperatures between 8 and 12 degrees.", "memory_type": "WorkingMemory", "create_time": 1765376340736, "conversation_id": "1210", "status": "activated", "confidence": 0.99, "tags": [ "Weather", "Cloudy", "Temperature" ], "update_time": 1765376340737, "relativity": 0.82587826 } ] ``` --- # Self-Evolving (/memos_cloud/features/self-evolving) ## What is Self-Evolving? In the context of AI Agents, a **Skill** is a reusable task-handling method. It tells an Agent "what to do when it encounters a certain type of task", for example: - How to plan a trip - How to process a return ticket - How to generate a weekly report following company standards Skills help compensate for the fact that execution experience is hard to accumulate in long-running LLM applications: - Maintainable: turn stable real-world workflows into structured methods that can be iterated over time. - On-demand: let the Agent retrieve relevant skills for the current task, instead of placing every workflow into the context. - Personalized: turn different users' preferences, habits, and constraints into reusable execution methods. By distilling reusable Skills from conversations, MemOS enables memory self-evolution, continuously enhancing Agent capabilities. --- ## How MemOS Provides Self-Evolving Capabilities for Agents ### 1. Auto-generate Personalized Skills MemOS believes "memory is an asset". The solution paths and user preferences accumulated in real conversations are the most valuable raw material for skills. You do not need to prepare any files. As long as you add the original conversation history between the user and the Agent, MemOS **automatically extracts skills from user memories** and turns scattered interaction history into reusable, personalized professional capabilities. ### 2. Upload Custom Skills MemOS also supports uploading existing skill files directly. Upload a Markdown file or ZIP package to a knowledge base, and MemOS can return relevant skills to the Agent during retrieval. --- ## Auto-generate Personalized Skills ### How It Works ![image.png](https://cdn.memtensor.com.cn/img/1769759436251_3tx57c_compressed.png) The diagram above shows the full interaction flow between end users, the AI Agent you build, and MemOS: 1. Call the `add/message` API to send the user's conversation messages to MemOS. 2. After receiving the request, MemOS processes the messages in sequence and generates Skill files: a. **Intelligent chunking**: identify task boundaries in historical conversations and split them into task text chunks. b. **Cluster extraction**: cluster similar task text chunks and combine them with the user's historical memories to extract structured skill text. c. **Skill conversion**: convert the skill into a runnable and recognizable Skill file. 3. Call the `search/memory` API to retrieve memories. MemOS returns user facts, preferences, tool memories, and matching Skill files related to the current context in a unified response. 4. Download the Skill file and pass both memories and the Skill file to your self-hosted LLM, enabling effective use of long-term experience and automatically generated skills. The entire process does not require manually uploading any skill files. ### Travel Planning Example Using "Travel Planning" as an example to show how the same task generates different skills for different users. #### 1. Add Conversations The user expresses travel planning preferences in conversation: no backtracking, compact routes, cultural attractions, and weather checks in advance. ```python import os import json import requests os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_j", "conversation_id": "travel_0127", "messages": [ { "role": "user", "content": "I'm going to Chengdu next week for 5 days. I like intense, no-backtracking trips. Also mark the must-try food along the route." }, {"role": "assistant", "content": "...omitted..."}, { "role": "user", "content": "I prefer cultural attractions. Not interested in shopping malls." }, {"role": "assistant", "content": "...omitted..."}, { "role": "user", "content": "Check the weather and temperature in advance so I can pack properly." }, {"role": "assistant", "content": "...omitted..."} ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/message" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` #### 2. Retrieve Skills When the same user makes a similar request next time, pass `include_skill=true`: ```python data = { "query": "I'm planning a 7-day trip to Yunnan for the spring holiday.", "user_id": "memos_user_j", "conversation_id": "travel_0301", "include_skill": True } res = requests.post( url=f"{os.environ['MEMOS_BASE_URL']}/search/memory", headers=headers, data=json.dumps(data) ) print(f"result: {res.json()}") ``` #### 3. Generated Skill Example For the same "Travel Planning" task, MemOS does not apply one template to every user. It turns each user's long-term conversational preferences into a dedicated, reusable capability. As shown below, MemOS may generate a skill like this for the high-energy planner: ```markdown --- name: Travel Itinerary Planning description: Design multi-day itineraries for high-energy travelers, including efficient routes, cultural attractions, food spots, and weather-adapted suggestions. --- ## Procedure 1. Determine trip duration, destination, and user preferences 2. Collect cultural attractions, food spots, and transit info 3. Plan daily routes by area to avoid backtracking 4. Weave food spots into the transit route 5. Check weather forecast and adjust routes and packing advice ## Experience - Prioritize cultural attractions over shopping - Keep routes compact for high-energy travel - Plan each day geographically to move forward ## User Preferences - No backtracking - Prefers cultural attractions - Wants weather checked in advance ``` If another user is a "low-energy relaxed traveler" who mentions being a night owl, hating early mornings, not wanting long commutes, and preferring hidden gems, MemOS generates a noticeably different skill: ```markdown --- name: Travel Itinerary Planning description: Help low-energy travelers plan relaxed, flexible itineraries focused on afternoon and evening experiences. --- ## Procedure 1. Confirm user's energy level, wake-up time, and max commute tolerance 2. Prioritize nearby, easy-access spots that don't require early starts 3. Focus activities on afternoon, evening, and nighttime 4. Include hidden gems, avoid overly popular crowded routes 5. Keep flexible time slots for spontaneous changes ## Experience - Avoid scheduling early-morning activities - Avoid long commutes and packed schedules - Recommend places reachable by subway or short taxi rides ## User Preferences - Night owl, can't wake up early - Dislikes long commutes - Likes niche, less conventional experiences ``` --- ## Upload Custom Skills ### Customer Return Example When you already have a clear standard workflow, upload the skill file directly to a knowledge base. MemOS will retrieve and return relevant skills in a unified way. The following example uses "Customer Service Agent Return Processing" to walk through the full flow from skill upload to retrieval. #### 1. Upload to the Knowledge Base via API Upload a skill file that guides a customer service Agent to help users complete product returns. ```python [URL Upload] import os import json import requests os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } data = { "knowledgebase_id": "kb_xxx", # Replace with your knowledge base ID "file": [ { "type": "skill", "content": "https://cdn.memtensor.com.cn/file/SKILL.md" # Replace with your public file URL } ] } url = f"{os.environ['MEMOS_BASE_URL']}/add/knowledgebase-file" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` ```python [Base64 Upload] import os import json import base64 import requests os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" skill_markdown = """--- name: Customer Return Processing description: Guide customer service to handle user return requests following standard procedures --- ## Procedure 1. Verify user identity and order number 2. Confirm return reason meets policy requirements 3. Guide user to select return method (pickup / self-ship) 4. Generate return tracking number and notify user 5. Track logistics status and notify user upon refund completion ## Experience - No-reason returns accepted within 7 days of receipt - Fresh products do not support returns; use after-sales compensation - High-value items (>$70) require supervisor approval ## User Preferences - Recommend pickup service first to reduce user effort - Default refund to original payment method ## Examples ### Example 1: Standard product return User: I want to return the headphones I bought three days ago. Assistant: Got it. I've confirmed your order is within the 7-day no-reason return window. Would you prefer pickup or self-shipping? """ encoded_skill = base64.b64encode(skill_markdown.encode("utf-8")).decode("utf-8") data = { "knowledgebase_id": "kb_xxx", # Replace with your knowledge base ID "file": [ { "type": "skill", "name": "customer-return-sop.md", "content": f"data:text/markdown;base64,{encoded_skill}" } ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/knowledgebase-file" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` #### 2. Upload Skill Files via Dashboard Go to [Dashboard - Knowledge Base](https://memos-dashboard.openmem.net/knowledgeBase/), select the target knowledge base, click "Upload Document", and drag in your `.md` or `.zip` file. Select "Skill file" as the upload type. ![image](https://cdn.memtensor.com.cn/img/1778148193764_flblvp_compressed.png) #### 3. Retrieve Skills After upload succeeds, pass `knowledgebase_ids` and enable `include_skill` during retrieval. MemOS will return skills relevant to the query. As shown below, the Agent can follow the "Customer Return Processing" flow to guide the user through the return. ```python data = { "query": "The user wants to return headphones bought three days ago", "user_id": "memos_user_123", "conversation_id": "session_001", "knowledgebase_ids": ["kb_xxx"], "include_skill": True } res = requests.post( url=f"{os.environ['MEMOS_BASE_URL']}/search/memory", headers=headers, data=json.dumps(data) ) print(f"result: {res.json()}") ``` ### Skill File Specification #### 1. `.md` Single File - The constraints are as follows: - Size limit: ≤ 100KB - File content: must include `name` and `description` - Recommended body structure: ```text --- name: (Skill name) description: (One-sentence description of purpose and scenario) --- ## Procedure 1. Step one 2. Step two 3. Step three ## Experience - Experience or note one - Experience or note two ## User Preferences - Preference setting one - Preference setting two ## Examples ### Example 1: (Scenario description) (Complete input/output example) ## Additional Information (Additional notes, such as reference links or special rules) ``` #### 2. `.zip` Skill Package - The constraints are as follows: | Constraint | Requirement | | --- | --- | | Format | Standard ZIP, no rar/7z | | Zip size | ≤ 20MB | | File count after extraction | ≤ 200 | | Single file after extraction | ≤ 10MB | | SKILL.md | ≤ 100KB, `name`/`description` required; must be at the first level of the archive | - Recommended structure: ```text refund-sop-v1.zip ├── SKILL.md ├── references/ │ └── return_policy_summary.md ├── scripts/ │ └── check_order.py └── assets/ └── flowchart.png ``` --- ## How to Use Retrieved Skills ### Returned Skill Details Regardless of whether the skill is auto-generated or uploaded to a knowledge base, each skill in `skill_detail_list` contains two fields: | Field | Description | | --- | --- | | `skill_value` | Structured skill content; can be converted to a string and injected into the Agent's prompt | | `skill_url` | Download link for the skill file; for ZIP packages, the Agent can download scripts, references, and other attachments | ### Usage Reference Choose the usage method based on whether your Agent can use Skill files. #### 1. The Agent Supports Skill Files Provide `skill_url` to the Agent so it can download the file. You can write it into the prompt: ```python skill_detail = result["skill_detail_list"][0] skill_url = skill_detail.get("skill_url") system_prompt = f"""You are a customer service assistant. Use the following Skill file when handling the task: {skill_url} """ ``` #### 2. The Agent Does Not Support Skill Files Convert `skill_value` to a string and add it to the prompt: ```python skill_detail = result["skill_detail_list"][0] skill = str(skill_detail["skill_value"]) system_prompt = f"""You are a customer service assistant. Refer to this skill when handling the task: {skill} """ ``` > **Tip**: During retrieval, MemOS searches both auto-generated personal skills and uploaded knowledge base skills, returning them in a unified ranked list. You do not need to distinguish the source. --- **Start exploring MemOS Self-Evolving capabilities now!** - Go to the [Dashboard - Skills page](https://memos-dashboard.openmem.net/skill/) to view auto-generated skills. - Don't have any skills yet? [Add messages](/memos_cloud/mem_operations/add_message) to trigger generation. - Want to upload custom skills? Go to [Dashboard - Knowledge Base](https://memos-dashboard.openmem.net/knowledgeBase/) to upload. --- # Knowledge Base (/memos_cloud/features/knowledge_base) > **Warning**: **[This article is an introduction to the [MemOS Knowledge Base] feature. Click here to view the detailed API documentation directly.](/api_docs/knowledge/create_kb)** ## 1. MemOS Knowledge Base vs Traditional RAG MemOS Knowledge Base supports developers in integrating business knowledge into the long-term memory system of intelligent applications. The system uses uploaded documents as the underlying data source to build and maintain an independent memory layer, supporting natural language interaction applications such as Q&A. As end-users continue to use it, MemOS dynamically evolves and updates memories based on conversation content, thereby promoting automatic iteration and self-evolution of the knowledge base. Unlike the static storage of traditional RAG, MemOS makes the knowledge base part of "memory". AI applications with "memory" can not only query information more accurately but also better understand the background and the user. Let's look at two real scenarios to compare the two solutions: **Shopping Customer Service Robot** **Background** ```python DAY 1 User asks: I have a three-month-old Golden Retriever. Which dog food is better? By the way, it doesn't eat chicken flavor. DAY 1 User bought A lamb puppy food under the assistant's recommendation. DAY 10 User asks: The dog has diarrhea eating this dog food. I want to change to another one. ``` **RAG Solution** ```python # Retrieve snippets related to "puppy food recommendation" and "diarrhea" based on user input, but fail to recall "user's dog doesn't eat chicken flavor". Retrieved knowledge: 1. Common reasons for dog food diarrhea: Can switch to hypoallergenic dog food. 2. Hypoallergenic puppy food recommendation: B (Chicken), C (Salmon). # 🤦 Shopping Assistant: If diarrhea occurs now, you can try B (Chicken flavor), C (Salmon). ``` **MemOS Solution** ```python # Retrieve relevant memories based on user input, perceive that the user's dog is a three-month-old Golden Retriever puppy, dislikes chicken, and needs a recommendation for puppy food that won't cause diarrhea. Retrieved memories: 1. The user has a three-month-old Golden Retriever weighing about 12 catties. 2. The user's dog does not eat chicken-flavored dog food. 3. The user previously purchased lamb puppy food. 4. Common reasons for dog food diarrhea: Can switch to hypoallergenic dog food. 5. Hypoallergenic puppy food recommendation: B (Chicken), C (Salmon). # ✅ Shopping Assistant: You mentioned before that the dog is a 3-month-old Golden Retriever and cannot eat chicken flavor, so I have excluded all chicken-flavored puppy foods for you. You bought lamb + salmon puppy food before. If diarrhea occurs now, you can try the milder C (Salmon Puppy Hypoallergenic). ``` **Enterprise Knowledge Base Assistant** **Background** ```python DAY 1 Employee asks: My computer is a MacBook Pro 13-inch, Intel chip. How do I install the company intranet proxy? DAY 1 Assistant provided the installation steps for the Intel version. DAY 20 Employee asks: The intranet proxy won't open. Which version should I reinstall? ``` **RAG Solution** ```python # Retrieve content related to "intranet proxy" and "won't open" based on user input, but fail to recall "user's device model". Retrieved knowledge: 1. Intranet proxy common troubleshooting 2. Intranet proxy installation instructions for M1/M2 (ARM) versions 3. Windows intranet proxy client installation instructions 4. Network connection and certificate issues 5. General FAQ # 🤦 Knowledge Base Assistant: Please try to re-download and install the latest Mac M1/M2 (ARM) version or Windows intranet proxy client. Here are the installation steps: ... ``` **MemOS Solution** ```python # Retrieve memories related to "intranet proxy" and "won't open" based on the employee's question, automatically identify the employee's device model. Retrieved memories: 1. The user installed the company intranet proxy 20 days ago, and his device is MacBook Pro 13 (Intel). 2. Intranet proxy common troubleshooting 3. Intranet proxy installation instructions for Intel version # ✅ Knowledge Base Assistant: You are using a MacBook Pro with an Intel chip. It is recommended to reinstall the Intel version of the intranet proxy client. Here are the download link and installation steps for the Intel version: ... ``` > **Note**: ** Advantage Summary** > RAG is good at retrieving information semantically similar to the query from the knowledge base, but it is **stateless**: every query is independent, lacking understanding of the specific user and context. > > MemOS can understand information such as **relationships, time, and preferences**, linking current questions with historical memories, and finding and using knowledge with "context": > > * **Understand Users**: MemOS knows "who you are" and "what you are doing". Just ask a question, and MemOS will automatically complete the context. > > * **Personalization**: For different positions and work habits, MemOS can remember "this client dislikes overly aggressive sales", "you use Python more often than Java", "you consulted the reimbursement policy last time, do you need to enter the application process this time". > > * **Knowledge Evolution**: When there are "rules of thumb" not written in documents in the actual process, MemOS will precipitate them into new memories, continuously supplementing and perfecting the knowledge system. ## 2. How it Works 1. **Upload**: Create a knowledge base and upload documents via the console or API. 2. **Validation**: Complete authentication and verify compliance of document format, size, etc. 3. **Storage**: After successful upload, documents are saved by MemOS and enter the processing queue. 4. **Parsing**: Parse original document content according to different file types. 5. **Intelligent Segmentation**: Split documents into finer-grained content fragments based on title, structure, and semantics. 6. **Generate Memory**: MemOS generates knowledge memories based on fragments, forming a complete project memory bank together with user long-term memories. 7. **Embedding and Indexing**: Write all memory content into the database and establish embedding indexes to support millisecond-level retrieval. ## 3. Knowledge Base Requirements ### Capacity Limits MemOS Cloud Service currently offers multiple pricing plans from free to enterprise versions for all developers. Different versions have different limits on knowledge base capacity and quantity. > **Note**: Currently, all versions are free for a limited time. Visit [Official Website - Pricing](https://memos.openmem.net/pricing) to apply for the version that suits your needs. | **Version** | **Knowledge Base Storage Limit** | | ---------- | ----------------------------------------- | | **Free** | Knowledge Base Count: 10; Single KB Storage: 1G | | **Starter** | Knowledge Base Count: 30; Single KB Storage: 10G | | **Pro** | Knowledge Base Count: 100; Single KB Storage: 100G | > **Warning**:  Note > When your service level is downgraded, if the existing knowledge base exceeds the capacity limit of the current version, MemOS **will not clear existing knowledge base data**, but will restrict the following operations: > > * Cannot create new knowledge bases > * Cannot continue to upload new documents > > Relevant functions will be restored after usage is adjusted to within the capacity range of the current version. ### Document Limits 1. Supported document types: PDF, DOCX, DOC, TXT, JSON, MD, XML 2. Single file size limit: Not exceeding 100 MB, 500 pages 3. Single upload file quantity limit: Not exceeding 20 > **Warning**:  Note > When the number of files, single file size, or number of pages in a single upload exceeds the above limits, the upload task will be judged as **processing failed**. > Please adjust the files according to the requirements and re-initiate the upload request. ### Skill Document Limits In addition to regular knowledge documents, a knowledge base also supports uploading custom Skill documents. Skill documents are used to capture reusable task-handling workflows and can be returned to the Agent together with other memories during retrieval. The format requirements are as follows: 1. **Single `.md` file** * Size limit: ≤ 100KB * File content: must include `name` and `description` 2. **Skill package `.zip`** | Constraint | Requirement | | --- | --- | | Format | Standard ZIP, no rar/7z | | Zip size | ≤ 20MB | | File count after extraction | ≤ 200 | | Single file after extraction | ≤ 10MB | | SKILL.md | ≤ 100KB, `name`/`description` required, must be at the first level of the archive | ## 4. Usage Example Here is a complete knowledge base usage example to help you quickly get started with building your exclusive "Knowledge Base Assistant". ### Create Knowledge Base: Financial Reimbursement Knowledge Base ```python [Python (HTTP)] import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "knowledgebase_name": "Financial Reimbursement Knowledge Base", "knowledgebase_description": "Summary of all financial reimbursement related knowledge of the company" } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/create/knowledgebase" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` ```text [Output] "result": { "code": 0, "data": { "id": "idxxxxx" #Replace with the Knowledge Base ID created above }, "message": "ok" } ``` ### Upload Document: Software Procurement Reimbursement Policy ```python [Python (HTTP)] import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "knowledgebase_id": "idxxxxx", #Replace with the Knowledge Base ID created above "file": [ {"content": "https://cdn.memtensor.com.cn/file/Software_Procurement_Reimbursement_Policy.pdf"} ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/knowledgebase-file" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` ```text [Output] "result": { "code": 0, "data": [ { "id": "1f35642253606ed1e9dd8cd8113a8998", "name": "Software_Procurement_Reimbursement_Policy.pdf", "sizeMB": 0.06331157684326172, "status": "running" } ], "message": "ok" } ``` ### Add User Conversation > **Note**:  Session A: Occurred on 2025-06-10 > > The designer indicated in the chat that they are a [Designer in the Creative Platform Department]. > ```python import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "0610", "messages": [ { "role": "user", "content": "I am a designer in the Creative Platform Department." }, { "role": "assistant", "content": "Okay, I've noted that." } ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/message" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` ### Retrieve Knowledge Base Memory > **Note**:  Session A: Occurred on 2025-12-12 > > In a new session, the user asks about [Software Reimbursement Policy]. MemOS will automatically recall [Knowledge Base Memory: Software Reimbursement Policy Content] and [User Memory: Creative Platform Designer], thereby providing a more specific and "user-understanding" answer about software reimbursement. > ```python [Python (HTTP)] import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "1211", "query": "Check the software procurement reimbursement limit for me.", "knowledgebase_ids":["idxxxxx"] #Replace with the Knowledge Base ID created above } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/search/memory" res = requests.post(url=url, headers=headers, data=json.dumps(data)) # Prettify JSON output json_res = res.json() print(json.dumps(json_res, indent=2, ensure_ascii=False)) ``` ```text [Output] "memory_detail_list": [ { "id": "2c760355-de4b-4a8f-b98d-b92851d23fa7", "memory_key": "Software Procurement Reimbursement Policy (Trial Version)", "memory_value": "This policy aims to standardize the procurement and reimbursement process for various software in the company, requiring all software procurement to follow the procurement amount limits for specific categories. The procurement limit for design software is 1000 yuan, applicable to graphic design, video editing, and prototype design, with examples including Photoshop and Premiere. The procurement limit for code/development software is 1500 yuan, applicable to IDEs and development frameworks, with examples like PyCharm and Visual Studio. The procurement limit for office software is 800 yuan, applicable to document editing and spreadsheet processing, with examples including Office Suite and WPS. The procurement limit for data analysis software is 1200 yuan, applicable to data statistics and visualization, with examples including Tableau and Power BI. The procurement limit for security and protection software is 1000 yuan, applicable to antivirus and firewalls. The procurement limit for collaboration/project management software is 900 yuan, with examples including Jira and Slack. The procurement limit for special industry software is 2000 yuan, requiring special approval. All procurement must comply with company budget and information security requirements; software exceeding the limit requires a business explanation and special approval.", "memory_type": "WorkingMemory", "create_time": 1765525947718, "conversation_id": "default_session", "status": "activated", "confidence": 0.99, "tags": [ "Software Procurement", "Reimbursement Policy", "Approval Process", "Budget", "Information Security", "mode:fine", "multimodal:file" ], "update_time": 1765525947720, "relativity": 0.89308184 }, { "id": "81fd1e79-65be-4d4e-81e0-8f76ba697c55", "memory_key": "Position Information", "memory_value": "User is a designer in the Creative Platform Department.", "memory_type": "WorkingMemory", "create_time": 1765526247112, "conversation_id": "0610", "status": "activated", "confidence": 0.99, "tags": [ "Position", "Department", "Design" ], "update_time": 1765526247113, "relativity": 1.6319022e-05 } ] ``` ### Feedback to Optimize Knowledge Base In enterprises, it is common for company policies/knowledge to be updated while the knowledge base is not updated in time. Currently, MemOS supports feedback on knowledge base memories through **natural language conversation**, used to quickly update knowledge base memories, thereby improving accuracy and timeliness. Try it out, drive the knowledge base to always stay up-to-date with the simplest interaction method. > **Note**:  Session A: Occurred on 2025-12-12 > > In another new session, the financial supervisor provides feedback that [The procurement limit for office software is 600 yuan, not 800 yuan]. > ```python import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "1212", "feedback_content": "The procurement limit for office software is 600 yuan, not 800 yuan.", "allow_knowledgebase_ids":["idxxxxx"] #Replace with the Knowledge Base ID created above } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/feedback" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` > **Note**:  Session A: Occurred on 2025-12-12 > > When any other user searches for [Software Reimbursement Policy], they get a newly added high-weight memory [The procurement limit for office software is 600 yuan, not 800 yuan]. > ```python import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "1211", "query": "Check the software procurement reimbursement limit for me.", "knowledgebase_ids":["idxxxxx"] #Replace with the Knowledge Base ID created above } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/search/memory" res = requests.post(url=url, headers=headers, data=json.dumps(data)) # Prettify JSON output json_res = res.json() print(json.dumps(json_res, indent=2, ensure_ascii=False)) ``` The output result is as follows (Simplified): ```python "memory_detail_list": [ { "id": "8a4f3d2e-c417-4e53-bc25-54451abd5ac8", "memory_key": "Software Procurement Reimbursement Policy (Trial Version)", "memory_value": "This policy aims to standardize the procurement and reimbursement process for various software in the company, requiring all software procurement to follow the procurement amount limits for specific categories. The procurement limit for design software is 1000 yuan, applicable to graphic design, video editing, and prototype design, with examples including Photoshop and Premiere. The procurement limit for code/development software is 1500 yuan, applicable to IDEs and development frameworks, with examples like PyCharm and Visual Studio. The procurement limit for office software is 800 yuan, applicable to document editing and spreadsheet processing, with examples including Office Suite and WPS. The procurement limit for data analysis software is 1200 yuan, applicable to data statistics and visualization, with examples including Tableau and Power BI. The procurement limit for security and protection software is 1000 yuan, applicable to antivirus and firewalls. The procurement limit for collaboration/project management software is 900 yuan, with examples including Jira and Slack. The procurement limit for special industry software is 2000 yuan, requiring special approval. All procurement must comply with company budget and information security requirements; software exceeding the limit requires a business explanation and special approval.", "memory_type": "LongTermMemory", "create_time": 1765525947718, "conversation_id": "default_session", "status": "activated", "confidence": 0.99, "tags": [ "Software Procurement", "Reimbursement Policy", "Approval Process", "Budget", "Information Security", "mode:fine", "multimodal:file" ], "update_time": 1765525947720, "relativity": 0.8931847 }, { "id": "a72a04d1-d7ba-4ebd-9410-0097bfa6c20d", "memory_key": "Office Software Procurement Limit", "memory_value": "User confirmed that the procurement limit for office software is 600 yuan, not 800 yuan.", "memory_type": "WorkingMemory", "create_time": 1765531700539, "conversation_id": "1212", "status": "activated", "confidence": 0.99, "tags": [ "Procurement", "Office Software", "Budget" ], "update_time": 1765531700540, "relativity": 0.7196722 } ] ``` The [Console - Knowledge Base](https://memos-dashboard.openmem.net/knowledgeBase/) displays details of all corrections or completions of knowledge base memories through natural language interaction. ![image.png](https://cdn.memtensor.com.cn/img/1766634697599_d1j187_compressed.png) > **Note**: For a complete list of feedback API fields, formats, etc., please refer to [Add Feedback API Documentation](/api_docs/message/add_feedback). ## 5. More: Upload Skill Documents If you want the knowledge base to return not only knowledge content but also reusable task-handling workflows, you can upload Skill documents to the knowledge base. Unlike regular documents, Skill documents must be marked with `type: "skill"` when uploaded. After upload succeeds, pass `knowledgebase_ids` and enable `include_skill` during retrieval to return both knowledge base memories and matching skills. > **Note**: For the detailed workflow and usage examples of Skill files, see the complete examples in [Self-Evolving](/memos_cloud/features/self-evolving). ```python [Python (HTTP)] import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "knowledgebase_id": "idxxxxx", # Replace with your knowledge base ID "file": [ { "type": "skill", "content": "https://cdn.memtensor.com.cn/file/SKILL.md" } ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/knowledgebase-file" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` ## 6. Common Errors and Troubleshooting | Error Code | Common Cause | How to Fix | | --- | --- | --- | | `40002` / `40003` | Missing `knowledgebase_id`, `file`, or the file list is empty | Check whether the request body is complete. `file` must be a non-empty array | | `40007` / `50107` | Unsupported file type | Use PDF, DOCX, DOC, TXT, JSON, MD, or XML. For Skill files, use `.md` or `.zip` | | `40008` / `40009` | Invalid or malformed Base64 content | Make sure the Base64 content is not truncated and does not include invalid characters or an incorrect prefix | | `40305` | A single request exceeds the token limit | Split large files and reduce the content uploaded in one request | | `40309` | Token usage exceeds the per-time-window limit | Lower upload concurrency and retry in batches | | `50123` | The knowledge base is not associated with the current project | Go to [Project Configuration](/api_docs/start/configuration) and confirm the knowledge base is associated with the project that owns the API Key | --- # Tool Calling (/memos_cloud/features/tool_calling) > **Warning**: Note > > > > **[You must first pass tool memory when calling addMessage (Click here for detailed API documentation)](/api_docs/core/add_message)** > > > **[Only then can you search for tool memory when calling searchMemory (Click here for detailed API documentation)](/api_docs/core/search_memory)** > > > > **This article focuses on functional description. For detailed API fields and limits, please click the text links above.** ## 1. When to Use This message structure is suitable when your Agent needs to obtain external information through tools (function / tool), and you want these "tool calling contexts and results" to be understood, associated, and precipitated as retrievable memories by MemOS. ## 2. How it Works Step 1: Add Tool Calling Information `assistant` message: `tool_calls` describes the model's decision to call a tool and its parameters. `tool` message: Carries the actual tool execution result and precisely associates with the corresponding `tool_calls` via `tool_call_id`. Step 2: MemOS Processes Tool-Related Memory * **Tool Schema**: MemOS supports structured management and dynamic updates of tool information, unifying the description of different tools. This enables the model to efficiently retrieve, understand, and discover tools without hardcoding tool details in prompts. * **Tool Trajectory Memory**: MemOS extracts and stores key trajectories during tool usage, including "what tool was called in what context, what parameters were used, and what result was returned". These trajectories can be retrieved and reused in subsequent conversations, helping the model reproduce tool usage patterns more stably and reducing repetitive trial-and-error and calling errors. ## 3. Usage Example For a complete list of API fields, formats, etc., please refer to the [Add Message API Documentation](/api_docs/core/add_message) to see how to add tool calling information. ### Add Tool Calling Information > **Note**:  Session A: User asks [How is the weather in Beijing] in the conversation. The assistant calls the [Weather Tool]. The weather tool returns the result [Beijing, Temperature 7°C, Cloudy]. ```python import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" # Message sequence with tool_call tool_schema = [{ "name": "get_weather", "description": "Get current weather information for a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City name, e.g. Beijing" } }, "required": [ "location" ] } }] data = { "user_id": "memos_user_123", "conversation_id": "demo-conv-id", "messages": [ { "role": "system", "content": f"""You are an assistant that can call tools. When a user's request can be fulfilled by a tool, you MUST call the appropriate tool. {json.dumps(tool_schema, indent=2, ensure_ascii=False)} """ }, {"role": "user", "content": "What's the weather like in Beijing right now?"}, { "role": "assistant", "tool_calls": [ { "id": "call_123", "type": "function", "function": { "name": "get_weather", "arguments": json.dumps({"location": "Beijing"}), }, } ], }, { "role": "tool", "tool_call_id": "call_123", "content": [ { "type": "text", "text": json.dumps( {"location": "Beijing", "temperature": "7°C", "condition": "Cloudy"} ), } ], }, ], } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/add/message" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(json.dumps(res.json(), indent=2, ensure_ascii=False)) ``` ### Retrieve Tool Memory > **Note**:  Session B: In a new session, the user asks [What clothes are suitable for Beijing]. MemOS can recall relevant tool memories from past [Weather Tool Calls]. The model can use tool memories in the future to improve the accuracy and effectiveness of tool usage. ```python import os import requests import json os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "user_id": "memos_user_123", "conversation_id": "0928", "query": "What clothes are suitable for Beijing", "memory_limit_number": 10, "include_preference": True, "preference_limit_number": 10, "include_tool_memory":True, "tool_memory_limit_number":10, } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/search/memory" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(json.dumps(res.json(), indent=2, ensure_ascii=False)) ``` ### Output Result ```python "tool_memory_detail_list": [ { "id": "7ec50fd8-19ec-42a2-a7c7-ce3cebdb70cf", "tool_type": "ToolSchemaMemory", "tool_value": {"name": "get_weather", "description": "Get current weather information for a given location", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "City name, e.g. Beijing"}}, "required": ["location"]}}, "create_time": 1766494806624, "conversation_id": "demo-conv-id", "status": "activated", "update_time": 1766494806625, "relativity": 0.44700349055540967 }, { "id": "4b208707-991a-481c-9dd6-c7f0577ff371", "tool_type": "ToolTrajectoryMemory", "tool_value": "User asked about the current weather in Beijing -> Tool 'get_weather' was called with the parameter 'location' set to 'Beijing' -> The tool returned the weather information: temperature is 7°C and condition is Cloudy.", "tool_used_status": [ { "used_tool": "get_weather", "error_type": "", "success_rate": 1.0, "tool_experience": "The 'get_weather' tool requires a valid location parameter and provides current weather information for that location." # New: Experience with this tool in the current trajectory. } ], "create_time": 1768390489180, "conversation_id": "demo-conv-id", "status": "activated", "update_time": 1768390489181, "relativity": 0.47883897395535013, "experience": "when encountering weather inquiry tasks, then ensure to call the 'get_weather' tool with the correct location parameter." # New: Procedural experience of the entire trajectory, serving as overall guidance for task completion. } ] ``` --- # Cloud FAQs (/memos_cloud/support/faq) This page answers common questions about using MemOS Cloud. If you want to understand MemOS concepts such as RAG, open source deployment, private deployment, or memory scheduling, see [FAQ](/memos_cloud/introduction/faq). ## Which document should a new user read first? If you have not logged in to the console yet, choose the entry point based on how you plan to use MemOS Cloud: - If you want Agent tools such as Claude Code or Cursor to help with integration, start with [Use in Agents](/memos_cloud/getting_started/agent_usage). - If you want to call Cloud APIs from your own application, start with [Integrate into Your App](/memos_cloud/getting_started/quick_start). [Configuration](/api_docs/start/configuration) is more useful when you need to manage multiple projects, edit or delete projects, or understand how knowledge bases are bound to projects. ## What is the relationship between API Keys and projects? Each project has its own API Key. When you call an API with a specific API Key, you can only access memories, messages, knowledge bases, and configuration under that project. If data is visible in the console but not returned by the API, check: - Whether the console project matches the project that owns the API Key. - Whether the request uses the same `user_id`. - Whether the search request uses an overly narrow `filter`, `knowledgebase_ids`, or a high `relativity`. ## How do I troubleshoot authentication or permission errors? Start by checking the request header: ```text Authorization: Token YOUR_API_KEY ``` Common issues include: - Missing `Authorization`. - Missing the `Token` prefix. - Incomplete API Key copied from the console. - API Key belongs to another project. - API Key is invalid, expired, or has no permission for the current resource. If the response returns `40100`, `40130`, or `40132`, go back to [Configuration](/api_docs/start/configuration) and check the API Key and project. ## Why did Search Memory or Chat not recall knowledge base content? First confirm that the knowledge base is bound to the current project and that the files have finished processing. Also check the API request: - If `knowledgebase_ids` is not provided, knowledge bases are not searched by default. - If `knowledgebase_ids` is provided, make sure the IDs belong to knowledge bases accessible from the current project. - If `filter` is too narrow or `relativity` is too high, relevant content may be filtered out. For upload, binding, and processing status, see [Knowledge Base](/memos_cloud/features/knowledge_base). ## What should I do when I hit quota or rate limits? First identify which limit was reached: - Single input or output is too long: reduce the current input, history, or expected output. - API call quota is exhausted: wait for quota recovery or request more quota. - Too many requests in a short period: reduce request frequency and avoid repeated retries. For quota details, see [Quotas and Limits](/memos_cloud/support/limit). ## Why can I not search a memory immediately after writing it? After memory is written, it still needs to be extracted, processed, and indexed. If you search immediately after writing, processing may not be finished yet. Check: - The write and search requests use the same `user_id`. - The request is not restricted by an overly narrow `conversation_id`, `filter`, or `knowledgebase_ids`. - If asynchronous writing is used, wait for the task to finish before searching. ## What should I check when Delete Memory fails? When deleting individual memories, prefer `memory_ids`. Do not use `conversation_id`, `user_id`, or knowledge base IDs as `memory_id`. Only pass `user_id` when you need to delete all memories for that user under the current project. This is a high-risk operation, so confirm the API Key, project, and user ID before running it. ## Why are console and API data ranges different? Both the console and APIs isolate data by project. Inconsistent data ranges are usually caused by: - The selected console project is different from the project that owns the API Key. - The API request uses a different `user_id`. - Extra filters are applied in the API request. - The knowledge base is not authorized for the current project. If an API response returns a specific `code`, see [Error Codes](/api_docs/help/error_codes). --- # Limits (/memos_cloud/support/limit) ## 1. Quota ![image.png](https://cdn.memtensor.com.cn/img/1766630472243_emn5fx_compressed.png) MemOS Cloud Services currently provides multiple pricing plans, from the free tier to the enterprise tier, to meet the needs of teams of different sizes. All plans are currently free for a limited time. Visit [MemOS Pricing page](https://memos.openmem.net/en/pricing) to apply for the plan that fits your needs. Take action now and use MemOS Cloud Services to support the growth of your projects. > **Note**: **Note** > - The free quota is provided per **developer account** and is shared across all projects under that account. > - Failed requests (authentication failure, parameter error, exceeding limits, etc.) **do not consume quota**. ## 2. Resource Limits To ensure service stability and security, MemOS Cloud Services imposes the following limits on core API calls, calculated per account: | **API Name** | **Single Input Limit** | **Single Output Limit** | |-----------------|------------------------|-------------------------| | addMessage | 40,000 tokens | - | | searchMemory | 40,000 tokens | Factual Memory: 25 items Preference Memory: 25 items Tool Memory: 25 items Skills: 25 items | In addition, the document upload feature for the knowledge base currently has the following limits: - Supported document types: PDF, DOCX, DOC, TXT, JSON, MD, XML - Maximum single-file size: no more than 100 MB and 500 pages - Maximum number of files per upload: no more than 20 files > **Note**: **Note** > Knowledge bases now also support uploading Skill files. For detailed limits, see [Knowledge Base](/memos_cloud/features/knowledge_base). If you have higher-level or special requirements, please contact the project team for further discussion. > **Note**: **Note** > - Requests exceeding the per-call limit will return the corresponding error code without deducting quota. > - The total input tokens per minute must not exceed 400,000 tokens. Requests exceeding this limit will be rate-limited. > - Additionally, we recommend a maximum QPS ≤ 50 (i.e., up to 50 requests per second). This is not a strict limit, but high concurrency may be affected by platform capacity, so control request frequency according to actual needs. ## 3. Usage Monitoring You can view the remaining quota for each API through the **API Console**, with filters for project, API key, and date to facilitate tracking and managing usage. image --- # Search memories **POST** `/product/search` Search memories for a specific user. - operationId: `search_memories_product_search_post` ## Request Body ## Response Successful Response URL: /api-reference/search-memories --- # Add memories **POST** `/product/add` Add memories for a specific user. - operationId: `add_memories_product_add_post` ## Request Body ## Response Successful Response URL: /api-reference/add-memories --- # Get all memories for user **POST** `/product/get_all` Get all memories or subgraph for a specific user. If search_query is provided, returns a subgraph based on the query. Otherwise, returns all memories of the specified type. - operationId: `get_all_memories_product_get_all_post` ## Request Body ## Response Successful Response URL: /api-reference/get-all-memories-for-user --- # Get memories for user **POST** `/product/get_memory` Get memories for user with pagination support. - operationId: `get_memories_product_get_memory_post` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `user_id` | string | Yes | Unique identifier of the user whose memories are being retrieved. | | `page` | integer | No | Page number for pagination when many results are returned. | | `size` | integer | No | Number of entries returned per memory category on the current page, up to 50. | | `filter` | object | No | Filter conditions, used to precisely limit the memory scope before retrieval. Available fields include: "agent_id", "app_id", "create_time", "update_time", and specific fields in "info". Supports logical operators (and, or) and comparison operators (gte, lte, gt, lt). For the "info" field, supports filtering by "business_type", "biz_id", "scene", and other custom fields. | | `include_preference` | boolean | No | Whether preference memories should be included. | | `include_tool_memory` | boolean | No | Whether tool memories should be included. | ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | number | Yes | API status code; refer to the error-code list for details. | | `data` | object | No | | | `message` | string | Yes | API message. | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `memory_detail_list` | array | No | Returned factual memories. | | `preference_detail_list` | array | No | Returned preference memories. | | `tool_memory_detail_list` | array | No | List of tool memory fragment details returned. | | `total` | integer | No | Maximum count across memory types, used to check if another page exists. | | `size` | integer | No | Number of entries per memory type on the current page. | | `current` | integer | No | Index of the current page. | | `pages` | integer | No | Total number of pages. | #### `memory_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of a factual memory entry. | | `memory_key` | string | No | Title or keyword summarizing the factual memory. | | `memory_value` | string | No | Content of the factual memory. | | `memory_type` | string (enum: WorkingMemory, LongTermMemory, UserMemory) | No | Type of factual memory. | | `create_time` | string | No | Creation time in ISO 8601 format. | | `conversation_id` | string | No | Conversation identifier linked to this memory. | | `status` | string | No | Current status, only activated is returned. | | `confidence` | number | No | Confidence score between 0 and 1; higher means more reliable. | | `tags` | array | No | Tag list for classification or retrieval. | | `update_time` | string | No | Last update time in ISO 8601 format. | | `sources` | array | No | Source message objects associated with the memory. | | `info` | object | No | Custom metadata provided when adding the message. | #### `preference_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of the preference memory. | | `preference_type` | string (enum: explicit_preference, implicit_preference) | No | Preference memory type. | | `preference` | string | No | Description of the preference. | | `reasoning` | string | No | Reasoning for extracting or deriving the preference. | | `create_time` | string | No | Creation time in ISO 8601 format. | | `conversation_id` | string | No | Conversation identifier linked to this preference. | | `status` | string | No | Current status, only activated is returned. | | `update_time` | string | No | Last update time in ISO 8601 format. | | `sources` | array | No | Source messages associated with the preference. | | `info` | object | No | Custom metadata provided when adding the message. | #### `tool_memory_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of the tool memory fragment. | | `tool_type` | string (enum: ToolTrajectoryMemory, ToolSchema) | No | Tool memory type. ToolTrajectoryMemory: tool trajectory memory; ToolSchema: tool info memory. | | `tool_value` | string | No | Specific content of the tool memory. | | `tool_used_status` | array | No | List of tool trajectory memories, each record contains tool used and experience info. | | `create_time` | string | No | Tool memory creation time (ISO 8601 format). | | `conversation_id` | string | No | Unique identifier of the conversation associated with the tool memory. | | `status` | string (enum: activated) | No | Tool memory status, currently activated. | | `update_time` | string | No | Last update time of the tool memory (ISO 8601 format). | | `experience` | string | No | Procedural experience of the entire trajectory, serving as overall guidance for task completion. | | `sources` | array | No | List of original message content associated with the tool memory. | | `info` | object | No | Custom metadata provided when adding the message. | ### Response Example ```json { "data": { "memory_detail_list": [ { "memory_type": "WorkingMemory" } ], "preference_detail_list": [ { "preference_type": "explicit_preference" } ], "tool_memory_detail_list": [ { "tool_type": "ToolTrajectoryMemory", "status": "activated" } ] } } ``` URL: /api-reference/get-memories-for-user --- # Get memory by id **GET** `/product/get_memory/{memory_id}` Get a specific memory by its ID. - operationId: `get_memory_by_id_product_get_memory__memory_id__get` ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | number | Yes | API status code; refer to the error-code list for details. | | `data` | object | No | | | `message` | string | Yes | API message. | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `memory_detail_list` | array | No | Returned factual memories. | | `preference_detail_list` | array | No | Returned preference memories. | | `tool_memory_detail_list` | array | No | List of tool memory fragment details returned. | | `total` | integer | No | Maximum count across memory types, used to check if another page exists. | | `size` | integer | No | Number of entries per memory type on the current page. | | `current` | integer | No | Index of the current page. | | `pages` | integer | No | Total number of pages. | #### `memory_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of a factual memory entry. | | `memory_key` | string | No | Title or keyword summarizing the factual memory. | | `memory_value` | string | No | Content of the factual memory. | | `memory_type` | string (enum: WorkingMemory, LongTermMemory, UserMemory) | No | Type of factual memory. | | `create_time` | string | No | Creation time in ISO 8601 format. | | `conversation_id` | string | No | Conversation identifier linked to this memory. | | `status` | string | No | Current status, only activated is returned. | | `confidence` | number | No | Confidence score between 0 and 1; higher means more reliable. | | `tags` | array | No | Tag list for classification or retrieval. | | `update_time` | string | No | Last update time in ISO 8601 format. | | `sources` | array | No | Source message objects associated with the memory. | | `info` | object | No | Custom metadata provided when adding the message. | #### `preference_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of the preference memory. | | `preference_type` | string (enum: explicit_preference, implicit_preference) | No | Preference memory type. | | `preference` | string | No | Description of the preference. | | `reasoning` | string | No | Reasoning for extracting or deriving the preference. | | `create_time` | string | No | Creation time in ISO 8601 format. | | `conversation_id` | string | No | Conversation identifier linked to this preference. | | `status` | string | No | Current status, only activated is returned. | | `update_time` | string | No | Last update time in ISO 8601 format. | | `sources` | array | No | Source messages associated with the preference. | | `info` | object | No | Custom metadata provided when adding the message. | #### `tool_memory_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of the tool memory fragment. | | `tool_type` | string (enum: ToolTrajectoryMemory, ToolSchema) | No | Tool memory type. ToolTrajectoryMemory: tool trajectory memory; ToolSchema: tool info memory. | | `tool_value` | string | No | Specific content of the tool memory. | | `tool_used_status` | array | No | List of tool trajectory memories, each record contains tool used and experience info. | | `create_time` | string | No | Tool memory creation time (ISO 8601 format). | | `conversation_id` | string | No | Unique identifier of the conversation associated with the tool memory. | | `status` | string (enum: activated) | No | Tool memory status, currently activated. | | `update_time` | string | No | Last update time of the tool memory (ISO 8601 format). | | `experience` | string | No | Procedural experience of the entire trajectory, serving as overall guidance for task completion. | | `sources` | array | No | List of original message content associated with the tool memory. | | `info` | object | No | Custom metadata provided when adding the message. | ### Response Example ```json { "data": { "memory_detail_list": [ { "memory_type": "WorkingMemory" } ], "preference_detail_list": [ { "preference_type": "explicit_preference" } ], "tool_memory_detail_list": [ { "tool_type": "ToolTrajectoryMemory", "status": "activated" } ] } } ``` URL: /api-reference/get-memory-by-id --- # Delete memories for user **POST** `/product/delete_memory` Delete memories for a specific user. - operationId: `delete_memories_product_delete_memory_post` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `memory_ids` | array | No | IDs of the memories to be deleted, obtained from the `id` field returned by the `search/memory` or `get/memory` API. | | `user_id` | string | No | Unique identifier of the user whose memories are being deleted. | ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | No | API status code. See Error Code for details. | | `data` | object | No | Returned deletion information | | `message` | string | No | API response message | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `success` | boolean | No | Whether deletion was successful. true for success, false for failure. | ### Response Example ```json { "data": { "success": true } } ``` URL: /api-reference/delete-memories-for-user --- # Chat with MemOS (Complete Response) **POST** `/product/chat/complete` Chat with MemOS for a specific user. Returns complete response (non-streaming). - operationId: `chat_complete_product_chat_complete_post` ## Request Body ## Response Successful Response URL: /api-reference/chat-with-memos-(complete-response) --- # Chat with MemOS **POST** `/product/chat/stream` Chat with MemOS for a specific user. Returns SSE stream. - operationId: `chat_stream_product_chat_stream_post` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `user_id` | string | Yes | Unique identifier of the user associated with the conversation. | | `conversation_id` | string | Yes | Unique identifier of the conversation. Providing this clarifies the current session, prioritizing its memory over historical sessions. If omitted, cross-session memory retrieval is performed for more relevant responses. | | `query` | string | Yes | User input content. | | `filter` | object | No | Filter conditions, used to precisely limit the memory scope before retrieval. Available fields include: "agent_id", "app_id", "create_time", "update_time", and specific fields in "info". Supports logical operators (and, or) and comparison operators (gte, lte, gt, lt). For the "info" field, supports filtering by "business_type", "biz_id", "scene", and other custom fields. | | `knowledgebase_ids` | array | No | Specifies the scope of Knowledge Bases accessible for the current search. Defaults to empty, meaning no Knowledge Bases are searched. Pass specific Knowledge Base IDs to search ordinary documents and uploaded Skills in those Knowledge Bases; pass "all" to search across all associated Knowledge Bases within the project. | | `memory_limit_number` | integer | No | Maximum number of memories that can be recalled: as long as the relevance threshold (relativity) is met, up to this many memories may be returned. Default is 9, maximum is 25. | | `include_preference` | boolean | No | Whether to recall preference memories. When enabled, the system intelligently recalls memories related to user preferences based on the query. Default = true. | | `preference_limit_number` | integer | No | Maximum number of preferred memories that can be recalled: as long as the relevance threshold (relativity) is met, up to this many preference memories may be returned. Default is 9, maximum is 25. | | `relativity` | number | No | Relevance threshold (0–1) for recalled memories. Filters out low-relevance memories and, together with the maximum counts for factual and preferred recalls, constrains the final results. When omitted, the system default threshold is used. A value of 0 disables relevance filtering. | | `model_name` | string (enum: qwen3-32b, deepseek-r1, qwen2.5-72b-instruct) | No | Specifies the concrete conversation model. | | `system_prompt` | string | No | Custom system instructions. | | `stream` | boolean | No | Whether to enable streaming response. | | `max_tokens` | integer | No | Indicates the maximum number of generated tokens. | | `temperature` | number | No | Controls generation randomness. Range: 0 ≤ x ≤ 2. | | `top_p` | number | No | Nucleus sampling parameter. Range: 0 ≤ x ≤ 1. | | `add_message_on_answer` | boolean | No | Whether to automatically write user and assistant conversation content into memory. When enabled, developers do not need to call the add/message API to add messages as memory. | | `app_id` | string | No | Unique identifier of the application associated with the conversation. | | `agent_id` | string | No | Unique identifier of the Agent associated with the conversation. | | `tags` | array | No | List of custom tags used to mark the topic or category of the conversation. | | `info` | object | No | Custom metadata field capable of storing any structured data related to the conversation, such as location, source, version, etc., primarily for precise filtering or source tracking during retrieval. | | `allow_public` | boolean | No | Whether to allow adding to public memory. | | `allow_knowledgebase_ids` | array | No | List of knowledgebases where memories generated from added messages are allowed to be written. | ## Response Successful Response URL: /api-reference/chat-with-memos --- # Chat with MemOS playground **POST** `/product/chat/stream/playground` Chat with MemOS for a specific user in playground. Returns SSE stream. - operationId: `chat_stream_playground_product_chat_stream_playground_post` ## Request Body ## Response Successful Response URL: /api-reference/chat-with-memos-playground --- # Get suggestion queries **POST** `/product/suggestions` Get suggestion queries for a specific user with language preference. - operationId: `get_suggestion_queries_product_suggestions_post` ## Request Body ## Response Successful Response URL: /api-reference/get-suggestion-queries --- # Feedback memories **POST** `/product/feedback` Feedback memories for a specific user. - operationId: `feedback_memories_product_feedback_post` ## Request Body ## Response Successful Response URL: /api-reference/feedback-memories --- # Get detailed scheduler status **GET** `/product/scheduler/allstatus` Get detailed scheduler status including running tasks and queue metrics. - operationId: `scheduler_allstatus_product_scheduler_allstatus_get` ## Response Successful Response URL: /api-reference/get-detailed-scheduler-status --- # Get scheduler running status **GET** `/product/scheduler/status` Get scheduler running status. - operationId: `scheduler_status_product_scheduler_status_get` ## Response Successful Response URL: /api-reference/get-scheduler-running-status --- # Get scheduler task queue status **GET** `/product/scheduler/task_queue_status` Get scheduler task queue backlog/pending status for a user. - operationId: `scheduler_task_queue_status_product_scheduler_task_queue_status_get` ## Response Successful Response URL: /api-reference/get-scheduler-task-queue-status --- # Wait until scheduler is idle for a specific user **POST** `/product/scheduler/wait` Wait until scheduler is idle for a specific user. - operationId: `scheduler_wait_product_scheduler_wait_post` ## Response Successful Response URL: /api-reference/wait-until-scheduler-is-idle-for-a-specific-user --- # Stream scheduler progress for a user **GET** `/product/scheduler/wait/stream` Stream scheduler progress via Server-Sent Events (SSE). - operationId: `scheduler_wait_stream_product_scheduler_wait_stream_get` ## Response Successful Response URL: /api-reference/stream-scheduler-progress-for-a-user --- # Get user names by memory ids **POST** `/product/get_user_names_by_memory_ids` Get user names by memory ids. - operationId: `get_user_names_by_memory_ids_product_get_user_names_by_memory_ids_post` ## Request Body ## Response Successful Response URL: /api-reference/get-user-names-by-memory-ids --- # Check if mem cube id exists **POST** `/product/exist_mem_cube_id` Check if mem cube id exists. - operationId: `exist_mem_cube_id_product_exist_mem_cube_id_post` ## Request Body ## Response Successful Response URL: /api-reference/check-if-mem-cube-id-exists --- # Installation Guide (/open_source/getting_started/installation) - [Install via Docker](/open_source/getting_started/installation#from-docker): Ideal for quick deployment: one-click startup for services and dependencies. - [Install from Source](/open_source/getting_started/installation#from-source): Ideal for development and contribution: editable installation, run tests, local debugging. - [Install via pip](/open_source/getting_started/installation#from-pip): The simplest installation method: get started with MemOS quickly. ## Install via Docker ```bash git clone https://github.com/MemTensor/MemOS.git cd MemOS ``` #### Create .env Configuration File > **Note**: **Please Note** > The .env file must be placed in the MemOS project root directory. #### 1. Create .env ```bash cd MemOS touch .env ``` #### 2. .env Contents Here is a quick .env configuration example: ```bash # OpenAI API Key (Required configuration) OPENAI_API_KEY=sk-xxx # OpenAI API Base URL OPENAI_API_BASE=http://xxx:3000/v1 # Default model name MOS_CHAT_MODEL=qwen3-max # Memory Reader LLM Model MEMRADER_MODEL=qwen3-max # Memory Reader API Key MEMRADER_API_KEY=sk-xxx # Memory Reader API Base URL MEMRADER_API_BASE=http://xxx:3000/v1 # Embedder Model Name MOS_EMBEDDER_MODEL=text-embedding-v4 # Configure embedding backend: ollama | universal_api MOS_EMBEDDER_BACKEND=universal_api # Embedder API Base URL MOS_EMBEDDER_API_BASE=http://xxx:8081/v1 # Embedder API Key MOS_EMBEDDER_API_KEY=xxx # Embedding Vector Dimension EMBEDDING_DIMENSION=1024 # Reranker Backend (http_bge | etc.) MOS_RERANKER_BACKEND=cosine_local # Neo4j Connection URI # Options: neo4j-community | neo4j | nebular | polardb NEO4J_BACKEND=neo4j-community # Required when backend=neo4j* NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=12345678 NEO4J_DB_NAME=neo4j MOS_NEO4J_SHARED_DB=false # Whether to use redis scheduler DEFAULT_USE_REDIS_QUEUE=false # Enable Chat API ENABLE_CHAT_API=true # Chat model list, can be applied through Bailian. Models are customizable. CHAT_MODEL_LIST=[{"backend": "qwen", "api_base": "https://xxx/v1", "api_key": "sk-xxx", "model_name_or_path": "qwen3-max", "extra_body": {"enable_thinking": true} ,"support_models": ["qwen3-max"]}] ``` #### .env Configuration Example using Bailian ```bash # Can be applied through Bailian platform # https://bailian.console.aliyun.com/?spm=a2c4g.11186623.0.0.2f2165b08fRk4l&tab=api#/api # After successful application, get API_KEY and BASE_URL, configuration example as follows # OpenAI API Key (Use Bailian API_KEY) OPENAI_API_KEY=you_bailian_api_key # OpenAI API Base URL OPENAI_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1 # Default model name MOS_CHAT_MODEL=qwen3-max # Memory Reader LLM Model MEMRADER_MODEL=qwen3-max # Memory Reader API Key (Use Bailian API_KEY) MEMRADER_API_KEY=you_bailian_api_key # Memory Reader API Base URL MEMRADER_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1 # Embedder model name can refer to the link below # https://bailian.console.aliyun.com/?spm=a2c4g.11186623.0.0.2f2165b08fRk4l&tab=api#/api/?type=model&url=2846066 MOS_EMBEDDER_MODEL=text-embedding-v4 # Configure embedding backend: ollama | universal_api MOS_EMBEDDER_BACKEND=universal_api # Embedder API Base URL MOS_EMBEDDER_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1 # Embedder API Key (Use Bailian API_KEY) MOS_EMBEDDER_API_KEY=you_bailian_api_key # Embedding Vector Dimension EMBEDDING_DIMENSION=1024 # Reranker Backend (http_bge | etc.) MOS_RERANKER_BACKEND=cosine_local # Neo4j Connection URI # Options: neo4j-community | neo4j | nebular | polardb NEO4J_BACKEND=neo4j-community # Required when backend=neo4j* NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=12345678 NEO4J_DB_NAME=neo4j MOS_NEO4J_SHARED_DB=false # Whether to use redis scheduler DEFAULT_USE_REDIS_QUEUE=false # Enable Chat API ENABLE_CHAT_API=true CHAT_MODEL_LIST=[{"backend": "qwen", "api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1", "api_key": "you_bailian_api_key", "model_name_or_path": "qwen3-max-preview", "extra_body": {"enable_thinking": true} ,"support_models": ["qwen3-max-preview"]}] ``` ![MemOS bailian](https://cdn.memtensor.com.cn/img/get_key_url_by_bailian_compressed.png)
Example of applying for API_KEY and BASE_URL in Bailian #### Configure Dockerfile > **Note**: **Please Note** > The Dockerfile is located in the docker directory. ```bash # Enter the docker directory cd docker ``` Includes quick mode and full mode. You can choose to use the lite package (distinguished by arm and x86) or the full package (distinguished by arm and x86). ```bash ● Lite Package: Simplifies large dependencies like nvidia-related ones, effectively making the image lightweight for faster local deployment. url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-base:v1.0 url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-base-arm:v1.0 ● Full Package: Packaged with all MemOS dependencies into the image, allowing for a full-feature experience. Can be built and started directly by configuring the Dockerfile. url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-full-base:v1.0.0 url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-full-base-arm:v1.0.0 ``` ```bash # This example uses the lite package url FROM registry.cn-shanghai.aliyuncs.com/memtensor/memos-base-arm:v1.0 WORKDIR /app ENV HF_ENDPOINT=https://hf-mirror.com ENV PYTHONPATH=/app/src COPY src/ ./src/ EXPOSE 8000 CMD ["uvicorn", "memos.api.server_api:app", "--host", "0.0.0.0", "--port", "8000", "--reload"] ``` #### Start Docker Client ```bash # If Docker is not installed, please install the corresponding version. Download address: https://www.docker.com/ #After installation, Docker can be started through the client or through the command line #Command line start sudo systemctl start docker # After installation, check docker status docker ps # Check docker images (optional) docker images ``` #### Build and Start Service: > **Note**: **Please Note** > The build command must also be executed in the docker directory. ```bash # In the docker directory docker compose up ``` ![MemOS buildComposeupSuccess](https://cdn.memtensor.com.cn/img/memos_build_composeup_success_compressed.png)
Example image, port according to custom docker configuration #### Access the API at [http://localhost:8000/docs](http://localhost:8000/docs). ![MemOS Architecture](https://cdn.memtensor.com.cn/img/memos_run_server_success_compressed.png) #### Search Memory ```bash curl --location --request POST 'http://127.0.0.1:8000/product/search' \ --header 'Content-Type: application/json' \ --data-raw '{ "query": "What do I like to eat", "user_id": "8736b16e-1d20-4163-980b-a5063c3facdc", "readable_cube_ids": ["b32d0977-435d-4828-a86f-4f47f8b55bca"], "top_k":20 }' # response { "code": 200, "message": "Search completed successfully", "data": { "text_mem": [ { "cube_id": "7231eda8-6c57-4f6e-97ce-98b699eebb98", "memories": [ { "id": "2f40be8f-736c-4a5f-aada-9489037769e0", "memory": "[user opinion] User likes strawberries.", "metadata": { "user_id": "de8215e3-3beb-4afc-9b64-ae594d62f1ea", "session_id": "root_session", "status": "activated", "type": "fact", "key": "User's preference for strawberries", "confidence": 0.99, "source": null, "tags": [ "preference", "strawberries" ], "visibility": null, "updated_at": "2025-09-18T08:23:44.625479000+00:00", "memory_type": "UserMemory", "sources": [], "embedding": [], "created_at": "2025-09-18T08:23:44.625511000+00:00", "usage": [ "{ "time": "2025-09-18T08:24:17.759748", "info": { "user_id": "de8215e3-3beb-4afc-9b64-ae594d62f1ea", "session_id": "root_session" } }" ], "background": "User expressed a preference for strawberries, indicating a tendency in dietary preferences.", "relativity": 0.6349761312470591, "vector_sync": "success", "ref_id": "[2f40be8f]", "id": "2f40be8f-736c-4a5f-aada-9489037769e0", "memory": "[user opinion] User likes strawberries." }, "ref_id": "[2f40be8f]" }, ... } } ], "act_mem": [], "para_mem": [] } } ``` ## Install from Source ```bash git clone https://github.com/MemTensor/MemOS.git cd MemOS ``` #### Create .env Configuration File The MemOS server_api relies on environment variables to start, so you need to create a .env file in the startup directory. 1. Create .env file ```bash cd MemOS touch .env ``` 2. .env contents Please refer to the Docker installation for quick configuration[env configuration](/open_source/getting_started/installation#from-docker) For detailed .env configuration, please refer to [env configuration](/open_source/getting_started/rest_api_server/#local-deployment) > **Note**: **Please Note** > The .env file must be placed in the MemOS project root directory. #### Install Dependencies ```bash # Execute the installation command pip install -e . pip install --no-cache-dir -r ./docker/requirements.txt # Configure PYTHONPATH to the absolute directory of the current project file src export PYTHONPATH=/******/MemOS/src ``` #### Neo4j Support > **Note**: **Neo4j Desktop Requirement** If you plan to use Neo4j for graph memory, please install Neo4j Desktop. > Additionally, you need to set **NEO4J_BACKEND=neo4j** in .env file #### Start MemOS Server ```bash # project root directory uvicorn memos.api.server_api:app --host 0.0.0.0 --port 8000 --workers 1 ``` #### Add Memory ```bash curl --location --request POST 'http://127.0.0.1:8000/product/add' \ --header 'Content-Type: application/json' \ --data-raw '{ "messages": [{ "role": "user", "content": "I like eating strawberries" }], "user_id": "8736b16e-1d20-4163-980b-a5063c3facdc", "writable_cube_ids":["b32d0977-435d-4828-a86f-4f47f8b55bca"] }' # response { "code": 200, "message": "Memory created successfully", "data": null } ``` #### Search Memory ```bash curl --location --request POST 'http://127.0.0.1:8000/product/search' \ --header 'Content-Type: application/json' \ --data-raw '{ "query": "What do I like to eat", "user_id": "8736b16e-1d20-4163-980b-a5063c3facdc", "readable_cube_ids": ["b32d0977-435d-4828-a86f-4f47f8b55bca"], "top_k":20 }' # response { "code": 200, "message": "Search completed successfully", "data": { "text_mem": [ { "cube_id": "7231eda8-6c57-4f6e-97ce-98b699eebb98", "memories": [ { "id": "2f40be8f-736c-4a5f-aada-9489037769e0", "memory": "[user opinion] User likes strawberries.", "metadata": { "user_id": "de8215e3-3beb-4afc-9b64-ae594d62f1ea", "session_id": "root_session", "status": "activated", "type": "fact", "key": "User's preference for strawberries", "confidence": 0.99, "source": null, "tags": [ "preference", "strawberries" ], "visibility": null, "updated_at": "2025-09-18T08:23:44.625479000+00:00", "memory_type": "UserMemory", "sources": [], "embedding": [], "created_at": "2025-09-18T08:23:44.625511000+00:00", "usage": [ "{ "time": "2025-09-18T08:24:17.759748", "info": { "user_id": "de8215e3-3beb-4afc-9b64-ae594d62f1ea", "session_id": "root_session" } }" ], "background": "User expressed a preference for strawberries, indicating a tendency in dietary preferences.", "relativity": 0.6349761312470591, "vector_sync": "success", "ref_id": "[2f40be8f]", "id": "2f40be8f-736c-4a5f-aada-9489037769e0", "memory": "[user opinion] User likes strawberries." }, "ref_id": "[2f40be8f]" }, ... } } ], "act_mem": [], "para_mem": [] } } ``` ## Install via pip The simplest way to install MemOS is using pip. #### Create and Activate Conda Environment (Recommended) To avoid dependency conflicts, it is strongly recommended to use a dedicated Conda environment. ```bash conda create -n memos python=3.11 conda activate memos ``` #### Install MemOS from PyPI Install MemOS with all optional components: ```bash pip install -U "MemoryOS[all]" ``` After installation, you can verify it was successful: ```bash python -c "import memos; print(memos.__version__)" ``` > **Note**: **Optional Dependencies** > > MemOS provides several optional dependency groups for different features. You can install them based on your needs. > > | Feature | Package Name | > | ---------------- | ------------------------- | > | Tree Memory | `MemoryOS[tree-mem]` | > | Memory Reader | `MemoryOS[mem-reader]` | > | Memory Scheduler | `MemoryOS[mem-scheduler]` | > > Example installation commands: > > ```bash > pip install MemoryOS[tree-mem] > pip install MemoryOS[tree-mem,mem-reader] > pip install MemoryOS[mem-scheduler] > pip install MemoryOS[tree-mem,mem-reader,mem-scheduler] > ``` #### Create .env Configuration File The MemOS server_api relies on environment variables to start, so you need to create a .env file in the startup directory. 1. Create .env file ```bash touch .env ``` 2. Example .env contents ```text # ========== Required Configuration ========== CHAT_MODEL_LIST='[ { "name": "default", "backend": "openai", "config": { "model": "gpt-4o-mini", "api_key": "YOUR_API_KEY" } } ]' # ========== Optional Configuration ========== MEMOS_LOG_LEVEL=INFO ``` > **Note**: **Please Note** > env notes For detailed development environment setup, workflow guidelines, and contribution best practices, please see our [Contribution Guide](/open_source/contribution/overview). #### Start MemOS Server MemOS does not automatically load .env files. Please use the python-dotenv method to start. ```bash python -m dotenv run -- \ uvicorn memos.api.server_api:app \ --host 0.0.0.0 \ --port 8000 ``` After successful startup, you will see output similar to: ```text INFO: Uvicorn running on http://0.0.0.0:8000 INFO: Application startup complete. ``` #### Verify Service is Running #### Ollama Support To use MemOS with [Ollama](https://ollama.com/), first install the Ollama CLI: ```bash curl -fsSL https://ollama.com/install.sh | sh ``` #### Transformers Support To use functionalities based on the `transformers` library, ensure you have [PyTorch](https://pytorch.org/get-started/locally/) installed (CUDA version recommended for GPU acceleration). #### Neo4j Support > **Note**: **Neo4j Desktop Requirement** If you plan to use Neo4j for graph memory, please install Neo4j Desktop. #### Download Examples To download example code, data, and configurations, run the following command: ```bash memos download_examples ``` --- # Your First Memory (/open_source/getting_started/your_first_memory) ## What You'll Learn By the end of this guide, you will: - Extract memories from plain text or chat messages. - Store them as semantic vectors. - Search and manage them using vector similarity. ## How It Works ### Memory Structure Every memory is stored as a `TextualMemoryItem`: - `memory`: the main text content (e.g., "The user loves tomatoes.") - `metadata`: extra details to make the memory searchable and manageable — type, time, source, confidence, entities, tags, visibility, and updated_at. These fields make each piece of memory queryable, filterable, and easy to govern. For each `TextualMemoryItem`: | Field | Example | What it means | | ------------- | ------------------------- | ------------------------------------------ | | `type` | `"opinion"` | Classify if it's a fact, event, or opinion | | `memory_time` | `"2025-07-02"` | When it happened | | `source` | `"conversation"` | Where it came from | | `confidence` | `100.0` | Certainty score (0–100) | | `entities` | `["tomatoes"]` | Key concepts | | `tags` | `["food", "preferences"]` | Extra labels for grouping | | `visibility` | `"private"` | Who can access it | | `updated_at` | `"2025-07-02T00:00:00Z"` | Last modified | > **Note**: **Best Practice** You can define any metadata fields that make sense for your use case! ### The Core Steps When you run this example: 1. **Extract:** Your messages go through an `extractor_llm`, which returns a JSON list of `TextualMemoryItem`s. 2. **Embed:** Each memory's `memory` field is turned into an embedding vector via `embedder`. 3. **Store:** The embeddings are saved into a local **Qdrant** collection. 4. **Search & Manage:** You can now `search` by semantic similarity, `update` by ID, or `delete` memories. > **Note**: **Hint** Make sure your embedder's output dimension matches your vector DB's `vector_dimension`. > Mismatch may cause search errors! > **Note**: **Hint** If your search results are too noisy or irrelevant, check whether your embedder config and vector DB are properly initialized. ### Example Flow **Input Messages:** ```json [ {"role": "user", "content": "I love tomatoes."}, {"role": "assistant", "content": "Great! Tomatoes are healthy."} ] ``` **Extracted Memory:** ```json { "memory": "The user loves tomatoes.", "metadata": { "type": "opinion", "memory_time": "2025-07-02", "source": "conversation", "confidence": 100.0, "entities": ["tomatoes"], "tags": ["food", "preferences"], "visibility": "private", "updated_at": "2025-07-02T00:00:00" } } ``` Here's a minimal script to create, extract, store, and search a memory: #### Create a Memory Config First, create your minimal GeneralTextMemory config. It contains three key parts: - extractor_llm: uses an LLM to extract plaintext memories from conversations. - embedder: turns each memory into a vector. - vector_db: stores vectors and supports similarity search. ```python from memos.configs.memory import MemoryConfigFactory from memos.memories.factory import MemoryFactory config = MemoryConfigFactory( backend="general_text", config={ "extractor_llm": { "backend": "ollama", "config": { "model_name_or_path": "qwen3:0.6b", "temperature": 0.0, "remove_think_prefix": True, "max_tokens": 8192, }, }, "vector_db": { "backend": "qdrant", "config": { "collection_name": "test_textual_memory", "distance_metric": "cosine", "vector_dimension": 768, }, }, "embedder": { "backend": "ollama", "config": { "model_name_or_path": "nomic-embed-text:latest", }, }, }, ) m = MemoryFactory.from_config(config) ``` #### Extract Memories from Messages Give your LLM a simple dialogue and see how it extracts structured plaintext memories. ```python memories = m.extract( [ {"role": "user", "content": "I love tomatoes."}, {"role": "assistant", "content": "Great! Tomatoes are delicious."}, ] ) print("Extracted:", memories) ``` You'll get a list of TextualMemoryItem, with each of them like: ```text TextualMemoryItem( id='...', memory='The user loves tomatoes.', metadata=... ) ``` #### Add Memories to Your Vector DB Save the extracted memories to your vector DB and demonstrate adding a custom plaintext memory manually (with a custom ID). ```python m.add(memories) m.add([ { "id": "a19b6caa-5d59-42ad-8c8a-e4f7118435b4", "memory": "User is Chinese.", "metadata": {"type": "opinion"}, } ]) ``` #### Search Memories Now test similarity search! Type any natural language query and find related memories. ```python results = m.search("Tell me more about the user", top_k=2) print("Search results:", results) ``` #### Get Memories by ID Fetch any memory directly by its ID: ```python print("Get one by ID:", m.get("a19b6caa-5d59-42ad-8c8a-e4f7118435b4")) ``` #### Update a Memory Need to fix or refine a memory? Update it by ID and re-embed the new version. ```python m.update( "a19b6caa-5d59-42ad-8c8a-e4f7118435b4", { "memory": "User is Canadian.", "metadata": { "type": "opinion", "confidence": 85, "memory_time": "2025-05-24", "source": "conversation", "entities": ["Canadian"], "tags": ["happy"], "visibility": "private", "updated_at": "2025-05-19T00:00:00", }, } ) print("Updated:", m.get("a19b6caa-5d59-42ad-8c8a-e4f7118435b4")) ``` #### Delete Memories Remove one or more memories cleanly ```python m.delete(["a19b6caa-5d59-42ad-8c8a-e4f7118435b4"]) print("Remaining:", m.get_all()) ``` #### Dump Memories to Disk Finally, dump all your memories to local storage: ```python m.dump("tmp/mem") print("Memory dumped to tmp/mem") ``` By default, your memories are saved to: ``` / ``` They can be reloaded anytime with `load()`. > **Note**: By default, your dumped memories are saved to the file path you set in your config. > Always check config.memory_filename if you want to customize it. Now your agent remembers — no more stateless chatbots! ## What's Next? Ready to level up? - **Try your own LLM backend:** Swap to OpenAI, HuggingFace, or Ollama. - **Explore [TreeTextMemory](/open_source/modules/memories/tree_textual_memory):** Build a graph-based, hierarchical memory. - **Add [Activation Memory](/open_source/modules/memories/kv_cache_memory):** Cache key-value states for faster inference. - **Dive deeper:** Check the [API Reference](/api-reference/search-memories) and [Examples](/open_source/getting_started/examples) for advanced workflows. > **Note**: **Try Graph Textual Memory** Try switching to > TreeTextMemory to add a graph-based, hierarchical structure to your memories. Perfect for scenarios that need explainability and long-term structured knowledge. --- # Core Concepts (/open_source/home/core_concepts) ## Overview * [MOS (Memory Operating System)](#mos-memory-operating-system) * [MemCube](#️memcube) * [Memory Types](#memory-types) * [Cross-Cutting Concepts](#cross-cutting-concepts) ## MOS (Memory Operating System) **What it is:** The orchestration layer that coordinates multiple MemCubes and memory operations. It connects your LLMs with structured, explainable memory for reasoning and planning. **When to use:** Use MOS whenever you need to bridge users, sessions, or agents with consistent, auditable memory workflows. ## MemCube **What it is:** A MemCube is like a flexible, swappable memory cartridge. Each user, session, or task can have its own MemCube, which can hold one or more memory types. **When to use:** Use different MemCubes to isolate, reuse, or scale your memory as your system grows. ## Memory Types MemOS treats memory like a living system — not just static data but evolving knowledge. Here's how the three core memory types work together: | Memory Type | Description | When to Use | |----------------|----------------------------------------------|---------------------------------------------| | **Parametric** | Knowledge distilled into model weights | Evergreen skills, stable domain expertise | | **Activation** | Short-term KV cache and hidden states | Fast reuse in dialogue, multi-turn sessions | | **Plaintext** | Text, docs, graph nodes, or vector chunks | Searchable, inspectable, evolving knowledge | ### Parametric Memory **What:** Knowledge embedded directly into the model's weights — think of this as the model's "cortex". It's always on, providing zero-latency reasoning. **When to use:** Perfect for stable domain knowledge, distilled FAQs, or skills that rarely change. ### Activation Memory **What:** Activation Memory is your model's reusable "working memory" — it includes precomputed key-value caches and hidden states that can be directly injected into the model's attention mechanism. Think of it as pre-cooked context that saves your LLM from repeatedly re-encoding static or frequently used information. **Why it matters:** By storing stable background content (like FAQs or known facts) in a KV-cache, your model can skip redundant computation during the prefill phase. This dramatically reduces Time To First Token (TTFT) and improves throughput for multi-turn conversations or retrieval-augmented generation. **When to use:** - Reuse background knowledge across many user queries. - Speed up chatbots that rely on the same domain context each turn. - Combine with MemScheduler to auto-promote stable plaintext memory to KV format. ### Explicit Memory **What:** Structured or unstructured knowledge units — user-visible, explainable. These can be documents, chat logs, graph nodes, or vector embeddings. **When to use:** Best for semantic search, user preferences, or traceable facts that evolve over time. Supports tags, provenance, and lifecycle states. ## How They Work Together MemOS lets you orchestrate all three memory types in a living loop: - Hot plaintext memories can be distilled into parametric weights. - High-frequency activation paths become reusable KV templates. - Stale parametric or activation units can be downgraded to plaintext nodes for traceability. With MemOS, your AI doesn't just store facts — it **remembers**, **understands**, and **grows**. > **Note**: **Insight** > Over time, frequently used plaintext memories can be distilled into parametric form. > Rarely used weights or caches can be demoted to plaintext storage for auditing and retraining. ## Cross-Cutting Concepts ### Hybrid Retrieval Combines vector similarity and graph traversal for robust, context-aware search. ### Governance & Lifecycle Every memory unit supports states (active, merged, archived), provenance tracking, and fine-grained access control — essential for auditing and compliance. > **Note**: **Compliance Reminder** > Always track provenance and state changes for each memory unit. > This helps meet audit and data governance requirements. ## Key Takeaway With MemOS, your LLM applications gain structured, evolving memory — empowering agents to plan, reason, and adapt like never before. --- # Architecture (/open_source/home/architecture) ## Core Modules ### MOS (Memory Operating System) The orchestration layer of MemOS — it manages predictive, asynchronous scheduling across multiple memory types (plaintext, activation, parametric) and orchestrates **multi-user, multi-session** memory workflows. MOS connects memory containers (**MemCubes**) with LLMs via a unified API for adding, searching, updating, transferring, or rolling back memories. It also supports cross-model, cross-device interoperability through a unified Memory Interchange Protocol (MIP). ### MemCube A modular, portable **memory container** — think of it like a flexible cartridge that can hold one or more memory types for a **user, agent, or session**. MemCubes can be dynamically registered, updated, or removed. They support containerized storage that is transferable across sessions, models, and devices. ### Memories MemOS supports several specialized memory types for different needs: #### 1. **Parametric Memory**(**Coming Soon**) Embedded in model weights; long-term, high-efficiency, but hard to edit. #### 2. **Activation Memory** Runtime hidden states & KV-cache; short-term, transient, steering dynamic behavior. #### 3. Plaintext Memory Structured or unstructured knowledge blocks; editable, traceable, suitable for fast updates, personalization & multi-agent sharing. - **GeneralTextMemory:** Flexible, vector-based storage for unstructured textual knowledge with semantic search and metadata filtering. - **TreeTextMemory:** Hierarchical, graph-style memory for structured knowledge — combining **tree-based hierarchy** and **cross-branch linking** for dynamic, evolving knowledge graphs. It supports long-term organization and multi-hop reasoning (often Neo4j-backed). > **Note**: **Best Practice** > Start simple with GeneralTextMemory — then scale to graph or KV-cache as your needs grow. #### Basic Modules Includes chunkers, embedders, LLM connectors, parsers, and interfaces for vector/graph databases. These provide the building blocks for memory extraction, semantic embedding, storage, and retrieval. ## Code Structure MemOS project is organized for clarity and plug-and-play: ``` src/memos/ api/ # API definitions chunkers/ # Text chunking utilities configs/ # Configuration schemas context/ # Log context embedders/ # Embedding models graph_dbs/ # Graph database backends (e.g., Neo4j) vec_dbs/ # Vector database backends (e.g., Qdrant) llms/ # LLM connectors mem_agent/ # Deep search mem_chat/ # Memory-augmented chat logic mem_cube/ # MemCube management mem_feedback # Memory feedback mem_os/ # MOS orchestration mem_reader/ # Memory readers mem_scheduler/ # Memory scheduling module memories/ # Memory type implementations multi_mem_cube/# Multi-view Cube parsers/ # Parsing utilities reranker/ # Reranker module templates/ # Prompt templates types/ # Type definitions ``` > **Note**: **Pro Tip** > Use examples/ for quick experimentation and docs/ for module deep dives. ## Extensibility MemOS is **modular by design**. Add your own memory types, storage backends, or LLM connectors with minimal changes — thanks to its **unified config and factory patterns**. > **Note**: **Pro Tip** > [Contribute](/open_source/contribution/overview) a new backend or share your custom memory > type — it’s easy to plug in. --- # REST API Server (/open_source/getting_started/rest_api_server) ![MemOS Architecture](https://cdn.memtensor.com.cn/img/memos_run_server_success_compressed.png)
APIs supported by MemOS REST API Server ### Features - Add new memory: Create a new memory for a specific user. - Search memories: Search for memory content for a specific user. - Get all user memories: Get all memory content for a specific user. - Memory feedback: Feedback memory content for a specific user. - Chat with MemOS: Chat with MemOS, returning SSE streaming response. ## Run Locally ### 1、Local Download ```bash # Download the code to the local folder git clone https://github.com/MemTensor/MemOS ``` ### 2、Configure Environment Variables ```bash # Enter the folder directory cd MemOS ``` #### Create a `.env` file in the root directory and set your environment variables. ##### .env The quick mode configuration is as follows, Complete Mode Reference [.env.example](https://github.com/MemTensor/MemOS/blob/main/docker/.env.example). ```bash # OpenAI API Key (Custom configuration required) OPENAI_API_KEY=sk-xxx # OpenAI API Base URL OPENAI_API_BASE=http://xxx:3000/v1 # Default model name MOS_CHAT_MODEL=qwen3-max # Memory Reader LLM model MEMRADER_MODEL=qwen3-max # Memory Reader API Key MEMRADER_API_KEY=sk-xxx # Memory Reader API Base URL MEMRADER_API_BASE=http://xxx:3000/v1 # Embedder model name MOS_EMBEDDER_MODEL=text-embedding-v4 # set default embedding backend default: ollama | universal_api MOS_EMBEDDER_BACKEND=universal_api # Embedder API Base URL MOS_EMBEDDER_API_BASE=http://xxx:8081/v1 # Embedder API Key MOS_EMBEDDER_API_KEY=xxx # Embedding vector dimension EMBEDDING_DIMENSION=1024 # Reranker backend (http_bge | etc.) MOS_RERANKER_BACKEND=cosine_local # Neo4j Connection URI # Optional values: neo4j-community | neo4j | nebular | polardb NEO4J_BACKEND=neo4j-community # required when backend=neo4j* NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=12345678 NEO4J_DB_NAME=neo4j MOS_NEO4J_SHARED_DB=false # Whether to use Redis scheduler DEFAULT_USE_REDIS_QUEUE=false # Enable chat api ENABLE_CHAT_API=true # Chat Model List can apply through Bailian. Models are selectable. CHAT_MODEL_LIST=[{"backend": "qwen", "api_base": "https://xxx/v1", "api_key": "sk-xxx", "model_name_or_path": "qwen3-max", "extra_body": {"enable_thinking": true} ,"support_models": ["qwen3-max"]}] ``` ### 3、Taking Bailian as an example to customize configuration ```bash # You can apply through the Bailian platform # https://bailian.console.aliyun.com/?spm=a2c4g.11186623.0.0.2f2165b08fRk4l&tab=api#/api # After successful application, obtain API_KEY and BASE-URL. The example configuration is as follows # OpenAI API Key (Using the API_KEY of Bailian) OPENAI_API_KEY=you_bailian_api_key # OpenAI API Base URL OPENAI_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1 # Default model name MOS_CHAT_MODEL=qwen3-max # Memory Reader LLM model MEMRADER_MODEL=qwen3-max # Memory Reader API Key (Using the API_KEY of Bailian) MEMRADER_API_KEY=you_bailian_api_key # Memory Reader API Base URL MEMRADER_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1 # Embedder The model name can refer to the following link # https://bailian.console.aliyun.com/?spm=a2c4g.11186623.0.0.2f2165b08fRk4l&tab=api#/api/?type=model&url=2846066 MOS_EMBEDDER_MODEL=text-embedding-v4 # set default embedding backend default: ollama | universal_api MOS_EMBEDDER_BACKEND=universal_api # Embedder API Base URL MOS_EMBEDDER_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1 # Embedder API Key (Using the API_KEY of Bailian) MOS_EMBEDDER_API_KEY=you_bailian_api_key # Embedding vector dimension EMBEDDING_DIMENSION=1024 # Reranker backend (http_bge | etc.) MOS_RERANKER_BACKEND=cosine_local # Neo4j Connection URI # Optional values: neo4j-community | neo4j | nebular | polardb NEO4J_BACKEND=neo4j-community # required when backend=neo4j* NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=12345678 NEO4J_DB_NAME=neo4j MOS_NEO4J_SHARED_DB=false # Whether to use Redis scheduler DEFAULT_USE_REDIS_QUEUE=false # Enable chat api ENABLE_CHAT_API=true CHAT_MODEL_LIST=[{"backend": "qwen", "api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1", "api_key": "you_bailian_api_key", "model_name_or_path": "qwen3-max-preview", "extra_body": {"enable_thinking": true} ,"support_models": ["qwen3-max-preview"]}] ``` ![MemOS bailian](https://cdn.memtensor.com.cn/img/get_key_url_by_bailian_compressed.png)
Bailian application API_KEY and BASE_URL example Configure dependency versions in docker/requirement.txt (negligible), Complete Mode Reference [requirements.txt](https://github.com/MemTensor/MemOS/blob/main/docker/requirements.txt). ### 4、Start Docker ```bash # If Docker is not installed, please install the corresponding version. The download link is as follows: https://www.docker.com/ #After installation, Docker can be started through the client or through the command line #Command line start sudo systemctl start docker # Check docker status docker ps # Check docker images (optional) docker images ``` ### Method 1: Docker use repository dependency package image/start (Recommended use) ```bash # Enter the Docker directory cd docker ``` #### Reference configuration environment variables above, .env file should be configured #### Configure Dockerfile(cd docker) Contains quick mode and full mode, distinguishing between using simplified packages (x86 and arm) and full packages (x86 and arm) ```bash ● Simplified package: Simplify dependencies related to Nvidia that are too large in size, achieve lightweight mirroring, and make local deployment lighter and faster. url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-base:v1.0 url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-base-arm:v1.0 ● Full package: Convert all MemOS dependencies into images, Experience complete functionality. By configuring Dockerfiles, you can directly build and start the package. url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-full-base:v1.0.0 url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-full-base-arm:v1.0.0 ``` #### Configure Dockerfile(cd docker) ```bash # The current example uses a simplified package url FROM registry.cn-shanghai.aliyuncs.com/memtensor/memos-base:v1.0 WORKDIR /app ENV HF_ENDPOINT=https://hf-mirror.com ENV PYTHONPATH=/app/src COPY src/ ./src/ EXPOSE 8000 CMD ["uvicorn", "memos.api.server_api:app", "--host", "0.0.0.0", "--port", "8000", "--reload"] ``` #### Build and start service using docker compose up: ```bash # Enter docker directory docker compose up ``` ![MemOS buildComposeupSuccess](https://cdn.memtensor.com.cn/img/memos_build_composeup_success_compressed.png)
Example image, port as per docker custom configuration #### Access API via [http://localhost:8000/docs](http://localhost:8000/docs). ![MemOS Architecture](https://cdn.memtensor.com.cn/img/memos_run_server_success_compressed.png) #### Test cases (Add user memory->Query user memory) Refer to Docker Compose up test cases ### Method 2:Client Install with Docker Compose up Development Docker Compose up comes pre-configured with qdrant, neo4j. Running the server requires the `OPENAI_API_KEY` environment variable. #### Enter docker folder ```bash # Enter docker folder from current directory cd docker ``` #### Install corresponding dependency modules ```bash pip install --upgrade pip && pip install --no-cache-dir -r requirements.txt # Install dependencies using Aliyun source # pip install --upgrade pip && pip install --no-cache-dir -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ ``` #### Start container using Docker Compose Up in docker directory (ensure vpn connects normally): ```bash # Build required for first run docker compose up --build # Not required for subsequent runs docker compose up ``` #### Access API via [http://localhost:8000/docs](http://localhost:8000/docs). #### Example process ##### (Query user memory (stop if none) -> Add user memory -> Query user memory) ##### Add User Memory http://localhost:8000/product/add (POST) ```bash # Request params { "user_id": "8736b16e-1d20-4163-980b-a5063c3facdc", "mem_cube_id": "b32d0977-435d-4828-a86f-4f47f8b55bca", "messages": [ { "role": "user", "content": "I like strawberry" } ], "memory_content": "", "doc_path": "", "source": "", "user_profile": false } # Response { "code": 200, "message": "Memory created successfully", "data": null } ``` ##### Query User Memory http://localhost:8000/product/search (POST) ```bash # Request params { "query": "What do I like", "user_id": "8736b16e-1d20-4163-980b-a5063c3facdc", "mem_cube_id": "b32d0977-435d-4828-a86f-4f47f8b55bca" } # Response { "code": 200, "message": "Search completed successfully", "data": { "text_mem": [ { "cube_id": "7231eda8-6c57-4f6e-97ce-98b699eebb98", "memories": [ { "id": "2f40be8f-736c-4a5f-aada-9489037769e0", "memory": "[user viewpoint] User likes strawberries.", "metadata": { "user_id": "de8215e3-3beb-4afc-9b64-ae594d62f1ea", "session_id": "root_session", "status": "activated", "type": "fact", "key": "User preference for strawberries", "confidence": 0.99, "source": null, "tags": [ "preference", "strawberry" ], "visibility": null, "updated_at": "2025-09-18T08:23:44.625479000+00:00", "memory_type": "UserMemory", "sources": [], "embedding": [], "created_at": "2025-09-18T08:23:44.625511000+00:00", "usage": [ "{ "time": "2025-09-18T08:24:17.759748", "info": { "user_id": "de8215e3-3beb-4afc-9b64-ae594d62f1ea", "session_id": "root_session" } }" ], "background": "The user expressed a preference for strawberries, indicating their inclination towards dietary preferences.", "relativity": 0.6349761312470591, "vector_sync": "success", "ref_id": "[2f40be8f]", "id": "2f40be8f-736c-4a5f-aada-9489037769e0", "memory": "[user viewpoint] User likes strawberries." }, "ref_id": "[2f40be8f]" }, ... } } ], "act_mem": [], "para_mem": [] } } # Response failure, troubleshooting # src/memos/api/config.py # Check "neo4j_vec_db" and "EMBEDDING_DIMENSION" configured in get_neo4j_community_config method ``` #### Modifications to server code or library code will automatically reload the server. ### Method 3:Client Install using CLI commands #### Install dependencies ```bash # pip install --upgrade pip && pip install --no-cache-dir -r ./docker/requirements.txt # Install dependencies using Aliyun source pip install --no-cache-dir -r ./docker/requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ ``` #### Open terminal and run the following command to install: ```bash # Packages that might need manual installation currently. Need to find resources for these two packages # neo4j.5.26.4.tar qdrant.v1.15.3.tar docker load -i neo4j.5.26.4.tar docker load -i qdrant.v1.15.3.tar # Check if installed successfully docker images # Check if running docker ps -a # Root directory uvicorn memos.api.server_api:app --host 0.0.0.0 --port 8000 --workers 1 # If ModuleNotFoundError: No module named 'memos' appears during startup, it is due to path matching problem, please execute export PYTHONPATH=/you-file-absolute-path/MemOS/src ``` #### Access API After startup is complete, access API via [http://localhost:8000/docs](http://localhost:8000/docs). ### Method 4: Without Docker #### Reference configuration environment variables above, .env file should be configured #### Install Poetry for dependency management: ```bash curl -sSL https://install.python-poetry.org | python3 - ``` #### Poetry environment variable configuration: ```bash # To start using, you need to find Poetry's bin directory in "PATH" (/Users/jinyunyuan/.local/bin) environment variable # Modern macOS systems default Shell is zsh. You can confirm via following command 1. Determine which Shell you are using echo $SHELL # If output is /bin/zsh or /usr/bin/env zsh, then you are zsh. # (If your system version is older, might still be using bash, output will be /bin/bash) 2. Open corresponding Shell config file # If using zsh (vast majority of cases): # Use nano editor (recommended for beginners) nano ~/.zshrc # Or use vim editor # vim ~/.zshrc # If using bash: nano ~/.bash_profile # Or nano ~/.bashrc 3. Add PATH environment variable # At the very end of opened file, start a new line, paste installation prompt command: export PATH="/you-path/.local/bin:$PATH" 4. Save and exit editor # If using nano: # Press Ctrl + O to write (save), press Enter to confirm filename. # Then press Ctrl + X to exit editor. # If using vim: # Press i to enter insert mode, paste code, then press ESC key to exit insert mode. # Input :wq, then press Enter to save and exit. 5. Make configuration take effect immediately # Newly modified config file won't automatically take effect in currently open terminal window, you need to run one of the following commands to reload it: # For zsh: source ~/.zshrc # For bash: source ~/.bash_profile 6. Verify if installation is successful # Now, you can execute test command in prompt to check if everything is ready: poetry --version # Success will show version number Poetry (version 2.2.0) ``` #### Install all project dependencies and development tools: ```bash make install ``` #### First start neo4j and qdrant in docker #### Start FastAPI server (In MomOS directory): ```bash uvicorn memos.api.product_api:app --host 0.0.0.0 --port 8000 --reload ``` #### After server runs, you can use OpenAPI docs to test API, URL is [http://localhost:8000/docs](http://localhost:8000/docs) or [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs) #### Test cases (Register user->Add user memory->Query user memory) Refer to Docker Compose up test cases ### Method 5:Start using PyCharm #### Run server_api ```bash 1. Enter MemOS/docker/Dockerfile file, modify run configuration # Start the docker CMD ["uvicorn", "memos.api.server_api:app", "--host", "0.0.0.0", "--port", "8000", "--reload"] 2. Enter directory MemOS/src/memos/api directly run server_api.py ``` --- # MemOS Examples (/open_source/getting_started/examples) - [Minimal Pipeline](/open_source/getting_started/examples#example-1-minimal-pipeline): The smallest working pipeline — add, search, update and dump plaintext memories. - [Adding and retrieving multiple information sources](/open_source/getting_started/examples#example-2-multi-modal): Adding multi-source messages—including text, images, files, and tool calls—into memory and enabling their retrieval. - [Multi-Cube addition and retrieval](/open_source/getting_started/examples#example-3-multi-cube): Add different memories to different Cubes and retrieve them simultaneously during a search. - [KVCacheMemory Only](/open_source/getting_started/examples#example-4-kvcachememory-only): Speed up sessions with short-term KV cache for fast context injection. - [Hybrid TreeText + KVCache](/open_source/getting_started/examples#example-5-hybrid): Combine explainable graph memory with fast KV caching in a single MemCube. - [Multi-Memory Scheduling](/open_source/getting_started/examples#example-6-multi-memory-scheduling): Run dynamic memory orchestration for multi-user, multi-session agents. ## Example 1: Minimal Pipeline ### When to Use: - You want the smallest possible working example. - You only need simple plaintext memories stored in a vector DB and retrieve them. ### Key Points: - Supports basic personal memory integration and search. ### Full Example Code ```python import json from memos.api.routers.server_router import add_memories, search_memories from memos.api.product_models import APIADDRequest, APISearchRequest user_id = "test_user_1" add_req = APIADDRequest( user_id=user_id, writable_cube_ids=["cube_test_user_1"], messages = [ {"role": "user", "content": "I’ve planned to travel to Guangzhou during the summer vacation. What chain hotels are available for accommodation?"}, {"role": "assistant", "content": "You can consider [7 Days Inn, Ji Hotel, Hilton], etc."}, {"role": "user", "content": "I’ll choose 7 Days Inn."}, {"role": "assistant", "content": "Okay, feel free to ask me if you have any other questions."} ], async_mode="sync", mode="fine", ) add_rsp = add_memories(add_req) print("add_memories rsp: \n\n", add_rsp) search_req = APISearchRequest( user_id=user_id, readable_cube_ids=["cube_test_user_1"], query="Please recommend a hotel that I haven’t stayed at before.", include_preference=True, ) search_rsp = search_memories(search_req).data print("\n\nsearch_rsp: \n\n", json.dumps(search_rsp, indent=2, ensure_ascii=False)) ``` ## Example 2: Adding and retrieving multi-source memories ### When to Use: - In addition to plain text conversations, you need to add files, image content, or tool call history to memory. - At the same time, you want to retrieve memories from these multiple sources. ### Key Points: - Adding memories from multiple information sources. - Needs to include downloadable file and image URLs. - The added information must strictly follow the OpenAI Messages format. - The tool schema in the system prompt needs to be wrapped in . ### Full Example Code Adding text and files to memory ```python import json from memos.api.routers.server_router import add_memories, search_memories from memos.api.product_models import APIADDRequest, APISearchRequest user_id = "test_user_2" add_req = APIADDRequest( user_id=user_id, writable_cube_ids=["cube_test_user_2"], messages = [ { "role": "user", "content": [ { "type": "text", "text": "Please read this file, summarize the key points, and provide a final conclusion." }, { "type": "file", "file": { "file_id": "file_123", "filename": "report.md", "file_data": "@http://139.196.232.20:9090/graph-test/algorithm/2025_11_13/1763043889_1763043782_PM1%E8%BD%A6%E9%97%B4PMT%E9%9D%B4%E5%8E%8B%E8%BE%B9%E5%8E%8B%E5%8E%8B%E5%8A%9B%E6%97%A0%E6%B3%95%E5%BB%BA%E7%AB%8B%E6%95%85%E9%9A%9C%E6%8A%A5%E5%91%8A20240720.md" } }, ] }, { "role": "assistant", "content": [ { "type": "text", "text": "Final Summary: During the PMT boot-pressure startup test of the PM1 workshop on July 20, 2024, the drive could not run because the edge pressures on both sides failed to reach the 2.5-bar interlock requirement. After troubleshooting, the PLC output signals, hydraulic pipelines, and valves were all found to be normal. The root cause was ultimately identified as poor contact at the negative terminal of the proportional valve’s DC 24V power supply inside the PLC cabinet, caused by a short-jumpered terminal block. After re-connecting the negative incoming lines in parallel, the equipment returned to normal operation. It is recommended to replace terminal blocks in batches, inspect instruments with uncertain service life, and optimize the troubleshooting process by tracing common-mode issues from shared buses and power supply sources." } ] } ], async_mode="sync", mode="fine", ) add_rsp = add_memories(add_req) print("add_memories rsp: \n\n", add_rsp) search_req = APISearchRequest( user_id=user_id, readable_cube_ids=["cube_test_user_2"], query="Workshop PMT boot pressure startup test", include_preference=False, ) search_rsp = search_memories(search_req).data print("\n\nsearch_rsp: \n\n", json.dumps(search_rsp, indent=2, ensure_ascii=False)) ``` Adding messages from multiple mixed information sources to memory ```python import json from memos.api.routers.server_router import add_memories, search_memories from memos.api.product_models import APIADDRequest, APISearchRequest user_id = "test_user_2" add_req = APIADDRequest( user_id=user_id, writable_cube_ids=["cube_test_user_2"], messages = [ { "role": "system", "content": [ { "type": "text", "text": "You are a professional industrial fault analysis assistant. Please read the PDF, images, and instructions provided by the user and provide a professional technical summary.\n\n\n[\n {\n \"name\": \"file_reader\",\n \"description\": \"Used to read the content of files uploaded by the user and return the text data (in JSON string format).\",\n \"parameters\": [\n {\"name\": \"file_id\", \"type\": \"string\", \"required\": true, \"description\": \"The file ID to be read\"}\n ],\n \"returns\": {\"type\": \"text\", \"description\": \"Returns the extracted text content of the file\"}\n }\n]\n" } ] }, { "role": "user", "content": [ { "type": "text", "text": "Please read this file and image, summarize the key points, and provide a final conclusion." }, { "type": "file", "file": { "file_id": "file_123", "filename": "report.pdf", "file_data": "@http://139.196.232.20:9090/graph-test/algorithm/2025_11_13/1763043889_1763043782_PM1%E8%BD%A6%E9%97%B4PMT%E9%9D%B4%E5%8E%8B%E8%BE%B9%E5%8E%8B%E5%8E%8B%E5%8A%9B%E6%97%A0%E6%B3%95%E5%BB%BA%E7%AB%8B%E6%95%85%E9%9A%9C%E6%8A%A5%E5%91%8A20240720.md" } }, { "type": "image_url", "image_url": { "url": "https://play-groud-test-1.oss-cn-shanghai.aliyuncs.com/%E5%9B%BE%E7%89%871.jpeg" } } ] }, { "role": "assistant", "tool_calls": [ { "id": "call_file_reader_001", "type": "function", "function": { "name": "file_reader", "arguments": "{\"file_id\": \"file_123\"}" } } ] }, { "role": "tool", "tool_call_id": "call_file_reader_001", "content": [ { "type": "text", "text": "{\"file_id\":\"file_123\",\"extracted_text\":\"PM1 workshop PMT boot pressure startup test record… Final fault cause: poor contact at the negative terminal of the DC 24V power supply circuit due to a short-jumped terminal block.\"}" } ] }, { "role": "assistant", "content": [ { "type": "text", "text": "Final Summary: During the PMT boot-pressure startup test of the PM1 workshop on July 20, 2024, the drive could not run because the edge pressures on both sides failed to reach the 2.5-bar interlock requirement. After troubleshooting, the PLC output signals, hydraulic pipelines, and valves were all found to be normal. The root cause was ultimately identified as poor contact at the negative terminal of the proportional valve’s DC 24V power supply inside the PLC cabinet, caused by a short-jumpered terminal block. After re-connecting the negative incoming lines in parallel, the equipment returned to normal operation. It is recommended to replace terminal blocks in batches, inspect instruments with uncertain service life, and optimize the troubleshooting process by tracing common-mode issues from shared buses and power supply sources." } ] } ], async_mode="sync", mode="fine", ) add_rsp = add_memories(add_req) print("add_memories rsp: \n\n", add_rsp) search_req = APISearchRequest( user_id=user_id, readable_cube_ids=["cube_test_user_2"], query="Workshop PMT boot pressure startup test", include_preference=False, ) search_rsp = search_memories(search_req).data print("\n\nsearch_rsp: \n\n", json.dumps(search_rsp, indent=2, ensure_ascii=False)) ``` ## Example 3: Multi-Cube addition and retrieval ### When to Use: - Add memories to separate, isolated Cube spaces - You want to retrieve memories from different Cube spaces simultaneously ### Key Points: - Input a readable_cube_ids list containing multiple cube IDs during retrieval ### Full Example Code ```python import json from memos.api.routers.server_router import add_memories, search_memories from memos.api.product_models import APIADDRequest, APISearchRequest user_id = "test_user_3" add_req = APIADDRequest( user_id=user_id, writable_cube_ids=["cube_test_user_3_1"] , messages = [ {"role": "user", "content": "I’ve planned to travel to Guangzhou during the summer vacation. What chain hotels are available for accommodation?"}, {"role": "assistant", "content": "You can consider [7 Days Inn, Ji Hotel, Hilton], etc."}, {"role": "user", "content": "I’ll choose 7 Days Inn."}, {"role": "assistant", "content": "Okay, feel free to ask me if you have any other questions."} ], async_mode="sync", mode="fine", ) add_rsp = add_memories(add_req) print("add_memories rsp: \n\n", add_rsp) add_req = APIADDRequest( user_id=user_id, writable_cube_ids=["cube_test_user_3_2"] , messages = [ {"role": "user", "content": "I love you, I need you."}, {"role": "assistant", "content": "Wow, I love you too"}, ], async_mode="sync", mode="fine", ) add_rsp = add_memories(add_req) print("add_memories rsp: \n\n", add_rsp) search_req = APISearchRequest( user_id=user_id, readable_cube_ids=["cube_test_user_3_1", "cube_test_user_3_2"], query="Please recommend a hotel, Love u u", include_preference=True, ) search_rsp = search_memories(search_req).data print("\n\nsearch_rsp: \n\n", json.dumps(search_rsp, indent=2, ensure_ascii=False)) ``` ## Example 4: KVCacheMemory Only ### When to Use: - You want short-term working memory for faster multi-turn conversation. - Useful for chatbot session acceleration or prompt reuse. - Best for caching hidden states / KV pairs. ### Key Points: - Uses KVCacheMemory with no explicit text memory. - Demonstrates extract → add → merge → get → delete. - Shows how to dump/load KV caches. ### Full Example Code ```python from memos.configs.memory import MemoryConfigFactory from memos.memories.factory import MemoryFactory # Create config for KVCacheMemory (HuggingFace backend) config = MemoryConfigFactory( backend="kv_cache", config={ "extractor_llm": { "backend": "huggingface", "config": { "model_name_or_path": "Qwen/Qwen3-0.6B", "max_tokens": 32, "add_generation_prompt": True, "remove_think_prefix": True, }, }, }, ) # Instantiate KVCacheMemory kv_mem = MemoryFactory.from_config(config) # Extract a KVCacheItem (DynamicCache) prompt = [ {"role": "user", "content": "What is MemOS?"}, {"role": "assistant", "content": "MemOS is a memory operating system for LLMs."}, ] print("===== Extract KVCacheItem =====") cache_item = kv_mem.extract(prompt) print(cache_item) # Add the cache to memory kv_mem.add([cache_item]) print("All caches:", kv_mem.get_all()) # Get by ID retrieved = kv_mem.get(cache_item.id) print("Retrieved:", retrieved) # Merge caches (simulate multi-turn) item2 = kv_mem.extract([{"role": "user", "content": "Tell me a joke."}]) kv_mem.add([item2]) merged = kv_mem.get_cache([cache_item.id, item2.id]) print("Merged cache:", merged) # Delete one kv_mem.delete([cache_item.id]) print("After delete:", kv_mem.get_all()) # Dump & load caches kv_mem.dump("tmp/kv_mem") print("Dumped to tmp/kv_mem") kv_mem.delete_all() kv_mem.load("tmp/kv_mem") print("Loaded caches:", kv_mem.get_all()) ``` ## Example 5: Hybrid ### When to Use: - You want long-term explainable memory and short-term fast context together. - Ideal for complex agents that plan, remember facts, and keep chat context. - Demonstrates multi-memory orchestration. ### How It Works: - **TreeTextMemory** stores your long-term knowledge in a graph DB (Neo4j). - **KVCacheMemory** stores recent or stable context as activation caches. - Both work together in a single **MemCube**, managed by your `MOS` pipeline. ### Full Example Code ```python import os from memos.configs.mem_cube import GeneralMemCubeConfig from memos.configs.mem_os import MOSConfig from memos.mem_cube.general import GeneralMemCube from memos.mem_os.main import MOS # 1. Setup CUDA (if needed) — for local GPU inference os.environ["CUDA_VISIBLE_DEVICES"] = "1" # 2. Define user & paths user_id = "root" cube_id = "root/mem_cube_kv_cache" tmp_cube_path = "/tmp/default/mem_cube_5" # 3. Initialize MOSConfig mos_config = MOSConfig.from_json_file("examples/data/config/simple_treekvcache_memos_config.json") mos = MOS(mos_config) # 4. Initialize the MemCube (TreeTextMemory + KVCacheMemory) cube_config = GeneralMemCubeConfig.from_json_file( "examples/data/config/simple_treekvcache_cube_config.json" ) mem_cube = GeneralMemCube(cube_config) # 5. Dump the MemCube to disk try: mem_cube.dump(tmp_cube_path) except Exception as e: print(e) # 6. Register the MemCube explicitly mos.register_mem_cube(tmp_cube_path, mem_cube_id=cube_id, user_id=user_id) # 7. Extract and add a KVCache memory (simulate stable context) extract_kvmem = mos.mem_cubes[cube_id].act_mem.extract("I like football") mos.mem_cubes[cube_id].act_mem.add([extract_kvmem]) # 8. Start chatting — now your chat uses: # - TreeTextMemory: for structured multi-hop retrieval # - KVCacheMemory: for fast context injection while True: user_input = input("👤 [You] ").strip() print() response = mos.chat(user_input) print(f"🤖 [Assistant] {response}\n") print("📢 [System] MemChat has stopped.") ``` ## Example 6: Multi-Memory Scheduling ### When to Use: - You want to manage multiple users, multiple MemCubes, or dynamic memory flows. - Good for SaaS agents or multi-session LLMs. - Demonstrates MemScheduler + config YAMLs. ### Key Points: - Uses parse_yaml to load MOSConfig and MemCubeConfig. - Dynamic user and cube creation. - Shows runtime scheduling of memories. ### Full Example Code ```python import shutil import uuid from pathlib import Path from memos.configs.mem_cube import GeneralMemCubeConfig from memos.configs.mem_os import MOSConfig from memos.mem_cube.general import GeneralMemCube from memos.mem_os.main import MOS from memos.mem_scheduler.utils import parse_yaml # Load main MOS config with MemScheduler config = parse_yaml("./examples/data/config/mem_scheduler/memos_config_w_scheduler.yaml") mos_config = MOSConfig(**config) mos = MOS(mos_config) # Create user with dynamic ID user_id = str(uuid.uuid4()) mos.create_user(user_id=user_id) # Create MemCube config and dump it config = GeneralMemCubeConfig.from_yaml_file( "./examples/data/config/mem_scheduler/mem_cube_config.yaml" ) mem_cube_id = "mem_cube_5" mem_cube_name_or_path = f"./outputs/mem_scheduler/{user_id}/{mem_cube_id}" # Remove old folder if exists if Path(mem_cube_name_or_path).exists(): shutil.rmtree(mem_cube_name_or_path) print(f"{mem_cube_name_or_path} is not empty, and has been removed.") # Dump new cube mem_cube = GeneralMemCube(config) mem_cube.dump(mem_cube_name_or_path) # Register MemCube for this user mos.register_mem_cube( mem_cube_name_or_path=mem_cube_name_or_path, mem_cube_id=mem_cube_id, user_id=user_id ) # Add messages messages = [ { "role": "user", "content": "I like playing football." }, { "role": "assistant", "content": "I like playing football too." }, ] mos.add(messages, user_id=user_id, mem_cube_id=mem_cube_id) # Chat loop: show TreeTextMemory nodes + KVCache while True: user_input = input("👤 [You] ").strip() print() response = mos.chat(user_input, user_id=user_id) retrieved_memories = mos.get_all(mem_cube_id=mem_cube_id, user_id=user_id) print(f"🤖 [Assistant] {response}") # Show WorkingMemory nodes in TreeTextMemory for node in retrieved_memories["text_mem"][0]["memories"]["nodes"]: if node["metadata"]["memory_type"] == "WorkingMemory": print(f"[WorkingMemory] {node['memory']}") # Show Activation Memory if retrieved_memories["act_mem"][0]["memories"]: for act_mem in retrieved_memories["act_mem"][0]["memories"]: print(f"⚡ [KVCache] {act_mem['memory']}") else: print("⚡ [KVCache] None\n") ``` > **Note**: **Keep in Mind** > Use dump() and load() to persist your memory cubes. > > Always check your vector DB dimension matches your embedder. > > For graph memory, you'll need Neo4j Desktop (community version support coming soon). ## Next Steps You're just getting started!Next, try: - Pick the example that matches your use case. - Combine modules to build smarter, more persistent agents! Need more? See the API Reference or contribute your own example! --- # MemOS API Development Guide (Components & Handlers Architecture) (/open_source/modules/mos/overview) This architecture separates "system components" (Components) from "business logic execution" (Handlers), making the system easier to extend, test, and maintain. ## 1. Core Concepts ### 1.1 Components (Core Components) Components are the "organs" of MemOS. They are initialized when the server starts (via `init_server()`) and reused throughout the system lifecycle. Core components include: #### Core Memory Components 1. **MemCube**: A memory container that isolates memories across different users and application scenarios, managing multiple memory modules in a unified way. 2. **MemReader**: A memory processor that parses user inputs (chat, documents, images) into standardized memory items that the system can persist. 3. **MemScheduler**: A background scheduler that handles asynchronous processing of memory operations—storage, indexing, and organization—supporting concurrent task execution. 4. **MemChat**: A conversation controller responsible for orchestrating the memory-augmented dialogue loop: "retrieve memory → generate response → store new memory". 5. **MemFeedback**: A memory correction engine that understands users' natural-language feedback and performs atomic-level updates to memories (correction, addition, replacement). ### 1.2 Handlers (Business Processors) Handlers are the "brain" of MemOS. They encapsulate concrete business logic by coordinating and calling the capabilities of Components to complete user-facing tasks. #### Core Handlers Overview | Handler | Purpose | Key Methods | | :--- | :--- | :--- | | **AddHandler** | Add memories (chat / documents / text) | `handle_add_memories` | | **SearchHandler** | Search memories (semantic retrieval) | `handle_search_memories` | | **ChatHandler** | Chat (with memory augmentation) | `handle_chat_complete`, `handle_chat_stream` | | **FeedbackHandler** | Feedback (correct memories / human feedback) | `handle_feedback_memories` | | **MemoryHandler** | Manage (get details / delete) | `handle_get_memory`, `handle_delete_memories` | | **SchedulerHandler** | Scheduling (query async task status) | `handle_scheduler_status`, `handle_scheduler_wait` | | **SuggestionHandler** | Suggestions (generate recommended questions) | `handle_get_suggestion_queries` | ## 2. API Details ### 2.1 Initialization Initialization is the foundation of system startup. All Handlers rely on a unified component registry and dependency-injection mechanism. - Component loading (`init_server`): When the system starts, it initializes all core components, including the LLM, storage layers (vector DB, graph DB), scheduler, and various Memory Cubes. - Dependency injection (`HandlerDependencies`): To ensure loose coupling and testability, all components are wrapped into a `HandlerDependencies` container. When a Handler is instantiated, it receives this container and can access needed resources—such as `naive_mem_cube`, `mem_reader`, or `feedback_server`—without duplicating initialization logic. ### 2.2 Add Memories (AddHandler) AddHandler is the brain's "memory intake instruction", responsible for converting external information into system memories. It handles not only intake and conversion of various information types, but also automatically recognizes feedback and routes it to dedicated feedback processing. - Core capabilities: - Multimodal support: Processes user conversations, documents, images, and other input types, converting them into standardized memory objects. - Sync and async modes: Controlled via `async_mode`. **Sync mode** ("sync"): processes immediately and blocks until completion, suitable for debugging. **Async mode** ("async"): pushes tasks to a background queue for concurrent processing by MemScheduler, returns a task ID immediately, suitable for production to improve response speed. - Automatic feedback routing: If the request sets `is_feedback=True`, the Handler automatically extracts the last user message as feedback content and routes it to MemFeedback processing, instead of adding it as a normal memory. - Multi-target writes: Supports writing to multiple MemCubes simultaneously. When multiple targets are specified, the system processes all write tasks in parallel; when only one target is specified, it uses a lightweight approach. ### 2.3 Search Memories (SearchHandler) SearchHandler is the brain's "memory retrieval instruction", providing semantic-based intelligent memory query capabilities and serving as a key component for RAG (Retrieval-Augmented Generation). - Core capabilities: - Semantic retrieval: Uses embedding technology to recall relevant memories based on semantic similarity, understanding user intent more accurately than simple keyword matching. - Flexible search scope: Supports specifying the target data range for retrieval. For example, you can search only within a specific user's memory, or search across multiple users' shared public memories, meeting different privacy and business needs. - Multiple retrieval modes: Flexibly choose between speed and accuracy based on application scenarios. **Fast mode** suits scenarios requiring high real-time performance, **fine mode** suits scenarios pursuing high retrieval accuracy, and **mixed mode** balances both. - Multi-step reasoning retrieval: For complex questions, supports deep reasoning capability to progressively approach the most relevant memories through multiple rounds of understanding and retrieval. ### 2.4 Chat (ChatHandler) ChatHandler is the brain's "dialogue coordination instruction", responsible for converting user dialogue requirements into a complete business process. It does not directly operate on memories; instead, it coordinates other Handlers to complete end-to-end dialogue tasks. - Core capabilities: - Orchestration: Automatically executes the complete dialogue loop of "retrieve memory → generate response → store memory". Each user query benefits from historical memories for smarter responses, and each dialogue is crystallized as new memory, achieving "chat-as-learning". - Context management: Handles the assembly of `history` (past conversation) and `query` (current question) to ensure the LLM understands the complete dialogue context and avoids information loss. - Multiple interaction modes: Supports standard request-response mode and streaming response mode. Standard mode suits simple questions, streaming mode suits long-text replies, meeting different frontend interaction needs. - Message push (optional): Supports automatically pushing results to third-party platforms (such as DingTalk) after generating responses, enabling multi-channel integration. ### 2.5 Feedback and Correction (FeedbackHandler) FeedbackHandler is the brain's "feedback correction instruction", responsible for understanding users' natural-language feedback about AI performance and automatically locating and correcting relevant memory content. - Core capabilities: - Memory correction: When users point out AI errors (such as "the meeting location is Shanghai, not Beijing"), the Handler automatically updates or marks old memories. The system uses version management rather than direct deletion, maintaining traceability of modification history. - Positive and negative feedback: Supports users marking specific memory quality through upvote or downvote. The system adjusts the memory's weight and credibility accordingly, making subsequent retrieval more accurate. - Precise targeting: Supports two feedback modes. One is automatic conflict detection based on dialogue history, the other allows users to directly specify memories to correct, improving feedback effectiveness and accuracy. ### 2.6 Memory Management (MemoryHandler) MemoryHandler is the brain's "memory management instruction", providing low-level CRUD capabilities for memory data, primarily for system admin backends or data cleanup scenarios. - Core capabilities: - Fine-grained management: Unlike AddHandler's business-level writes, this Handler allows fetching detailed information of a single memory or performing physical deletion by memory ID. This direct operation bypasses business logic packaging, primarily for debugging, auditing, or system cleanup. - Direct backend access: Some management operations need to interact directly with the underlying memory component (naive_mem_cube) to provide the most efficient and lowest-latency data operations, meeting system operations needs. ### 2.7 Scheduler Status (SchedulerHandler) SchedulerHandler is the brain's "task monitoring instruction", responsible for tracking the real-time execution status of all async tasks in the system, allowing users to understand background task progress and results. - Core capabilities: - Status tracking: Tracks real-time task status in real-time (queued, running, completed, failed). This is important for users in async mode who need to understand when tasks complete. - Result fetching: Provides a task result query interface. When async tasks complete, users can fetch the final execution result or error information through this interface, understanding whether operations succeeded and the reasons for failure. - Sync wait (debugging tool): During testing and integration testing, provides a tool to force async tasks into synchronous waits, allowing developers to debug async flows like debugging synchronous code, improving development efficiency. ### 2.8 Suggested Questions (SuggestionHandler) SuggestionHandler is the brain's "suggestion generation instruction", predicting users' potential needs and proactively recommending related questions to help users explore system capabilities and discover topics of interest. - Core capabilities: - Dual-mode generation: - Conversation-based suggestions: When users provide recent conversation records, the system analyzes dialogue context and infers potential follow-up topics of interest, generating 3 related recommended questions. - Memory-based suggestions: When there is no conversation context, the system infers user interests and status from recent memories, generating recommended questions related to the user's recent life or work. This suits dialogue initiation or topic transitions. - Multi-language support: Recommended questions automatically adapt to user language settings, supporting Chinese, English, and other languages, improving experience for different users. --- # MemCube (/open_source/modules/mem_cube) ## What is MemCube? **MemCube** contains three major types of memory: - **Textual Memory**: Stores text knowledge, supporting semantic search and knowledge management. - **Activation Memory**: Stores intermediate reasoning results, accelerating LLM responses. - **Parametric Memory**: Stores model adaptation weights, used for personalization. Each memory type can be independently configured and flexibly combined based on application needs. ## Structure MemCube is defined by a configuration (see `GeneralMemCubeConfig`), which specifies the backend and settings for each memory type. The typical structure is: ``` MemCube ├── user_id ├── cube_id ├── text_mem: TextualMemory ├── act_mem: ActivationMemory └── para_mem: ParametricMemory ``` All memory modules are accessible via the MemCube interface: - `mem_cube.text_mem` - `mem_cube.act_mem` - `mem_cube.para_mem` ## View Architecture Starting from MemOS 2.0, runtime operations (add/search) should go through the **View architecture**: ### SingleCubeView Use this to manage a single MemCube. When you only need one memory space. ```python from memos.multi_mem_cube.single_cube import SingleCubeView view = SingleCubeView( cube_id="my_cube", naive_mem_cube=naive_mem_cube, mem_reader=mem_reader, mem_scheduler=mem_scheduler, logger=logger, searcher=searcher, feedback_server=feedback_server, # Optional ) # Add memories view.add_memories(add_request) # Search memories view.search_memories(search_request) ``` ### CompositeCubeView Use this to manage multiple MemCubes. When you need unified operations across multiple memory spaces. ```python from memos.multi_mem_cube.composite_cube import CompositeCubeView # Create multiple SingleCubeViews view1 = SingleCubeView(cube_id="cube_1", ...) view2 = SingleCubeView(cube_id="cube_2", ...) # Composite view for multi-cube operations composite = CompositeCubeView(cube_views=[view1, view2], logger=logger) # Search across all cubes results = composite.search_memories(search_request) # Results contain cube_id field to identify source ``` ## API Request Fields When using the View architecture for add/search operations, specify these parameters: | Field | Type | Description | | :--- | :--- | :--- | | `writable_cube_ids` | `list[str]` | Target cubes for add operations. Can specify multiple; the system will write to all targets in parallel. | | `readable_cube_ids` | `list[str]` | Target cubes for search operations. Can search across multiple cubes; results include source information. | | `async_mode` | `str` | Execution mode: `"sync"` for synchronous processing (wait for results), `"async"` for asynchronous processing (push to background queue, return task ID immediately). | ## Core Methods (`GeneralMemCube`) **GeneralMemCube** is the standard implementation of MemCube, managing all system memories through a unified interface. Here are the main methods to complete memory lifecycle management. ### Initialization ```python from memos.mem_cube.general import GeneralMemCube mem_cube = GeneralMemCube(config) ``` ### Static Data Operations | Method | Description | | :--- | :--- | | `init_from_dir(dir)` | Load a MemCube from a local directory | | `init_from_remote_repo(repo, base_url)` | Load a MemCube from a remote repository (e.g., Hugging Face) | | `load(dir)` | Load all memories from a directory into the existing instance | | `dump(dir)` | Save all memories to a directory for persistence | ## File Structure A MemCube directory contains the following files, with each file corresponding to a memory type: - `config.json` (MemCube configuration) - `textual_memory.json` (textual memory) - `activation_memory.pickle` (activation memory) - `parametric_memory.adapter` (parametric memory) ## Usage Examples ### Export Example (dump_cube.py) ```python import json import os import shutil from memos.api.handlers import init_server from memos.api.product_models import APIADDRequest from memos.log import get_logger from memos.multi_mem_cube.single_cube import SingleCubeView logger = get_logger(__name__) EXAMPLE_CUBE_ID = "example_dump_cube" EXAMPLE_USER_ID = "example_user" # 1. Initialize server components = init_server() naive = components["naive_mem_cube"] # 2. Create SingleCubeView view = SingleCubeView( cube_id=EXAMPLE_CUBE_ID, naive_mem_cube=naive, mem_reader=components["mem_reader"], mem_scheduler=components["mem_scheduler"], logger=logger, searcher=components["searcher"], feedback_server=components["feedback_server"], ) # 3. Add memories via View result = view.add_memories(APIADDRequest( user_id=EXAMPLE_USER_ID, writable_cube_ids=[EXAMPLE_CUBE_ID], messages=[ {"role": "user", "content": "This is a test memory"}, {"role": "user", "content": "Another memory to persist"}, ], async_mode="sync", # Use sync mode to ensure immediate completion )) print(f"✓ Added {len(result)} memories") # 4. Export data for the specific cube_id output_dir = "tmp/mem_cube_dump" if os.path.exists(output_dir): shutil.rmtree(output_dir) os.makedirs(output_dir, exist_ok=True) # Export graph data (only data for the current cube_id) json_data = naive.text_mem.graph_store.export_graph( include_embedding=True, # Include embeddings to support semantic search user_name=EXAMPLE_CUBE_ID, # Filter by cube_id ) # Fix embedding format: parse string to list for import compatibility import contextlib for node in json_data.get("nodes", []): metadata = node.get("metadata", {}) if "embedding" in metadata and isinstance(metadata["embedding"], str): with contextlib.suppress(json.JSONDecodeError): metadata["embedding"] = json.loads(metadata["embedding"]) print(f"✓ Exported {len(json_data.get('nodes', []))} nodes") # Save to file memory_file = os.path.join(output_dir, "textual_memory.json") with open(memory_file, "w", encoding="utf-8") as f: json.dump(json_data, f, indent=2, ensure_ascii=False) print(f"✓ Saved to: {memory_file}") ``` ### Import and Search Example (load_cube.py) > **Embedding Compatibility Note**: The sample data uses the **bge-m3** model with **1024 dimensions**. If your environment uses a different embedding model or dimension, semantic search after import may be inaccurate or fail. Ensure your `.env` configuration matches the embedding settings used during export. ```python import json import os from memos.api.handlers import init_server from memos.api.product_models import APISearchRequest from memos.log import get_logger from memos.multi_mem_cube.single_cube import SingleCubeView logger = get_logger(__name__) EXAMPLE_CUBE_ID = "example_dump_cube" EXAMPLE_USER_ID = "example_user" # 1. Initialize server components = init_server() naive = components["naive_mem_cube"] # 2. Create SingleCubeView view = SingleCubeView( cube_id=EXAMPLE_CUBE_ID, naive_mem_cube=naive, mem_reader=components["mem_reader"], mem_scheduler=components["mem_scheduler"], logger=logger, searcher=components["searcher"], feedback_server=components["feedback_server"], ) # 3. Load data from file into graph_store load_dir = "examples/data/mem_cube_tree" memory_file = os.path.join(load_dir, "textual_memory.json") with open(memory_file, encoding="utf-8") as f: json_data = json.load(f) naive.text_mem.graph_store.import_graph(json_data, user_name=EXAMPLE_CUBE_ID) nodes = json_data.get("nodes", []) print(f"✓ Imported {len(nodes)} nodes") # 4. Display loaded data print(f"\nLoaded {len(nodes)} memories:") for i, node in enumerate(nodes[:3], 1): # Show first 3 metadata = node.get("metadata", {}) memory_text = node.get("memory", "N/A") mem_type = metadata.get("memory_type", "unknown") print(f" [{i}] Type: {mem_type}") print(f" Content: {memory_text[:60]}...") # 5. Semantic search verification query = "test memory dump persistence demonstration" print(f'\nSearching: "{query}"') search_result = view.search_memories( APISearchRequest( user_id=EXAMPLE_USER_ID, readable_cube_ids=[EXAMPLE_CUBE_ID], query=query, ) ) text_mem_results = search_result.get("text_mem", []) memories = [] for group in text_mem_results: memories.extend(group.get("memories", [])) print(f"✓ Found {len(memories)} relevant memories") for i, mem in enumerate(memories[:2], 1): # Show first 2 print(f" [{i}] {mem.get('memory', 'N/A')[:60]}...") ``` ### Complete Examples See examples in the code repository: - `MemOS/examples/mem_cube/dump_cube.py` - Export MemCube data (add + export) - `MemOS/examples/mem_cube/load_cube.py` - Import MemCube data and perform semantic search (import + search) ### Legacy API Notes The old approach of directly calling `mem_cube.text_mem.get_all()` is deprecated. Please use the View architecture. Legacy examples have been moved to `MemOS/examples/mem_cube/_deprecated/`. ## Developer Notes * MemCube enforces schema consistency to ensure safe loading and dumping * Each memory type can be independently configured, tested, and extended * See `/tests/mem_cube/` for integration tests and usage examples --- # MemReader (/open_source/modules/mem_reader) ## 1. Overview When building AI applications, we often run into this problem: users send all kinds of things—casual chat messages, PDF documents, and images. **MemReader** turns these raw inputs (Raw Data) into standard memory blocks (Memory Items) with embeddings and metadata by "chewing" and "digesting" them. In short, it does three things: 1. **Normalization**: Whether you send a string or JSON, it first converts everything into a standard format. 2. **Chunking**: It splits long conversations or documents into appropriately sized chunks for downstream processing. 3. **Extraction**: It calls an LLM to extract unstructured information into structured knowledge points (Fine mode), or directly generates snapshots (Fast mode). --- ## 2. Core Modes MemReader provides two modes, corresponding to the needs for "speed" and "accuracy": ### ⚡ Fast Mode (speed first) * **Characteristics**: **Does not call an LLM**, only performs chunking and embeddings. * **Use cases**: * Users are sending messages quickly and the system needs millisecond-level responses. * You only need to keep "snapshots" of the conversation, without deep understanding. * **Output**: raw text chunks + vector index + provenance tracking (Sources). ### 🧠 Fine Mode (carefully crafted) * **Characteristics**: **Calls an LLM** for deeper analysis. * **Use cases**: * Long-term memory writing (needs key facts extracted). * Document analysis (needs core ideas summarized). * Multimodal understanding (needs to understand what's in an image). * **Output**: structured facts + key information extraction (Key) + background (Background) + vector index + provenance tracking (Sources) + multimodal details. --- ## 3. Code Structure MemReader's code structure is straightforward and mainly includes: * **`base.py`**: defines the interface contract that all Readers must follow. * **`simple_struct.py`**: **the most commonly used implementation**. Focuses on pure-text conversations and local documents; lightweight and efficient. * **`multi_modal_struct.py`**: **an all-rounder**. Handles images, file URLs, tool calls, and other complex inputs. * **`read_multi_modal/`**: contains various parsers, such as `ImageParser` for images and `FileParser` for files. --- ## 4. How to Choose? | Your need | Recommended choice | Why | | :--- | :--- | :--- | | **Only process plain text chats** | `SimpleStructMemReader` | Simple, direct, and performant. | | **Need to handle images and file links** | `MultiModalStructMemReader` | Built-in multimodal parsing. | | **Upgrade from Fast to Fine** | Any Reader's `fine_transfer` method | Supports a progressive "store first, refine later" strategy. | --- ## 5. API Overview ### Unified Factory: `MemReaderFactory` Don't instantiate readers directly; using the factory pattern is best practice: ```python from memos.configs.mem_reader import MemReaderConfigFactory from memos.mem_reader.factory import MemReaderFactory # Create a Reader from configuration cfg = MemReaderConfigFactory.model_validate({...}) reader = MemReaderFactory.from_config(cfg) ``` ### Core Method: `get_memory()` This is the method you will call most often. ```python memories = reader.get_memory( scene_data, # your input data type="chat", # type: chat or doc info=user_info, # user info (user_id, session_id) mode="fine" # mode: fast or fine (highly recommended to specify explicitly!) ) ``` **Return value**: `list[list[TextualMemoryItem]]` > **Note**: Why a nested list? > Because a long conversation may be split into multiple windows (Window). The outer list represents windows, and the inner list represents memory items extracted from that window. --- ## 6. Practical Development ### Scenario 1: Processing simple chat logs This is the most basic usage, with `SimpleStructMemReader`. ```python # 1. Prepare input: standard OpenAI-style conversation format conversation = [ [ {"role": "user", "content": "I have a meeting tomorrow at 3pm"}, {"role": "assistant", "content": "What is the meeting about?"}, {"role": "user", "content": "Discussing the Q4 project deadline"}, ] ] # 2. Extract memory (Fine mode) memories = reader.get_memory( conversation, type="chat", mode="fine", info={"user_id": "u1", "session_id": "s1"} ) # 3. Result # memories will include extracted facts, e.g., "User has a meeting tomorrow at 3pm about the Q4 project deadline" ``` ### Scenario 2: Processing multimodal inputs When users send images or file links, switch to `MultiModalStructMemReader`. ```python # 1. Prepare input: a complex message containing files and images scene_data = [ [ { "role": "user", "content": [ {"type": "text", "text": "Check this file and image"}, # Files support automatic download and parsing via URL {"type": "file", "file": {"file_data": "https://example.com/readme.md"}}, # Images support URL {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}}, ] } ] ] # 2. Extract memory memories = multimodal_reader.get_memory( scene_data, type="chat", mode="fine", # Only Fine mode invokes the vision model to parse images info={"user_id": "u1", "session_id": "s1"} ) ``` ### Scenario 3: Progressive optimization (Fine Transfer) For better UX, you can first store the conversation quickly in Fast mode, then "refine" it into Fine memories when the system is idle. ```python # 1. Store quickly first (millisecond-level) fast_memories = reader.get_memory(conversation, mode="fast", ...) # ... store into the database ... # 2. Refine asynchronously in the background refined_memories = reader.fine_transfer_simple_mem( fast_memories_flat_list, # Note: pass a flattened list of Items here type="chat" ) # 3. Replace the original fast_memories with refined_memories ``` --- ## 7. Configuration Notes In `.env` or configuration files, you can adjust these key parameters: * **`chat_window_max_tokens`**: **sliding window size**. Default is 1024. It determines how much context is packed together for processing. Too small may lose context; too large may exceed the LLM token limit. * **`remove_prompt_example`**: **whether to remove examples from the prompt**. True = save tokens but may reduce extraction quality; False = keep few-shot examples for better accuracy but consume more tokens. * **`direct_markdown_hostnames`** (multimodal only): **hostname allowlist**. If a file URL's hostname is in this list (e.g., `raw.githubusercontent.com`), the Reader treats it as Markdown text directly instead of trying OCR or conversion, which is more efficient. --- # MemScheduler (/open_source/modules/mem_scheduler) ## Key Features - 🚀 **Concurrent operation with MemOS system**: Runs in independent threads/processes without blocking main business logic. - 🧠 **Multi-memory coordination**: Intelligently manages the flow of working memory, long-term memory, and user-personalized memory. - ⚡ **Event-driven scheduling**: Asynchronous task distribution based on message queues (Redis/Local). - 🔍 **Efficient retrieval**: Integrated vector and graph retrieval for quick location of relevant memories. - 📊 **Comprehensive monitoring**: Real-time monitoring of memory utilization, task queue status, and scheduling latency. - 📝 **Detailed logging**: Full-chain tracing of memory operations for debugging and system analysis. ## MemScheduler Architecture `MemScheduler` adopts a three-layer modular architecture: ### Scheduling Layer (Core) 1. **Scheduler (Router)**: Intelligent message router that dispatches tasks to corresponding handlers based on message types (e.g., `QUERY`, `ANSWER`, `MEM_UPDATE`). 2. **Message Processing**: Event-driven business logic through messages with specific labels, defining message formats and processing rules. ### Execution Layer (Guarantee) 3. **Task Queue**: Supports both Redis Stream (production) and Local Queue (development/testing) modes, providing asynchronous task buffering and persistence. 4. **Memory Management**: Executes read/write, compression, forgetting, and type conversion operations on three-layer memory (Working/Long-term/User). 5. **Retrieval System**: Hybrid retrieval module combining user intent, scenario management, and keyword matching for quick memory location. ### Support Layer (Auxiliary) 6. **Monitoring**: Tracks task accumulation, processing latency, and memory health status. 7. **Logging**: Maintains full-chain memory operation logs for debugging and analysis. ## MemScheduler Initialization In the MemOS architecture, `MemScheduler` is initialized as part of the server components during startup. ### Initialization in Server Router In `src/memos/api/routers/server_router.py`, the scheduler is automatically loaded through the `init_server()` function: ```python from memos.api import handlers from memos.api.handlers.base_handler import HandlerDependencies from memos.mem_scheduler.base_scheduler import BaseScheduler from memos.mem_scheduler.utils.status_tracker import TaskStatusTracker # ... other imports ... # 1. Initialize all server components (including DB, LLM, Memory, Scheduler) # init_server() reads environment variables and initializes global singleton components components = handlers.init_server() # Create dependency container for handlers dependencies = HandlerDependencies.from_init_server(components) # Initialize handlers... # search_handler = SearchHandler(dependencies) # ... # 2. Get the scheduler instance from the components dictionary # The scheduler is already initialized and started inside init_server (if enabled) mem_scheduler: BaseScheduler = components["mem_scheduler"] # 3. Users can also get other scheduling-related components from components (optional, for custom task handling) # redis_client is used for direct Redis operations or monitoring task status redis_client = components["redis_client"] # ... ``` ## Scheduling Tasks and Data Models The scheduler distributes and executes tasks through a message-driven approach. This section introduces supported task types, message structures, and execution logs. ### Message Types and Handlers The scheduler dispatches and executes tasks by registering specific task labels (Label) with handlers (Handler). The following are the default supported scheduling tasks in the current version (based on `GeneralScheduler` and `OptimizedScheduler`): | Message Label | Constant | Handler Method | Description | | :--- | :--- | :--- | :--- | | `query` | `QUERY_TASK_LABEL` | `_query_message_consumer` | Processes user queries, triggers intent recognition, memory retrieval, and converts them to memory update tasks. | | `answer` | `ANSWER_TASK_LABEL` | `_answer_message_consumer` | Processes AI responses and logs conversations. | | `mem_update` | `MEM_UPDATE_TASK_LABEL` | `_memory_update_consumer` | Core task. Executes the long-term memory update process, including extracting Query Keywords, updating Monitor, retrieving relevant memories, and replacing Working Memory. | | `add` | `ADD_TASK_LABEL` | `_add_message_consumer` | Handles logging of new memory additions (supports local and cloud logs). | | `mem_read` | `MEM_READ_TASK_LABEL` | `_mem_read_message_consumer` | Deep processing and importing external memory content using `MemReader`. | | `mem_organize` | `MEM_ORGANIZE_TASK_LABEL` | `_mem_reorganize_message_consumer` | Triggers memory reorganization and merge operations. | | `pref_add` | `PREF_ADD_TASK_LABEL` | `_pref_add_message_consumer` | Handles extraction and addition of user preference memory (Preference Memory). | | `mem_feedback` | `MEM_FEEDBACK_TASK_LABEL` | `_mem_feedback_message_consumer` | Processes user feedback for correcting or reinforcing preferences. | | `api_mix_search` | `API_MIX_SEARCH_TASK_LABEL` | `_api_mix_search_message_consumer` | (OptimizedScheduler only) Executes asynchronous hybrid search tasks combining fast and fine retrieval. | ### Message Data Structure (ScheduleMessageItem) The scheduler uses a unified `ScheduleMessageItem` structure to pass messages in the queue. > **Note**: The `mem_cube` object itself is not directly included in the message model; instead, it is resolved by the scheduler at runtime through `mem_cube_id`. | Field | Type | Description | Default/Remarks | | :--- | :--- | :--- | :--- | | `item_id` | `str` | Unique message identifier (UUID) | Auto-generated | | `user_id` | `str` | Associated user ID | (Required) | | `mem_cube_id` | `str` | Associated Memory Cube ID | (Required) | | `label` | `str` | Task label (e.g., `query`, `mem_update`) | (Required) | | `content` | `str` | Message payload (typically JSON string or text) | (Required) | | `timestamp` | `datetime` | Message submission time | Auto-generated (UTC now) | | `session_id` | `str` | Session ID for context isolation | `""` | | `trace_id` | `str` | Trace ID for full-chain log association | Auto-generated | | `user_name` | `str` | User display name | `""` | | `task_id` | `str` | Business-level task ID (for associating multiple messages) | `None` | | `info` | `dict` | Additional custom context information | `None` | | `stream_key` | `str` | (Internal use) Redis Stream key name | `""` | ### Execution Log Structure (ScheduleLogForWebItem) The scheduler generates structured log messages for frontend display or persistent storage. | Field | Type | Description | Remarks | | :--- | :--- | :--- | :--- | | `item_id` | `str` | Unique log entry identifier | Auto-generated | | `task_id` | `str` | Associated parent task ID | Optional | | `user_id` | `str` | User ID | (Required) | | `mem_cube_id` | `str` | Memory Cube ID | (Required) | | `label` | `str` | Log category (e.g., `addMessage`, `addMemory`) | (Required) | | `log_content` | `str` | Brief log description text | (Required) | | `from_memory_type` | `str` | Source memory area | e.g., `UserInput`, `LongTermMemory` | | `to_memory_type` | `str` | Destination memory area | e.g., `WorkingMemory` | | `memcube_log_content` | `list[dict]` | Structured detailed content | Contains specific memory text, reference IDs, etc. | | `metadata` | `list[dict]` | Memory item metadata | Contains confidence, status, tags, etc. | | `status` | `str` | Task status | e.g., `completed`, `failed` | | `timestamp` | `datetime` | Log creation time | Auto-generated | | `current_memory_sizes` | `MemorySizes` | Current memory quantity snapshot for each area | For monitoring dashboard display | | `memory_capacities` | `MemoryCapacities` | Memory capacity limits for each area | For monitoring dashboard display | ## Scheduling Function Examples ### 1. Message Processing and Custom Handlers The scheduler's most powerful feature is support for registering custom message handlers. You can define specific message types (e.g., `MY_CUSTOM_TASK`) and write functions to handle them. ```python import uuid from datetime import datetime # 1. Import necessary type definitions and scheduler instance # Note: mem_scheduler needs to be imported from server_router as it's a global singleton from memos.api.routers.server_router import mem_scheduler from memos.mem_scheduler.schemas.message_schemas import ScheduleMessageItem # Define a custom task label MY_TASK_LABEL = "MY_CUSTOM_TASK" # Define a handler function def my_task_handler(messages: list[ScheduleMessageItem]): """ Function to handle custom tasks """ for msg in messages: print(f"⚡️ [Handler] Received task: {msg.item_id}") print(f"📦 Content: {msg.content}") # Execute your business logic here, e.g., call LLM, write to database, trigger other tasks, etc. # 2. Register the handler to the scheduler # This step mounts your custom logic to the scheduling system mem_scheduler.register_handlers({ MY_TASK_LABEL: my_task_handler }) # 3. Submit a task task = ScheduleMessageItem( item_id=str(uuid.uuid4()), user_id="user_123", mem_cube_id="cube_001", label=MY_TASK_LABEL, content="This is a test message", timestamp=datetime.now() ) # If the scheduler is not started, the task will be queued for processing # or in local queue mode may require calling mem_scheduler.start() first mem_scheduler.submit_messages([task]) print(f"Task submitted: {task.item_id}") # Prevent scheduler main process from exiting prematurely time.sleep(10) ``` ### 2. Redis Queue vs Local Queue - **Local Queue**: - **Use case**: Unit tests, simple single-machine scripts. - **Characteristics**: Fast, but data is lost after process restart; does not support multi-process/multi-instance sharing. - **Configuration**: `MOS_SCHEDULER_USE_REDIS_QUEUE=false` - **Redis Queue (Redis Stream)**: - **Use case**: Production environment, distributed deployment. - **Characteristics**: Data persistence, supports consumer groups allowing multiple scheduler instances to handle tasks together (load balancing). - **Configuration**: `MOS_SCHEDULER_USE_REDIS_QUEUE=true` - **Debugging**: Use the `show_redis_status.py` script to check queue accumulation. ## Comprehensive Application Scenarios ### Scenario 1: Basic Conversation Flow and Memory Update The following is a complete example demonstrating how to initialize the environment, register custom logic, simulate conversation flow, and trigger memory updates. ```python import asyncio import json import os import sys import time from pathlib import Path # --- Environment Setup --- # 1. Add project root to sys.path to ensure memos module can be imported FILE_PATH = Path(__file__).absolute() BASE_DIR = FILE_PATH.parent.parent.parent sys.path.insert(0, str(BASE_DIR)) # 2. Set necessary environment variables (simulating .env configuration) os.environ["ENABLE_CHAT_API"] = "true" os.environ["MOS_ENABLE_SCHEDULER"] = "true" # Choose between Redis or Local queue os.environ["MOS_SCHEDULER_USE_REDIS_QUEUE"] = "false" # --- Import Components --- # Note: Importing server_router triggers component initialization, # ensure environment variables are set before this import from memos.api.product_models import APIADDRequest, ChatPlaygroundRequest from memos.api.routers.server_router import ( add_handler, chat_stream_playground, mem_scheduler, # mem_scheduler here is already an initialized singleton ) from memos.log import get_logger from memos.mem_scheduler.schemas.message_schemas import ScheduleMessageItem from memos.mem_scheduler.schemas.task_schemas import ( MEM_UPDATE_TASK_LABEL, QUERY_TASK_LABEL, ) logger = get_logger(__name__) # Global variable for demonstrating memory retrieval results working_memories = [] # --- Custom Handlers --- def custom_query_handler(messages: list[ScheduleMessageItem]): """ Handle user query messages: 1. Print query content 2. Convert message to MEM_UPDATE task, triggering memory retrieval/update process """ for msg in messages: print(f"\n[Scheduler 🟢] Received user query: {msg.content}") # Copy message and change label to MEM_UPDATE, a common "task chaining" pattern new_msg = msg.model_copy(update={"label": MEM_UPDATE_TASK_LABEL}) # Submit new task back to scheduler mem_scheduler.submit_messages([new_msg]) def custom_mem_update_handler(messages: list[ScheduleMessageItem]): """ Handle memory update tasks: 1. Use retriever to find relevant memories 2. Update global working memory list """ global working_memories search_args = {} top_k = 2 for msg in messages: print(f"[Scheduler 🔵] Retrieving memories for query...") # Call core retrieval functionality results = mem_scheduler.retriever.search( query=msg.content, user_id=msg.user_id, mem_cube_id=msg.mem_cube_id, mem_cube=mem_scheduler.current_mem_cube, top_k=top_k, method=mem_scheduler.search_method, search_args=search_args, ) # Simulate working memory update working_memories.extend(results) working_memories = working_memories[-5:] # Keep the latest 5 for mem in results: # Print retrieved memory fragments print(f" ↳ [Memory Found]: {mem.memory[:50]}...") # --- Mock Business Data --- def get_mock_data(): """Generate mock conversation data""" conversations = [ {"role": "user", "content": "I just adopted a golden retriever puppy named Max."}, {"role": "assistant", "content": "That's exciting! Max is a great name."}, {"role": "user", "content": "He loves peanut butter treats but I am allergic to nuts."}, {"role": "assistant", "content": "Noted. Peanut butter for Max, no nuts for you."}, ] questions = [ {"question": "What is my dog's name?", "category": "Pet"}, {"question": "What am I allergic to?", "category": "Allergy"}, ] return conversations, questions # --- Main Flow --- async def run_demo(): print("==== MemScheduler Demo Start ====") conversations, questions = get_mock_data() user_id = "demo_user_001" mem_cube_id = "cube_demo_001" print(f"1. Initialize user memory library ({user_id})...") # Use API Handler to add initial memories (synchronous mode) add_req = APIADDRequest( user_id=user_id, writable_cube_ids=[mem_cube_id], messages=conversations, async_mode="sync", ) add_handler.handle_add_memories(add_req) print(" Memory addition completed.") print("\n2. Start conversation testing (triggering background scheduling tasks)...") for item in questions: query = item["question"] print(f"\n>> User: {query}") # Initiate chat request chat_req = ChatPlaygroundRequest( user_id=user_id, query=query, readable_cube_ids=[mem_cube_id], writable_cube_ids=[mem_cube_id], ) # Get streaming response response = chat_stream_playground(chat_req) # Handle streaming output (simplified) full_answer = "" buffer = "" async for chunk in response.body_iterator: if isinstance(chunk, bytes): chunk = chunk.decode("utf-8") buffer += chunk while "\n\n" in buffer: msg, buffer = buffer.split("\n\n", 1) for line in msg.split("\n"): if line.startswith("data: "): try: data = json.loads(line[6:]) if data.get("type") == "text": full_answer += data["data"] except: pass print(f">> AI: {full_answer}") # Wait a moment for background scheduler to process tasks and print logs await asyncio.sleep(1) if __name__ == "__main__": # 1. Register our custom handlers # This will override or add to the default scheduling logic mem_scheduler.register_handlers( { QUERY_TASK_LABEL: custom_query_handler, MEM_UPDATE_TASK_LABEL: custom_mem_update_handler, } ) # 2. Ensure scheduler is started if not mem_scheduler._running: mem_scheduler.start() try: asyncio.run(run_demo()) except KeyboardInterrupt: pass finally: # Prevent scheduler main process from exiting prematurely time.sleep(10) print("\n==== Stopping scheduler ====") mem_scheduler.stop() ``` ### Scenario 2: Concurrent Asynchronous Tasks and Checkpoint Restart (Redis) This example demonstrates how to use Redis queues to achieve concurrent asynchronous task processing and checkpoint restart functionality. Running this example requires Redis environment configuration. ```python from pathlib import Path from time import sleep from memos.api.routers.server_router import mem_scheduler from memos.mem_scheduler.schemas.message_schemas import ScheduleMessageItem # Debug: Print scheduler configuration print("=== Scheduler Configuration Debug ===") print(f"Scheduler type: {type(mem_scheduler).__name__}") print(f"Config: {mem_scheduler.config}") print(f"use_redis_queue: {mem_scheduler.use_redis_queue}") print(f"Queue type: {type(mem_scheduler.memos_message_queue).__name__}") print(f"Queue maxsize: {getattr(mem_scheduler.memos_message_queue, 'maxsize', 'N/A')}") print("=====================================\n") queue = mem_scheduler.memos_message_queue # Define handler function def my_test_handler(messages: list[ScheduleMessageItem]): print(f"My test handler received {len(messages)} messages: {[one.item_id for one in messages]}") for msg in messages: # Create file based on task_id (use item_id as numeric ID 0..99) task_id = str(msg.item_id) file_path = tmp_dir / f"{task_id}.txt" try: sleep(5) file_path.write_text(f"Task {task_id} processed.\n") print(f"writing {file_path} done") except Exception as e: print(f"Failed to write {file_path}: {e}") def submit_tasks(): mem_scheduler.memos_message_queue.clear() # Create 100 messages (task_id 0..99) users = ["user_A", "user_B"] messages_to_send = [ ScheduleMessageItem( item_id=str(i), user_id=users[i % 2], mem_cube_id="test_mem_cube", label=TEST_HANDLER_LABEL, content=f"Create file for task {i}", ) for i in range(100) ] # Batch submit messages and print completion info print(f"Submitting {len(messages_to_send)} messages to the scheduler...") mem_scheduler.memos_message_queue.submit_messages(messages_to_send) print(f"Task submission done! tasks in queue: {mem_scheduler.get_tasks_status()}") # Register handler function TEST_HANDLER_LABEL = "test_handler" mem_scheduler.register_handlers({TEST_HANDLER_LABEL: my_test_handler}) # 5 second restart mem_scheduler.orchestrator.tasks_min_idle_ms[TEST_HANDLER_LABEL] = 5_000 tmp_dir = Path("./tmp") tmp_dir.mkdir(exist_ok=True) # Test stop and restart: if tmp has >1 files, skip submission and print info existing_count = len(list(Path("tmp").glob("*.txt"))) if Path("tmp").exists() else 0 if existing_count > 1: print(f"Skip submission: found {existing_count} files in tmp (>1), continue processing") else: submit_tasks() # Wait until tmp has 100 files or timeout poll_interval = 1 expected = 100 tmp_dir = Path("tmp") tasks_status = mem_scheduler.get_tasks_status() mem_scheduler.print_tasks_status(tasks_status=tasks_status) while ( mem_scheduler.get_tasks_status()["remaining"] != 0 or mem_scheduler.get_tasks_status()["running"] != 0 ): count = len(list(tmp_dir.glob("*.txt"))) if tmp_dir.exists() else 0 tasks_status = mem_scheduler.get_tasks_status() mem_scheduler.print_tasks_status(tasks_status=tasks_status) print(f"[Monitor] Files in tmp: {count}/{expected}") sleep(poll_interval) print(f"[Result] Final files in tmp: {len(list(tmp_dir.glob('*.txt')))})") # Stop scheduler sleep(20) print("Stopping the scheduler...") mem_scheduler.stop() ``` --- # MemChat (/open_source/modules/mem_chat) ## 1. Introduction **MemChat** is the conversation control center of MemOS. It is not just a chat interface, but a bridge connecting "instant conversation" and "long-term memory". During interactions with users, MemChat is responsible for real-time retrieval of relevant background information from MemCube (Memory Cube), building context, and crystallizing new conversation content into new memories. With it, your Agent is no longer "goldfish memory", but a truly intelligent companion that can understand the past and continuously grow. --- ## 2. Core Capabilities ### Memory-Augmented Chat Before answering user questions, MemChat automatically retrieves relevant Textual Memory from MemCube and injects it into the Prompt. This enables the Agent to answer questions based on past interaction history or knowledge bases, rather than relying solely on the LLM's pre-trained knowledge. ### Auto-Memorization After conversation, MemChat uses Extractor LLM to automatically extract valuable information from the conversation flow (such as user preferences, factual knowledge) and store it in MemCube. The entire process is fully automated without manual user intervention. ### Context Management Automatically manages conversation history window (`max_turns_window`). When conversations become too long, it intelligently trims old context while relying on retrieved long-term memory to maintain conversation coherence, effectively solving the LLM Context Window limitation problem. ### Flexible Configuration Supports configurable toggles for different types of memory (textual memory, activation memory, etc.) to adapt to different application scenarios. --- ## 3. Code Structure Core logic is located under `memos/src/memos/mem_chat/`. * **`simple.py`**: **Default implementation (SimpleMemChat)**. This is an out-of-the-box REPL (Read-Eval-Print Loop) implementation containing complete "retrieve -> generate -> store" loop logic. * **`base.py`**: **Interface definition (BaseMemChat)**. Defines the basic behavior of MemChat, such as `run()` and `mem_cube` properties. * **`factory.py`**: **Factory class**. Responsible for instantiating concrete MemChat objects based on configuration (`MemChatConfig`). --- ## 4. Key Interface The main interaction entry point is the `MemChat` class (typically created by `MemChatFactory`). ### 4.1 Initialization You need to first create a configuration object, then create an instance through the factory method. After creation, you must mount the `MemCube` instance to `mem_chat.mem_cube`. ### 4.2 `run()` Starts an interactive command-line conversation loop. Suitable for development and debugging, it handles user input, calls memory retrieval, generates replies, and prints output. ### 4.3 Properties * **`mem_cube`**: Associated MemCube object. MemChat reads and writes memories through it. * **`chat_llm`**: LLM instance used to generate replies. --- ## 5. Workflow A typical conversation round in MemChat includes the following steps: 1. **Receive Input**: Get user text input. 2. **Memory Recall**: (If `enable_textual_memory` is enabled) Use user input as Query to retrieve Top-K relevant memories from `mem_cube.text_mem`. 3. **Prompt Construction**: Concatenate system prompt, retrieved memories, and recent conversation history into a complete Prompt. 4. **Generate Response**: Call `chat_llm` to generate a reply. 5. **Memorization**: (If `enable_textual_memory` is enabled) Send this round's conversation (User + Assistant) to `mem_cube`'s extractor, extract new memories and store them in the database. --- ## 6. Development Example Below is a complete code example showing how to configure MemChat and mount a MemCube based on Qdrant and OpenAI. ### 6.1 Code Implementation ```python import os import sys # Ensure src module can be imported sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../src"))) from memos.configs.mem_chat import MemChatConfigFactory from memos.configs.mem_cube import GeneralMemCubeConfig from memos.mem_chat.factory import MemChatFactory from memos.mem_cube.general import GeneralMemCube def get_mem_chat_config() -> MemChatConfigFactory: """Generate MemChat configuration""" return MemChatConfigFactory.model_validate( { "backend": "simple", "config": { "user_id": "user_123", "chat_llm": { "backend": "openai", "config": { "model_name_or_path": os.getenv("MOS_CHAT_MODEL", "gpt-4o"), "temperature": 0.8, "max_tokens": 1024, "api_key": os.getenv("OPENAI_API_KEY"), "api_base": os.getenv("OPENAI_API_BASE"), }, }, "max_turns_window": 20, "top_k": 5, "enable_textual_memory": True, # Enable explicit memory }, } ) def get_mem_cube_config() -> GeneralMemCubeConfig: """Generate MemCube configuration""" return GeneralMemCubeConfig.model_validate( { "user_id": "user03alice", "cube_id": "user03alice/mem_cube_tree", "text_mem": { "backend": "general_text", "config": { "cube_id": "user03alice/mem_cube_general", "extractor_llm": { "backend": "openai", "config": { "model_name_or_path": os.getenv("MOS_CHAT_MODEL", "gpt-4o"), "api_key": os.getenv("OPENAI_API_KEY"), "api_base": os.getenv("OPENAI_API_BASE"), }, }, "vector_db": { "backend": "qdrant", "config": { "collection_name": "user03alice_mem_cube_general", "vector_dimension": 1024, }, }, "embedder": { "backend": os.getenv("MOS_EMBEDDER_BACKEND", "universal_api"), "config": { "provider": "openai", "api_key": os.getenv("MOS_EMBEDDER_API_KEY", "EMPTY"), "model_name_or_path": os.getenv("MOS_EMBEDDER_MODEL", "bge-m3"), "base_url": os.getenv("MOS_EMBEDDER_API_BASE"), }, }, }, }, } ) def main(): print("Initializing MemChat...") mem_chat = MemChatFactory.from_config(get_mem_chat_config()) print("Initializing MemCube...") mem_cube = GeneralMemCube(get_mem_cube_config()) # Critical step: mount the memory cube mem_chat.mem_cube = mem_cube print("Starting Chat Session...") try: mem_chat.run() finally: print("Saving memory cube...") mem_chat.mem_cube.dump("new_cube_path") if __name__ == "__main__": main() ``` --- ## 7. Configuration Description When configuring `MemChatConfigFactory`, the following parameters are crucial: * **`user_id`**: Required. Used to identify the current user in the conversation, ensuring memory isolation. * **`chat_llm`**: Chat model configuration. Recommend using a capable model (such as GPT-4o) for better reply quality and instruction-following ability. * **`enable_textual_memory`**: `True` / `False`. Whether to enable textual memory. If enabled, the system will perform retrieval before conversation and storage after conversation. * **`max_turns_window`**: Integer. Number of conversation turns to retain in history. History beyond this limit will be truncated, relying on long-term memory to supplement context. * **`top_k`**: Integer. How many most relevant memory fragments to retrieve from the memory library and inject into the Prompt each time. --- # MemFeedback (/open_source/modules/mem_feedback) ## 1. Introduction **MemFeedback** is the "regret medicine" for MemOS. In long-term memory systems, the biggest headache is often not "forgetting," but "remembering wrong and unable to change." When a user says, "No, my birthday is tomorrow" or "Change the project code to X," simple RAG systems are usually helpless. MemFeedback can understand these natural language instructions, automatically locate conflicting memories in the database, and execute atomic correction operations (such as archiving old memories and writing new ones). With it, your Agent can correct errors and learn continuously during interactions, just like a human. --- ## 2. Core Capabilities It can handle four common feedback scenarios: ### Correction When the user points out a factual error. The system will not brutally delete the old data but **Archive** it and write new data. This corrects the error while preserving version history (Traceability). If it is an ongoing conversation (WorkingMemory), it updates in place to ensure context continuity. ### Addition If the user just supplements new information that does not conflict with old memories, it is simple—directly save it as a new node in the memory database. ### Keyword Replacement (Global Refactor) Similar to "Global Refactor" in an IDE. For example, if the user says, "Change 'Zhang San' to 'Li Si' in all documents," the system will combine the Reranker to automatically determine the scope of affected documents and update all relevant memories in batches. ### Preference Evolution Specifically handles preferences like "I don't eat cilantro" or "I like Python." The system records the context in which this preference arose, constantly enriching the user profile to make the Agent more tailored to use. --- ## 3. Code Structure The core logic is located under `memos/src/memos/mem_feedback/`. * **`simple_feedback.py`**: **Recommended entry point**. It is the official encapsulated version that assembles LLM, vector database, and searcher, ready to use out of the box. * **`feedback.py`**: Core implementation class `MemFeedback`. The heavy lifting is done here: intent recognition, conflict comparison, and security risk control. * **`base.py`**: Interface definition. * **`utils.py`**: Utility box. --- ## 4. Key Interface There is only one main entry point: `process_feedback()`. It is usually called asynchronously after the RAG process ends and the user gives feedback. ### 4.1 Input Parameters | Parameter | Description | | :--- | :--- | | `user_id` / `user_name` | User identification and Cube ID. | | `chat_history` | Conversation history, letting LLM know what you talked about. | | `feedback_content` | The feedback sentence from the user (e.g., "No, it's 5 o'clock"). | | **`retrieved_memory_ids`** | **Required (Strongly Recommended)**. Pass in the memory IDs retrieved in the previous RAG round. This gives the system a "target," telling it which memory to correct. If not passed, the system has to search again in the massive memory, which is slow and prone to errors. | | `corrected_answer` | Whether to generate a corrected response along the way. | ### 4.2 Output Result Returns a dictionary telling you what changed in this operation: * **`record`**: Database change details (e.g., `{ "add": [...], "update": [...] }`). * **`answer`**: Natural language response to the user. --- ## 5. Workflow The workflow of MemFeedback is like a rigorous editorial office: 1. **Review (Intent Recognition)**: First, see if the user is correcting errors, adding information, or renaming. 2. **Locate (Recall)**: Find the memory to be modified (if you passed the ID, this step is skipped). 3. **Proofread (Comparison)**: Let LLM carefully compare new and old information to determine if it is completely new (ADD) or needs an update (UPDATE). 4. **Risk Control (Security Check)**: Prevent LLM from making random changes. For example, is the ID correct? Is it trying to delete an entire long document? (Threshold interception applies). 5. **Publish (Write)**: Finally, execute graph database operations, archive the old, and write the new. --- ## 6. Development Example Here is a runnable code snippet showing how to initialize the service, preset an "incorrect memory," and then correct it through user feedback. ### 6.1 Preparation First, we need to initialize the `SimpleMemFeedback` service. ```python # Assuming components like llm, embedder, graph_db are initialized via Factory # For complete initialization code, please refer to examples/mem_feedback/example_feedback.py from memos.mem_feedback.simple_feedback import SimpleMemFeedback feedback_server = SimpleMemFeedback( llm=llm, embedder=embedder, graph_store=graph_db, memory_manager=memory_manager, mem_reader=mem_reader, searcher=searcher, reranker=mem_reranker, pref_mem=None, ) ``` ### 6.2 Simulate Scenario and Execute Feedback Scenario: The system incorrectly remembers "You like apples, dislike bananas," and now we want to correct it. ```python import json from memos.mem_feedback.utils import make_mem_item # 1. Simulate Chat History # User asks for preference, assistant answers wrongly history = [ {"role": "user", "content": "What fruits do I like and dislike?"}, {"role": "assistant", "content": "You like apples, dislike bananas."}, ] # 2. Preset "Incorrect Memory" # We manually insert an incorrect fact into the database mem_text = "You like apples, dislike bananas" # ... (Omitted detailed parameters of make_mem_item, see source code) ... memory_manager.add([make_mem_item(mem_text, ...)], ...) # 3. User Feedback feedback_content = "Wrong, actually I like mangosteens." print(f"Feedback Input: {feedback_content}") # 4. Execute Correction # MemFeedback will detect conflict, archive old memory, and write new memory "like mangosteens" res = feedback_server.process_feedback( ..., chat_history=history, feedback_content=feedback_content, ... ) # 5. View Result print(json.dumps(res, indent=4)) ``` --- ## 7. Configuration Description To make MemFeedback work, you need to prepare the configuration of the following components (usually in `.env` or YAML): * **LLM (`extractor_llm`)**: Needs a smart brain, recommend GPT-4o level models. Set Temperature low (e.g., 0) because it performs logical analysis and shouldn't be too divergent. * **Embedder (`embedder`)**: Used to convert new memories into vectors. * **GraphDB (`graph_db`)**: Where memories are stored and how, handled by these two. * **MemReader (`mem_reader`)**: Used to parse purely new memories. --- # Memory Modules Overview (/open_source/modules/memories/overview) The Memory Module provides Agents with essential long-term memory capabilities. Instead of acting as a static database, it mimics human cognitive processes by automatically extracting, organizing, and linking information. Choosing different memory modules allows you to customize and enhance your Agent's skills. ## 🎯 Quick Selection Guide > **Alert**: **Not sure which to choose?** Follow this decision tree: > - 🚀 **Quick testing/demo**: Get started easily with no additional software → [NaiveTextMemory](#naivetextmemory-simple-textual-memory) > - 📝 **General text memory**: Retain chat history or massive documents with semantic search capabilities → [GeneralTextMemory](#generaltextmemory-general-purpose-textual-memory) > - 👤 **User preference management**:Specifically designed for building and managing user profiles → [PreferenceTextMemory](#preferencetextmemory-preference-memory) > - 🌳 **Structured knowledge graph**: Ideal for data with complex logical relationships and interconnections → [TreeTextMemory](#treetextmemory-hierarchical-structured-memory) > - ⚡ **Inference acceleration**: Optimized for high-traffic scenarios to ensure stable and rapid responses → [KVCacheMemory](#kvcachememory-activation-memory) --- ## 📚 Memory Module Categories ### I. Textual Memory Series Focused on storing and retrieving text-based memories, suitable for most application scenarios. #### NaiveTextMemory: Simple Textual Memory **Use Cases:** Rapid prototyping, demos, teaching, small-scale applications **Core Features:** - ✅ Zero dependencies, pure in-memory storage - ✅ Keyword-based retrieval - ✅ Minimal API, get started in 5 minutes - ✅ File persistence support **Limitations:** - ❌ No vector semantic search - ❌ Not suitable for large-scale data - ❌ Limited retrieval precision 📖 [View Documentation](./naive_textual_memory) #### GeneralTextMemory: General-Purpose Textual Memory **Use Cases:** Conversational agents, personal assistants, knowledge management systems **Core Features:** - ✅ Vector-based semantic search - ✅ Rich metadata support (type, time, source, etc.) - ✅ Flexible filtering and querying - ✅ Suitable for medium to large-scale applications **Technical Requirements:** - Requires vector database (Qdrant, etc.) - Requires embedding model 📖 [View Documentation](./general_textual_memory) #### PreferenceTextMemory: Preference Memory **Use Cases:** Personalized recommendations, user profiling, intelligent assistants **Core Features:** - ✅ Automatic detection of explicit and implicit preferences - ✅ Preference deduplication and conflict detection - ✅ Filter by preference type and strength - ✅ Vector semantic retrieval **Specialized Functions:** - Dual preference extraction (explicit/implicit) - Preference strength scoring - Temporal decay support 📖 [View Documentation](./preference_textual_memory) #### TreeTextMemory: Hierarchical Structured Memory **Use Cases:** Knowledge graphs, complex relationship reasoning, multi-hop queries **Core Features:** - ✅ Graph database-based structured storage - ✅ Support for hierarchical relationships and causal chains - ✅ Multi-hop reasoning capabilities - ✅ Deduplication, conflict detection, memory scheduling **Advanced Features:** - Supports MultiModal Reader (images, URLs, files) - Supports Internet Retrieval (BochaAI, Google, Bing) - Working memory replacement mechanism **Technical Requirements:** - Requires graph database (Neo4j, etc.) - Requires vector database and embedding model 📖 [View Documentation](./tree_textual_memory) --- ### II. Specialized Memory Modules Memory systems optimized for specific scenarios. #### KVCacheMemory: Activation Memory **Use Cases:** LLM inference acceleration, high-frequency background knowledge reuse **Core Features:** - ⚡ Pre-computed KV Cache, skip repeated encoding - ⚡ Significantly reduce prefill phase computation - ⚡ Suitable for high-throughput scenarios **Typical Use Cases:** - FAQ caching - Conversation history reuse - Domain knowledge preloading **How It Works:** Stable text memory → Pre-convert to KV Cache → Direct injection during inference 📖 [View Documentation](./kv_cache_memory) #### ParametricMemory: Parametric Memory **Status:** 🚧 Under Development **Design Goals:** - Encode knowledge into model weights (LoRA, expert modules) - Dynamically load/unload capability modules - Support multi-task, multi-role architecture **Future Features:** - Parameter module generation and compression - Version control and rollback - Hot-swappable capability modules 📖 [View Documentation](./parametric_memory) --- ### III. Graph Database Backends Provide graph storage capabilities for TreeTextMemory. #### Neo4j Graph DB **Recommendation:** ⭐⭐⭐⭐⭐ **Features:** - Complete graph database functionality - Support for vector-enhanced retrieval - Multi-tenant architecture (v0.2.1+) - Compatible with Community Edition 📖 [View Documentation](./neo4j_graph_db) #### Nebula Graph DB **Features:** - Distributed graph database - High availability - Suitable for large-scale deployment 📖 [View Documentation](./nebula_graph_db) #### PolarDB Graph DB **Features:** - Alibaba Cloud PolarDB graph computing - Cloud-native architecture - Enterprise-grade reliability 📖 [View Documentation](./polardb_graph_db) --- ## 📊 Feature Comparison Table | Feature | Naive | General | Preference | Tree | KVCache | |---------|-------|---------|------------|------|---------| | **Search Method** | Keyword | Vector Semantic | Vector Semantic | Vector+Graph | N/A | | **Metadata Support** | ⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | - | | **Relationship Reasoning** | ❌ | ❌ | ❌ | ✅ | - | | **Deduplication** | ❌ | ⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | - | | **Scalability** | Small | Medium-Large | Medium-Large | Large | - | | **Deployment Complexity** | Minimal | Medium | Medium | Higher | Medium | | **Inference Acceleration** | - | - | - | - | ⭐⭐⭐⭐⭐ | --- ## 🛠️ Usage Scenario Recommendations ### Scenario 1: Rapid Prototyping **Recommended:** [NaiveTextMemory](./naive_textual_memory) ```python from memos.memories import NaiveTextMemory memory = NaiveTextMemory() memory.add("User likes coffee") results = memory.search("coffee") ``` ### Scenario 2: Chatbot Memory **Recommended:** [GeneralTextMemory](./general_textual_memory) - Supports semantic search - Filter by time, type, source - Suitable for conversation history management ### Scenario 3: Personalized Recommendation System **Recommended:** [PreferenceTextMemory](./preference_textual_memory) - Automatic user preference extraction - Preference conflict detection - Strength scoring and filtering ### Scenario 4: Knowledge Graph Applications **Recommended:** [TreeTextMemory](./tree_textual_memory) - Multi-hop relationship queries - Hierarchical structure management - Complex reasoning scenarios ### Scenario 5: High-Performance LLM Services **Recommended:** [KVCacheMemory](./kv_cache_memory) - FAQ systems - Customer service bots - High-volume request processing --- ## 🔗 Advanced Features ### MultiModal Reader (Multimodal Reading) Supported in TreeTextMemory for processing: - 📷 Images in conversations - 🌐 Web URLs - 📄 Local files (PDF, DOCX, TXT, Markdown) - 🔀 Mixed mode (text+images+URLs) 👉 [View Examples](./tree_textual_memory#using-multimodalstructmemreader-advanced) ### Internet Retrieval Fetch real-time information from the web and add to memory: - 🔍 BochaAI search - 🌍 Google search - 🔎 Bing search 👉 [View Examples](./tree_textual_memory#retrieve-memories-from-the-internet-optional) --- ## 🚀 Quick Start 1. **Choose Memory Module** - Select the appropriate module based on the guide above 2. **Read Documentation** - Click the corresponding link to view detailed documentation 3. **Hands-On Practice** - Each module has complete code examples 4. **Production Deployment** - Refer to the best practices section --- ## 📖 Related Resources - [API Reference](/api) - [Best Practices Guide](/best-practices) - [Example Code Repository](https://github.com/MemOS/examples) - [FAQ](/faq) --- > **Alert**: **Beginner Suggestion:** Start with NaiveTextMemory, understand the basic concepts, then explore GeneralTextMemory and TreeTextMemory. --- # KVCacheMemory: Key-Value Cache for Activation Memory (/open_source/modules/memories/kv_cache_memory) ## KV-cache Memory Use Cases In MemOS, KV-cache memory is best suited for storing **semantically stable and frequently reused background content** such as: - Frequently asked questions (FAQs) or domain-specific knowledge - Prior conversation history These stable **plaintext memory items** are automatically identified and managed by the `MemScheduler` module. Once selected, they are converted into KV-format representations (`KVCacheItem`) ahead of time. This precomputation step stores the activation states (Key/Value tensors) of the memory in a reusable format, allowing them to be injected into the model’s attention cache during inference. Once converted, these KV memories can be **reused across queries without requiring re-encoding** of the original content. This reduces the computational overhead of processing and storing large amounts of text, making it ideal for applications that require **rapid response times** and **high throughput**. ## Why KV-cache Memory Integrating `MemScheduler` with KV-cache memory enables significant performance optimization, particularly in the **prefill phase** of LLM inference. ### Without KVCacheMemory - Each new query is appended to the full prompt, including the background memory. - The model must **recompute token embeddings and attention** over the full sequence — even for unchanged memory. ### With KVCacheMemory - The background content is **cached once** as Key/Value tensors. - For each query, only the new user input (query tokens) is encoded. - The previously cached KV is injected directly into the attention mechanism. ### Benefits This separation reduces redundant computation in the prefill phase and leads to: - Skipping repeated encoding of background content - Faster attention computation between query tokens and cached memory - **Lower Time To First Token (TTFT)** latency during generation This optimization is especially valuable in: - Multi-turn chatbot interactions - Retrieval-augmented or context-augmented generation (RAG, CAG) - Assistants operating over fixed documentation or FAQ-style memory ### KVCacheMemory Acceleration Evaluation To validate the performance impact of KV-based memory injection, we conducted a set of controlled experiments simulating real memory reuse in MemOS. #### Experiment Setup During typical usage, the `MemScheduler` module continuously tracks interaction patterns and promotes high-frequency, stable plaintext memory into KV format. These KV memories are loaded into GPU memory as activation caches and reused during inference. The evaluation compares two memory injection strategies: 1. **Prompt-based injection**: background memory is prepended as raw text. 2. **KV-cache injection**: memory is injected directly into the model’s attention cache. We test these strategies across: - **Three context sizes**: short, medium, and long - **Three query types**: short-form, medium-form, and long-form The primary metric is **Time To First Token (TTFT)**, a key latency indicator for responsive generation. #### Results The following table shows results across three models (Qwen3-8B, Qwen3-32B, Qwen2.5-72B). TTFT under KV-cache injection is consistently lower than prompt-based injection, while the output tokens remain identical across both strategies. > **Note**: `Build (s)` refers to the one-time preprocessing cost of converting the memory to KV format, amortized across multiple queries. | Model | Ctx | CtxTok | Qry | QryTok | Build (s) | KV TTFT (s) | Dir TTFT (s) | Speedup (%) | | ----------- | ------ | ------ | ------ | ------ | --------- | ----------- | ------------ | ----------- | | Qwen3-8B | long | 6064 | long | 952.7 | 0.92 | 0.50 | 2.37 | 79.1 | | | | | medium | 302.7 | 0.93 | 0.19 | 2.16 | 91.1 | | | | | short | 167 | 0.93 | 0.12 | 2.04 | 94.2 | | | medium | 2773 | long | 952.7 | 0.41 | 0.43 | 1.22 | 64.6 | | | | | medium | 302.7 | 0.41 | 0.16 | 1.08 | 85.1 | | | | | short | 167 | 0.43 | 0.10 | 0.95 | 89.7 | | | short | 583 | long | 952.7 | 0.12 | 0.39 | 0.51 | 23.0 | | | | | medium | 302.7 | 0.12 | 0.14 | 0.32 | 55.6 | | | | | short | 167 | 0.12 | 0.08 | 0.29 | 71.3 | | Qwen3-32B | long | 6064 | long | 952.7 | 0.71 | 0.31 | 1.09 | 71.4 | | | | | medium | 302.7 | 0.71 | 0.15 | 0.98 | 84.3 | | | | | short | 167 | 0.71 | 0.11 | 0.96 | 88.8 | | | medium | 2773 | long | 952.7 | 0.31 | 0.24 | 0.56 | 56.9 | | | | | medium | 302.7 | 0.31 | 0.12 | 0.47 | 75.1 | | | | | short | 167 | 0.31 | 0.08 | 0.44 | 81.2 | | | short | 583 | long | 952.7 | 0.09 | 0.20 | 0.24 | 18.6 | | | | | medium | 302.7 | 0.09 | 0.09 | 0.15 | 39.6 | | | | | short | 167 | 0.09 | 0.07 | 0.14 | 53.5 | | Qwen2.5-72B | long | 6064 | long | 952.7 | 1.26 | 0.48 | 2.04 | 76.4 | | | | | medium | 302.7 | 1.26 | 0.23 | 1.82 | 87.2 | | | | | short | 167 | 1.27 | 0.15 | 1.79 | 91.4 | | | medium | 2773 | long | 952.7 | 0.58 | 0.39 | 1.05 | 62.7 | | | | | medium | 302.7 | 0.58 | 0.18 | 0.89 | 79.2 | | | | | short | 167 | 0.71 | 0.23 | 0.82 | 71.6 | | | short | 583 | long | 952.7 | 0.16 | 0.33 | 0.43 | 23.8 | | | | | medium | 302.7 | 0.16 | 0.15 | 0.27 | 43.2 | | | | | short | 167 | 0.16 | 0.10 | 0.25 | 60.5 | #### vLLM-based Performance MemOS now supports using vLLM to manage activation memory. To evaluate the impact of KV Cache prefilling for different prefix text lengths, we conducted performance tests on a system equipped with 8x `H800 80GB GPUs (112 vCPUs, 1920 GiB Memory)` and a system equipped with 8x `RTX4090-24G-PCIe (112 vCPUs, 960 GiB Memory)`. The evaluation covered two core models: Qwen3-32B and Qwen2.5-72B. The benchmarks were run across a range of memory and context length combinations to simulate various activation memory scenarios: - **Memory Text Lengths (tokens)**: 500, 1000, 2000 - **Context Text Lengths (tokens)**: 500, 1000, 2000, 4000 The following table summarizes the benchmark results. **Qwen2.5-72B** - On 4090 (2 Nodes 16 GPUs) | mem tks | prompt tks | TTFT (without cache, ms) | TTFT (With cache, ms) | TTFT Speedup (%) | Abs Dis(ms) | | --- | --- | --- | --- | --- | --- | | 0.5k | 0.5k | 1787.21 | 851.47 | 52.358% | 935.74 | | 0.5k | 1k | 2506.26 | 1290.68 | 48.502% | 1215.58 | | 0.5k | 2k | 3843.48 | 2897.97 | 24.600% | 945.51 | | 0.5k | 4k | 6078.01 | 5200.86 | 14.432% | 877.15 | | 1k | 0.5k | 2274.61 | 920.16 | 59.546% | 1354.45 | | 1k | 1k | 2907.17 | 1407.65 | 51.580% | 1499.52 | | 1k | 2k | 4278.53 | 2916.47 | 31.835% | 1362.06 | | 1k | 4k | 6897.99 | 5218.94 | 24.341% | 1679.05 | | 2k | 0.5k | 3460.12 | 782.73 | 77.379% | 2677.39 | | 2k | 1k | 4443.34 | 1491.24 | 66.439% | 2952.10 | | 2k | 2k | 5733.14 | 2758.48 | 51.885% | 2974.66 | | 2k | 4k | 8152.76 | 5627.41 | 30.975% | 2525.35 | - On H800 (4 GPUs) | mem tks | prompt tks | TTFT (without cache, ms) | TTFT (With cache, ms) | TTFT Speedup (%) | Abs Dis(ms) | | --- | --- | --- | --- | --- | --- | | 0.5k | 0.5k | 51.65 | 52.17 | \-1.007% | \-0.52 | | 0.5k | 1k | 55.70 | 57.03 | \-2.388% | \-1.33 | | 0.5k | 2k | 74.23 | 78.56 | \-5.833% | \-4.33 | | 0.5k | 4k | 77.56 | 77.45 | 0.142% | 0.11 | | 1k | 0.5k | 55.90 | 55.73 | 0.304% | 0.17 | | 1k | 1k | 55.35 | 52.89 | 4.444% | 2.46 | | 1k | 2k | 80.14 | 73.82 | 7.886% | 6.32 | | 1k | 4k | 82.83 | 73.51 | 11.252% | 9.32 | | 2k | 0.5k | 75.82 | 71.31 | 5.948% | 4.51 | | 2k | 1k | 80.60 | 78.71 | 2.345% | 1.89 | | 2k | 2k | 83.91 | 78.60 | 6.328% | 5.31 | | 2k | 4k | 99.15 | 80.12 | 19.193% | 19.03 | **Qwen3-32B** - On 4090 (1 Nodes 8 GPUs) | mem tks | prompt tks | TTFT (without cache, ms) | TTFT (With cache, ms) | TTFT Speedup (%) | Abs Dis(ms) | | --- | --- | --- | --- | --- | --- | | 0.5k | 0.5k | 288.72 | 139.29 | 51.756% | 149.43 | | 0.5k | 1k | 428.72 | 245.85 | 42.655% | 182.87 | | 0.5k | 2k | 683.65 | 538.59 | 21.218% | 145.06 | | 0.5k | 4k | 1170.48 | 986.94 | 15.681% | 183.54 | | 1k | 0.5k | 409.83 | 137.96 | 66.337% | 271.87 | | 1k | 1k | 507.95 | 262.21 | 48.379% | 245.74 | | 1k | 2k | 743.48 | 539.71 | 27.408% | 203.77 | | 1k | 4k | 1325.34 | 1038.59 | 21.636% | 286.75 | | 2k | 0.5k | 686.01 | 147.34 | 78.522% | 538.67 | | 2k | 1k | 762.96 | 246.22 | 67.728% | 516.74 | | 2k | 2k | 1083.93 | 498.05 | 54.051% | 585.88 | | 2k | 4k | 1435.39 | 1053.31 | 26.619% | 382.08 | - On H800 (2 GPUs) | mem tks | prompt tks | TTFT (without cache, ms) | TTFT (With cache, ms) | TTFT Speedup (%) | Abs Dis(ms) | | --- | --- | --- | --- | --- | --- | | 0.5k | 0.5k | 161.18 | 97.61 | 39.440% | 63.57 | | 0.5k | 1k | 164.00 | 121.39 | 25.982% | 42.61 | | 0.5k | 2k | 257.34 | 215.20 | 16.375% | 42.14 | | 0.5k | 4k | 365.14 | 317.95 | 12.924% | 47.19 | | 1k | 0.5k | 169.45 | 100.52 | 40.679% | 68.93 | | 1k | 1k | 180.91 | 128.25 | 29.108% | 52.66 | | 1k | 2k | 271.69 | 210.00 | 22.706% | 61.69 | | 1k | 4k | 389.30 | 314.64 | 19.178% | 74.66 | | 2k | 0.5k | 251.43 | 130.92 | 47.930% | 120.51 | | 2k | 1k | 275.81 | 159.60 | 42.134% | 116.21 | | 2k | 2k | 331.11 | 218.17 | 34.110% | 112.94 | | 2k | 4k | 451.06 | 334.80 | 25.775% | 116.26 | The results clearly demonstrate that integrating vLLM's KV Cache reuse provides a transformative performance improvement for MemOS. ## KV-cache Memory Structure KV-based memory reuse via `KVCacheMemory` offers substantial latency reduction across model sizes and query types, while maintaining identical output. By shifting reusable memory from plaintext prompts into precomputed KV caches, MemOS eliminates redundant context encoding and achieves faster response times—especially beneficial in real-time, memory-augmented LLM applications. Each cache is stored as a `KVCacheItem`: | Field | Type | Description | | ------------- | -------------- | ------------------------------------------- | | `kv_cache_id` | `str` | Unique ID for the cache (UUID) | | `kv_cache` | `DynamicCache` | The actual key-value cache (transformers) | | `metadata` | `dict` | Metadata (source, extraction time, etc.) | ## API Summary (`KVCacheMemory`) ### Initialization ```python KVCacheMemory(config: KVCacheMemoryConfig) ``` ### Core Methods | Method | Description | | ------------------------ | -------------------------------------------------------- | | `extract(text)` | Extracts a KV cache from input text using the LLM | | `add(memories)` | Adds one or more `KVCacheItem` to memory | | `get(memory_id)` | Fetch a single cache by ID | | `get_by_ids(ids)` | Fetch multiple caches by IDs | | `get_all()` | Returns all stored caches | | `get_cache(cache_ids)` | Merge and return a combined cache from multiple IDs | | `delete(ids)` | Delete caches by IDs | | `delete_all()` | Delete all caches | | `dump(dir)` | Serialize all caches to a pickle file in directory | | `load(dir)` | Load caches from a pickle file in directory | | `from_textual_memory(mem)` | Convert a `TextualMemoryItem` to a `KVCacheItem` | | `build_vllm_kv_cache( messages)` | Build a vLLM KV cache from a list of messages | When calling `dump(dir)`, the system writes to: ``` / ``` This file contains a pickled dictionary of all KV caches, which can be reloaded using `load(dir)`. ## How to Use ```python from memos.configs.memory import KVCacheMemoryConfig from memos.memories.activation.kv import KVCacheMemory config = KVCacheMemoryConfig( extractor_llm={ "backend": "huggingface", "config": {"model_name_or_path": "Qwen/Qwen3-1.7B"} } ) mem = KVCacheMemory(config) # Extract and add a cache cache_item = mem.extract("The capital of France is Paris.") mem.add([cache_item]) # Retrieve and merge caches merged_cache = mem.get_cache([cache_item.kv_cache_id]) # Save/load mem.dump("tmp/act_mem") mem.load("tmp/act_mem") ``` ## Developer Notes * Uses HuggingFace `DynamicCache` for efficient key-value storage * Pickle-based serialization for fast load/save * All methods are covered by integration tests in `/tests` --- # Parametric Memory *(Coming Soon)* (/open_source/modules/memories/parametric_memory) > **Note**: **Coming Soon** > This feature is still under active development. Stay tuned for updates! `Parametric Memory` is the core **long-term knowledge and capability store** inside MemOS. Unlike plaintext or activation memories, parametric memory is embedded directly within a model’s weights — encoding deep representations of language structure, world knowledge, and general reasoning abilities. In the MemOS architecture, parametric memory does not just refer to static pre-trained weights. It also includes modular weight components such as **LoRA adapters** and plug-in expert modules. These allow you to incrementally expand or specialize your LLM’s capabilities without retraining the entire model. For example, you could distill structured or stable knowledge into parametric form, save it as a **capability block**, and dynamically load or unload it during inference. This makes it easy to create “expert sub-models” for tasks like legal reasoning, financial analysis, or domain-specific summarization — all managed by MemOS. ## Design Goals - **Controllability** — Generate, load, swap, or compose parametric modules on demand. - **Plasticity** — Evolve alongside plaintext and activation memories; support knowledge distillation and rollback. - **Traceability** *(Coming Soon)* — Versioning and governance for parametric blocks. ## Current Status `Parametric Memory` is currently under design and prototyping. APIs for generating, compressing, and hot-swapping parametric modules will be released in future versions — supporting multi-task, multi-role, and multi-agent architectures. Stay tuned! ## Related Modules While parametric memory is under development, try out these today: - **[GeneralTextMemory](/open_source/modules/memories/general_textual_memory)**: Flexible vector-based semantic storage. - **[TreeTextMemory](/open_source/modules/memories/tree_textual_memory)**: Structured, hierarchical knowledge graphs. - **[Activation Memory](/open_source/modules/memories/kv_cache_memory)**: Efficient runtime state caching. ## Developer Note Parametric Memory will complete MemOS’s vision of a unified **Memory³** architecture: - **Parametric**: Embedded knowledge - **Activation**: Ephemeral runtime states - **Plaintext**: Structured, traceable external memories Bringing all three together enables adaptable, evolvable, and explainable intelligent systems. --- # NaiveTextMemory: Simple Plain Text Memory (/open_source/modules/memories/naive_textual_memory) Let's get started with the MemOS memory system in the simplest way possible! NaiveTextMemory is a lightweight, memory-based, plain-text memory module. It stores memories in an in-memory list and retrieves them using keyword matching. It is the perfect starting point for learning MemOS, as well as an ideal choice for demos, testing, and small-scale applications. ## Table of Contents - [What You'll Learn](#what-youll-learn) - [Why Choose NaiveTextMemory](#why-choose-naivetextmemory) - [Core Concepts](#core-concepts) - [Memory Structure](#memory-structure) - [Metadata Fields](#metadata-fields-textualmemorymetadata) - [Search Mechanism](#search-mechanism) - [API Reference](#api-reference) - [Initialization](#initialization) - [Core Methods](#core-methods) - [Configuration Parameters](#configuration-parameters) - [Hands-On Practice](#hands-on-practice) - [Quick Start](#quick-start) - [Complete Example](#complete-example) - [File Storage](#file-storage) - [Use Case Guide](#use-case-guide) - [Comparison with Other Memory Modules](#comparison-with-other-memory-modules) - [Best Practices](#best-practices) - [Next Steps](#next-steps) ## What You'll Learn By the end of this guide, you will be able to: - Automatically extract structured memories from conversations using LLM - Store and manage memories in memory (no database required) - Search memories using keyword matching - Persist and restore memory data - Understand when to use NaiveTextMemory and when to upgrade to other modules ## Why Choose NaiveTextMemory ### Key Advantages - **Zero Dependencies**: No vector database or embedding model required - **Fast Startup**: Up and running in just a few lines of code - **Lightweight & Efficient**: Low resource footprint, fast execution - **Simple & Intuitive**: Keyword matching with predictable results - **Easy to Debug**: All memories in memory, easy to inspect - **Perfect Starting Point**: The best entry point for learning MemOS ### Suitable Scenarios - Rapid prototyping and proof of concept - Simple conversational agents (< 1000 memories) - Testing and demo scenarios - Resource-constrained environments (cannot run embedding models) - Keyword search scenarios (queries directly match memories) > **Note**: **Performance Tip** > When memory count exceeds 1000, it's recommended to upgrade to [GeneralTextMemory](/open_source/modules/memories/general_textual_memory), which uses vector search for better performance. ## Core Concepts ### Memory Structure Each memory is represented as a `TextualMemoryItem` object with the following fields: | Field | Type | Required | Description | | ---------- | --------------------------- | -------- | ------------------------------------ | | `id` | `str` | ✗ | Unique identifier (auto-generated UUID) | | `memory` | `str` | ✓ | Main text content of the memory | | `metadata` | `TextualMemoryMetadata` | ✗ | Metadata (for categorization, filtering, and retrieval) | ### Metadata Fields (`TextualMemoryMetadata`) Metadata provides rich contextual information for categorization, filtering, and organizing memories: | Field | Type | Default | Description | | ------------- | -------------------------------------------------- | ---------- | ---------------------------------- | | `type` | `"procedure"` / `"fact"` / `"event"` / `"opinion"` | `"fact"` | Memory type classification | | `memory_time` | `str (YYYY-MM-DD)` | Current date | Time associated with the memory | | `source` | `"conversation"` / `"retrieved"` / `"web"` / `"file"` | - | Source of the memory | | `confidence` | `float (0-100)` | 80.0 | Certainty/confidence score | | `entities` | `list[str]` | `[]` | Mentioned entities or concepts | | `tags` | `list[str]` | `[]` | Topic tags | | `visibility` | `"private"` / `"public"` / `"session"` | `"private"` | Access control scope | | `updated_at` | `str` | Auto-generated | Last update timestamp (ISO 8601) | ## API Reference ### Initialization ```python from memos.memories.textual.naive import NaiveTextMemory from memos.configs.memory import NaiveTextMemoryConfig memory = NaiveTextMemory(config: NaiveTextMemoryConfig) ``` ### Core Methods | Method | Parameters | Returns | Description | | ------------------------ | ------------------------------------- | ----------------------------- | --------------------------------------------- | | `extract(messages)` | `messages: list[dict]` | `list[TextualMemoryItem]` | Extract structured memories from conversation using LLM | | `add(memories)` | `memories: list / dict / Item` | `None` | Add one or more memories | | `search(query, top_k)` | `query: str, top_k: int` | `list[TextualMemoryItem]` | Retrieve top-k memories using keyword matching | | `get(memory_id)` | `memory_id: str` | `TextualMemoryItem` | Get a single memory by ID | | `get_by_ids(ids)` | `ids: list[str]` | `list[TextualMemoryItem]` | Batch retrieve memories by ID list | | `get_all()` | - | `list[TextualMemoryItem]` | Return all memories | | `update(memory_id, new)` | `memory_id: str, new: dict` | `None` | Update content or metadata of specified memory | | `delete(ids)` | `ids: list[str]` | `None` | Delete one or more memories | | `delete_all()` | - | `None` | Clear all memories | | `dump(dir)` | `dir: str` | `None` | Serialize memories to JSON file | | `load(dir)` | `dir: str` | `None` | Load memories from JSON file | ### Search Mechanism Unlike `GeneralTextMemory`'s vector semantic search, `NaiveTextMemory` uses a **keyword matching algorithm**: #### Step 1: Tokenization Break down the query and each memory content into lists of tokens #### Step 2: Calculate Match Score Count the number of overlapping tokens between query and memory #### Step 3: Sort Sort all memories by match count in descending order #### Step 4: Return Results Return the top-k memories as search results > **Note**: **Example Comparison** > Query: "cat" > - **Keyword Matching**: Only matches memories containing "cat" > - **Semantic Search**: Also matches memories about "pet", "kitten", "feline", etc. ### Configuration Parameters **NaiveTextMemoryConfig** | Parameter | Type | Required | Default | Description | | ------------------ | ---------------------- | -------- | ---------------------- | ---------------------------------------------- | | `extractor_llm` | `LLMConfigFactory` | ✓ | - | LLM configuration for extracting memories from conversations | | `memory_filename` | `str` | ✗ | `textual_memory.json` | Filename for persistent storage | **Configuration Example** ```json { "backend": "naive_text", "config": { "extractor_llm": { "backend": "openai", "config": { "model_name_or_path": "gpt-4o-mini", "temperature": 0.8, "max_tokens": 1024, "api_base": "xxx", "api_key": "sk-xxx" } }, "memory_filename": "my_memories.json" } } ``` ## Hands-On Practice ### Quick Start Get started with NaiveTextMemory in just 3 steps: #### Step 1: Create Configuration ```python from memos.configs.memory import MemoryConfigFactory config = MemoryConfigFactory( backend="naive_text", config={ "extractor_llm": { "backend": "openai", "config": { "model_name_or_path": "gpt-4o-mini", "api_key": "your-api-key", "api_base": "your-api-base" }, }, }, ) ``` #### Step 2: Initialize Memory Module ```python from memos.memories.factory import MemoryFactory memory = MemoryFactory.from_config(config) ``` #### Step 3: Extract and Add Memories ```python # Automatically extract memories from conversation memories = memory.extract([ {"role": "user", "content": "I love tomatoes."}, {"role": "assistant", "content": "Great! Tomatoes are delicious."}, ]) # Add to memory store memory.add(memories) print(f"✓ Added {len(memories)} memories") ``` > **Note**: **Advanced: Using MultiModal Reader** > If you need to process multimodal content such as images, URLs, or files, use `MultiModalStructMemReader`. > View complete example: [Using MultiModalStructMemReader (Advanced)](./tree_textual_memory#using-multimodalstructmemreader-advanced) ### Complete Example Here's a complete end-to-end example demonstrating all core functionality: ```python from memos.configs.memory import MemoryConfigFactory from memos.memories.factory import MemoryFactory # ======================================== # 1. Initialization # ======================================== config = MemoryConfigFactory( backend="naive_text", config={ "extractor_llm": { "backend": "openai", "config": { "model_name_or_path": "gpt-4o-mini", "api_key": "your-api-key", }, }, }, ) memory = MemoryFactory.from_config(config) # ======================================== # 2. Extract and Add Memories # ======================================== memories = memory.extract([ {"role": "user", "content": "I love tomatoes."}, {"role": "assistant", "content": "Great! Tomatoes are delicious."}, ]) memory.add(memories) print(f"✓ Added {len(memories)} memories") # ======================================== # 3. Search Memories # ======================================== results = memory.search("tomatoes", top_k=2) print(f"\n🔍 Found {len(results)} relevant memories:") for i, item in enumerate(results, 1): print(f" {i}. {item.memory}") # ======================================== # 4. Get All Memories # ======================================== all_memories = memory.get_all() print(f"\n📊 Total {len(all_memories)} memories") # ======================================== # 5. Update Memory # ======================================== if memories: memory_id = memories[0].id memory.update( memory_id, { "memory": "User loves tomatoes.", "metadata": {"type": "opinion", "confidence": 95.0} } ) print(f"\n✓ Updated memory: {memory_id}") # ======================================== # 6. Persist to Storage # ======================================== memory.dump("tmp/mem") print("\n💾 Memories saved to tmp/mem/textual_memory.json") # ======================================== # 7. Load Memories # ======================================== memory.load("tmp/mem") print("✓ Memories loaded from file") # ======================================== # 8. Delete Memories # ======================================== if memories: memory.delete([memories[0].id]) print(f"\n🗑️ Deleted 1 memory") # Delete all memories # memory.delete_all() ``` > **Note**: **Extension: Internet Retrieval** > NaiveTextMemory focuses on local memory management. For retrieving information from the internet and adding it to your memory store, see: > [Retrieve Memories from the Internet (Optional)](./tree_textual_memory#retrieve-memories-from-the-internet-optional) ### File Storage When calling `dump(dir)`, the system saves memories to: ``` / ``` **Default File Structure** ```json [ { "id": "550e8400-e29b-41d4-a716-446655440000", "memory": "User loves tomatoes.", "metadata": { "type": "opinion", "confidence": 95.0, "entities": ["user", "tomatoes"], "tags": ["food", "preference"], "updated_at": "2026-01-14T10:30:00Z" } }, ... ] ``` Use `load(dir)` to fully restore all memory data. > **Note**: **Important Note** > Memories are stored in memory and will be lost after process restart. Remember to call `dump()` regularly to save data! ## Use Case Guide ### Best Suited For - **Rapid Prototyping**: No need to configure vector databases, get started in minutes - **Simple Conversational Agents**: Small-scale applications with < 1000 memories - **Testing and Demos**: Quickly validate memory extraction and retrieval logic - **Resource-Constrained Environments**: Scenarios where embedding models or vector databases cannot run - **Keyword Search**: Scenarios where query content directly matches memory text - **Learning and Teaching**: The best starting point for understanding MemOS memory system ### Not Recommended For - **Large-Scale Applications**: More than 10,000 memories (search performance degrades) - **Semantic Search Needs**: Need to understand synonyms (e.g., "cat" and "pet") - **Production Environments**: Strict performance and accuracy requirements - **Multilingual Scenarios**: Need cross-language semantic understanding - **Complex Relationship Reasoning**: Need to understand relationships between memories > **Alert**: **Upgrade Path** > For the scenarios not recommended above, consider upgrading to: > - [GeneralTextMemory](/open_source/modules/memories/general_textual_memory) - Vector semantic search, suitable for 10K-100K memories > - [TreeTextMemory](/open_source/modules/memories/tree_textual_memory) - Graph structure storage, supports relationship reasoning and multi-hop queries ## Comparison with Other Memory Modules Choosing the right memory module is crucial for project success. This comparison helps you make the decision: | Feature | **NaiveTextMemory** | **GeneralTextMemory** | **TreeTextMemory** | | ------------------ | --------------------- | -------------------------- | --------------------------- | | **Search Method** | Keyword matching | Vector semantic search | Graph structure + vector search | | **Dependencies** | LLM only | LLM + Embedder + Vector DB | LLM + Embedder + Graph DB | | **Suitable Scale** | < 1K | 1K - 100K | 10K - 1M | | **Query Complexity** | O(n) linear scan | O(log n) approximate NN | O(log n) + graph traversal | | **Semantic Understanding** | ❌ | ✅ | ✅ | | **Relationship Reasoning** | ❌ | ❌ | ✅ | | **Multi-Hop Queries** | ❌ | ❌ | ✅ | | **Storage Backend** | In-memory list | Vector DB (Qdrant, etc.) | Graph DB (Neo4j/PolarDB) | | **Configuration Complexity** | Low ⭐ | Medium ⭐⭐ | High ⭐⭐⭐ | | **Learning Curve** | Minimal | Moderate | Steep | | **Production Ready** | ❌ Prototype/demo only | ✅ Suitable for most cases | ✅ Suitable for complex apps | > **Alert**: **Selection Guide** > - **Just getting started?** → Start with NaiveTextMemory > - **Need semantic search?** → Use GeneralTextMemory > - **Need relationship reasoning?** → Choose TreeTextMemory ## Best Practices Follow these recommendations to make the most of NaiveTextMemory: ### 1. Persist Data Regularly ```python # Save immediately after critical operations memory.add(new_memories) memory.dump("tmp/mem") # ✓ Persist immediately # Regular automatic backups import schedule schedule.every(10).minutes.do(lambda: memory.dump("tmp/mem")) ``` ### 2. Control Memory Scale ```python # Regularly clean old memories if len(memory.get_all()) > 1000: old_memories = sorted( memory.get_all(), key=lambda m: m.metadata.updated_at )[:100] # Oldest 100 memory.delete([m.id for m in old_memories]) print("✓ Cleaned 100 old memories") ``` ### 3. Optimize Search Queries ```python # ❌ Poor: Vague query results = memory.search("thing", top_k=5) # ✅ Good: Use specific keywords results = memory.search("tomato", top_k=5) ``` ### 4. Use Metadata Wisely ```python # Set clear metadata when adding memories memory.add({ "memory": "User prefers dark mode", "metadata": { "type": "opinion", # ✓ Clear classification "tags": ["UI", "preference"], # ✓ Easy filtering "confidence": 90.0, # ✓ Mark confidence "entities": ["user", "dark mode"] # ✓ Entity annotation } }) ``` ### 5. Plan Upgrade Path ```python # Monitor memory count and upgrade timely memory_count = len(memory.get_all()) if memory_count > 800: print("⚠️ Memory count approaching limit, consider upgrading to GeneralTextMemory") # Migration code reference: # 1. Export existing memories: memory.dump("backup") # 2. Create GeneralTextMemory configuration # 3. Import memories to new module ``` ## Next Steps Congratulations! You've mastered the core usage of NaiveTextMemory. Next, you can: - **Upgrade to Vector Search**: Learn about [GeneralTextMemory](/open_source/modules/memories/general_textual_memory)'s semantic retrieval capabilities - **Explore Graph Structure**: Understand [TreeTextMemory](/open_source/modules/memories/tree_textual_memory)'s relationship reasoning features - **Integrate into Applications**: Check [Complete API Documentation](/api-reference/search-memories) to build production-grade applications - **Run Example Code**: Browse the `/examples/` directory for more practical cases - **Learn Graph Databases**: If you need advanced features, learn about [Neo4j](/open_source/modules/memories/neo4j_graph_db) or [PolarDB](/open_source/modules/memories/polardb_graph_db) > **Alert**: **Tip** > NaiveTextMemory is the perfect starting point for learning MemOS. When your application needs more powerful features, you can seamlessly migrate to other memory modules! --- # GeneralTextMemory: General-Purpose Textual Memory (/open_source/modules/memories/general_textual_memory) ## Table of Contents - [Memory Structure](#memory-structure) - [Metadata Fields (`TextualMemoryMetadata`)](#metadata-fields-textualmemorymetadata) - [API Summary (`GeneralTextMemory`)](#api-summary-generaltextmemory) - [Initialization](#initialization) - [Core Methods](#core-methods) - [File Storage](#file-storage) - [Example Usage](#example-usage) - [Extension: Internet Retrieval](#extension-internet-retrieval) - [Advanced: Using MultiModal Reader](#advanced-using-multimodal-reader) - [Developer Notes](#developer-notes) ## Memory Structure Each memory is represented as a `TextualMemoryItem`: | Field | Type | Description | | ---------- | --------------------------- | ---------------------------------- | | `id` | `str` | UUID (auto-generated if omitted) | | `memory` | `str` | The main memory content (required) | | `metadata` | `TextualMemoryMetadata` | Metadata for search/filtering | ### Metadata Fields (`TextualMemoryMetadata`) | Field | Type | Description | | ------------- | -------------------------------------------------- | ----------------------------------- | | `type` | `"procedure"`, `"fact"`, `"event"`, `"opinion"` | Memory type | | `memory_time` | `str (YYYY-MM-DD)` | Date/time the memory refers to | | `source` | `"conversation"`, `"retrieved"`, `"web"`, `"file"` | Source of the memory | | `confidence` | `float (0-100)` | Certainty/confidence score | | `entities` | `list[str]` | Key entities/concepts | | `tags` | `list[str]` | Thematic tags | | `visibility` | `"private"`, `"public"`, `"session"` | Access scope | | `updated_at` | `str` | Last update timestamp (ISO 8601) | All values are validated. Invalid values will raise errors. ### Search Mechanism Unlike NaiveTextMemory, which relies on keyword matching, GeneralTextMemory utilizes vector-based semantic search. ## Algorithm Comparison | Feature | Keyword Matching | Vector Semantic Search | | ------------------ | ---------------------------------- | ------------------------------------------ | | **Semantic Understanding** | ❌ Doesn't understand synonyms | ✅ Understands similar concepts | | **Resource Usage** | ✅ Extremely low | ⚠️ Requires embedding model and vector DB | | **Execution Speed** | ✅ Fast (O(n)) | ⚠️ Slower (indexing + querying) | | **Suitable Scale** | < 1K memories | 10K - 100K memories | | **Predictability** | ✅ Intuitive results | ⚠️ Black box model ## API Summary (`GeneralTextMemory`) ### Initialization ```python GeneralTextMemory(config: GeneralTextMemoryConfig) ``` ### Core Methods | Method | Description | | ------------------------ | --------------------------------------------------- | | `extract(messages)` | Extracts memories from message list (LLM-based) | | `add(memories)` | Adds one or more memories (items or dicts) | | `search(query, top_k)` | Retrieves top-k memories using vector similarity | | `get(memory_id)` | Fetch single memory by ID | | `get_by_ids(ids)` | Fetch multiple memories by IDs | | `get_all()` | Returns all memories | | `update(memory_id, new)` | Update a memory by ID | | `delete(ids)` | Delete memories by IDs | | `delete_all()` | Delete all memories | | `dump(dir)` | Serialize all memories to JSON file in directory | | `load(dir)` | Load memories from saved file | ## File Storage When calling `dump(dir)`, the system stores the memories to: ``` / ``` This file contains a JSON list of all memory items, which can be reloaded using `load(dir)`. ## Example Usage ```python import os from memos.configs.memory import MemoryConfigFactory from memos.memories.factory import MemoryFactory config = MemoryConfigFactory( backend="general_text", config={ "extractor_llm": { ... }, "vector_db": { ... }, "embedder": { ... }, }, ) m = MemoryFactory.from_config(config) # Extract and add memories memories = m.extract([ {"role": "user", "content": "I love tomatoes."}, {"role": "assistant", "content": "Great! Tomatoes are delicious."}, ]) m.add(memories) # Search results = m.search("Tell me more about the user", top_k=2) # Update m.update(memory_id, {"memory": "User is Canadian.", ...}) # Delete m.delete([memory_id]) # Dump/load m.dump("tmp/mem") m.load("tmp/mem") ``` > **Note**: **Extension: Internet Retrieval** > GeneralTextMemory can be combined with Internet Retrieval to extract content from web pages and add to memory. > View example: [Retrieve Memories from the Internet](./tree_textual_memory#retrieve-memories-from-the-internet-optional) > **Note**: **Advanced: Using MultiModal Reader** > For processing images, URLs, or files within conversations, see the comprehensive MultiModal Reader examples. > View documentation: [Using MultiModalStructMemReader](./tree_textual_memory#using-multimodalstructmemreader-advanced) ## Developer Notes * Uses Qdrant (or compatible) vector DB for fast similarity search * Embedding and extraction models are configurable (Ollama/OpenAI supported) * All methods are covered by integration tests in `/tests` --- # PreferenceTextMemory: Textual Memory for User Preferences (/open_source/modules/memories/preference_textual_memory) ## Table of Contents - [Why Preference Memory is Needed](#why-preference-memory-is-needed) - [Key Features](#key-features) - [Application Scenarios](#application-scenarios) - [Core Concepts and Workflow](#core-concepts-and-workflow) - [Memory Structure](#memory-structure) - [Metadata Fields (`PreferenceTextualMemoryMetadata`)](#metadata-fields-preferencetextualmemorymetadata) - [Core Workflow](#core-workflow) - [API Reference](#api-reference) - [Initialization](#initialization) - [Core Methods](#core-methods) - [File Storage](#file-storage) - [Hands-on Practice: From Zero to One](#hands-on-practice-from-zero-to-one) - [Create PreferenceTextMemory Configuration](#create-preferencetextmemory-configuration) - [Initialize PreferenceTextMemory](#initialize-preferencetextmemory) - [Extract Structured Memory](#extract-structured-memory) - [Search Memory](#search-memory) - [Backup and Restore](#backup-and-restore) - [Complete Code Example](#complete-code-example) ## Why Preference Memory is Needed ### Key Features - **Dual Preference Extraction**: Automatically identifies explicit and implicit preferences - **Semantic Understanding**: Uses vector embeddings to understand the deep meaning of preferences - **Smart Deduplication**: Automatically detects and merges duplicate or conflicting preferences - **Precise Retrieval**: Semantic search based on vector similarity - **Persistent Storage**: Supports vector databases (Qdrant/Milvus) - **Scalability**: Supports large-scale preference data management - **Personalization Enhancement**: Maintains independent preference profiles for each user ### Application Scenarios - Personalized conversational agents (remembering user likes/dislikes) - Intelligent recommendation systems (recommendations based on preferences) - Customer service systems (providing customized services) - Content filtering systems (filtering content based on preferences) - Learning assistance systems (adapting to learning styles) In conclusion, when you need to build systems that can "remember" user preferences and provide personalized services accordingly, `PreferenceTextMemory` is the best choice. ## Core Concepts and Workflow ### Memory Structure In MemOS, preference memory is represented by `PreferenceTextMemory`, where each memory item is a `TextualMemoryItem` stored in Milvus database. - `id`: Unique memory ID (automatically generated if omitted) - `memory`: Main text content - `metadata`: Includes hierarchical structure information, embeddings, tags, entities, sources, and status Preference memory can be divided into explicit preference memory and implicit preference memory: - **Explicit Preference Memory**: Preferences that users explicitly express. **Examples**: - "I like dark mode" - "I don't eat spicy food" - "Please use short answers" - "I prefer technical documentation over video tutorials" - **Implicit Preference Memory**: Preferences inferred from user behavior and conversation patterns. **Examples**: - User always asks for code examples → prefers practice-oriented learning - User frequently requests detailed explanations → prefers in-depth understanding - User mentions environmental topics multiple times → concerned about sustainable development > **Note**: **Intelligent Extraction** > `PreferenceTextMemory` automatically extracts both explicit and implicit preferences from conversations using LLM, no manual annotation required! ### Metadata Fields (`PreferenceTextualMemoryMetadata`) | Field | Type | Description | | ------------- | -------------------------------------------------- | ----------------------------------- | | `preference_type` | `"explicit_preference"`, `"implicit_preference"` | Preference memory type, divided into explicit and implicit preference memory | | `dialog_id` | `str` | Dialog ID, used to associate preference memory with specific dialogs | | `original_text` | `str` | Original text containing user preference information | | `embedding` | `str` | Embedding vector for semantic search and retrieval | | `preference` | `str` | User preference information | | `create_at` | `str` | Creation timestamp (ISO 8601) | | `mem_cube_id` | `str` | Memory cube ID, used to associate preference memory with specific memory cubes | | `score` | `float ` | Similarity score between preference memory and query in search results | ### Core Workflow When you run this example, your workflow will: 1. **Extraction:** Use LLM to extract structured memory from raw text. 2. **Embedding:** Generate vector embeddings for similarity search. 3. **Storage:** Store preference memory in Milvus database while updating metadata fields. 4. **Search:** Return the most relevant preference memories through vector similarity queries. ## API Reference ### Initialization ```python PreferenceTextMemory(config: PreferenceTextMemoryConfig) ``` ### Core Methods | Method | Description | | --------------------------- | ----------------------------------------------------- | | `get_memory(messages)` | Extract preference memories from original dialogues. | | `search(query, top_k)` | Retrieve top-k preference memories using vector similarity. | | `load(dir)` | Load preference memories from stored files. | | `dump(dir)` | Serialize all preference memories to JSON files in the directory. | | `add(memories)` | Batch add preference memories to Milvus database. | | `get_with_collection_name(collection_name, memory_id)` | Get specific type of preference memory by collection name and memory ID. | | `get_by_ids_with_collection_name(collection_name, memory_ids)` | Batch get specific type of preference memory by collection name and memory IDs. | | `get_all()` | Get all preference memories. | | `get_memory_by_filter(filter)` | Get preference memories based on filter conditions. | | `delete(memory_ids)` | Delete preference memories by specified IDs. | | `delete_by_filter(filter)` | Delete preference memories based on filter conditions. | | `delete_with_collection_name(collection_name, memory_ids)` | Delete all preference memories with specified collection name and IDs. | | `delete_all()` | Delete all preference memories. | ### File Storage When calling `dump(dir)`, MemOS will serialize all preference memories to JSON files in the directory: ``` / ``` --- ## Hands-on Practice: From Zero to One ### Create PreferenceTextMemory Configuration Define: - Your embedding model (e.g., nomic-embed-text:latest), - Your Milvus database backend, - Memory extractor (based on LLM) (optional). ```python from memos.configs.memory import PreferenceTextMemoryConfig config = PreferenceTextMemoryConfig.from_json_file("examples/data/config/preference_config.json") ``` ### Initialize PreferenceTextMemory ```python from memos.memories.textual.preference import PreferenceTextMemory preference_memory = PreferenceTextMemory(config) ``` ### Extract Structured Memory Use the memory extractor to parse dialogues, files, or documents into multiple `TextualMemoryItem`. ```python scene_data = [[ {"role": "user", "content": "Tell me about your childhood."}, {"role": "assistant", "content": "I loved playing in the garden with my dog."} ]] memories = preference_memory.get_memory(scene_data, type="chat", info={"user_id": "1234"}) preference_memory.add(memories) ``` ### Search Memory ```python results = preference_memory.search("Tell me more about the user", top_k=2) ``` ### Backup and Restore Support persistent storage and on-demand reloading of preference memories: ```python preference_memory.dump("tmp/pref_memories") preference_memory.load("tmp/pref_memories") ``` ### Complete Code Example This example integrates all the above steps, providing an end-to-end complete workflow — copy and run! ```python from memos.configs.memory import PreferenceTextMemoryConfig from memos.memories.textual.preference import PreferenceTextMemory # Create PreferenceTextMemory config = PreferenceTextMemoryConfig.from_json_file("examples/data/config/preference_config.json") preference_memory = PreferenceTextMemory(config) preference_memory.delete_all() scene_data = [[ {"role": "user", "content": "Tell me about your childhood."}, {"role": "assistant", "content": "I loved playing in the garden with my dog."} ]] # Extract preference memories from original dialogues and add to Milvus database memories = preference_memory.get_memory(scene_data, type="chat", info={"user_id": "1234"}) preference_memory.add(memories) # Search memory results = preference_memory.search("Tell me more about the user", top_k=2) # Persist preference memories preference_memory.dump("tmp/pref_memories") ``` --- # TreeTextMemory: Structured Hierarchical Textual Memory (/open_source/modules/memories/tree_textual_memory) ## Table of Contents - [What You’ll Learn](#what-youll-learn) - [Core Concepts and Workflow](#core-concepts-and-workflow) - [Memory Structure](#memory-structure) - [Metadata Fields](#metadata-fields-treenodetextualmemorymetadata) - [Core Workflow](#core-workflow) - [API Reference](#api-reference) - [Hands-on: From 0 to 1](#hands-on-from-0-to-1) - [Create TreeTextMemory Config](#create-treetextmemory-config) - [Initialize TreeTextMemory](#initialize-treetextmemory) - [Extract Structured Memories](#extract-structured-memories) - [Search Memories](#search-memories) - [Retrieve Memories from the Internet (Optional)](#retrieve-memories-from-the-internet-optional) - [Replace Working Memory](#replace-working-memory) - [Backup & Restore](#backup--restore) - [Full Code Example](#full-code-example) - [Why Choose TreeTextMemory](#why-choose-treetextmemory) - [What’s Next](#whats-next) ## What You’ll Learn By the end of this guide, you will: - Extract structured memories from raw text or conversations. - Store them as **nodes** in a graph database. - Link memories into **hierarchies** and semantic graphs. - Search them using **vector similarity + graph traversal**. ## Core Concepts and Workflow ### Memory Structure Every node in your `TreeTextMemory` is a `TextualMemoryItem`: - `id`: Unique memory ID (auto-generated if omitted). - `memory`: the main text. - `metadata`: includes hierarchy info, embeddings, tags, entities, source, and status. ### Metadata Fields (`TreeNodeTextualMemoryMetadata`) | Field | Type | Description | | --------------- |-------------------------------------------------------| ------------------------------------------ | | `memory_type` | `"WorkingMemory"`, `"LongTermMemory"`, `"UserMemory"` | Lifecycle category | | `status` | `"activated"`, `"archived"`, `"deleted"` | Node status | | `visibility` | `"private"`, `"public"`, `"session"` | Access scope | | `sources` | `list[str]` | List of sources (e.g. files, URLs) | | `source` | `"conversation"`, `"retrieved"`, `"web"`, `"file"` | Original source type | | `confidence` | `float (0-100)` | Certainty score | | `entities` | `list[str]` | Mentioned entities or concepts | | `tags` | `list[str]` | Thematic tags | | `embedding` | `list[float]` | Vector embedding for similarity search | | `created_at` | `str` | Creation timestamp (ISO 8601) | | `updated_at` | `str` | Last update timestamp (ISO 8601) | | `usage` | `list[str]` | Usage history | | `background` | `str` | Additional context | > **Note**: **Best Practice** > Use meaningful tags and background — they help organize your graph for > multi-hop reasoning. ### Core Workflow When you run this example, your workflow will: 1. **Extract:** Use an LLM to pull structured memories from raw text. 2. **Embed:** Generate vector embeddings for similarity search. 3. **Store & Link:** Add nodes to your graph database (Neo4j) with relationships. 4. **Search:** Query by vector similarity, then expand results by graph hops. > **Note**: **Hint** Graph links help retrieve context that pure vector search might miss! ## API Reference ### Initialization ```python TreeTextMemory(config: TreeTextMemoryConfig) ``` ### Core Methods | Method | Description | | --------------------------- | ----------------------------------------------------- | | `add(memories)` | Add one or more memories (items or dicts) | | `replace_working_memory()` | Replace all WorkingMemory nodes | | `get_working_memory()` | Get all WorkingMemory nodes | | `search(query, top_k)` | Retrieve top-k memories using vector + graph search | | `get(memory_id)` | Fetch single memory by ID | | `get_by_ids(ids)` | Fetch multiple memories by IDs | | `get_all()` | Export the full memory graph as dictionary | | `update(memory_id, new)` | Update a memory by ID | | `delete(ids)` | Delete memories by IDs | | `delete_all()` | Delete all memories and relationships | | `dump(dir)` | Serialize the graph to JSON in directory | | `load(dir)` | Load graph from saved JSON file | | `drop(keep_last_n)` | Backup graph & drop database, keeping N backups | ### File Storage When calling `dump(dir)`, the system writes to: ``` / ``` This file contains a JSON structure with `nodes` and `edges`. It can be reloaded using `load(dir)`. --- ## Hands-on: From 0 to 1 ### Create TreeTextMemory Config Define: - your embedder (to create vectors), - your graph DB backend (Neo4j), - and your extractor LLM (optional). ```python from memos.configs.memory import TreeTextMemoryConfig config = TreeTextMemoryConfig.from_json_file("examples/data/config/tree_config.json") ``` ### Initialize TreeTextMemory ```python from memos.memories.textual.tree import TreeTextMemory tree_memory = TreeTextMemory(config) ``` ### Extract Structured Memories Use your extractor to parse conversations, files, or docs into `TextualMemoryItem`s. ```python from memos.mem_reader.simple_struct import SimpleStructMemReader reader = SimpleStructMemReader.from_json_file("examples/data/config/simple_struct_reader_config.json") scene_data = [[ {"role": "user", "content": "Tell me about your childhood."}, {"role": "assistant", "content": "I loved playing in the garden with my dog."} ]] memories = reader.get_memory(scene_data, type="chat", info={"user_id": "1234"}) for m_list in memories: tree_memory.add(m_list) ``` #### Using MultiModalStructMemReader (Advanced) `MultiModalStructMemReader` supports processing multimodal content (text, images, URLs, files, etc.) and intelligently routes to different parsers: ```python from memos.configs.mem_reader import MultiModalStructMemReaderConfig from memos.mem_reader.multi_modal_struct import MultiModalStructMemReader # Create MultiModal Reader configuration multimodal_config = MultiModalStructMemReaderConfig( llm={ "backend": "openai", "config": { "model_name_or_path": "gpt-4o-mini", "api_key": "your-api-key" } }, embedder={ "backend": "openai", "config": { "model_name_or_path": "text-embedding-3-small", "api_key": "your-api-key" } }, chunker={ "backend": "text_splitter", "config": { "chunk_size": 1000, "chunk_overlap": 200 } }, extractor_llm={ "backend": "openai", "config": { "model_name_or_path": "gpt-4o-mini", "api_key": "your-api-key" } }, # Optional: specify which domains should return Markdown directly direct_markdown_hostnames=["github.com", "docs.python.org"] ) # Initialize MultiModal Reader multimodal_reader = MultiModalStructMemReader(multimodal_config) # ======================================== # Example 1: Process conversations with images # ======================================== scene_with_image = [[ { "role": "user", "content": [ {"type": "text", "text": "This is my garden"}, {"type": "image_url", "image_url": {"url": "https://example.com/garden.jpg"}} ] }, { "role": "assistant", "content": "Your garden looks beautiful!" } ]] memories = multimodal_reader.get_memory( scene_with_image, type="chat", info={"user_id": "1234", "session_id": "session_001"} ) for m_list in memories: tree_memory.add(m_list) print(f"✓ Added {len(memories)} multimodal memories") # ======================================== # Example 2: Process web URLs # ======================================== scene_with_url = [[ { "role": "user", "content": "Please analyze this article: https://example.com/article.html" }, { "role": "assistant", "content": "I'll help you analyze this article" } ]] url_memories = multimodal_reader.get_memory( scene_with_url, type="chat", info={"user_id": "1234", "session_id": "session_002"} ) for m_list in url_memories: tree_memory.add(m_list) print(f"✓ Extracted and added {len(url_memories)} memories from URL") # ======================================== # Example 3: Process local files # ======================================== # Supported file types: PDF, DOCX, TXT, Markdown, HTML, etc. file_paths = [ "./documents/report.pdf", "./documents/notes.md", "./documents/data.txt" ] file_memories = multimodal_reader.get_memory( file_paths, type="doc", info={"user_id": "1234", "session_id": "session_003"} ) for m_list in file_memories: tree_memory.add(m_list) print(f"✓ Extracted and added {len(file_memories)} memories from files") # ======================================== # Example 4: Mixed mode (text + images + URLs) # ======================================== mixed_scene = [[ { "role": "user", "content": [ {"type": "text", "text": "Here's my project documentation:"}, {"type": "text", "text": "https://github.com/user/project/README.md"}, {"type": "image_url", "image_url": {"url": "https://example.com/diagram.png"}} ] } ]] mixed_memories = multimodal_reader.get_memory( mixed_scene, type="chat", info={"user_id": "1234", "session_id": "session_004"} ) for m_list in mixed_memories: tree_memory.add(m_list) print(f"✓ Extracted and added {len(mixed_memories)} memories from mixed content") ``` > **Note**: **MultiModal Reader Advantages** > - **Smart Routing**: Automatically identifies content type (image/URL/file) and selects appropriate parser > - **Format Support**: Supports PDF, DOCX, Markdown, HTML, images, and more > - **URL Parsing**: Automatically extracts web content (including GitHub, documentation sites, etc.) > - **Large File Handling**: Automatically chunks oversized files to avoid token limits > - **Context Preservation**: Uses sliding window to maintain context continuity between chunks > **Note**: **Configuration Tips** > - Use the `direct_markdown_hostnames` parameter to specify which domains should return Markdown format > - Supports both `mode="fast"` and `mode="fine"` extraction modes; fine mode extracts more details > - See complete examples: `/examples/mem_reader/multimodal_struct_reader.py` ### Search Memories Try a vector + graph search: ```python results = tree_memory.search("Talk about the garden", top_k=5) for i, node in enumerate(results): print(f"{i}: {node.memory}") ``` ### Retrieve Memories from the Internet (Optional) You can also fetch real-time web content using search engines such as Google, Bing, or Bocha, and automatically extract them into structured memory nodes. MemOS provides a unified interface for this purpose. The following example demonstrates how to retrieve web content related to **“Alibaba 2024 ESG report”** and convert it into structured memories: ```python # Create the embedder embedder = EmbedderFactory.from_config( EmbedderConfigFactory.model_validate({ "backend": "ollama", "config": {"model_name_or_path": "nomic-embed-text:latest"}, }) ) # Configure the retriever (using BochaAI as an example) retriever_config = InternetRetrieverConfigFactory.model_validate({ "backend": "bocha", "config": { "api_key": "sk-xxx", # Replace with your BochaAI API Key "max_results": 5, "reader": { # Reader config for automatic chunking "backend": "simple_struct", "config": ..., # Your mem-reader config }, } }) # Instantiate the retriever retriever = InternetRetrieverFactory.from_config(retriever_config, embedder) # Perform internet search results = retriever.retrieve_from_internet("Alibaba 2024 ESG report") # Add results to the memory graph for m in results: tree_memory.add(m) ``` Alternatively, you can configure the `internet_retriever` field directly in the `TreeTextMemoryConfig`. For example: ```json { "internet_retriever": { "backend": "bocha", "config": { "api_key": "sk-xxx", "max_results": 5, "reader": { "backend": "simple_struct", "config": ... } } } } ``` With this setup, when you call `tree_memory.search(query)`, the system will automatically trigger an internet search (via BochaAI, Google, or Bing), and merge the results with local memory nodes in a unified ranked list — no need to manually call `retriever.retrieve_from_internet`. ### Replace Working Memory Replace your current `WorkingMemory` nodes with new ones: ```python tree_memory.replace_working_memory( [{ "memory": "User is discussing gardening tips.", "metadata": {"memory_type": "WorkingMemory"} }] ) ``` ### Backup & Restore Dump your entire tree structure to disk and reload anytime: ```python tree_memory.dump("tmp/tree_memories") tree_memory.load("tmp/tree_memories") ``` ### Full Code Example This combines all the steps above into one end-to-end example — copy & run! ```python from memos.configs.embedder import EmbedderConfigFactory from memos.configs.memory import TreeTextMemoryConfig from memos.configs.mem_reader import SimpleStructMemReaderConfig from memos.embedders.factory import EmbedderFactory from memos.mem_reader.simple_struct import SimpleStructMemReader from memos.memories.textual.tree import TreeTextMemory # Setup Embedder embedder_config = EmbedderConfigFactory.model_validate({ "backend": "ollama", "config": {"model_name_or_path": "nomic-embed-text:latest"} }) embedder = EmbedderFactory.from_config(embedder_config) # Create TreeTextMemory tree_config = TreeTextMemoryConfig.from_json_file("examples/data/config/tree_config.json") my_tree_textual_memory = TreeTextMemory(tree_config) my_tree_textual_memory.delete_all() # Setup Reader reader_config = SimpleStructMemReaderConfig.from_json_file( "examples/data/config/simple_struct_reader_config.json" ) reader = SimpleStructMemReader(reader_config) # Extract from conversation scene_data = [[ { "role": "user", "content": "Tell me about your childhood." }, { "role": "assistant", "content": "I loved playing in the garden with my dog." }, ]] memory = reader.get_memory(scene_data, type="chat", info={"user_id": "1234", "session_id": "2222"}) for m_list in memory: my_tree_textual_memory.add(m_list) # Search results = my_tree_textual_memory.search( "Talk about the user's childhood story?", top_k=10 ) for i, r in enumerate(results): print(f"{i}'th result: {r.memory}") # [Optional] Add from documents doc_paths = ["./text1.txt", "./text2.txt"] doc_memory = reader.get_memory( doc_paths, "doc", info={ "user_id": "your_user_id", "session_id": "your_session_id", } ) for m_list in doc_memory: my_tree_textual_memory.add(m_list) # [Optional] Dump & Drop my_tree_textual_memory.dump("tmp/my_tree_textual_memory") my_tree_textual_memory.drop() ``` ## Why Choose TreeTextMemory - **Structured Hierarchy:** Organize memories like a mind map — nodes can have parents, children, and cross-links. - **Graph-Style Linking:** Beyond pure hierarchy — build multi-hop reasoning chains. - **Semantic Search + Graph Expansion:** Combine the best of vectors and graphs. - **Explainability:** Trace how memories connect, merge, or evolve over time. > **Note**: **Try This** Add memory nodes from documents or web content. Link them > manually or auto-merge similar nodes! ## What’s Next - **Know more about [Neo4j](/open_source/modules/memories/neo4j_graph_db):** TreeTextMemory is powered by a graph database backend. Understanding how Neo4j handles nodes, edges, and traversal will help you design more efficient memory hierarchies, multi-hop reasoning, and context linking strategies. - **Add [Activation Memory](/open_source/modules/memories/kv_cache_memory):** Experiment with runtime KV-cache for session state. - **Explore Graph Reasoning:** Build workflows for multi-hop retrieval and answer synthesis. - **Go Deep:** Check the [API Reference](/api-reference/search-memories) for advanced usage, or run more examples in `examples/`. Now your agent remembers not just facts — but the connections between them! --- # Neo4j Graph Database (/open_source/modules/memories/neo4j_graph_db) ## Why Graph for Memory? Unlike flat vector stores, a graph database allows: - Structuring memory into **chains, hierarchies, and causal links** - Performing **multi-hop reasoning** and **subgraph traversal** - Supporting memory **deduplication, conflict detection, and scheduling** - Dynamically evolving a memory graph over time This forms the backbone of long-term, explainable, and compositional memory reasoning. ## Features - Unified interface across different graph databases - Built-in support for Neo4j - Support for vector-enhanced retrieval (`search_by_embedding`) - Modular, pluggable, and testable - [v0.2.1 New! ] Supports **multi-tenant graph memory architecture** (shared DB, per-user logic) - [v0.2.1 New! ] Compatible with **Neo4j Community Edition** environments ## Directory Structure ``` src/memos/graph_dbs/ ├── base.py # Abstract interface: BaseGraphDB ├── factory.py # Factory to instantiate GraphDB from config ├── neo4j.py # Neo4jGraphDB: production implementation ```` ## How to Use ```python from memos.graph_dbs.factory import GraphStoreFactory from memos.configs.graph_db import GraphDBConfigFactory # Step 1: Build factory config config = GraphDBConfigFactory( backend="neo4j", config={ "uri": "bolt://localhost:7687", "user": "your_neo4j_user_name", "password": "your_password", "db_name": "memory_user1", "auto_create": True, "embedding_dimension": 768 } ) # Step 2: Instantiate the graph store graph = GraphStoreFactory.from_config(config) # Step 3: Add memory graph.add_node( id="node-001", memory="Today I learned about retrieval-augmented generation.", metadata={"type": "WorkingMemory", "tags": ["RAG", "AI"], "timestamp": "2025-06-05", "sources": []} ) ```` ## Pluggable Design ### Interface: `BaseGraphDB` ```` Function Introduction: 1. Node Operations: Insert: add_node (Adds a single node) add_nodes_batch (Adds multiple nodes in batch) Query: get_node (Retrieves a single node) get_nodes (Retrieves multiple nodes) get_memory_count (Retrieves the count of nodes) node_not_exist (Checks if a node exists) search_by_embedding (Vector search supports adding filter conditions for filtering. For usage of the filter, refer to the function neo4j_example.example_complex_shared_db_search_filter for the complete method documentation.) Update: update_node (Updates a single node) Delete: delete_node (Deletes a single node) clear(deletes all associated nodes by the user_name attribute.) See neo4j_example.example_complex_shared_db_delete_memory for full method docs 2. Edge Operations: Insert: add_edge (Adds a triple/relation as a memory element) Query: get_edges (Retrieves multiple relations/edges) edge_exists (Checks if a relation/edge exists) get_children_with_embeddings (Retrieves a list of child nodes for the PARENT relation type) get_subgraph (Queries multi-hop nodes/retrieves a subgraph) Delete: delete_edge (Deletes a relation/edge) 3. Import/Export Operations: import_graph (Imports an entire graph from a serialized dictionary. Parameters: A dictionary containing all nodes and edges to load, format: {'nodes': [], 'edges': []}) export_graph (Exports all graph nodes and edges in a structured format, with pagination support) See src/memos/graph_dbs/base.py for full method docs. ```` ### Current Backend: | Backend | Status | File | | ------- | ------ | ---------- | | Neo4j | Stable | `neo4j.py` | ## Shared DB, Multi-Tenant Support By specifying the `user_name` field, MemOS can isolate memory graphs for multiple users in a single Neo4j database. Ideal for collaborative systems or multi-agent applications: ```python config = GraphDBConfigFactory( backend="neo4j", config={ "uri": "bolt://localhost:7687", "user": "neo4j", "password": "your_password", "db_name": "shared-graph", "user_name": "alice", "use_multi_db": False, "embedding_dimension": 768, }, ) ``` User data is logically isolated via the `user_name` field. Filtering is handled automatically during reads, writes, and searches. > **Note**: **Example? You bet.** > No blah blah, just go check the code: > `examples/basic_modules/neo4j_example.example_complex_shared_db(db_name="shared-traval-group-complex-new")` ## Neo4j Community Edition Support New backend identifier: `neo4j-community` Usage is similar to standard Neo4j, but disables Enterprise-only features: - ❌ No support for `auto_create` databases - ❌ No native vector indexes (External vector library must be used, currently only Qdrant is supported) - ✅ Enforces `user_name` logic-based isolation(Community version or username belong to the same business and do not require strong isolation) Example configuration: ```python config = GraphDBConfigFactory( backend="neo4j-community", config={ "uri": "bolt://localhost:7687", "user": "neo4j", "password": "12345678", "db_name": "paper", "user_name": "bob", "auto_create": False, "embedding_dimension": 768, "use_multi_db": False, "vec_config": { "backend": "qdrant", "config": { "host": "localhost", "port": 6333, "collection_name": "neo4j_vec_db", "vector_dimension": 768, "distance_metric": "cosine" }, }, }, ) ``` > **Note**: **Example? You bet.** > No blah blah, just go check the code: > `examples/basic_modules/neo4j_example.example_complex_shared_db(db_name="paper", community=True)` ## Extending You can add support for any other graph engine (e.g., **TigerGraph**, **DGraph**, **Weaviate hybrid**) by: 1. Subclassing `BaseGraphDB` 2. Creating a config dataclass (e.g., `DgraphConfig`) 3. Registering it in: * `GraphDBConfigFactory.backend_to_class` * `GraphStoreFactory.backend_to_class` See `src/memos/graph_dbs/neo4j.py` as a reference for implementation. --- # PolarDB Graph Database (/open_source/modules/memories/polardb_graph_db) ## Features - Complete graph database operations: node CRUD, edge management - Vector embedding search: semantic retrieval with IVFFlat index support - Connection pool management: automatic database connection management with high concurrency support - Multi-tenant isolation: supports both physical and logical isolation modes - JSONB property storage: flexible metadata storage - Batch operations: supports batch insertion of nodes and edges - Automatic timestamps: automatically maintains `created_at` and `updated_at` - SQL injection protection: built-in parameterized queries and string escaping ## Directory Structure ``` MemOS/ └── src/ └── memos/ ├── configs/ │ └── graph_db.py # PolarDBGraphDBConfig configuration class └── graph_dbs/ ├── base.py # BaseGraphDB abstract base class ├── factory.py # GraphDBFactory factory class └── polardb.py # PolarDBGraphDB implementation ``` ## Quick Start ### 1. Install Dependencies ```bash # Install psycopg2 driver (choose one) pip install psycopg2-binary # Recommended: pre-compiled version # or pip install psycopg2 # Requires PostgreSQL development libraries # Install MemOS pip install MemoryOS -U ``` ### 2. Configure PolarDB #### Method 1: Using Configuration File (Recommended) ```json { "graph_db_store": { "backend": "polardb", "config": { "host": "localhost", "port": 5432, "user": "postgres", "password": "your_password", "db_name": "memos_db", "user_name": "alice", "use_multi_db": true, "auto_create": false, "embedding_dimension": 1024, "maxconn": 100 } } } ``` #### Method 2: Code Initialization ```python from memos.configs.graph_db import PolarDBGraphDBConfig from memos.graph_dbs.polardb import PolarDBGraphDB # Create configuration config = PolarDBGraphDBConfig( host="localhost", port=5432, user="postgres", password="your_password", db_name="memos_db", user_name="alice", use_multi_db=True, embedding_dimension=1024, maxconn=100 ) # Initialize database graph_db = PolarDBGraphDB(config) ``` ### 3. Basic Operation Examples ```python # ======================================== # Step 1: Add Node # ======================================== node_id = graph_db.add_node( label="Memory", properties={ "content": "Python is a high-level programming language", "memory_type": "Knowledge", "tags": ["programming", "python"] }, embedding=[0.1, 0.2, 0.3, ...], # 1024-dimensional vector user_name="alice" ) print(f"✓ Node created: {node_id}") # ======================================== # Step 2: Update Node # ======================================== graph_db.update_node( id=node_id, fields={ "content": "Python is an interpreted, object-oriented high-level programming language", "updated": True }, user_name="alice" ) print("✓ Node updated") # ======================================== # Step 3: Create Relationship # ======================================== # First create a second node node_id_2 = graph_db.add_node( label="Memory", properties={ "content": "Django is a web framework for Python", "memory_type": "Knowledge" }, embedding=[0.15, 0.25, 0.35, ...], user_name="alice" ) # Create edge edge_id = graph_db.add_edge( source_id=node_id, target_id=node_id_2, edge_type="RELATED_TO", properties={ "relationship": "framework and language", "confidence": 0.95 }, user_name="alice" ) print(f"✓ Relationship created: {edge_id}") # ======================================== # Step 4: Vector Search # ======================================== query_embedding = [0.12, 0.22, 0.32, ...] # Query vector results = graph_db.search_by_embedding( embedding=query_embedding, top_k=5, memory_type="Knowledge", user_name="alice" ) print(f"\n🔍 Found {len(results)} similar nodes:") for node in results: print(f" - {node.get('content')} (similarity: {node.get('score', 'N/A')})") # ======================================== # Step 5: Delete Node # ======================================== graph_db.delete_node(id=node_id, user_name="alice") print(f"✓ Node {node_id} deleted") ``` ## Configuration Details ### PolarDBGraphDBConfig Parameters | Parameter | Type | Default | Required | Description | |------|------|--------|------|------| | `host` | str | - | ✓ | Database host address | | `port` | int | 5432 | ✗ | Database port | | `user` | str | - | ✓ | Database username | | `password` | str | - | ✓ | Database password | | `db_name` | str | - | ✓ | Target database name | | `user_name` | str | None | ✗ | Tenant identifier (for logical isolation) | | `use_multi_db` | bool | True | ✗ | Whether to use multi-database physical isolation | | `auto_create` | bool | False | ✗ | Whether to automatically create database | | `embedding_dimension` | int | 1024 | ✗ | Vector embedding dimension | | `maxconn` | int | 100 | ✗ | Maximum connections in connection pool | ### Multi-Tenant Mode Comparison | Feature | Physical Isolation (`use_multi_db=True`) | Logical Isolation (`use_multi_db=False`) | |------|-----------------------------------|-------------------------------------| | **Isolation Level** | Database level | Application layer tag filtering | | **Configuration Requirements** | `db_name` typically equals `user_name` | Must provide `user_name` | | **Performance** | Better (independent resources) | Good (shared resources) | | **Cost** | High (independent DB per tenant) | Low (shared database) | | **Use Cases** | Enterprise customers, high security requirements | SaaS multi-tenant, development testing | | **Data Migration** | Convenient (full database export) | Requires filtering by tags | ### Configuration Examples #### Example 1: Physical Isolation (Recommended for Enterprise) ```json { "graph_db_store": { "backend": "polardb", "config": { "host": "prod-polardb.example.com", "port": 5432, "user": "admin", "password": "secure_password", "db_name": "customer_001", "user_name": null, "use_multi_db": true, "auto_create": false, "embedding_dimension": 1536, "maxconn": 200 } } } ``` #### Example 2: Logical Isolation (Recommended for SaaS) ```json { "graph_db_store": { "backend": "polardb", "config": { "host": "shared-polardb.example.com", "port": 5432, "user": "app_user", "password": "app_password", "db_name": "shared_memos", "user_name": "tenant_alice", "use_multi_db": false, "auto_create": false, "embedding_dimension": 768, "maxconn": 50 } } } ``` ## Advanced Features ### 1. Batch Insert Nodes ```python # Batch add nodes (high performance) nodes_data = [ { "label": "Memory", "properties": {"content": f"Node {i}", "memory_type": "Test"}, "embedding": [0.1 * i] * 1024, } for i in range(100) ] node_ids = graph_db.add_nodes_batch( nodes=nodes_data, user_name="alice" ) print(f"✓ Batch created {len(node_ids)} nodes") ``` ### 2. Complex Query Examples ```python # Find memories of specific type and sort by time def get_recent_memories(graph_db, memory_type, limit=10): """Get recent memory nodes""" query = f""" SELECT * FROM "{graph_db.db_name}_graph"."Memory" WHERE properties->>'memory_type' = %s AND properties->>'user_name' = %s ORDER BY updated_at DESC LIMIT %s """ conn = graph_db._get_connection() try: with conn.cursor() as cursor: cursor.execute(query, [memory_type, "alice", limit]) results = cursor.fetchall() return results finally: graph_db._return_connection(conn) # Usage example recent = get_recent_memories(graph_db, "WorkingMemory", limit=5) print(f"Recent 5 working memories: {len(recent)} items") ``` ### 3. Vector Index Optimization ```python # Create or update vector index graph_db.create_index( label="Memory", vector_property="embedding", dimensions=1024, index_name="memory_vector_index" ) print("✓ Vector index optimized") ``` ### 4. Connection Pool Monitoring ```python # View connection pool status (for debugging only) import logging logging.basicConfig(level=logging.DEBUG) # Detailed logs will be output when acquiring connection conn = graph_db._get_connection() # [DEBUG] [_get_connection] Successfully acquired connection from pool graph_db._return_connection(conn) # [DEBUG] [_return_connection] Successfully returned connection to pool ``` ## BaseGraphDB Interface PolarDB implements all methods of the `BaseGraphDB` abstract class, ensuring interoperability with other graph database backends. ### Core Methods | Method | Description | Parameters | |------|------|------| | `add_node()` | Add a single node | label, properties, embedding, user_name | | `add_nodes_batch()` | Batch add nodes | nodes, user_name | | `update_node()` | Update node properties | id, fields, user_name | | `delete_node()` | Delete node | id, user_name | | `delete_node_by_params()` | Delete nodes by conditions | params, user_name | | `add_edge()` | Create relationship | source_id, target_id, edge_type, properties, user_name | | `update_edge()` | Update relationship properties | edge_id, properties, user_name | | `delete_edge()` | Delete relationship | edge_id, user_name | | `search_by_embedding()` | Vector similarity search | embedding, top_k, memory_type, user_name | | `get_node()` | Get a single node | id, user_name | | `get_memory_count()` | Count nodes | memory_type, user_name | | `remove_oldest_memory()` | Clean old memories | memory_type, keep_latest, user_name | ### Complete Method Signature Examples ```python from typing import Any # Add node def add_node( self, label: str = "Memory", properties: dict[str, Any] | None = None, embedding: list[float] | None = None, user_name: str | None = None ) -> str: """Add a new node to the graph database""" pass # Vector search def search_by_embedding( self, embedding: list[float], top_k: int = 10, memory_type: str | None = None, user_name: str | None = None, filters: dict[str, Any] | None = None ) -> list[dict[str, Any]]: """Perform similarity search based on vector embedding""" pass # Batch operations def add_nodes_batch( self, nodes: list[dict[str, Any]], user_name: str | None = None ) -> list[str]: """Batch add multiple nodes""" pass ``` ## Extension Development Guide If you need to implement custom functionality based on PolarDB, you can inherit the `PolarDBGraphDB` class: ```python from memos.graph_dbs.polardb import PolarDBGraphDB from memos.configs.graph_db import PolarDBGraphDBConfig class CustomPolarDBGraphDB(PolarDBGraphDB): """Custom PolarDB graph database implementation""" def __init__(self, config: PolarDBGraphDBConfig): super().__init__(config) # Custom initialization logic self.custom_index_created = False def create_custom_index(self): """Create custom index""" conn = self._get_connection() try: with conn.cursor() as cursor: cursor.execute(f""" CREATE INDEX IF NOT EXISTS idx_custom_field ON "{self.db_name}_graph"."Memory" ((properties->>'custom_field')); """) conn.commit() self.custom_index_created = True print("✓ Custom index created") except Exception as e: print(f"❌ Failed to create index: {e}") conn.rollback() finally: self._return_connection(conn) def search_by_custom_field(self, field_value: str): """Search based on custom field""" query = f""" SELECT * FROM "{self.db_name}_graph"."Memory" WHERE properties->>'custom_field' = %s """ conn = self._get_connection() try: with conn.cursor() as cursor: cursor.execute(query, [field_value]) results = cursor.fetchall() return results finally: self._return_connection(conn) # Use custom implementation config = PolarDBGraphDBConfig( host="localhost", port=5432, user="postgres", password="password", db_name="custom_db" ) custom_db = CustomPolarDBGraphDB(config) custom_db.create_custom_index() results = custom_db.search_by_custom_field("special_value") ``` ## Reference Resources - [Apache AGE Official Documentation](https://age.apache.org/) - [PostgreSQL Connection Pool Documentation](https://www.psycopg.org/docs/pool.html) - [PolarDB Official Documentation](https://www.alibabacloud.com/product/polardb) - [MemOS GitHub Repository](https://github.com/MemOS-AI/MemOS) ## Next Steps - Learn about using [Neo4j Graph Database](./neo4j_graph_db.md) - Check out [General Textual Memory](./general_textual_memory.md) configuration - Explore advanced features of [Tree Textual Memory](./tree_textual_memory.md) --- # Performance Tuning (/open_source/best_practice/performance_tuning) ## Embedding Optimization ```python fast_embedder = { "backend": "ollama", "config": { "model_name_or_path": "nomic-embed-text:latest" } } slow_embedder = { "backend": "sentence_transformer", "config": { "model_name_or_path": "nomic-ai/nomic-embed-text-v1.5" } } ``` ## Inference Speed ```python generation_config = { "max_new_tokens": 256, # Limit response length "temperature": 0.7, "do_sample": True } ``` ## System Resource Optimization ### Memory Capacity Limits ```python scheduler_config = { "memory_capacities": { "working_memory_capacity": 20, # Active context "user_memory_capacity": 500, # User storage "long_term_memory_capacity": 2000, # Domain knowledge "transformed_act_memory_capacity": 50 # KV cache items } } ``` ### Batch Processing ```python def batch_memory_operations(operations, batch_size=10): for i in range(0, len(operations), batch_size): batch = operations[i:i + batch_size] yield batch # Process in batches ``` --- # Network Workarounds (/open_source/best_practice/network_workarounds) ## **Downloading Huggingface Models** ### Mirror Site (HF-Mirror) To download Huggingface models using the mirror site, you can follow these steps: #### Install Dependencies Install the necessary dependencies by running: ```bash pip install -U huggingface_hub ``` #### Set Environment Variable Set the environment variable `HF_ENDPOINT` to `https://hf-mirror.com`. #### Download Models or Datasets Use huggingface-cli to download models or datasets. For example: - To download a model: ```bash huggingface-cli download --resume-download gpt2 --local-dir gpt2 ``` - To download a dataset: ``` huggingface-cli download --repo-type dataset --resume-download wikitext --local-dir wikitext ``` For more detailed instructions and additional methods, please refer to [this link](https://hf-mirror.com/). ### Alternative Sources You may still encounter limitations accessing some models in your regions. In such cases, you can use modelscope: #### Install ModelScope Install the necessary dependencies by running: ```bash pip install modelscope[framework] ``` #### Download Models or Datasets Use modelscope to download models or datasets. For example: - To download a model: ```bash modelscope download --model 'Qwen/Qwen2-7b' --local_dir 'path/to/dir' ``` - To download a dataset: ```bash modelscope download --dataset 'Tongyi-DataEngine/SA1B-Dense-Caption' --local_dir './local_dir' ``` For more detailed instructions and additional methods, please refer to the [official docs](https://modelscope.cn/docs/home). ## **Using Poetry** ### Network Errors during Installing To address network errors when using "poetry install" in your regions, you can follow these steps: #### Update Configuration Update the `pyproject.toml` file to use a mirror source by adding the following configuration: ```toml [[tool.poetry.source]] name = "mirrors" url = "https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple/" priority = "primary" ``` #### Reconfigure Poetry Run the command `poetry lock` in the terminal to reconfigure Poetry with the new mirror source. **Tips:** Be aware that `poetry lock` will modify both Pyproject.toml and poetry.lock files. To avoid committing redundant changes: - Option 1: After successful `poetry install`, revert to the git HEAD node using `git reset --hard HEAD`. - Option 2: When executing `git add`, exclude the Pyproject.toml and poetry.lock files by specifying other files. For future dependency management tasks like adding or removing packages, you can use the `poetry add` command: ```bash poetry add ``` Refer to the [Poetry CLI documentation](https://python-poetry.org/docs/cli/) for more commands and details. --- # Common Errors and Solutions (/open_source/best_practice/common_errors_solutions) ## Configuration Errors ### Missing Required Fields ```python # ✅ Always include required fields llm_config = { "backend": "openai", "config": { "api_key": "your-api-key", "model_name_or_path": "gpt-4" } } ``` ### Backend Mismatch ```python # ✅ KVCache requires HuggingFace backend kv_config = { "backend": "kv_cache", "config": { "extractor_llm": { "backend": "huggingface", "config": { "model_name_or_path": "Qwen/Qwen3-1.7B" } } } } ``` ## Service Connection Issues ```bash # Start required services as needed docker run -p 6333:6333 qdrant/qdrant ollama serve ``` ### Memory Loading Failures ```python try: mem_cube.load("memory_dir") except Exception: mem_cube = GeneralMemCube(config) mem_cube.dump("memory_dir") ``` ### GPU Out Of Memory ```python import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Use smaller models if GPU memory is limited: Qwen/Qwen3-0.6B ``` ## User Management ```python # Register user first mos.register_mem_cube(cube_path="path", user_id="user_id", cube_id="cube_id") # Check if user exists try: user_id = mos.create_user(user_name="john", role=UserRole.USER) except ValueError: user = mos.user_manager.get_user_by_name("john") ``` --- # MemOS MCP Integration Guide (/open_source/best_practice/mcp_for_cozespace_and_tools) This guide helps you configure MemOS MCP service in platforms like Coze Space, enabling seamless integration between your agent and the memory system. ## Choose an MCP Deployment Method MemOS provides two MCP deployment options. Choose based on your needs: ### Use MemOS Cloud Service (Recommended) If you want to connect quickly without deploying your own server, MemOS official cloud service is recommended. **Advantages:** - ✅ Out of the box, no deployment required - ✅ High availability guarantees - ✅ Automatic scaling and maintenance - ✅ Supports multiple clients (Claude, Cursor, Cline, etc.) **How to configure:** Visit [MemOS Cloud MCP Configuration Guide](https://memos-docs.openmem.net/cn/mcp_agent/mcp/guide) for detailed instructions. Main steps: 1. Register and get an API Key in [MemOS API Console](https://memos-dashboard.openmem.net/cn/apikeys/) 2. Configure `@memtensor/memos-api-mcp` service in your MCP client 3. Set environment variables (`MEMOS_API_KEY`, `MEMOS_USER_ID`, `MEMOS_CHANNEL`) ### Deploy MCP Service Yourself If you need a private deployment or custom requirements, you can deploy MCP service on your own server. **Advantages:** - ✅ Fully private data - ✅ Configurable and customizable - ✅ Full control of the service - ✅ Suitable for internal enterprise use **Prerequisites:** - Python 3.9+ - Neo4j database (or another supported graph database) - HTTPS domain (required by platforms like Coze) Continue reading for detailed deployment steps. --- ## Self-Hosted MCP Service Configuration The content below applies to users who deploy MCP service themselves. ## Architecture Self-hosted MCP service uses the following architecture: ``` Client (Coze/Claude, etc.) ↓ [HTTPS] MCP Server (port 8002) ↓ [HTTP calls] Server API (port 8001) ↓ MemOS Core Service ``` **Component overview:** - **Server API**: provides REST APIs (`/product/*`) to handle memory CRUD - **MCP Server**: exposes the MCP protocol over HTTP and calls Server API to complete operations - **HTTPS reverse proxy**: platforms like Coze require HTTPS secure connections ### Step 1: Start Server API Server API is the backend for MCP service and provides actual memory management capabilities. ```bash cd /path/to/MemOS python src/memos/api/server_api.py --port 8001 ``` Verify whether Server API is running: ```bash curl http://localhost:8001/docs ``` If it returns the API documentation page, startup succeeded. > **Note**: **Configuration file** > Server API loads configuration automatically. Ensure Neo4j and other dependencies are configured correctly. You can refer to `examples/data/config/tree_config_shared_database.json` as an example configuration. ### Step 2: Start MCP HTTP Service Start MCP service in another terminal: ```bash cd /path/to/MemOS python examples/mem_mcp/simple_fastmcp_serve.py --transport http --port 8002 ``` After MCP service starts, it will show information similar to: ``` ╭──────────────────────────────────────────────────╮ │ MemOS MCP via Server API │ │ Transport: HTTP │ │ Server URL: http://localhost:8002/mcp │ ╰──────────────────────────────────────────────────╯ ``` **Environment variable configuration (optional):** You can configure the Server API address via a `.env` file or environment variables: ```bash export MEMOS_API_BASE_URL="http://localhost:8001/product" ``` > **Note**: **Tool list** > MCP service provides the following tools: > - `add_memory`: add memory > - `search_memories`: search memories > - `chat`: chat with the memory system > > For the full tool list, see `examples/mem_mcp/simple_fastmcp_serve.py` ### Step 3: Configure an HTTPS Reverse Proxy Platforms like Coze require HTTPS. You need to set up an HTTPS reverse proxy (e.g., Nginx) to forward traffic to MCP service. **Nginx configuration example:** ```nginx server { listen 443 ssl http2; server_name your-domain.com; ssl_certificate /path/to/cert.pem; ssl_certificate_key /path/to/key.pem; location /mcp { proxy_pass http://localhost:8002/mcp; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # SSE support proxy_buffering off; proxy_cache off; } } ``` > **Warning**: **HTTPS certificate** > Make sure you use a valid SSL certificate. Self-signed certificates may not be accepted by platforms like Coze. You can use Let's Encrypt to obtain a free certificate. ### Step 4: Test MCP Service Use the client test script to verify the service: ```bash cd /path/to/MemOS python examples/mem_mcp/simple_fastmcp_client.py ``` Example success output: ``` Working FastMCP Client ======================================== Connected to MCP server 1. Adding memory... Result: Memory added successfully 2. Searching memories... Result: [search result] 3. Chatting... Result: [AI response] ✓ All tests completed! ``` ## Configure MCP in Coze Space After the service is deployed, configure the MCP connection in Coze Space. ### Step 1: Open Coze Space and go to the tool configuration page ![Coze Space configuration page](https://statics.memtensor.com.cn/memos/coze_space_1.png) ### Step 2: Add a custom MCP tool Add a custom tool on the tool configuration page: ![Add a custom tool](https://statics.memtensor.com.cn/memos/coze_space_2.png) ### Step 3: Configure the MCP endpoint URL Configure the MCP endpoint URL with your HTTPS address: ``` https://your-domain.com/mcp ``` Available MCP tools: - **add_memory**: add a new memory - **search_memories**: search existing memories - **chat**: memory-based chat > **Note**: **Test connection** > After configuration, test whether MCP connection works in Coze. Ensure each tool can be called successfully. --- ## Use REST API Directly (Advanced) For scenarios that require more flexible integration, you can call Server API’s REST endpoints directly. ### Step 1: Start Server API ```bash cd /path/to/MemOS python src/memos/api/server_api.py --port 8001 ``` **Port notes** - Server API runs on port 8001 by default - Provides `/product/*` REST API endpoints ### Step 2: Configure custom tools in Coze IDE 1. In Coze, choose the "IDE plugin" creation method 2. Configure requests to your deployed Server API service ![Coze IDE plugin configuration](https://statics.memtensor.com.cn/memos/coze_tools_1.png) ### Step 3: Implement the add_memory tool ![Configure add_memory operation](https://statics.memtensor.com.cn/memos/coze_tools_2.png) **Code example:** configure and publish the `add_memory` operation in the IDE: ![Configure add_memory operation](https://statics.memtensor.com.cn/memos/coze_tools_2.png) Full code is as follows: ```python import json import requests from runtime import Args from typings.add_memory.add_memory import Input, Output def handler(args: Args[Input])->Output: memory_content = args.input.memory_content user_id = args.input.user_id cube_id = args.input.cube_id # Call Server API add endpoint url = "https://your-domain.com:8001/product/add" payload = json.dumps({ "user_id": user_id, "messages": memory_content, # Supports string or message array "writable_cube_ids": [cube_id] if cube_id else None }) headers = { 'Content-Type': 'application/json' } response = requests.post(url, headers=headers, data=payload, timeout=30) response.raise_for_status() return response.json() ``` **Other tool implementations:** Similarly, implement the search and chat tools: ```python # Search tool def search_handler(args: Args[Input]) -> Output: url = "https://your-domain.com:8001/product/search" payload = json.dumps{ "user_id": args.input.user_id, "query": args.input.query, }) headers = { 'Content-Type': 'application/json' } response = requests.post(url, headers=headers, data=payload, timeout=30) response.raise_for_status() return response.json() # Chat tool def chat_handler(args: Args[Input]) -> Output: url = "https://your-domain.com:8001/product/chat/complete" payload = json.dumps({ "user_id": args.input.user_id, "query": args.input.query }) response = requests.post(url, json=payload, timeout=30) return response.json() ``` ### Step 4: Publish and test tools After publishing, you can view the plugin under "My Resources": ![Published plugin resource](https://statics.memtensor.com.cn/memos/coze_tools_3.png) ### Step 5: Integrate into agent workflow Add the plugin into the agent workflow: 1. Create a new agent or edit an existing agent 2. Add the published MemOS plugin to the tool list 3. Configure the workflow to call memory tools 4. Test memory write and retrieval functions --- ## FAQ ### Q1: MCP service cannot connect to Server API **Solution:** - Check whether Server API is running: `curl http://localhost:8001/docs` - Check whether environment variable `MEMOS_API_BASE_URL` is configured correctly - Check MCP service logs and confirm the call address ### Q2: Coze cannot connect to MCP service **Solution:** - Make sure you use HTTPS - Check whether the SSL certificate is valid - Test reverse proxy configuration: `curl https://your-domain.com/mcp` - Check firewall and security group settings ### Q3: Neo4j connection failed **Solution:** - Ensure Neo4j service is running - Check connection info in the configuration file (uri, user, password) - Refer to `examples/data/config/tree_config_shared_database.json` as an example configuration ### Q4: How to see complete API examples? **Reference files:** - MCP server: `examples/mem_mcp/simple_fastmcp_serve.py` - MCP client: `examples/mem_mcp/simple_fastmcp_client.py` - API tests: `examples/api/server_router_api.py` --- ## Summary With this guide, you can: - ✅ Choose a suitable MCP deployment option (cloud or self-hosted) - ✅ Complete the full MCP service deployment process - ✅ Integrate MemOS memory features into platforms like Coze - ✅ Integrate directly via REST API No matter which option you choose, MemOS can provide your agent with powerful memory managementders=headers, data=payload) return json.loads(response.text) > **Note**: **API parameter notes** > - Use the standard Server API parameter format > - `messages`: replaces the previous `memory_content`, supports string or message array > - `writable_cube_ids`: replaces the previous `mem_cube_id`, supports multiple cubes > - Server API runs on port 8001, and the path is `/product/add` > - Ensure it matches MemOS Server API interface. You can refer to the example in `examples/api/server_router_api.py` > **IDE configuration** In the IDE, you can customize tool parameters, return value formats, etc., ensuring consistency with MemOS API. Use this method to implement the search endpoint and user registration endpoint, then click Publish. ### Publish and Use the Plugin After publishing, you can view the plugin under "My Resources" and integrate it into the agent workflow as a plugin: ![Published plugin resource](https://statics.memtensor.com.cn/memos/coze_tools_3.png) ### Build an Agent and Test After building the simplest agent, you can test memory operations: 1. Create a new agent 2. Add the published memory plugin 3. Configure the workflow 4. Test memory write and retrieval functions With the above configuration, you can successfully integrate MemOS memory features in Coze Space and provide powerful memory capabilities for your agent. --- # Contributing to MemOS (/open_source/contribution/overview) - **First-time contributors:** Please start by reading the [Setting Up](./setting_up.md) guide to prepare your development environment. - **Ready to code?** The [Development Workflow](./development_workflow.md) guide will walk you through our process for submitting changes. - **Writing good commit messages:** See our [Commit Guidelines](./commit_guidelines.md). - **Contributing to documentation:** If you're helping us improve our docs, check out the [Writing Documentation](./writing_docs.md) guide. - **Adding or improving tests:** The [Writing Tests](./writing_tests.md) guide is for you. Your contributions make this project better! ✨ If you have any questions, feel free to open an issue or join the discussion or scan the QR codes below to connect with us on Discord or WeChat. QR Code --- # Setting Up Your Development Environment (/open_source/contribution/setting_up) #### Fork & Clone the Repository Set up the repository on your local machine: - Fork the repository on GitHub - Clone your fork to your local machine: ```bash git clone https://github.com/YOUR-USERNAME/MemOS.git cd MemOS ``` - Add the upstream repository as a remote: ```bash git remote add upstream https://github.com/MemTensor/MemOS.git ``` #### Prepare Development Dependencies Ensure the following are installed locally: - Git - Python 3.9+ - Make Verify Python: ```bash python3 --version ``` #### Install Poetry MemOS uses Poetry for dependency management. We recommend using the official installer: ```bash curl -sSL https://install.python-poetry.org | python3 - ``` Verify the installation: ```bash poetry --version ``` If you see `poetry: command not found`, please add the Poetry executable directory to your PATH as prompted by the installer, then restart your terminal and verify again. For more installation options, see the [official installation guide](https://python-poetry.org/docs/#installing-with-the-official-installer). #### Install Dependencies and Set Up Pre-commit Hooks Install all project dependencies and development tools in the repository root: ```bash make install ``` Tip: - If you switch branches or dependencies change, you may need to **re-run `make install`** to keep the environment consistent. ### Understanding Memory Modules and Dependency Selection Before setting up the environment, we need to understand MemOS's memory module classification and their corresponding database dependencies. This will determine which components you need to install. #### Memory Types The MemOS memory system is mainly divided into two categories (identifiers for `backend` config are in parentheses): - **Textual Memory**: Fact-based memory, **you must choose one**. - `tree` (`tree_text`): Tree memory (recommended), highest structure. - `general` (`general_text`): General memory, based on vector retrieval. - `naive` (`naive_text`): Naive memory, no special dependencies (for testing only). - **Preference Memory**: User preferences, **optional**. - `pref`: Used for storing and retrieving user preferences. #### Database Dependency Matrix Different memory types require different database support: | Memory Type | Component Dependency | Note | | :--- | :--- | :--- | | **Tree** | **Graph Database** | Required. Supports Neo4j Desktop, Neo4j Community, PolarDB | | **General** | **Vector Database** | Required. Recommended to use Qdrant (or compatible Vector DB) | | **Naive** | None | No database installation required | | **Pref** | **Milvus** | If Preference Memory is enabled, Milvus must be installed | #### About Tree Memory and Graph Database Selection If you choose the most powerful `tree` memory (which is what most developers choose), you need to prepare a graph database. Currently, there are three options: - **Neo4j Desktop** (Recommended for PC): Install directly on PC, comes with full GUI and features, easiest solution. - **PolarDB**: Graph database service provided by Alibaba Cloud (paid). - **Neo4j Community**: Open source and free, suitable for server or Linux environments. **Special Note**: - If you use **Neo4j Desktop**, it usually handles graph data independently. - If you use **Neo4j Community**, **it does not have native vector retrieval capabilities**. Therefore, you need to pair it with an additional vector database (Qdrant) to supplement vector retrieval capabilities. #### Configuration Scheme for This Tutorial To help developers get started quickly, this tutorial will use the following configuration: - **Memory Type**: `tree` (`tree_text`) - **Graph Database**: **Neo4j Community** (requires you to download installer or use Docker) - **Vector Database**: **Qdrant (Local Mode)** Since Neo4j Community lacks vector capabilities, we will introduce Qdrant. To avoid running an extra Qdrant service (Docker), we will configure Qdrant to run in **Local Embedded Mode** (reading/writing local files directly), so you don't need to install an additional Qdrant server. If no external configuration is provided, the system will automatically create a local database. #### Create Configuration File For .env content, please refer to [env config](/open_source/getting_started/installation#2.-.env-content) under docker installation for quick configuration. For detailed .env configuration, please see [env configuration](/open_source/getting_started/rest_api_server/#running-locally). > **Note**: **Note** > The .env configuration file needs to be placed in the MemOS project root directory. ```bash cd MemOS touch .env ``` #### Configure Dockerfile > **Note**: **Note** > The Dockerfile is located in the docker directory. ```bash # Enter the docker directory cd docker ``` Includes fast mode and full mode, distinguishing between slim packages (arm and x86) and full packages (arm and x86). ```bash ● Slim Package: Simplifies heavy dependencies like nvidia, making the image lightweight for faster local deployment. - url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-base:v1.0 - url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-base-arm:v1.0 ● Full Package: Packages all MemOS dependencies into the image for full functionality. Can be built and started directly by configuring the Dockerfile. - url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-full-base:v1.0.0 - url: registry.cn-shanghai.aliyuncs.com/memtensor/memos-full-base-arm:v1.0.0 ``` ```bash # Current example uses slim package url FROM registry.cn-shanghai.aliyuncs.com/memtensor/memos-base-arm:v1.0 WORKDIR /app ENV HF_ENDPOINT=https://hf-mirror.com ENV PYTHONPATH=/app/src COPY src/ ./src/ EXPOSE 8000 CMD ["uvicorn", "memos.api.server_api:app", "--host", "0.0.0.0", "--port", "8000", "--reload"] ``` #### Start Docker Client ```bash # If docker is not installed, please install the corresponding version from: https://www.docker.com/ # After installation, start docker via client or command line # Start docker via command line sudo systemctl start docker # After installation, check docker status docker ps # Check docker images (optional) docker images ``` #### Build and Start Service > **Note**: **Note** > Build commands are also executed in the docker directory. ```bash # In the docker directory docker compose up neo4j ``` #### Open New Terminal to Start Server ```bash cd MemOS make serve ``` --- # Development Workflow (/open_source/contribution/development_workflow) Follow these steps to contribute to the project. #### Sync with Upstream If you've previously forked the repository, sync with the upstream changes: ```bash git checkout dev # switch to dev branch git fetch upstream # fetch latest changes from upstream git pull upstream dev # merge changes into your local dev branch git push origin dev # push changes to your fork ``` #### Create a Feature Branch Create a new branch for your feature or fix: ```bash git checkout -b feat/descriptive-name ``` #### Make Your Changes Implement your feature, fix, or improvement in the appropriate files. - For example, you might add a function in `src/memos/hello_world.py` and create corresponding tests in `tests/test_hello_world.py`. #### Test Your Changes Run the test suite to ensure your changes work correctly: ```bash make test ``` #### Commit Your Changes Before committing or creating a PR, rebase to the latest upstream/dev: ```bash git fetch upstream git rebase upstream/dev # Replay your feat branch on top of the latest dev ``` Follow the project's commit guidelines (see [Commit Guidelines](./commit_guidelines.md)) when committing your changes. #### Push to Your Fork Push your feature branch to your forked repository: ```bash git push origin feat/descriptive-name ``` #### Create a Pull Request Submit your changes for review: - **Important:** Please create your pull request against - ✅ the `dev` branch of the upstream repository, - ❎ not the `main` branch of the upstream repository. - Go to the original repository on GitHub - Click on "Pull Requests" - Click on "New Pull Request" - Select `dev` as the base branch, and your branch as compare - Fulfill the PR description carefully. --- # Commit Guidelines (/open_source/contribution/commit_guidelines) Please follow the [Conventional Commits](https://www.conventionalcommits.org/) format: - `feat:` for new features - `fix:` for bug fixes - `docs:` for documentation updates - `style:` for formatting changes (no code logic change) - `refactor:` for code refactoring - `test:` for adding or updating tests - `chore:` for other maintenance tasks - `ci:` for CI/CD or workflow related changes **Example:** `feat: add user authentication` --- # Documentation Writing Guidelines (/open_source/contribution/writing_docs) ## Creating New Documents ### Create Markdown File Create a new `.md` file in the `content/` directory or its subdirectories. Choose an appropriate location based on your content type. ### Add Frontmatter Add YAML frontmatter at the top of your file to provide metadata. The frontmatter supports the following fields: **Required Fields:** - `title` (string) - The document title that appears in navigation and page headers **Optional Fields:** - `desc` (string) - Brief description of the document content - `banner` (string) - URL to a banner image displayed at the top of the page - `links` (array) - Array of related links with labels, URLs, and icons ![Frontmatter Example](https://statics.memtensor.com.cn/memos/frontmatter.png) **Complete Frontmatter Example:** ```yaml --- title: MemOS Documentation desc: Welcome to the official documentation for MemOS – a Python package designed to empower large language models (LLMs) with advanced, modular memory capabilities. banner: https://statics.memtensor.com.cn/memos/memos-banner.gif links: - label: 'PyPI' to: https://pypi.org/project/MemoryOS/ target: _blank avatar: src: https://statics.memtensor.com.cn/icon/pypi.svg alt: PyPI logo - label: 'Open Source' to: https://github.com/MemTensor/MemOS target: _blank icon: i-simple-icons-github --- ``` ### Write Content Use Markdown syntax and MDC components to write your documentation content. Take advantage of the available components to create engaging and well-structured content. ### Update Navigation Add the new document to the `nav` section in `content/settings.yml` to make it accessible through the site navigation. ### Merge to Main Branch Once changes are merged into the `main` branch, the documentation will be automatically updated and deployed. ## Component Examples This project uses Nuxt Content's MDC (Markdown Components) syntax, which supports using Vue components within Markdown. These components help create engaging, well-structured documentation with consistent styling and improved user experience. ### Image References When adding images to your documentation, you can use several methods to reference them: #### Local Assets with Base64Image Component For images stored in the `public/assets` directory, use the `Base64Image` component. This component provides better performance by embedding the image directly in the page: ```mdc :Base64Image{src="/assets/memos-architecture.png" alt="MemOS Architecture"} ``` #### Remote Images with Markdown Syntax For remote images (hosted on external servers), use standard Markdown image syntax: ```markdown ![MemOS Architecture](https://statics.memtensor.com.cn/memos/memos-architecture.png) ``` ### Steps Use `steps` to create step-by-step guides from document headings. The `steps` component automatically numbers headings, creating a numbered guide for processes and tutorials. --- class: "[&>div]:*:w-full" --- #### Fork & Clone the Repository Set up the repository on your local machine: - Fork the repository on GitHub - Clone your fork to your local machine: ```bash git clone https://github.com/YOUR-USERNAME/MemOS.git cd MemOS ``` - Add the upstream repository as a remote: ```bash git remote add upstream https://github.com/MemTensor/MemOS.git ``` #### Prepare Development Dependencies Ensure the following are installed locally: - Git - Python 3.9+ - Make Verify Python: ```bash python3 --version ``` #### Install Poetry MemOS uses Poetry for dependency management. We recommend using the official installer: ```bash curl -sSL https://install.python-poetry.org | python3 - ``` Verify the installation: ```bash poetry --version ``` If you see `poetry: command not found`, please add the Poetry executable directory to your PATH as prompted by the installer, then restart your terminal and verify again. For more installation options, see the [official installation guide](https://python-poetry.org/docs/#installing-with-the-official-installer). #### Install Dependencies and Set Up Pre-commit Hooks Install all project dependencies and development tools in the repository root: ```bash make install ``` Tip: - If you switch branches or dependencies change, you may need to **re-run `make install`** to keep the environment consistent. ```python from memos.configs.mem_os import MOSConfig # init MOSConfig mos_config = MOSConfig.from_json_file("examples/data/config/simple_memos_config.json") ``` #### Create a User & Register a MemCube ```python import uuid from memos.mem_os.main import MOS mos = MOS(mos_config) # Generate a unique user ID user_id = str(uuid.uuid4()) # Create the user mos.create_user(user_id=user_id) ``` ::: #code ````mdc #### Install MemOS ```bash pip install MemoryOS ``` #### Create a Minimal Config For this Quick Start, we'll use the built-in GeneralTextMemory. ```python from memos.configs.mem_os import MOSConfig # init MOSConfig mos_config = MOSConfig.from_json_file("examples/data/config/simple_memos_config.json") ``` #### Create a User & Register a MemCube ```python import uuid from memos.mem_os.main import MOS mos = MOS(mos_config) # Generate a unique user ID user_id = str(uuid.uuid4()) # Create the user mos.create_user(user_id=user_id) ``` ```` ### Accordion Use `accordion` and `accordion-item` to create collapsible content sections. Accordions are useful for organizing FAQs, expandable details, or grouped information in an interactive way. --- class: "[&>div]:*:my-0" --- :accordion-item --- icon: i-lucide-circle-help label: Is MemOS compatible with LLMs accessed via API? --- Yes. MemOS is designed to be as compatible as possible with various types of models. However, it's important to note that if you're using API-based models, activation and parametric memories cannot be utilized. :::: : --- icon: i-lucide-circle-help label: How does MemOS improve the effectiveness of large language model applications? --- MemOS enhances large language model applications by providing structured, persistent memory with intelligent scheduling, long-term knowledge retention, and KV cache for fast inference. It supports fine-grained access control and user isolation, ensuring memory security in multi-user environments. Its modular architecture allows seamless integration of new memory types, LLMs, and storage backends, making it adaptable to a wide range of intelligent applications.: : MemOS open-source is free.: ::: #code ```mdc Yes. MemOS is designed to be as compatible as possible with various types of models. However, it's important to note that if you're using API-based models, activation and parametric memories cannot be utilized. MemOS enhances large language model applications by providing structured, persistent memory with intelligent scheduling, long-term knowledge retention, and KV cache for fast inference. It supports fine-grained access control and user isolation, ensuring memory security in multi-user environments. Its modular architecture allows seamless integration of new memory types, LLMs, and storage backends, making it adaptable to a wide range of intelligent applications. MemOS open-source is free. ``` ### Badge Use badge to display status indicators or labels. Badges are great for highlighting version numbers, statuses, or categories within your content. --- label: Preview --- **v1.0.0** #code ```mdc **v1.0.0** ``` ### Callout Use callout to emphasize important contextual information. Callouts draw attention to notes, tips, warnings, or cautions, making key information stand out. Customize with `icon` and `color` props or use `note`, `tip`, `warning`, `caution` shortcuts for pre-defined semantic styles. --- class: "[&>div]:*:my-0 [&>div]:*:w-full" --- This is a `callout` with full **markdown** support. #code ```mdc This is a `callout` with full **markdown** support. ``` : > **Note**: Basic note content > : : > **Note**: Note with link - click to navigate to quick start guide > : : > **Note**: Note with custom icon - learn more about MemCube > : : > **Tip**: Here's a helpful suggestion. > : : > **Warning**: Be careful with this action as it might have unexpected results. > : : > **Caution**: This action cannot be undone. > : #code ```mdc > **Note**: Basic note content > **Note**: Note with link - click to navigate to quick start guide > **Note**: Note with custom icon - learn more about MemCube > **Tip**: Here's a helpful suggestion. > **Warning**: Be careful with this action as it might have unexpected results. > **Caution**: This action cannot be undone. ``` ### Card Use `card` to highlight content blocks. Cards are useful for showcasing features, resources, or related information in visually distinct and interactive containers. Customize with `title`, `icon`, and `color` props. Cards can also act as links using `` properties for navigation. --- class: "[&>div]:*:my-0 [&>div]:*:w-full" --- - [Open Source](https://github.com/MemTensor/MemOS): Use our open-source version #code ```mdc --- title: Open Source icon: i-simple-icons-github to: https://github.com/MemTensor/MemOS target: _blank --- Use our open-source version ``` ### CardGroup Use `card-group` to arrange cards in a grid layout. `card-group` is ideal for displaying collections of cards in a structured, visually appealing, and responsive grid. :- [Minimal Pipeline](/open_source/getting_started/examples#example-1-minimal-pipeline): The smallest working pipeline — add, search, update and dump plaintext memories.: :- [TreeTextMemory Only](/open_source/getting_started/examples#example-2-treetextmemory-only): Use Neo4j-backed hierarchical memory to build structured, multi-hop knowledge graphs.: :- [KVCacheMemory Only](/open_source/getting_started/examples#example-3-kvcachememory-only): Speed up sessions with short-term KV cache for fast context injection.: :- [Hybrid TreeText + KVCache](/open_source/getting_started/examples#example-4-hybrid): Combine explainable graph memory with fast KV caching in a single MemCube.: #code ```mdc - [Minimal Pipeline](/open_source/getting_started/examples#example-1-minimal-pipeline): The smallest working pipeline — add, search, update and dump plaintext memories. - [TreeTextMemory Only](/open_source/getting_started/examples#example-2-treetextmemory-only): Use Neo4j-backed hierarchical memory to build structured, multi-hop knowledge graphs. - [KVCacheMemory Only](/open_source/getting_started/examples#example-3-kvcachememory-only): Speed up sessions with short-term KV cache for fast context injection. - [Hybrid TreeText + KVCache](/open_source/getting_started/examples#example-4-hybrid): Combine explainable graph memory with fast KV caching in a single MemCube. ``` ## Navigation Icons When adding navigation entries in `content/settings.yml`, you can include icons using the syntax `(ri:icon-name)`: ```yaml - "(ri:home-line) Home": overview.md - "(ri:team-line) User Management": modules/mos/users.md - "(ri:flask-line) Writing Tests": contribution/writing_tests.md ``` Available icons can be found at: [https://icones.js.org/](https://icones.js.org/) ## Local Preview To preview the documentation locally, run the following command from the project root: ```bash ## Make sure to install the dependencies: pnpm install ``` ```bash pnpm dev ``` This command will start a local web server, usually accessible at `http://127.0.0.1:3000`. ## Learn More ### Nuxt Content and Typography This project uses Nuxt Content and supports rich Typography components and styles. To learn more about available components and customization options, please refer to: - [Nuxt UI Typography Documentation](https://ui.nuxt.com/getting-started/typography) ## Best Practices > **Note**: **Documentation Writing Tips** > > 1. **Keep document structure clear**: Use appropriate heading levels to organize content logically > 2. **Use components wisely**: Use note, card, and other components to improve readability and engagement > 3. **Code examples**: Provide clear code examples for technical documentation with proper syntax highlighting > 4. **Icon usage**: Use appropriate icons in navigation to enhance user experience and visual hierarchy Remember to test your documentation locally before submitting. Use `npm run dev` to preview your changes and ensure all components render correctly. --- # How to Write Unit Tests (/open_source/contribution/writing_tests) ## Writing a Test 1. Create a new Python file in the `tests/` directory. The filename should start with `test_`. 2. Inside the file, create functions whose names start with `test_`. 3. Use the `assert` statement to check for expected outcomes. Here is a basic example: ```python # tests/test_example.py def test_addition(): assert 1 + 1 == 2 ``` ## Running Tests To run all the tests, execute the following command from the root of the project: ```bash make test ``` This will discover and run all the tests in the `tests/` directory. ## Advanced Techniques Pytest has many advanced features, such as fixtures and mocking. ### Fixtures Fixtures are functions that can provide data or set up state for your tests. They are defined using the `@pytest.fixture` decorator. ### Mocking Mocking is used to replace parts of your system with mock objects. This is useful for isolating the code you are testing. The `unittest.mock` library is commonly used for this, often with the `patch` function. For an example of mocking, see `tests/test_hello_world.py`. --- # Usage Examples (/self_developed_model/extraction_usage_example) MemOS exposes a memory extraction API powered by the in-house **memos-extractor-0.6b** model. Pass conversation turns in and get fact and preference memories in one call. Request/response fields and OpenAPI: [Extract Memory](/api_docs/core/extract_memory). Auth, base URL, and calling conventions match [MemOS Cloud Quick Start](/memos_cloud/getting_started/quick_start). ## When to use memory extraction The extraction API fits when you need: - **Lightweight extraction**: Structured memories from dialogue without running the full add/message pipeline. - **Low latency at high QPS**: A 0.6B in-house model tuned for fast, frequent calls. - **Flexible control**: Request fact memories, preferences, or both via `extraction_types`. ## How it works ![Memory extraction flow](https://cdn.memtensor.com.cn/img/1775554708977_az90s6_compressed.png) The end-to-end flow is: 1. **Data input** You send raw dialogue as `messages` with `role` and `content` on each item. 2. **Format & clean** Content is normalized to a standard shape and the dialogue language is detected. 3. **Task & language selection** Using `extraction_types` and the detected language, the pipeline selects a branch: - **Memory Reader**: fact memories - **Explicit Preference**: stated preferences - **Implicit Preference**: inferred preferences 4. **Prompt build** The prompt template for the chosen branch is assembled into the final inference request. 5. **API call** The request is sent to **memos-extractor-0.6b** in the agreed format. 6. **Final results** The model returns structured fact and/or preference lists, depending on what you asked to extract. ## Get started ```python [Facts + preferences] import os, json, requests os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "messages": [ {"role": "system", "content": "Extract key memories from the dialogue."}, {"role": "user", "content": "I'm Alex, 28, backend dev in Hangzhou, I play badminton."}, {"role": "assistant", "content": "Hi Alex!"}, {"role": "user", "content": "Keep replies short, not too wordy."}, ] } headers = {"Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}"} url = f"{os.environ['MEMOS_BASE_URL']}/extract/memory" res = requests.post(url, headers=headers, data=json.dumps(data)) print(res.json()) ``` ```python [Facts only] import os, json, requests os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "messages": [{"role": "user", "content": "Flying to Beijing next Wed, staying at Ji Hotel Chaoyang."}], "extraction_types": ["memory"], } headers = {"Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}"} url = f"{os.environ['MEMOS_BASE_URL']}/extract/memory" res = requests.post(url, headers=headers, data=json.dumps(data)) print(res.json()) ``` ```python [Preferences only] import os, json, requests os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { "messages": [ {"role": "user", "content": "Use Markdown for docs, code blocks should have syntax highlighting."}, {"role": "assistant", "content": "Got it."}, {"role": "user", "content": "Also go easy on emoji."}, ], "extraction_types": ["preference"], } headers = {"Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}"} url = f"{os.environ['MEMOS_BASE_URL']}/extract/memory" res = requests.post(url, headers=headers, data=json.dumps(data)) print(res.json()) ``` ### Sample responses ```json [Facts + preferences] { "code": 0, "message": "ok", "data": { "success": true, "memory_detail_list": [ { "memory_key": "User profile and job", "memory_value": "User Alex, 28, backend developer in Hangzhou, plays badminton.", "memory_type": "UserMemory", "tags": ["person", "job", "location", "hobby"] } ], "preference_detail_list": [ { "preference": "Wants assistant replies to stay concise and not overly verbose.", "reasoning": "User explicitly asked to keep responses short and not too wordy.", "preference_type": "explicit_preference" } ] } } ``` ```json [Facts only] { "code": 0, "message": "ok", "data": { "success": true, "memory_detail_list": [ { "memory_key": "Travel and stay", "memory_value": "User flies to Beijing next Wednesday on business and plans to stay at Ji Hotel in Chaoyang.", "memory_type": "LongTermMemory", "tags": ["travel", "trip", "hotel"] } ] } } ``` ```json [Preferences only] { "code": 0, "message": "ok", "data": { "success": true, "preference_detail_list": [ { "preference": "Prefers Markdown documents with syntax-highlighted code blocks.", "reasoning": "User clearly asked for Markdown and highlighted code blocks.", "preference_type": "explicit_preference" }, { "preference": "Use fewer emoji.", "reasoning": "User directly asked to avoid too many emoji.", "preference_type": "explicit_preference" } ] } } ``` ## Limits - Request size: up to **8,000 tokens** for input. - **Synchronous only** today: the API returns when extraction finishes. - **Text-only dialogue**: each `messages` item only supports `role` and `content`. **No multimodal** input or multimodal memory extraction through this API. ## Compared to add/message | Dimension | Extract Memory | add/message | | --- | --- | --- | | Core behavior | Extract memories from dialogue; returns results only | Writes the dialogue and extracts/stores memories | | Storage | ❌ Does not write to the MemOS memory store | ✅ Writes into the MemOS memory store | | Model | In-house 0.6B extractor, low latency | MemOS built-in pipeline models | | Async | Not supported | ✅ Supported | | Preferences | ✅ Explicit + implicit | ✅ Supported | | Tool / skill memories | ❌ Not supported | ✅ Supported | | Typical use | Offline analysis / pre-processing / QA | Full conversational memory lifecycle | --- # Usage Examples (/self_developed_model/reranker_usage_example) MemOS provides a memory reranking API based on the **memos-reranker** model series (including 0.6B lightweight and 4B enhanced versions, base model uses qwen-reranker post-training). Developers can directly pass a user query and a list of candidate memories to complete memory relevance reranking in one call. Request/response fields and OpenAPI: [Rerank Memory](/api_docs/core/rerank). Auth, base URL, and calling conventions match [MemOS Cloud Quick Start](/memos_cloud/getting_started/quick_start). ## When to use memory reranking The reranking API fits when you need: - **Memory recall optimization**: After retrieving a large number of candidate memories, accurately filter out the memories most relevant to the current query through reranking to improve the quality of context injection. - **Low latency at high QPS**: Based on a 0.6B small model, suitable for latency-sensitive and frequently invoked business scenarios. - **Flexible sorting control**: Supports custom candidate document lists, can be used with any retrieval system, and does not rely on the MemOS memory store. Do not call the reranking API directly in these cases: - You do not have candidate documents yet and only want to search memory content. Call [Search Memory](/memos_cloud/mem_operations/search_memory) first. - You want to write content into memory. The reranking API does not write to the MemOS memory store; use [Add Message](/memos_cloud/mem_operations/add_message) instead. - You want to rank a full long document. Retrieve, split, or truncate candidate content first, then pass shorter candidate snippets to `documents`. ## How it works The memory reranking API and interaction with the model are shown in the figure below: ![Memory Reranking Process](https://cdn.memtensor.com.cn/img/1776755224177_y8jat9_compressed.png) The end-to-end flow of the reranking model is as follows: 1. **Query Input** Developers pass in the user query (`query`) and the candidate memory document list (`documents`). 2. **Encoding & Representation** After model encoding, relevance scores are output. 3. **Relevance Scoring** The relevance scores are mainly divided into 5 stages as shown in the figure. Developers can set thresholds according to actual scenarios. ## Get started ```python [Basic reranking] import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { # Available models: memos-reranker-0.6b (lightweight) or memos-reranker-4b (enhanced) "model": "memos-reranker-0.6b", "query": "Any liquor recommendations for me?", "documents": [ "User prefers Jiangxiang-flavored baijiu, like Moutai.", "I don't drink alcohol." ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/rerank" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` ```python [Reranking combined with memory retrieval] import os import requests import json # Replace with your MemOS API Key os.environ["MEMOS_API_KEY"] = "YOUR_API_KEY" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" data = { # Available models: memos-reranker-0.6b (lightweight) or memos-reranker-4b (enhanced) "model": "memos-reranker-0.6b", "query": "What are the user's hobbies?", "top_n": 3, "documents": [ "User likes playing badminton.", "User is a backend developer in Hangzhou.", "User prefers concise replies.", "User prefers Jiangxiang-flavored baijiu.", "User is going on a business trip to Beijing next Wednesday." ] } headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } url = f"{os.environ['MEMOS_BASE_URL']}/rerank" res = requests.post(url=url, headers=headers, data=json.dumps(data)) print(f"result: {res.json()}") ``` ## Limits - `query` is required. Use a clear and concise current question. - `documents` is required and must be a non-empty string array. The total token limit across all candidate documents is **8k**. - `top_n` is optional and returns the top N most relevant results. If omitted, all results are returned. - `model` is optional and supports `memos-reranker-0.6b` and `memos-reranker-4b`. - The API currently supports **synchronous mode only**. Results are returned once reranking is complete. ## Common Errors and Troubleshooting | Error Code | Common Cause | How to Fix | | --- | --- | --- | | `40000` | The request body structure is invalid, or a field type is incorrect | Check whether `query` is a string and whether `documents` is a string array | | `40002` / `40003` | A required field is empty, or `documents` is empty | Provide `query` and a non-empty `documents` array | | `40309` | Token usage exceeds the per-time-window limit | Reduce the number and length of candidate documents, lower concurrency, and retry in batches | | `50000` | Internal server error | Retry later. If it persists, contact support | ## Compared to Embedding Retrieval | Dimension | Reranking API | Embedding Retrieval | | --- | --- | --- | | Core behavior | Precision ranking of candidate docs, outputting relevance scores | Semantic similarity recall, fast coarse filtering | | Storage | ❌ Does not write to the MemOS memory store | ❌ Does not write to the MemOS memory store | | Model | 0.6B/4B reranking models | Embedding model | | Precision | ✅ High (cross-encoding, query-doc interaction) | General (dual-tower encoding, independent representation) | | Speed | Slower (requires pair-by-pair computation) | ✅ Fast (vector approximate retrieval) | | Async | Not supported | Not supported | | Typical use | Post-retrieval precision ranking / Memory quality assessment | Fast recall from massive memory store | --- # Cloud Plugin vs Local Plugin (/openclaw/plugin_compare) ## Overview ### Cloud Plugin Stores memories in **MemOS Cloud**. After installing the OpenClaw cloud plugin, a single MemOS Cloud API Key is all you need to get started. It supports multi-agent memory sharing across devices, and benchmarks show up to **72% reduction in Token usage** — ideal for quick setup, cross-device collaboration, and production use. ### Local Plugin The new local plugin is `@memtensor/memos-local-plugin`: a **local-first memory core shared by OpenClaw and Hermes**. It stores data in local SQLite and evolves it into four layers: L1 Trace, L2 Policy, L3 World Model, and callable Skills. With feedback-driven self-evolution, three-tier retrieval, and decision repair, the agent accumulates reusable experience on your own machine. It is best for developers who care most about privacy, local deployment, and observability. --- ## Core Differences | Comparison Dimension | ☁️ MemOS Cloud Plugin | 🖥️ MemOS Local Plugin | | --- | --- | --- | | 💾 **Data Storage & Privacy** | **Cloud storage**: Memory data is stored in MemOS Cloud, making cross-device and multi-instance sharing easy. | **Local storage**: Each agent has its own runtime home. OpenClaw defaults to `~/.openclaw/memos-plugin/`, and Hermes defaults to `~/.hermes/memos-plugin/`. SQLite, skill packages, logs, and config all stay on the local machine. | | 🤖 **Agent Support** | Built for the OpenClaw cloud plugin, backed by MemOS Cloud as the unified memory service. | One shared core supports both OpenClaw and Hermes: OpenClaw integrates through an in-process TypeScript plugin; Hermes integrates through a Python Provider that talks to the Node core over JSON-RPC. | | 🔑 **API & Model Config** | Uses a MemOS Cloud API Key. Memory processing, retrieval, and evolution are handled by the cloud service. | Uses the Memory Viewer's Settings panel for model and team-sharing configuration. Embeddings can use the local provider by default or OpenAI-compatible, Gemini, Cohere, Voyage, and Mistral providers. OpenClaw can inherit the host model; Hermes can configure an LLM provider and API Key in the panel. | | 🔍 **Retrieval Capability** | Cloud-based semantic vector retrieval + graph retrieval, optimized by the service. | Three-tier retrieval: Tier 1 Skill, Tier 2 Trace/Episode, and Tier 3 World Model. It combines vector, FTS5, keyword pattern, and error-signature channels, then uses RRF + MMR for relevance and diversity. | | 🧠 **Memory Evolution** | Automatically handled by cloud services: written memories are structured, deduplicated, and corrected in natural language. | Local Reflect2Evolve pipeline: conversations and tool calls become L1 Traces, cross-task patterns become L2 Policies, policies roll up into L3 World Models, and high-value strategies crystallize into callable Skills with active / retired lifecycle states. | | 🛠️ **Decision Repair** | Mainly relies on cloud retrieval to bring back more relevant memory and reduce repeated context. | Tool failures, negative feedback, and task outcomes enter the feedback channel. Failure patterns can trigger decision repair, injecting corrective context into the next turn so the agent avoids repeating the same mistake. | | 👥 **Multi-Agent & Sharing** | Supports multi-agent scenarios and cross-device sharing, making it suitable for teams. | Isolated by default: OpenClaw and Hermes have separate databases and viewers. Optional Hub sharing can publish locally crystallized Skills and optional trace excerpts inside a LAN / VPN; hub failures degrade back to local-only mode. | | 👀 **Visualization & Observability** | Managed through the MemOS Cloud Dashboard for API Key and cloud memory capabilities. | Includes a local Viewer with Overview, Memories, Tasks, Policies, World Models, Skills, Analytics, Logs, Import, Settings, and Help pages. HTTP + SSE streams expose events, logs, retrieval, skills, and health status in real time. | | 🛠️ **Deployment & Configuration** | **Very simple**: Done in 3 steps (install plugin, get API Key, configure env vars), mainly relying on cloud services. | **Very simple**: Installation and upgrades are both one command. The installer auto-detects installed OpenClaw / Hermes agents, installs `@memtensor/memos-local-plugin`, creates runtime folders, and restarts the target runtime. | --- ## Quick Install ### Cloud Plugin (3 steps) 1. **Install the plugin** ```bash openclaw plugins install @memtensor/memos-cloud-openclaw-plugin@latest ``` 2. **Get and configure API Key** Get your API Key: [MemOS Cloud Dashboard](https://memos-dashboard.openmem.net/apikeys/) ```bash mkdir -p ~/.openclaw && echo "MEMOS_API_KEY=mpg-..." > ~/.openclaw/.env ``` 3. **Restart the gateway** ```bash openclaw gateway restart ``` **Manually update the plugin**: ```bash openclaw plugins update @memtensor/memos-cloud-openclaw-plugin@latest openclaw gateway restart ``` > For more details, see the [OpenClaw Cloud Plugin documentation](/openclaw/guide#quick-start). ### Local Plugin (one command) ```bash # Install the plugin curl -fsSL https://raw.githubusercontent.com/MemTensor/MemOS/main/apps/memos-local-plugin/install.sh | bash ``` Installation and upgrades use the same command. The installer auto-detects whether OpenClaw and/or Hermes are installed. In an interactive terminal, it asks which agent to install for; in non-interactive environments, it installs for the detected agent(s). | Agent | Code directory | Data and config directory | Viewer | | --- | --- | --- | --- | | OpenClaw | `~/.openclaw/plugins/memos-local-plugin/` | `~/.openclaw/memos-plugin/` | `http://127.0.0.1:18799` | | Hermes | `~/.hermes/plugins/memos-local-plugin/` | `~/.hermes/memos-plugin/` | `http://127.0.0.1:18800` | > Upgrading or uninstalling plugin code does not delete existing local data, skill packages, or logs. OpenClaw and Hermes each run their own Viewer; there is no shared port or read-only peer view. > > Configure models, team sharing, and general options from the Memory Viewer for the target agent: OpenClaw defaults to `http://127.0.0.1:18799`, and Hermes defaults to `http://127.0.0.1:18800`. --- # OpenClaw Plugin Changelog (/openclaw/changes) --- releases: - date: '2026-05-21' plugins: - title: 'Cloud Plugin' version: 'v0.1.16' sections: - title: 'Features' items: - '**Recall Hook compatibility with modern OpenClaw hosts**: For host versions ≥ `2026.5.7`, the recall logic automatically migrates to the new `before_prompt_build` hook, while safely maintaining backward compatibility with `before_agent_start` for legacy hosts.' - | **Automatic system-event skipping**: Both the recall and write phases now automatically detect and ignore the following system events to keep memory records clean: - Heartbeat polls (`Read HEARTBEAT.md if it exists`, `[OpenClaw heartbeat poll]`) - System commands (`/new`, `/reset`, `/stop`, `/status`, `/help`, `/dock_*`, `/undock`) - | **End-to-End Tool Memory support**: - Recall phase: Appends a new `` section to the prompt injection block, rendering structurally from `tool_memory_detail_list`. - Write phase: Implements schema conversion for `assistant.tool_calls` and `toolResult` messages into the standard MemOS `/add/message` format (`role: "tool"` with `tool_call_id`). - File data normalization: Preserves URLs within tool results as searchable text; distills base64/data URIs into metadata descriptions instead of inlining large binary payloads. - '**System Note prefix stripping**: Automatically removes the `Note: The previous agent run was aborted by the user.` prefix to improve the accuracy of recall queries and stored memories.' - title: 'Improvements' items: - '**Refactored system-event detection**: Extracted detection logic into reusable helpers (`isHeartbeatPrompt` and `isSystemCommandPrompt`), anchoring the heartbeat regex pattern to the start of the string to prevent false positives on regular user messages.' - '**Streamlined memory filtration**: Consolidated duplicate relevance-threshold evaluations within the `buildMemorySections` function.' - title: 'Tests' items: - 'Added `test/recall-hook-registration.test.mjs` to validate hook registration compatibility across modern and legacy OpenClaw host versions.' - title: 'Docs' items: - 'Updated the "How it Works" section in both `README.md` and `README_ZH.md` to outline the hook adaptation strategies across different OpenClaw versions.' - date: '2026-05-08' plugins: - title: 'Cloud Plugin' version: 'v0.1.15' sections: - title: 'Added' items: - 'Added `activation.onCapabilities: ["hook"]` to the OpenClaw, Moltbot, and ClawDBot plugin manifests.' - 'Added compatibility with the plugin loading mechanism introduced in OpenClaw 5.3 and later. OpenClaw evaluates capability declarations before plugin registration; this declaration ensures the plugin is recognized and loaded as a lifecycle hook plugin, allowing hooks such as `before_agent_start` and `agent_end` to continue registering correctly.' - title: 'Improved' items: - 'Adjusted the automatic `hooks.allowConversationAccess: true` patching flow to run after the gateway is ready, allowing the host config update to trigger a gateway restart and apply the required hook permission.' - date: '2026-04-29' plugins: - title: 'Cloud Plugin' version: 'v0.1.14' summary: 'Added compatibility support for the agent_end permission restriction introduced in OpenClaw 2026.4.23 and later: when the gateway starts, the plugin automatically checks the host config and adds `hooks.allowConversationAccess: true` for this plugin, helping users avoid memory-write hook failures caused by missing permissions.' - date: '2026-04-16' plugins: - title: 'Cloud Plugin' version: 'v0.1.13' summary: 'Fully supports shared knowledge base access and collaborative processing in multi-agent mode.' sections: - title: 'Shared Knowledge Base Support (Multi-Agent Scenario)' items: - '**Multi-Agent Knowledge Base Support**: Fully supported collaborative access and processing of the knowledge base by multiple agents. Allows different agent nodes to share, retrieve, and invoke data from the same knowledge base, improving knowledge acquisition efficiency and context consistency during multi-agent collaboration in complex tasks.' - date: '2026-04-03' plugins: - title: 'Cloud Plugin' version: 'v0.1.12' summary: 'Introduced local visual configuration interface, deeply refactored configuration resolution architecture, and adapted to OpenClaw plugin security review.' sections: - title: 'Visual Configuration UI (Config UI)' items: - '**Local Configuration Service**: Built-in HTTP service provides a plugin management backend, supporting visual configuration viewing and modification in the browser, and real-time synchronization of configuration changes (default URL is `http://127.0.0.1:38463`).' - '**Startup Stability Assurance**: Introduced gateway readiness detection (`waitForGatewayReady`) in the service startup process to ensure stable service status.' - '**UI Experience Optimization**: Added responsive layout and collapsible floating navigation tools, along with new SVG icons.' - title: 'Architecture Optimization & Security Compliance' items: - '**Security Review Adaptation (Subprocess Removed)**: To comply with strict plugin sandbox and security requirements, completely removed `child_process` `spawn`/`exec` calls. The auto-update mechanism was changed from "silent background download and force update" to "version detection only with manual update command prompts in logs", eliminating the risk of background process escape.' - '**Security Review Adaptation (Default Overstep Removed)**: Removed all `default` value settings in the `plugin.json` declaration files to ensure the plugin does not trigger unauthorized or unexpected calls when no explicit configuration is provided.' - '**Centralized Schema Management**: Refactored configuration resolution logic (`getConfigResolution`) to centrally manage priority strategies for environment variables, user configurations, and default values, enhancing code security and robustness.' - date: '2026-03-30' plugins: - title: 'Cloud Plugin' version: 'v0.1.11' summary: 'Strengthened fine-grained control for multi-agent scenarios and enhanced dynamic user identity extraction capabilities.' sections: - title: 'Session & User Identity Management' items: - '**Direct Session User ID Support**: Added `useDirectSessionUserId` configuration. When enabled, it directly parses and extracts the real session user ID from the `sessionKey`, meeting data isolation needs in complex agent scenarios.' - title: 'Multi-Agent Configuration Enhancements' items: - '**Agent Execution Whitelist**: Added the `allowedAgents` configuration item, allowing memory recall and recording to be triggered only for specific agents in multi-agent mode, avoiding redundant consumption caused by global interception.' - '**Differentiated Override Mechanism (Agent Overrides)**: Introduced the `agentOverrides` configuration object, supporting individual overrides for core parameters such as knowledge base IDs (`knowledgebaseIds`), recall limit (`memoryLimitNumber`), and feature switches (`recallEnabled`) for different agents.' - date: '2026-03-24' plugins: - title: 'Cloud Plugin' version: 'v0.1.10' sections: - items: - '**Improved memory ingestion quality:** Added and strengthened cleanup for OpenClaw inbound metadata, timestamp wrappers, and trailing Feishu system hints to reduce noisy writes into memory.' - '**Multi-channel message prefix cleanup improvements**: Expanded and standardized envelope/prefix stripping for channels such as WebChat, WhatsApp, Telegram, Slack, Discord, and Zalo, reducing platform wrapper noise in memory ingestion and recall quality.' - '**More accurate recall display**: Recall timestamps now prioritize update time for better temporal consistency.' - '**More robust Recall Filter**: Default parameters are aligned with runtime fallback values (timeout and retries), improving stability in local model scenarios.' - '**Timeout and resource management optimization**: Fixed timer cleanup behavior to prevent resource leaks on exceptional code paths.' - '**Configuration completeness**: Completed Recall Filter-related fields in the plugin schema for more complete and controllable configuration.' - '**Enhanced observability**: Added before/after filtering count logs to make recall quality and filter effect troubleshooting easier.' - date: '2026-03-13' plugins: - title: 'Cloud Plugin' version: 'v0.1.9' summary: 'Silent upgrade and memory recall optimization. This release includes the following improvements to enhance usability and Token efficiency:' sections: - title: 'Silent Self-Detection and Upgrade' items: - 'Added a plugin version self-check mechanism that periodically checks the latest version from the NPM registry in the background.' - 'When a new version is detected, a silent upgrade is triggered automatically so users can continuously receive the latest capabilities and fixes without manual actions.' - title: 'Support Custom Models for Memory Recall' items: - 'Introduced LLM-based secondary filtering for memory recall.' - 'Added configuration options such as recallFilterModel and recallFilterBaseUrl, allowing an independent model to evaluate relevance.' - 'Effectively removes noisy results and keeps only memory snippets that are truly useful for the current conversation.' - title: 'Lean Prompt Injection (System Prompt Optimization)' items: - 'Refactored memory injection logic by moving static protocols and instructions to appendSystemContext.' - 'prependContext now keeps only dynamically retrieved memory-list data.' - 'Significantly reduces Token usage caused by repetitive prompts and improves model focus on core memory.' - date: '2026-03-09' plugins: - title: 'Cloud Plugin' version: 'v0.1.8' summary: 'Added support for multi-agent mode, enabling agent identification from context for memory isolation, with a compatibility switch for older versions.' - date: '2026-03-05' plugins: - title: 'Cloud Plugin' version: 'v0.1.7' summary: 'Added support for user-defined relativity in the searchMemory API.' - date: '2026-02-26' plugins: - title: 'Cloud Plugin' version: 'Other Historical Versions (Core Capabilities)' summary: 'Supports searchMemory in the before_agent_start event and addMessage in the agent_end event.' --- --- # OpenClaw Cloud Plugin (/openclaw/guide) > **Note**: **Compatibility recommendation** > To avoid plugin loading or runtime errors, we recommend upgrading OpenClaw after MemOS has completed compatibility adaptation for the latest OpenClaw version. OpenClaw's going viral lately. But if you've actually used it for a while, you'll find two issues you can hardly avoid: 1. **Tokens burn way too quickly**:OpenClaw can handle many long-tail tasks, but the cost is that each run consumes a huge number of tokens. When you have it monitoring your screen, running scheduled tasks, or handling complex workflows, the token consumption is painfully fast. > ("u know token is money🫠") 2. **Its memory function is rather poor**:Many claim OpenClaw's memory outperforms ChatGPT. Yet in practice, you'll find it does retain some information—but often not what you need. Crucial preferences may be forgotten, while trivial chatter is remembered in vivid detail. > ("can u please remember something really matter to me???") > **Tip**: **NOT OpenClaw's fault, ALL AI agents suffering.** This tutorial guides you through using the MemOS OpenClaw plugin to figure out these 3 pain issues: - **Significantly reduce token consumption** — intelligently retrieve relevant memories without indiscriminately loading all history - **Make memories genuinely useful** — professional memory categorisation and management, remembering what should be retained and forgetting what should be discarded - **Preserve OpenClaw's core strengths** — cross-device control, proactive interaction, and human-like experience remain intact --- ## Why is OpenClaw now a Token Killer🥷? ### Issues with OpenClaw ```plaintext 1st convo: 500 tokens 2nd convo: 500 + 800 = 1,300 tokens 3rd convo: 1,300 + 600 = 1,900 tokens 10th convo: 10,000+ tokens ``` When you have OpenClaw monitoring your screen, performing executive tasks, and running on a schedule, this figure increases even more rapidly. ### Three critical points in OpenClaw's native memory management OpenClaw's memories reside in local `.md` files, categorised as global memories and daily memories. While this sounds promising, practical use reveals three unavoidable issues: #### 1. Global memories become booming As global memories accumulate, context overload ensues. Moreover, these memories persistently interfere with current conversations. You might simply wish to ask a straightforward question, yet it dredges up every utterance from three months prior. #### 2. Daily memory recall proves difficult Accumulating daily memories invariably makes retrieval cumbersome. To recall yesterday's activities, one must undergo an additional retrieval process. Maintaining cross-session memory becomes nearly impossible. #### 3. Memory relies on the model's proactive logging OpenClaw's memory system relies on the model to log information itself, rather than automatic logging. This means it frequently misses details—you mention something, and it promptly forgets. > I've encountered this several times myself: I'd explicitly emphasised a particular project configuration, yet when restarting the conversation the next day, it had no recollection whatsoever, requiring me to explain it all over again. --- ## OpenClaw vs OpenClaw + MemOS: Memory Solution Comparison ### OpenClaw Native Memory Solution #### Memory Storage Solution **Core Philosophy: File is Truth** — Abandoning opaque vector databases in favor of Markdown files as the core carrier of memory. ![Memory Storage Solution](https://cdn.memtensor.com.cn/img/1772697758585_b155tx_compressed.png) #### Memory Retrieval Solution: Dual-Engine Drive | Engine | Technology | Features | |-----|------|------| | **Vector Search** | Cosine Similarity | Captures semantic associations, excels at "concept matching", e.g., associating "login flow" with "authentication" | | **BM25 Search** (Lexical Matching) | FTS5-based lexical matching | Handles "exact tokens", such as error codes, function names, or specific IDs | **Retrieval Trigger**: Triggered via Prompt, model decides automatically **Weighted Score Fusion**: `Score = (0.7 * VectorScore) + (0.3 * BM25Score)` #### Pain Points of Existing Solutions - **Rudimentary Retrieval Algorithms**: Unstable recall, weak relevance, Agent repeats trial and error, Token accumulates rapidly - **Excessive Context Injection**: Fixed reading of today + yesterday + long-term memory, high proportion of invalid context - **Lack of Structure and Deduplication in Memory**: Tool call long outputs are written directly and re-transmitted repeatedly, costs snowball ### OpenClaw + MemOS Memory Solution ![MemOS-OpenClaw](https://cdn.memtensor.com.cn/img/1772679552943_lsuh81_compressed.png) #### Three Core Effects **Effect 1: Controllable Token Costs 💰** > From "Full Context Stuffing" to "Precise Recall per Task" OpenClaw no longer stuffs today+yesterday+long-term memory every time. Instead, MemOS retrieves the most relevant few memories based on the current task (recall budget/count can be set), significantly reducing the proportion of invalid context and avoiding Token snowballing. **Effect 2: More Stable and Accurate Retrieval 🎯** > Reduce repeated trial and error and re-asking, improve one-shot hit rate MemOS provides stronger memory organization and retrieval capabilities (structured, hierarchical/multi-granular, semantic retrieval + rule filtering, etc.), making OpenClaw's recalled content more relevant and stable, reducing repeated reasoning and confirmation caused by "unstable recall". **Effect 3: Cleaner and More Usable Memory ✨** > Structured + Deduplicated + High Compression, avoiding "Long Output Pollution" Long outputs from tool calls (such as traversal results, config/schema, etc.) are not written back to the context verbatim repeatedly; MemOS can summarize/compress, deduplicate, and archive, making it "cleaner" over long-term operation, with memory quality improving rather than deteriorating over time. --- ## After integrating the MemOS OpenClaw plugin👇🏻 - ✅ Retrieve only 3–5 relevant memories at a time - ✅ Maintain context stability within 2,000–3,000 tokens - ✅ Cost remains manageable regardless of dialogue length ### MemOS plugins can enhance your OpenClaw | Feature | Description | |-----|------| | **Automatically remember all conversations** | without relying on models to actively log, ensuring no critical information is missed | | **Precise recall** | retrieve relevant memories based on current task intent, avoiding irrelevant historical data | | **Remember user preferences** | categorise and store preference information specifically, remaining effective across sessions | MemOS OpenClaw has restructured the token consumption model, transforming costs from a ‘historical length function’ into a ‘task relevance function’. Your local OpenClaw costs become manageable, and the system operates more stably. --- ## Quick Start Three steps to boost your Agent with basic memory capabilities. ### 1. Install OpenClaw Ensure that the OpenClaw environment is installed on your system: ```bash # Install the newest version npm install -g openclaw@latest # Initialize and configure startup openclaw onboard ``` ### 2. Get and configure your API Key #### 2.1 Get your Key Log in to or register with MemOS Cloud to get your API Key 🔗 [MemOS Cloud](https://memos-dashboard.openmem.net/apikeys/) ![image.png](https://cdn.memtensor.com.cn/img/1772443326905_kkxve6_compressed.webp) #### 2.2 Set Environment Variables The plugin tries env files in order (**openclaw → moltbot → clawdbot**). For each key, the first file with a value wins. If none of these files exist (or the key is missing), it falls back to the process environment. **Where to configure** - Files (priority order): - `~/.openclaw/.env` - `~/.moltbot/.env` - `~/.clawdbot/.env` - Each line is `KEY=value` **Quick setup (shell)** ```bash echo 'export MEMOS_API_KEY="mpg-..."' >> ~/.zshrc source ~/.zshrc # or echo 'export MEMOS_API_KEY="mpg-..."' >> ~/.bashrc source ~/.bashrc ``` **Quick setup (Windows PowerShell)** ```powershell [System.Environment]::SetEnvironmentVariable("MEMOS_API_KEY", "mpg-...", "User") ``` If `MEMOS_API_KEY` is missing, the plugin will warn with setup instructions and the API key URL. **Minimal config** ```env MEMOS_API_KEY=YOUR_TOKEN ``` ### 3. Install Plugins #### Option A — NPM (Recommended) ```bash openclaw plugins install @memtensor/memos-cloud-openclaw-plugin@latest openclaw gateway restart ``` > Note for Windows Users: If you encounter Error: spawn EINVAL, this is a known issue with OpenClaw's plugin installer on Windows. Please use Option B (Manual Install) below. Make sure it’s enabled in ~/.openclaw/openclaw.json: ```json { "plugins": { "entries": { "memos-cloud-openclaw-plugin": { "enabled": true } } } } ``` #### Option B — Manual Install (Workaround for Windows) 1. Download the latest `.tgz` from [NPM](https://www.npmjs.com/package/@memtensor/memos-cloud-openclaw-plugin). 2. Extract it to a local folder (e.g., `C:\Users\YourName\.openclaw\extensions\memos-cloud-openclaw-plugin`). 3. Configure `~/.openclaw/openclaw.json` (or `%USERPROFILE%\.openclaw\openclaw.json`): ```json { "plugins": { "entries": { "memos-cloud-openclaw-plugin": { "enabled": true } }, "load": { "paths": [ "C:\\Users\\YourName\\.openclaw\\extensions\\memos-cloud-openclaw-plugin\\package" ] } } } ``` > **Tip**: Note: The extracted folder usually contains a package subfolder. Point to the folder containing package.json. Restart the gateway after config changes. ### 4. Update Plugin You can manually update the cloud plugin to the latest version using the following commands: ```bash openclaw plugins update @memtensor/memos-cloud-openclaw-plugin@latest openclaw gateway restart ``` ## Advanced Configuration for Open-Source Projects If you wanna unlock further possibilities, you may explore and configure additional features via the MemOS GitHub project! ### Visual Configuration UI (Config UI) Starting from version `v0.1.12`, the Cloud Plugin features a built-in local visual configuration service, allowing you to manage and modify plugin settings more intuitively. **How to access:** 1. Start your OpenClaw node or host gateway. 2. Once the plugin is successfully loaded and detects that the gateway is ready, it will automatically start the Config UI service in the background. 3. An access link will be printed in the terminal console logs (the default URL is typically `http://127.0.0.1:38463`). 4. Open this link in your browser to access the plugin's visual management backend. **Features:** - **Intuitive Editing**: Supports form-based editing of all core configurations (such as Knowledge Base IDs, LLM retrieval parameters, multi-agent override rules, etc.). - **Real-time Synchronization**: Configuration changes saved via the interface take effect immediately during plugin runtime, without requiring a service restart. - **Status Monitoring**: The interface provides heartbeat detection with the host gateway to ensure the configuration synchronization link is healthy. ### Multi-Agent Support & Isolation The plugin provides powerful native support for multi-agent architectures (via the `agent_id` parameter), making it ideal for complex workflows or team agent scenarios. **1. Enable & Data Isolation** - **How to enable**: Set `"multiAgentMode": true` in the config or configure the environment variable `MEMOS_MULTI_AGENT_MODE=true`. - **Automatic Isolation**: When enabled, the plugin automatically reads `ctx.agentId` from the context. This Agent identifier is attached to memory retrieval and writing, ensuring complete data isolation between different Agents under the same user (Note: the default `"main"` Agent is ignored to maintain legacy data compatibility). **2. Memory Switch per Agent (Whitelist Control)** In Multi-Agent mode, if you do not want all Agents to consume memory, you can use `allowedAgents` to precisely control the whitelist: ```json { "plugins": { "entries": { "memos-cloud-openclaw-plugin": { "enabled": true, "config": { "multiAgentMode": true, "allowedAgents": ["research-agent", "coding-agent"] } } } } } ``` *(Tip: 1. If `allowedAgents` is not configured or is an empty array `[]`, it means **all Agents** are allowed to use memory retrieval and writing. 2. If it is configured, Agents not in the configuration will be completely skipped, and only the configured Agents will be effective for memory retrieval and writing, thereby avoiding Token waste.)* **3. Per-Agent Configuration (agentOverrides)** Beyond simple toggles, you can use `agentOverrides` to **configure different memory parameters for each Agent**. For example, giving a research assistant a looser retrieval threshold, while restricting a coding assistant to read only a specific codebase knowledge base: ```json { "plugins": { "entries": { "memos-cloud-openclaw-plugin": { "enabled": true, "config": { "multiAgentMode": true, "allowedAgents": ["research-agent", "coding-agent"], "memoryLimitNumber": 6, "relativity": 0.45, "agentOverrides": { "research-agent": { "knowledgebaseIds": ["kb-research-papers"], "memoryLimitNumber": 12, "relativity": 0.3, "queryPrefix": "research context: " }, "coding-agent": { "knowledgebaseIds": ["kb-codebase"], "memoryLimitNumber": 9, "addEnabled": false } } } } } } } ``` *(In the example above, memory writing is disabled for the `coding-agent`, and it can only retrieve the top 9 highly relevant memories from the `kb-codebase` knowledge base).* ### Deep customisation of environment variables In addition to the required API Key, you may also adjust the plugin's behaviour via environment variables。 Further configuration details can be found in [the MemTensor official plugin repo](https://github.com/MemTensor/MemOS/tree/main/apps/MemOS-Cloud-OpenClaw-Plugin) ## Testing Now, you can engage in multi-turn conversations with your Agent, for example: **First convo:** - "My favourite programming language is Python" - "I'm developing an e-commerce project" **Second convo (new convo):** - "Do you recall which programming language I prefer?" - "How is the project I mentioned previously progressing?" Now, your OpenClaw will retrieve memories from MemOS Cloud and provide accurate responses ✅ --- ## Use with the CLI Tool If you use OpenClaw alongside other Agents that can execute shell commands (e.g. Cursor, Codex, Claude Code), the MemOS CLI lets them all share the same memory space. ### Install CLI and generate plugin-aware Skill ```bash npm install -g @memtensor/memos-cloud-cli memos init --agent openclaw --memos-plugin ``` `--memos-plugin` makes the generated Skill aware of the installed OpenClaw cloud plugin, avoiding duplicate memory writes. > **Tip**: Using OpenClaw as an example, in the LOCOMO evaluation, using MemOS CLI alone reduced token usage by about 65.5%; integrating MemOS Cloud + CLI improved accuracy from 66.60% to 77.27%. ### Install Skill for other Agents ```bash memos init --agent cursor memos init --agent codex ``` Once installed, these Agents will also automatically search and write MemOS memories through the CLI, sharing the same memory space as OpenClaw. ### Test whether the plugin and CLI work together Return to OpenClaw and start a conversation that should produce long-term memory: ```text I am recently working on an e-commerce project, mainly using Python. Please remember this. ``` Then start a new conversation and ask: ```text What project am I working on recently? Which language am I mainly using? ``` If OpenClaw answers using the previous turn, the plugin has written and recalled memory successfully. The generated CLI Skill is also aware of the installed plugin, so it avoids duplicate memory writes. > **Tip**: See the [MemOS CLI Guide](/mcp_agent/cli/guide) for the full command reference and more usage details. --- ## More Integration Methods > **Tip**: In addition to the OpenClaw plugin, MemOS also supports [MCP](/mcp_agent/mcp/guide) (for Cursor, Claude Desktop, and other clients) and [CLI](/mcp_agent/cli/guide) (for any Agent framework that can execute shell commands). See [Use in Agents](/memos_cloud/getting_started/agent_usage) for a full comparison. --- # Local Plugin (/openclaw/local_plugin) > **Note**: **Compatibility recommendation** > To avoid plugin loading or runtime errors, we recommend upgrading your Agent after MemOS has completed compatibility adaptation for that Agent version. `@memtensor/memos-local-plugin` is the new MemOS local plugin: one local-first memory core for both **OpenClaw** and **Hermes Agent**. It does not host your memory data in the cloud. Instead, it maintains SQLite data, skill packages, and logs on your own machine so the agent can accumulate reusable experience locally. If you want a cloud-hosted memory service for OpenClaw with the simplest API Key setup, see the [OpenClaw Cloud Plugin](/openclaw/guide). If you care more about privacy, local runtime, observability, or using the same local memory capability across OpenClaw / Hermes, use this local plugin. ## Core Capabilities | Capability | Description | | --- | --- | | Local-first | OpenClaw and Hermes each get an isolated runtime home. SQLite, skills, logs, and config stay on your machine. | | Dual-agent support | OpenClaw integrates through an in-process TypeScript plugin; Hermes integrates through a Python Provider that talks to the same Node.js memory core over JSON-RPC. | | Four memory layers | L1 Trace records each execution step, L2 Policy induces cross-task strategies, L3 World Model compresses environment knowledge, and Skill turns high-value experience into callable capabilities. | | Three-tier retrieval | Retrieval runs across Skill → Trace/Episode → World Model, combining vector, FTS5, keyword pattern, and error-signature channels with RRF + MMR. | | Feedback-driven evolution | Tool outcomes, environment feedback, and explicit user feedback update memory value and drive policy induction, skill crystallization, and decision repair. | | Local Viewer | Includes Overview, Memories, Tasks, Policies, World Models, Skills, Analytics, Logs, Import, Settings, and Help pages. | | Import and migration | Supports JSON import/export, legacy plugin migration, and agent-specific native imports for OpenClaw session JSONL or Hermes `MEMORY.md`. | | Optional team sharing | Isolated by default. Enable sharing from the Memory Viewer's Team Sharing panel to share crystallized Skills and optional trace excerpts over a LAN / VPN. | ## How It Works Before each task, the plugin retrieves relevant context and injects it into the agent. After the task ends, it stores conversations, tool calls, observations, and feedback in the local pipeline. High-value patterns gradually become Policies, World Models, and callable Skills. The next time a similar task appears, the agent receives guidance about what to do and what to avoid. | Stage | What Happens | Output | | --- | --- | --- | | 1. Agent adapter | OpenClaw / Hermes send conversations, tool calls, and feedback to the shared `MemoryCore` through their adapters. | Standardized turns, tool outcomes, feedback | | 2. Local capture | `MemoryCore` turns the execution process into grounded, traceable step records. | L1 Trace | | 3. Experience induction | Similar Traces are induced into cross-task strategies, then compressed into environment knowledge. | L2 Policy, L3 World Model | | 4. Skill crystallization | High-value strategies become callable Skills and keep updating reliability from later feedback. | Skill, η, lifecycle status | | 5. Retrieval injection | Before the next task, Retriever recalls context from Skill, Trace/Episode, and World Model tiers. | Local memory context injected into the agent | ## Quick Start ### Step 1: Install or Upgrade with One Command Installation and upgrades use the same command. The current installer targets macOS / Linux: ```bash curl -fsSL https://raw.githubusercontent.com/MemTensor/MemOS/main/apps/memos-local-plugin/install.sh | bash ``` The installer auto-detects whether OpenClaw and/or Hermes are installed. In an interactive terminal, it asks which agent to install for; in non-interactive environments, it installs for the detected agent(s). It deploys plugin code, installs production dependencies, and restarts the target runtime when needed. > Do not use direct `npm install` as the primary path. The installer handles agent detection, directory layout, config initialization, and runtime restart. ### Step 2: Open the Memory Viewer After installation, open the corresponding Memory Viewer: | Agent | Memory Viewer | | --- | --- | | OpenClaw | `http://127.0.0.1:18799` | | Hermes | `http://127.0.0.1:18800` | If you install both OpenClaw and Hermes, they use separate Viewers and separate local data directories. ### Step 3: Configure from the Panel All user-facing configuration is done from the Memory Viewer: - **Settings → AI Models**: configure Embedding, LLM, Skill Evolver, and use Test to confirm connectivity. - **Settings → Team Sharing**: enable or disable team sharing, then configure team address and tokens. - **Settings → General**: configure language, detailed logs, anonymous telemetry, and related options. After saving, the Viewer restarts the plugin and loads the new settings. ### Step 4: Start the Target Agent After installation, start the agent you selected as usual. The plugin retrieves local context before the agent builds its prompt, then writes conversations, tool calls, observations, and feedback into local memory after the turn finishes. | Agent | How to start | Plugin integration | | --- | --- | --- | | OpenClaw | Start or restart the OpenClaw gateway normally | TypeScript plugin calls `MemoryCore` in the OpenClaw process | | Hermes | Run `hermes chat` | Python Provider calls the Node.js memory core over JSON-RPC | If the Hermes machine cannot run Node.js, the Hermes Provider reports unavailable and falls back to Hermes' own in-memory mode. ### Step 5: Verify Memory Back in the Memory Viewer, check: 1. **Overview**: confirm core status, version, and event stream. 2. **Memories**: confirm conversations and tool steps are written as Traces. 3. **Tasks / Policies / World Models / Skills**: inspect how experience is induced and crystallized. 4. **Import**: migrate legacy data, import OpenClaw session JSONL, import Hermes `MEMORY.md`, or import/export JSON backups. 5. **Help**: look up field meanings such as `V`, `α`, `R_human`, `η`, support, and gain. ## Agent Differences | Item | OpenClaw | Hermes | | --- | --- | --- | | Integration | TypeScript plugin, in-process calls to `MemoryCore` | Python `MemoryProvider`, stdio JSON-RPC to Node bridge | | Default Viewer | `http://127.0.0.1:18799` | `http://127.0.0.1:18800` | | Model configuration | Configure in OpenClaw Viewer Settings → AI Models | Configure in Hermes Viewer Settings → AI Models | | Data sharing | Isolated from Hermes by default | Isolated from OpenClaw by default | Even on the same machine, the two agents use separate databases and Viewers. They only share data after you explicitly enable `hub:`. ## Available Tools OpenClaw and Hermes expose memory tools through their own host interfaces. Common capabilities include: | Tool | Purpose | | --- | --- | | `memory_search` | Search across relevant Skills, Trace/Episodes, and World Models. | | `memory_get` | Fetch a memory detail. | | `memory_timeline` | Inspect an episode / task timeline. | | `skill_list` | List callable Skills. | | `skill_get` | Fetch a Skill invocation guide. | | `memory_environment` | Query L3 World Models for project structure, environment behavior, and constraints. | The plugin also records tool successes and failures for later decision repair. ## Data Management - **Back up**: export JSON from the Viewer's Import page, or back up the current agent's `~/./memos-plugin/` directory. - **Clear only memory**: after confirming you have a backup, delete `data/` and `skills/` under the runtime home. - **Clear logs**: delete regular files under `logs/`. Audit logs are gzipped monthly and kept by default. - **Full reset**: delete the entire `~/./memos-plugin/` directory. It will be recreated empty on the next start. ## More - [MemOS local plugin project](https://github.com/MemTensor/MemOS/tree/main/apps/memos-local-plugin) - [Cloud Plugin vs Local Plugin](/openclaw/plugin_compare) --- # Knowledge Base Usage (/openclaw/examples/knowledge_base) ## Cloud Plugin The MemOS OpenClaw cloud plugin can recall both personal memories and selected knowledge base content before each task. This is useful for connecting product docs, company policies, project materials, coding standards, and other long-lived references to OpenClaw so the Agent can automatically use them during execution. ### 1. Create a Knowledge Base and Get Its ID Create the knowledge base in [MemOS Cloud Dashboard](https://memos-dashboard.openmem.net) first. Steps: 1. Log in to [MemOS Cloud Dashboard](https://memos-dashboard.openmem.net). 2. Go to the knowledge base management page and create a new knowledge base. 3. Upload the documents or Skill files that the Agent should retrieve from. 4. Wait until file processing is complete. 5. Copy the knowledge base ID from the knowledge base details page, for example `kb-company-handbook`. > **Warning**: Use the real knowledge base ID copied from the dashboard to replace `kb-company-handbook` in the examples below. The knowledge base name is only a display name; the plugin configuration requires the knowledge base ID. ### 2. Configure a Global Knowledge Base Knowledge base configuration has two directions: - `knowledgebaseIds` / `MEMOS_KNOWLEDGEBASE_IDS`: queries knowledge bases. During `/search/memory`, the plugin recalls content from these knowledge bases. - `allowKnowledgebaseIds` / `MEMOS_ALLOW_KNOWLEDGEBASE_IDS`: writes to knowledge bases. During `/add/message`, the plugin is allowed to write newly generated memories to these knowledge bases. If all OpenClaw sessions should retrieve from the same knowledge base, configure `knowledgebaseIds` in `~/.openclaw/openclaw.json`: ```json { "plugins": { "entries": { "memos-cloud-openclaw-plugin": { "enabled": true, "config": { "apiKey": "YOUR_MEMOS_API_KEY", "knowledgebaseIds": ["kb-company-handbook"], "memoryLimitNumber": 6, "relativity": 0.45 } } } } } ``` Restart the OpenClaw gateway after updating the config: ```bash openclaw gateway restart ``` ### 3. Configure with Environment Variables You can also configure knowledge base IDs in `~/.openclaw/.env`: ```bash MEMOS_API_KEY=YOUR_MEMOS_API_KEY MEMOS_KNOWLEDGEBASE_IDS="kb-company-handbook,kb-product-docs" ``` `MEMOS_KNOWLEDGEBASE_IDS` defines the knowledge base scope for retrieval. The plugin reads from these knowledge bases when calling `/search/memory`. If you need to write newly generated conversation memories into a knowledge base, configure the write parameter separately: ```bash MEMOS_ALLOW_KNOWLEDGEBASE_IDS="kb-company-handbook" ``` > **Warning**: Knowledge base writes can store Agent-generated information into the target knowledge base, but knowledge base content is not filtered by `agent_id`. In other words, if multiple Agents are configured to read the same knowledge base, anything written into that knowledge base may be recalled by all of them, weakening multi-agent memory isolation. > > If you need strict Agent isolation, do not write conversation memories into a shared knowledge base. Configure `allowKnowledgebaseIds` or `MEMOS_ALLOW_KNOWLEDGEBASE_IDS` only when you explicitly want the content to become shared team knowledge. ### 4. Bind Different Knowledge Bases to Different Agents When different Agents own different tasks, enable multi-agent mode and use `agentOverrides` to assign knowledge bases per Agent: ```json { "plugins": { "entries": { "memos-cloud-openclaw-plugin": { "enabled": true, "config": { "multiAgentMode": true, "allowedAgents": ["research-agent", "coding-agent"], "memoryLimitNumber": 6, "relativity": 0.45, "agentOverrides": { "research-agent": { "knowledgebaseIds": ["kb-research-papers"], "memoryLimitNumber": 12, "queryPrefix": "research context: " }, "coding-agent": { "knowledgebaseIds": ["kb-codebase", "kb-api-docs"], "memoryLimitNumber": 9, "includeToolMemory": true, "addEnabled": false } } } } } } } ``` This configuration means: - `research-agent` retrieves from research paper / academic knowledge bases. - `coding-agent` retrieves from codebase and API documentation knowledge bases, and disables memory writes to avoid storing temporary debugging details as long-term memory. - Agents not listed in `agentOverrides` inherit the global configuration. If you enable knowledge base writes in a multi-agent setup, make sure the write target is not a shared knowledge base read by multiple Agents. Otherwise, information from different Agents may become visible to each other through the knowledge base. ### 5. Verify Knowledge Base Recall After configuration, ask OpenClaw a question that can only be answered from the uploaded knowledge base documents: ```text You: According to the company reimbursement policy, what purchase amount for design software requires special approval? OpenClaw: According to the procurement rules in the knowledge base, design software purchases above 1000 require special approval... ``` If the answer does not use knowledge base content, check the following: - Confirm that the knowledge base files have finished processing in the dashboard. - Confirm that `knowledgebaseIds` or `MEMOS_KNOWLEDGEBASE_IDS` uses the knowledge base ID, not the knowledge base name. - Confirm that the OpenClaw gateway has been restarted and loaded the latest config. - If using multi-agent mode, confirm that the current Agent is listed in `allowedAgents` and that the corresponding `agentOverrides` config is correct. ### Config Reference For knowledge base configuration details, see the [official MemOS-Cloud-OpenClaw-Plugin repository](https://github.com/MemTensor/MemOS-Cloud-OpenClaw-Plugin): | Config | Purpose | | --- | --- | | `knowledgebaseIds` | Query knowledge bases. Plugin config for the knowledge base retrieval scope, corresponding to `/search/memory`. | | `MEMOS_KNOWLEDGEBASE_IDS` | Query knowledge bases. Environment variable form of the knowledge base retrieval scope. Separate multiple IDs with commas. | | `allowKnowledgebaseIds` | Write to knowledge bases. Plugin config for knowledge bases that newly generated memories may be written to, corresponding to `/add/message`. | | `MEMOS_ALLOW_KNOWLEDGEBASE_IDS` | Write to knowledge bases. Environment variable form of knowledge bases that newly generated memories may be written to. | | `agentOverrides..knowledgebaseIds` | In multi-agent scenarios, assign knowledge bases to a specific Agent. | --- # Multi-Agent Memory Isolation (/openclaw/examples/multi_agent) ## Cloud Plugin The MemOS OpenClaw Cloud plugin supports complete isolation of memory and message history across multiple Agents. Each Agent can only access its own memory, preventing cross-agent interference. ### How to Use in Cloud Plugin With a simple configuration, different Agents can have independent memory spaces. Both auto-detection and static assignment are supported. #### 1. Enable Multi-Agent Mode Add the following to your `openclaw.json`: ```json { "plugins": { "entries": { "memos-cloud-openclaw-plugin": { "config": { "multiAgentMode": true } } } } } ``` Or set the environment variable: ```bash MEMOS_MULTI_AGENT_MODE=true ``` #### 2. Auto-detect Agent Once enabled, the plugin automatically reads `ctx.agentId` and isolates memory for each Agent. No extra configuration is required. #### 3. Statically Assign Agent (Optional) If you need to pin a specific Agent ID, set it in the config: ```json { "config": { "agentId": "marketing_agent" } } ``` #### 4. Configure Parameters Per Agent (Optional) Beyond isolating memory spaces, you can use `agentOverrides` to override parameters for each Agent, such as knowledge bases, recall limits, relevance threshold, and whether memory writes are enabled. Fields that are not specified inherit from the global config. ```json { "plugins": { "entries": { "memos-cloud-openclaw-plugin": { "enabled": true, "config": { "multiAgentMode": true, "allowedAgents": ["research-agent", "coding-agent"], "knowledgebaseIds": [], "memoryLimitNumber": 6, "relativity": 0.45, "agentOverrides": { "research-agent": { "knowledgebaseIds": ["kb-research-papers", "kb-academic"], "memoryLimitNumber": 12, "relativity": 0.3, "includeToolMemory": true, "captureStrategy": "full_session", "queryPrefix": "research context: " }, "coding-agent": { "knowledgebaseIds": ["kb-codebase", "kb-api-docs"], "memoryLimitNumber": 9, "relativity": 0.5, "addEnabled": false } } } } } } } ``` This configuration means: - `research-agent` uses research / academic knowledge bases, recalls more memories, and includes tool memory. - `coding-agent` only reads from codebase and API documentation knowledge bases, and disables memory writes. - Other Agents, if allowed to use memory, inherit global `knowledgebaseIds`, `memoryLimitNumber`, and `relativity`. You can also configure a JSON string in `.env` with `MEMOS_AGENT_OVERRIDES`, but it has lower priority than `agentOverrides` in `openclaw.json`: ```bash MEMOS_AGENT_OVERRIDES='{"research-agent":{"memoryLimitNumber":12,"relativity":0.3},"coding-agent":{"memoryLimitNumber":9,"addEnabled":false}}' ``` Common overridable fields: | Field | Description | | --- | --- | | `knowledgebaseIds` | Knowledge base IDs retrieved by the current Agent. | | `memoryLimitNumber` | Maximum number of memory items recalled by the current Agent. | | `preferenceLimitNumber` | Maximum number of preference memories recalled by the current Agent. | | `includePreference` | Whether to recall preference memories. | | `includeToolMemory` | Whether to recall tool memories. | | `toolMemoryLimitNumber` | Maximum number of tool memories to recall. | | `relativity` | Relevance threshold, usually between `0` and `1`. | | `recallEnabled` | Whether memory recall is enabled for the current Agent. | | `addEnabled` | Whether memory writes are enabled for the current Agent. | | `captureStrategy` | Capture strategy, such as `last_turn` or `full_session`. | | `queryPrefix` | Search query prefix for the current Agent. | | `recallFilterEnabled` | Whether secondary recall filtering is enabled. | | `allowKnowledgebaseIds` | Knowledge base IDs that the current Agent is allowed to write to. | | `tags` | Tags attached when writing memories. | ### Principles - **/search/memory**: Memory retrieval — returns only the current Agent's memories - **/add/message**: Record insertion — automatically tags data for the current Agent - **Config merging**: the plugin reads global config first, then applies `agentOverrides.` for the current Agent - **Backward compatibility**: Default Agent `"main"` is ignored to keep existing single-Agent data unaffected ### Use Cases - **Multi-role collaboration**: Strategy, business, marketing, and engineering Agents can work in parallel - **Business-line isolation**: Agents from different business lines run independently without interference - **Persona consistency**: Preserve each Agent's long-term persona and behavior style --- ## Local Plugin `@memtensor/memos-local-plugin` supports both OpenClaw and Hermes. By default, each agent uses its own runtime home and local database. If multiple sessions / agents share one runtime, retrieval is scoped toward the current agent context. For cross-instance collaboration, enable team sharing from **Viewer → Settings → Team Sharing**. ### Rules - **Isolated by default**: OpenClaw uses `~/.openclaw/memos-plugin/`, while Hermes uses `~/.hermes/memos-plugin/`. They do not share databases automatically. - **Current agent first**: retrieval prioritizes the current agent / session's Traces, Policies, World Models, and Skills. - **Optional sharing**: when `hub.enabled` is on, instances can share locally crystallized Skills and optional trace excerpts over a LAN / VPN. - **Graceful fallback**: Hub is not on the algorithm critical path. If sharing is unavailable, the plugin falls back to local-only memory. ### Example Workflow ```text OpenClaw: memory_search("deploy config") → prioritizes OpenClaw's local Skill / Trace / World Model store Hermes: memory_search("deploy config") → prioritizes Hermes' local Skill / Trace / World Model store With Hub enabled: OpenClaw / Hermes can pull team-shared Skills private Traces remain local to each machine and runtime home by default ``` ### Expected Results - OpenClaw and Hermes do not read each other's local database by default - Team members can explicitly share high-value Skills to avoid repeating mistakes - Local writes, retrieval, and skill lookup continue to work even if Hub is unavailable --- # Secondary Filtering for Memory Recall (/openclaw/examples/recall_filter) ## Cloud Plugin The MemOS Openclaw cloud plugin supports secondary filtering of recalled memories with a specified large language model. After filtering, only memories that are highly relevant to the current task are injected into context, which reduces irrelevant noise and saves tokens. ### How to Use Just configure an OpenAI-compatible model endpoint (such as local Ollama or a third-party LLM API) and enable the filter switch to turn on secondary memory filtering. #### 1. Enable Memory Filtering When configuring an LLM for memory filtering, you **must** configure the API Key and Base URL. Add the following in your `openclaw.json` config: ```json { "plugins": { "entries": { "memos-cloud-openclaw-plugin": { "config": { "recallFilterEnabled": true, "recallFilterBaseUrl": "http://127.0.0.1:11434/v1", "recallFilterApiKey": "sk-...", "recallFilterModel": "qwen2.5_7b" } } } } } ``` Or set environment variables: ```bash MEMOS_RECALL_FILTER_ENABLED=true MEMOS_RECALL_FILTER_BASE_URL="http://127.0.0.1:11434/v1" MEMOS_RECALL_FILTER_API_KEY="sk-..." MEMOS_RECALL_FILTER_MODEL="qwen2.5_7b" ``` #### 2. Configure Authentication and Advanced Parameters (Optional) If you need to adjust timeout and failure strategy, you can specify them in the config: ```json { "config": { "recallFilterTimeoutMs": 6000, "recallFilterFailOpen": true } } ``` ### How It Works - **Post-recall interception**: Before each conversation round, after memories are recalled from the cloud, the plugin sends candidate memory entries to your configured filtering model for secondary screening. - **Precise retention**: After model judgment, only entries marked as `keep` are retained and injected into the agent context. - **High-availability fallback**: Fail-open (`recallFilterFailOpen: true`) is enabled by default. If the filtering model times out or fails, it automatically falls back to full injection without filtering, so the current conversation is not interrupted. ### Typical Use Cases - **Pruning long-term memory**: In long-running conversations with many accumulated memories, remove content unrelated to the current prompt to significantly reduce main-model context token usage. - **Improving reasoning accuracy**: For agents handling complex tasks, filter out early irrelevant memories to improve reasoning quality on the core task. - **Working with local models**: Use a locally running small model (such as `qwen2.5_7b` via Ollama) as a low-cost pre-filter to improve memory injection quality without increasing main-model API costs. --- ## Local Plugin `@memtensor/memos-local-plugin` includes multi-stage local retrieval filtering. It first recalls candidates from Skill, Trace/Episode, and World Model tiers, then applies RRF + MMR for fusion and deduplication. If an LLM is configured, it can also run a final relevance check before injection to drop items that only share surface keywords with the current task. ### How to Configure Configure this directly in the Memory Viewer for the target agent: | Agent | Memory Viewer | | --- | --- | | OpenClaw | `http://127.0.0.1:18799` | | Hermes | `http://127.0.0.1:18800` | Steps: 1. Open the Memory Viewer. 2. Go to **Settings → AI Models**. 3. In the **LLM** section, choose a provider and fill in endpoint, API Key, model, and related fields. 4. Click **Test** to confirm the model works. 5. Save the settings. The Viewer restarts the plugin and loads the new config. After saving, local retrieval can use that LLM for a relevance check after recall and RRF/MMR ranking. If no LLM is configured, the plugin still uses built-in multi-channel recall and mechanical threshold filtering. ### Local Retrieval Flow ```text User request → Build retrieval query and tags → Tier 1: Skill candidates → Tier 2: Trace / Episode candidates → Tier 3: World Model candidates → Multi-channel recall: vector / FTS5 / pattern / error signatures → RRF fusion + MMR diversity control → Optional LLM relevance check → Inject into the agent ``` ### Expected Results - Injected memory context is more focused and less noisy - Skill, Trace/Episode, and World Model hits are not selected by vector similarity alone - If the LLM is unavailable, retrieval falls back to stricter mechanical thresholds without breaking basic recall --- # Local Plugin Usage (/openclaw/examples/hermes_usage) ## Basic Usage `@memtensor/memos-local-plugin` supports both OpenClaw and Hermes. After installation, start the agent you use as usual. The plugin injects local memory context before each task and writes Trace, Policy, World Model, and Skill data after the task finishes. | Agent | How to start | Viewer | | --- | --- | --- | | OpenClaw | Start or restart the OpenClaw gateway normally | `http://127.0.0.1:18799` | | Hermes | `hermes chat` | `http://127.0.0.1:18800` | ### Verify Memory is Working 1. Have a conversation with OpenClaw or Hermes. 2. Open the corresponding Memory Viewer and confirm the conversation appears in **Memories** / **Tasks**. 3. In a new conversation, ask the agent to recall what you discussed: ```text You: Do you remember what I asked you to help me with before? Agent: (Calls memory_search) Yes, we previously discussed... ``` --- ## Memory Tools The local plugin exposes memory tools through each agent host. Exact tool presentation may differ by host, but the core capabilities are shared. | Tool | Purpose | | --- | --- | | `memory_search` | Search across Skill, Trace/Episode, and World Model tiers. | | `memory_get` | Fetch a memory detail. | | `memory_timeline` | Inspect an episode / task timeline. | | `skill_list` | List currently available Skills. | | `skill_get` | Fetch a Skill invocation guide. | | `memory_environment` | Query L3 World Models for project structure, environment behavior, and constraints. | ### Call Examples ```text Agent call: memory_search("Nginx deployment config") → Returns relevant Skills, Trace snippets, and environment knowledge Agent call: skill_get("nginx-proxy") → Returns executable steps, applicability, and caveats ``` The plugin also records tool successes and failures for later decision repair. --- ## Team Sharing By default, OpenClaw and Hermes use separate local databases. For collaboration, enable Team Sharing from the Memory Viewer to share locally crystallized Skills and optional trace excerpts with other instances on the same LAN / VPN. ### How to Configure Open the Memory Viewer for the target agent, go to **Settings → Team Sharing**, fill in the team address and tokens as prompted, then save. The Viewer restarts the plugin and loads the new settings. ### Expected Results - Private local data stays in the current agent's runtime home by default. - Explicitly shared Skills can be discovered and reused by other instances. - Hub is not on the algorithm critical path. If sharing fails, local writes, retrieval, and Skill lookup continue to work. --- ## Multi-Agent Scenarios When OpenClaw and Hermes are installed on the same machine, their ports and data are isolated: | Resource | OpenClaw | Hermes | | --- | --- | --- | | Viewer | `18799` | `18800` | | Data directory | `~/.openclaw/memos-plugin/` | `~/.hermes/memos-plugin/` | | Config entry | Viewer → Settings | Viewer → Settings | ```text OpenClaw: memory_search("deploy config") → prioritizes OpenClaw's local experience Hermes: memory_search("deploy config") → prioritizes Hermes' local experience With Hub enabled: both can explicitly reuse team-shared Skills ``` --- ## Viewer Management The Memory Viewer provides these common entry points: | Page | Purpose | | --- | --- | | Overview | Inspect core status, version, event stream, and health. | | Memories | Inspect L1 Traces and raw execution records. | | Tasks | Inspect conversations and execution results grouped by task. | | Policies | Inspect strategies induced from multiple Traces. | | World Models | Inspect environment knowledge and constraints. | | Skills | Inspect, search, or retire crystallized Skills. | | Import | Import legacy plugin data, OpenClaw session JSONL, Hermes `MEMORY.md`, or import/export JSON backups. | | Settings | Configure models, team sharing, logs, and telemetry. | | Help | Look up field meanings such as `V`, `α`, `R_human`, `η`, support, and gain. | --- # MemOS MCP Usage Guide (/mcp_agent/mcp/guide) MemOS Memory Management MCP is a powerful AI memory enhancement plugin that supports three core capabilities: **conversation memory access**, **user profile construction**, and **knowledge base full lifecycle management**. By integrating MemOS into mainstream AI clients such as Claude, Cursor, and Cline, users can enable AI to continuously accumulate personal memories, understand user preferences, and efficiently process large-scale professional documents, fundamentally improving the consistency and personalization of AI conversations. ## 1. Capabilities Overview ### 1.1 Conversation Memory Management Provides writing, retrieving, deleting, and quality feedback functions for conversation content, which is the foundational capability module of the MemOS MCP. | Tool | Function Description | |---|---| | `add_message` | Writes a summary of the current conversation content into the user's memory bank for future retrieval. | | `search_memory` | Searches for relevant historical memories in the user's personal memory bank based on search terms. | | `delete_memory` | Deletes specified memory entries from the memory bank. | | `add_feedback` | Submits quality feedback on memory entries to optimize memory management effects. | ### 1.2 User Profile System **`get_user_profile`**: Retrieves the user's full-dimensional memory profile with one click. Unlike single factual memories (Facts), the user profile also integrates explicit/implicit preferences (Preferences) and tool usage experience (Tool Trajectories), allowing AI to answer identity-related questions such as "Who am I?" and "What are my preferences?", achieving true personalized interaction. ### 1.3 Knowledge Base Lifecycle Management Supports creating independent namespace containers for specific projects or domains, facilitating the isolated management of structured documents. | Tool | Function Description | |---|---| | `create_knowledge_base` | Creates an independent knowledge base container for document management in a specific project or domain. | | `remove_knowledge_base` | Removes a knowledge base that is no longer needed and its associated content. | ### 1.4 Intelligent Document Upload **`add_kb_document`**: Supports injecting local files or online resources into a specified knowledge base, which is the core tool of the knowledge base capability. **Core Features:** - **Local File Direct Upload**: Pioneered the MCP internal interception mechanism, allowing LLMs to pass local paths directly. - **Polymorphic Path Recognition**: Perfectly supports Windows (drive letters), Unix (absolute paths), Home directory (~/), and environment variable paths. - **Zero Context Loss**: Silently completes Base64 conversion and MIME type encapsulation locally, completely avoiding long documents overwhelming the context. - **Intelligent Direct Link Completion**: Automatically corrects non-standard URLs (adds http/https) and identifies online resources. - **Circuit Breaker Security Strategy**: Built-in anti-deadlock instruction; once an interface error occurs, it immediately trips the circuit breaker to prevent consuming excess traffic. ### 1.5 Precise Document Control Supports batch query and precise deletion of uploaded documents, achieving dynamic maintenance of the knowledge base. | Tool | Function Description | |---|---| | `get_kb_documents` | Batch retrieves detailed metadata of uploaded documents via a list of File IDs. | | `delete_kb_documents` | Precisely deletes specific documents from a designated knowledge base, achieving dynamic streamlining of the document library. | ## 2. Quick Configuration Fill in the following configuration in the client: ```json { "mcpServers": { "memos-api-mcp": { "timeout": 60, "type": "stdio", "command": "npx", "args": [ "-y", "@memtensor/memos-api-mcp@latest" ], "env": { "MEMOS_API_KEY": "mpg-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "MEMOS_USER_ID": "your-user-id", "MEMOS_CHANNEL": "MODELSCOPE" } } } } ``` How to obtain environment variables: - `MEMOS_API_KEY`: Register an account on the MemOS official website's [API Console](https://memos-dashboard.openmem.net/cn/apikeys/), then create a new api-key on the API keys page and copy it here. ![Create a new api-key on the MemOS API Console](https://cdn.memtensor.com.cn/img/1763451978063_251scz_compressed.png) - `MEMOS_USER_ID`: A deterministic user-defined personal identifier. - For the same user, this environment variable needs to remain consistent across different devices/clients; - Please do not use random values, device IDs, or chat session IDs as user identifiers; - Recommended: Use personal email address, full name, or employee ID as the user identifier. - `MEMOS_CHANNEL`: Fill in "MODELSCOPE" here. For more detailed configuration, please refer to: * [npm package](https://www.npmjs.com/package/@memtensor/memos-api-mcp) * [GitHub](https://github.com/MemTensor/memos-api-mcp) ## 3. Using MemOS MCP in Different Clients ### Using in Claude Desktop To use MemOS in Claude Desktop, click the avatar in the lower left corner -> "Settings" -> "Developer" -> "Edit Config", paste the configuration into the `Claude_desktop_config.json` file, and finally restart the client. You can use it in the chat when you observe that the memos-api-mcp service is in the running state. ![Verification of using MemOS in Claude](https://cdn.memtensor.com.cn/img/1763105334517_9ayhrp_compressed.png) To improve the usage effect, it is recommended that users modify the user preference settings that apply to all conversations when using MemOS in Claude Desktop. The specific method is to click the avatar in the lower left corner -> "General", and paste the following content into the input box under "What personal preferences should Claude consider in responses?": ``` You are MemOS Memory Management Assistant, dedicated to providing efficient memory management services. It extracts memories based on users' past conversation content and enhances the consistency and personalization of users' conversations with AI through memory retrieval. Before answering each user's question, you need to call the search_memory service of memos-api-mcp and use appropriate search terms to find memories related to the current topic in the user's personal memory bank. After completing the answer based on these memories, call the add_message service of memos-api-mcp to record a summary of the current conversation content. (Note that calling add_message is mandatory. Regardless of what the user says or asks, it must be recorded; otherwise, in subsequent conversations, search_memory will not be able to obtain more detailed user information, leading to your inability to answer the user's questions accurately.) ``` ![Modifying user preferences for using MemOS in Claude Desktop](https://cdn.memtensor.com.cn/img/1763105312212_yqu9m7_compressed.png) The following is an example of using MemOS in Claude Desktop, by which users can judge whether they have successfully configured MemOS in Claude Desktop. ![Example of using MemOS in Claude Desktop](https://cdn.memtensor.com.cn/img/1763105296073_gtqj1s_compressed.png) ### Using in Cursor To use MemOS in Cursor, go to "Cursor Settings" -> "Tools & MCP" -> "Add Custom MCP" (or "New MCP Server"), and paste the configuration into the pop-up `mcp.json` file editing page. You can use it in the Cursor chat panel when you observe that memos-api-mcp is in the started state and can see tools such as `add_message` and `search_memory` on the tool details page. ![Using MemOS in Cursor](https://cdn.memtensor.com.cn/img/1763105278297_n23ukk_compressed.png) To improve the usage effect, it is recommended that users modify User Rules when using MemOS in Cursor. The specific method is to go to "Cursor Settings" -> "Rules, Memories, Commands" -> "User Rules" -> "+ Add Rule", then copy and paste the following content and save it: ``` You are MemOS Memory Management Assistant, dedicated to providing efficient memory management services. It extracts memories based on users' past conversation content and enhances the consistency and personalization of users' conversations with AI through memory retrieval. Before answering each user's question, you need to call the search_memory service of memos-api-mcp and use appropriate search terms to find memories related to the current topic in the user's personal memory bank. After completing the answer based on these memories, call the add_message service of memos-api-mcp to record a summary of the current conversation content. (Note that calling add_message is mandatory. Regardless of what the user says or asks, it must be recorded; otherwise, in subsequent conversations, search_memory will not be able to obtain more detailed user information, leading to your inability to answer the user's questions accurately.) ``` ![Configuring User rules for using MemOS in Cursor](https://cdn.memtensor.com.cn/img/1763105260337_yqacto_compressed.png) The following is an example of using MemOS in Cursor, by which users can judge whether they have successfully configured MemOS in Cursor. ![Usage example of using MemOS in Cursor](https://cdn.memtensor.com.cn/img/1763105238556_p99epu_compressed.png) ### Using in Visual Studio Code or Trae To use MemOS in VS Code or Trae, install the Cline extension and configure the model, then click the "MCP Servers" icon in the upper right corner inside the Cline panel -> "Configure" -> "Configure MCP Servers", and paste the configuration into the `cline_map_settings.json` file. You can use it in the Cline agent when you observe that memos-api-mcp is in the started state. At the same time, it is recommended to turn on the Auto-approve switch of each tool to avoid the agent asking before each tool call, making the use smoother. ![Configuration process of using MemOS in Cline](https://cdn.memtensor.com.cn/img/1763105211482_g1xclj_compressed.png) To improve the usage effect, it is recommended that users modify global rules when using MemOS in Cline. The specific method is to click the "Manage Cline Rules & Workflows" icon in the lower left corner of the Cline panel -> click the "+" icon on the right below Global Rules -> enter a rules file name in the input box, such as "memos_rules.md" -> copy and paste the following content into the pop-up editor: ```markdown # 🧠 MemOS Automatic Memory System — Mandatory Usage ## ⚠️ Always-On (No User Opt-In Required) This system must run **automatically for every turn**. Do **not** wait for the user to say "use memory", "use MCP", or "call a tool". The **client/orchestrator is responsible** for auto-invoking the tools. ## 🎯 Mandatory 3-Step Workflow (Enforced by Client Each Turn) \`\`\` Every user message → 1) 🔍 search_memory (AUTO, before answering) → 2) 💬 Answer (use only relevant memories; ignore noise) → 3) 💾 add_message (AUTO, after answering) \`\`\` ### 1) 🔍 Search Memory (Auto-invoked BEFORE answering) - **Trigger**: Must be auto-called **before** generating any answer (including simple greetings). - **Tool**: `search_memory` **Relevance rule**: The model must judge relevance and **only use relevant** memories. If results are irrelevant or noisy, **ignore them** and proceed. ### 2) 💬 Answer Use retrieved memories **only if relevant**. If none are relevant, answer normally. ### 3) 💾 Save Conversation (Auto-invoked AFTER answering) - **Trigger**: Must be auto-called after producing the final answer on **every turn**. - **Tool**: `add_message` **Purpose**: Persist Q&A for future personalization and continuity — even if no memory was used this turn. ## ✅ Non-Negotiable Client Responsibilities 1. **Auto-invoke** `search_memory` before **every** answer and `add_message` after **every** answer. 2. **No user opt-in**: Do not wait for the user to mention memory/tools/MCP. 3. **Store both user and assistant** messages every turn. 4. **Sequence** must be strictly: Search → Answer → Save. ``` ![Modifying global rules for using MemOS in VS Code or Trae](https://cdn.memtensor.com.cn/img/1763105181443_v9kg80_compressed.png) The following is an example of using MemOS in Cline, by which users can judge whether they have successfully configured MemOS in Cline. ![Usage example of using MemOS in Cline](https://cdn.memtensor.com.cn/img/1763105156433_jz4k3t_compressed.png) ### Using in [Chatbox](https://chatboxai.app/en) To use MemOS in Chatbox, click "Settings" in the lower left corner -> "MCP" -> "Custom MCP Servers - Add Server" -> "Add Custom Server", and add the memos-api-mcp service according to the following configuration. ``` Name: MemOS Memory Management Type: Local (stdio) Command: npx -y @memtensor/memos-api-mcp@latest Environment Variables: MEMOS_API_KEY={{api_key applied for on the MemOS official website API Console}} MEMOS_USER_ID={{custom USER_ID}} ``` After filling in, click "Test". If you can see tools such as `add_message` and `search_memory` at the bottom of the dialog box, the configuration is successful. ![Verification of using MemOS in Chatbox](https://cdn.memtensor.com.cn/img/1763105136401_xbvcsh_compressed.png) To improve the usage effect, it is recommended that users modify the system_prompt when using MemOS in Chatbox. The specific method is to go to "Settings" in the lower left corner -> "Chat Settings" -> "Default Settings for New Conversation", and modify the prompt as follows: ``` You are MemOS Memory Management Assistant, dedicated to providing efficient memory management services. It extracts memories based on users' past conversation content and enhances the consistency and personalization of users' conversations with AI through memory retrieval. Before answering each user's question, you need to call the search_memory service of memos-api-mcp and use appropriate search terms to find memories related to the current topic in the user's personal memory bank. After completing the answer based on these memories, call the add_message service of memos-api-mcp to record a summary of the current conversation content. (Note that calling add_message is mandatory. Regardless of what the user says or asks, it must be recorded; otherwise, in subsequent conversations, search_memory will not be able to obtain more detailed user information, leading to your inability to answer the user's questions accurately.) ``` ![Modifying system_prompt when using MemOS in Chatbox](https://cdn.memtensor.com.cn/img/1763105111045_trc5fx_compressed.png) The following is an example of using MemOS in Chatbox, by which users can judge whether they have successfully configured MemOS in Chatbox. ![Effect example of using MemOS in Chatbox](https://cdn.memtensor.com.cn/img/1763104980563_q3q7v2_compressed.png) ## 4. Q&A **Q: Why do agents sometimes fail to invoke tools when they should?** A: Due to the different underlying models used, different agents have different proficiency in using tools. When the agent forgets to use the tool, you can guide the model to call the corresponding tool through instructions, or try to use other underlying models. ## 5. Contact Us ![image.png](https://cdn.memtensor.com.cn/img/1758685658684_nbhka4_compressed.png) --- # MemOS CLI (/mcp_agent/cli/guide) MemOS CLI is designed for Agents and development environments that can execute shell commands. It wraps common memory operations into the `memos` command, so you can verify memory flows in a terminal or let an Agent search memories before answering and write useful new memories afterward. ## 1. When to Use It - You want to quickly test memory operations such as `add`, `search`, `get`, and `delete` in a terminal. - Your Agent framework can execute shell commands, but does not have a dedicated MemOS plugin. - You want one reusable Skill to work across multiple Agents instead of writing a separate plugin for each framework. If you use OpenClaw, prefer the [OpenClaw Cloud Plugin](/openclaw/guide). If your client supports MCP natively, see the [MCP Guide](/mcp_agent/mcp/guide). ## 2. Install ```bash npm install -g @memtensor/memos-cloud-cli ``` After installation, confirm that the CLI is available: ```bash memos --help ``` ## 3. Choose a Usage Mode and Configure MemOS CLI has two usage modes: install a memory Skill for an Agent, or manually run `memos` commands in a terminal. ### 3.1 Use with Agents To let an Agent automatically search and write memories, use `memos init` to install the memory Skill. `--agent` is currently required; if it is omitted, the command fails because the CLI needs to know where to install the Skill. ```bash memos init --agent codex ``` You can also write the API Key during initialization: ```bash memos init --api-key YOUR_API_KEY --agent codex ``` The currently supported Agents are listed below. Pass the matching value via `--agent`: | Agent | `--agent` value | | --- | --- | | Codex CLI | `codex` | | Cursor | `cursor` | | Claude Code | `claude` | | OpenClaw | `openclaw` | | Hermes | `hermes` | | Trae | `trae` | | Trae (China) | `trae-cn` | | OpenCode | `opencode` | | Antigravity | `antigravity` | | CodeBuddy | `workbuddy` | | Cline | `cline` | | GitHub Copilot | `copilot` | For example, to install the memory Skill for Cursor: ```bash memos init --agent cursor ``` Once installed, the Agent will automatically load the Skill. During each conversation turn, the Agent will: 1. **Before answering** — automatically run `memos search` to retrieve long-term memories related to the current task 2. **After answering** — automatically run `memos add` to write new facts, preferences, etc. into MemOS If you already have a MemOS plugin installed (e.g. the OpenClaw cloud plugin), add `--memos-plugin` to generate plugin-aware Skill guidance: ```bash memos init --agent openclaw --memos-plugin ``` > **Tip**: Using OpenClaw as an example, in the LOCOMO evaluation, using MemOS CLI alone reduced token usage by about 65.5%; integrating MemOS Cloud + CLI improved accuracy from 66.60% to 77.27%. | Parameter | Description | | --- | --- | | `-k, --api-key` | MemOS API Key | | `--user-id` | Default user ID | | `--conversation-id` | Default conversation ID | | `--memos-plugin` | Generate plugin-aware Skill guidance when a MemOS memory plugin is installed | | `--agent` | Install Skill to a specific Agent directory; required | ### 3.2 Use Directly in Terminal If you only use CLI commands manually in a terminal and do not install an Agent Skill, use `memos config set` to configure CLI variables. After these values are set, later commands automatically use them when the corresponding parameter is not provided. Configure the API Key: ```bash memos config set platform.api_key YOUR_API_KEY ``` Configure the default user ID: ```bash memos config set defaults.user_id user_123 ``` Configure the default conversation ID: ```bash memos config set defaults.conversation_id conv_001 ``` ## 4. Quick Start After completing Agent initialization or terminal configuration above, use the following commands to verify memory operations. Add a memory: ```bash memos add "The user prefers Python programming" ``` Search related memories: ```bash memos search "programming language preference" ``` Chat with memory: ```bash memos chat "Do you know my preference?" ``` Get memories for a user: ```bash memos get user_123 ``` Query the original text of a memory: ```bash memos origin mem_123456 ``` Delete one memory, or delete all memories for a user: ```bash memos delete mem_123456 memos delete --user-id user_123 ``` ## 5. Command Reference ### `memos add` Write a memory into MemOS. ```bash memos add "The user prefers Python for data analysis and often uses pandas" memos add --message "The user frequently uses Jupyter Notebook" --user-id user_123 ``` | Parameter | Description | | --- | --- | | `[MESSAGE]` | Memory content to write; use either this or `--message` | | `-m, --message` | Memory content; alternative to the positional argument | | `--user-id` | User identifier; defaults to `defaults.user_id` in config | | `--format` | Output format; defaults to `agent` | ### `memos search` Search for memories related to a query. ```bash memos search "data analysis tool preference" memos search "programming language" --format json --detail detail ``` | Parameter | Description | | --- | --- | | `[QUERY]` | Search query; use either this or `--query` | | `-q, --query` | Search query; alternative to the positional argument | | `--user-id` | User identifier; defaults to `defaults.user_id` in config | | `--include-preference` | Include preference memories (`true` / `false`); defaults to `true` | | `--include-tool-memory` | Include tool memories (`true` / `false`); defaults to `false` | | `--include-skill-memory` | Include skill memories (`true` / `false`); defaults to `false` | | `--memory-limit-number` | Max number of main memories to recall; defaults to `9` | | `--preference-limit-number` | Max number of preference memories to recall; defaults to `9` | | `--tool-memory-limit-number` | Max number of tool memories to recall; defaults to `6` | | `--skill-memory-limit-number` | Max number of skill memories to recall; defaults to `6` | | `--format` | Output format; defaults to `agent` | | `--detail` | Detail level for non-JSON output; defaults to `simple`; supports `simple`, `detail` | ### `memos get` Get memories by user. ```bash memos get user_123 memos get user_123 --format json --detail detail ``` | Parameter | Description | | --- | --- | | `[USER_ID]` | User identifier; falls back to `defaults.user_id` in config | | `--user-id` | Alias for `[USER_ID]`; same fallback rules | | `--page` | Page number; omitted from request body if not set | | `--size` | Page size; omitted from request body if not set | | `--include-preference` | Include preference memories (`true` / `false`); defaults to API default if not set | | `--include-tool-memory` | Include tool memories (`true` / `false`); defaults to API default if not set | | `--format` | Output format; defaults to `agent` | | `--detail` | Detail level; defaults to `simple`; supports `simple`, `detail` | ### `memos origin` Query the original text of a memory by memory ID. ```bash memos origin mem_123456 memos origin mem_123456 --format json ``` | Parameter | Description | | --- | --- | | `MEMORY_ID` | Memory ID whose original text should be queried; required | | `--format` | Output format; defaults to `agent` | ### `memos delete` Delete one memory, or delete all memories for a user. ```bash memos delete mem_123456 --format json memos delete --user-id user_123 --format json ``` | Parameter | Description | | --- | --- | | `[MEMORY_ID]` | Memory ID to delete; pass this to delete a single memory | | `--user-id` | Delete all memories for this user; use either this or `MEMORY_ID` | | `--format` | Output format; defaults to `agent` | ### `memos chat` Chat using MemOS memories as context. ```bash memos chat "Do you know my preferences?" memos chat "Do you know my preferences?" --user-id user_123 --format table ``` | Parameter | Description | | --- | --- | | `[QUERY]` | Chat question; use either this or `--query` | | `-q, --query` | Chat question; alternative to the positional argument | | `--user-id` | User identifier; defaults to `defaults.user_id` in config | | `--format` | Output format; defaults to `agent` | ### `memos extract` Extract candidate memories from a message without writing them. ```bash memos extract "The user likes coffee and prefers dark mode" --format json ``` | Parameter | Description | | --- | --- | | `[MESSAGE]` | Message to extract from; use either this or `--message` | | `-m, --message` | Message to extract from; alternative to the positional argument | | `--user-id` | User identifier; defaults to `defaults.user_id` in config | | `--format` | Output format; defaults to `agent` | ### `memos rerank` Rerank candidate documents by relevance. ```bash memos rerank "python backend" "Flask guide" "React guide" --format json ``` | Parameter | Description | | --- | --- | | `[QUERY]` | Rerank query; use either this or `--query` | | `[DOCUMENTS]...` | Candidate document texts; multiple positional arguments | | `-q, --query` | Rerank query; alternative to the positional argument | | `--documents` | Candidate document texts; can be repeated | | `--top-n` | Return only the top N results | | `--format` | Output format; defaults to `agent` | ### `memos feedback` Submit feedback to improve memory management quality. ```bash memos feedback "Prefer concise, direct technical answers." --user-id user_123 --format json ``` | Parameter | Description | | --- | --- | | `[FEEDBACK_TEXT]` | Feedback content; use either this or `--feedback-content` | | `--feedback-content` | Feedback content; alternative to the positional argument | | `--user-id` | User identifier; defaults to `defaults.user_id` in config | | `--format` | Output format; defaults to `agent` | ### `memos message` Retrieve original conversation messages. ```bash memos message --user-id user_123 --conversation-id conv_001 memos message --user-id user_123 --limit 10 --format table ``` | Parameter | Description | | --- | --- | | `--user-id` | User identifier; required | | `--conversation-id` | Conversation ID; defaults to `defaults.conversation_id` in config | | `--limit` | Max messages to return; defaults to `6`, max `50` | | `--format` | Output format; defaults to `agent` | ### `memos status` Query the processing status of an async task. The task ID comes from the `task_id` returned by `add` or `feedback` in async mode. ```bash memos status abc123-task-id --format json ``` | Parameter | Description | | --- | --- | | `[TASK_ID]` | Async task ID; required | | `--format` | Output format; defaults to `agent` | Returned status values include `running`, `completed`, and `failed`. ### `memos kb` Knowledge base management subcommand group: create knowledge bases, upload documents, and query or delete files. #### `memos kb create` Create a knowledge base. ```bash memos kb create --name "Product FAQ" --description "Common product questions" --format json ``` | Parameter | Description | | --- | --- | | `--name` | Knowledge base name; required | | `--description` | Knowledge base description; optional | | `--format` | Output format; defaults to `agent` | #### `memos kb remove` Remove (delete) a knowledge base. ```bash memos kb remove base_xxxxx --format json ``` | Parameter | Description | | --- | --- | | `[KB_ID]` | Knowledge base ID; required | | `--format` | Output format; defaults to `agent` | #### `memos kb add-file` Upload documents to a knowledge base. Supports PDF, DOCX, DOC, TXT, JSON, MD, XML. ```bash memos kb add-file --kb-id base_xxxxx --files '["https://example.com/doc.pdf"]' --format json memos kb add-file --kb-id base_xxxxx --files '[{"content":"https://cdn.example.com/file.docx"}]' --format json ``` | Parameter | Description | | --- | --- | | `--kb-id` | Target knowledge base ID; required | | `--files` | JSON array of file entries: URL strings or `{"content": "..."}` objects; required | | `--format` | Output format; defaults to `agent` | #### `memos kb get-file` Get knowledge base file details and processing status. ```bash memos kb get-file --file-ids '["file_id_1", "file_id_2"]' --format json ``` | Parameter | Description | | --- | --- | | `--file-ids` | JSON array of file IDs; required | | `--format` | Output format; defaults to `agent` | #### `memos kb list-file` List files in a knowledge base with pagination, optionally filtered by type. ```bash memos kb list-file --kb-id base_xxxxx memos kb list-file --kb-id base_xxxxx --type skill --page 2 --page-size 10 --format json ``` | Parameter | Description | | --- | --- | | `--kb-id` | Knowledge base ID; required | | `--type` | Filter by type: `document` or `skill`; optional | | `--page` | Page number; defaults to `1` | | `--page-size` | Items per page; defaults to `20` | | `--format` | Output format; defaults to `agent` | #### `memos kb delete-file` Delete files from a knowledge base. ```bash memos kb delete-file --kb-id base_xxxxx --file-ids '["file_id_1"]' --format json ``` | Parameter | Description | | --- | --- | | `--kb-id` | Knowledge base ID; required | | `--file-ids` | JSON array of file IDs to delete; required | | `--format` | Output format; defaults to `agent` | ## 6. Output Formats All commands support `--format`. The default format is `agent`. `search` and `get` also support `--detail`. | Format | Use case | | --- | --- | | `table` | Human-readable terminal output | | `markdown` | Paste into documentation | | `agent` | Default; inject directly into Agent context | | `json` | Scripts, workflows, or structured processing | ```bash memos search "python" memos search "python" --format table --detail simple memos search "python" --format markdown --detail detail memos search "python" --format agent --detail simple memos search "python" --format json --detail detail ``` ## 7. Configuration Commands and Environment Variables View or modify local configuration: ```bash memos config show memos config get platform.api_key memos config set platform.api_key YOUR_API_KEY memos config set defaults.user_id user_123 memos config set defaults.conversation_id conv_001 ``` | Environment Variable | Description | | --- | --- | | `MEMOS_API_KEY` | Your API Key | | `MEMOS_BASE_URL` | API Base URL; defaults to `https://memos.memtensor.cn/api/openmem/v1` | Global options: | Parameter | Description | | --- | --- | | `--api-key TEXT` | Override the API Key in local configuration | | `--base-url TEXT` | Override the API Base URL | | `--version` | Show version number | All CLI requests include a `source=cli` tag. When the framework can be identified from environment variables or parent processes, the `framework` info is also attached to memory API requests. ## 8. CLI, Plugin, and MCP | Integration | Best for | Characteristics | | --- | --- | --- | | Plugin | Agent frameworks with deep MemOS integration | Deepest integration and best experience; requires per-framework adaptation | | CLI + Skill | Any Agent framework that can execute shell commands | Highly portable, low adaptation cost, great for cross-framework automation | | MCP | MCP-native clients | Standardized tool protocol for clients that support MCP | The three approaches are complementary. Plugin is best for deep integration, CLI + Skill for general automation, and MCP for MCP-native clients. ## Next Steps - [Quick Start](/memos_cloud/getting_started/quick_start): Run the basic memory write and search flow with API / SDK - [MCP Guide](/mcp_agent/mcp/guide): Connect MemOS memory tools through MCP clients - [OpenClaw Cloud Plugin](/openclaw/guide): Use the OpenClaw plugin for deeper Agent integration --- # User Guide (/mcp_agent/agent/guide) ## Coze Platform Plugin Tools ### 1. Plugin Listing Information The MemOS Cloud Service Interface Plugin is now available on the Coze Store! You can directly [visit the tool link](https://www.coze.cn/store/plugin/7569918012912893995?from=store_search_suggestion) to add the plugin and achieve zero-code integration. ### 2. Plugin Description #### Plugin Functions * `search_memory`: This tool is used to query user memory data and can return snippets most relevant to the input. It supports real-time memory retrieval during user-AI conversations and can also perform global searches across the entire memory. It can be used to create user profiles or support personalized recommendations. Parameters such as conversation ID, user ID, and query text are required for queries, and the number of returned memory items can also be set. * `add_memory`: This tool allows batch importing of one or more messages into the MemOS memory storage database, facilitating retrieval in future conversations to support chat history management, user behavior tracking, and personalized interaction. Information such as conversation ID, message content, sender role, conversation time, and user ID must be specified when using it. #### Interface Description * search_memory Interface | Parameter Name | Parameter Type | Description | Required | | --- | --- | --- | --- | | memory_limit_number | string | Limits the number of returned memory items; defaults to 6 if not provided | No | | memos_key | string | Authorization key for MemOS cloud service | Yes | | memos_url | string | URL address of MemOS cloud service | Yes | | query | string | User input | Yes | | user_id | string | Unique identifier of the user associated with the memory being queried | Yes | * add_memory Interface | Parameter Name | Parameter Type | Description | Required | | --- | --- | --- | --- | | conversation_id | string | Unique identifier for the conversation | Yes | | memos_key | string | Authorization key for MemOS cloud service | Yes | | memos_url | string | URL address of MemOS cloud service | Yes | | messages | Array | Array of message objects | Yes | | user_id | string | Unique identifier of the user associated with the memory being queried | Yes | ### 3. Agent Call Example #### Agent Persona and Reply Logic Example ``` You are a Q&A robot. Every time, you read the user's memory and interests, and reply with very clear logic to gain the user's favor. ## Workflow Content # 1. Access {search_memory} to retrieve data materials After each user input, first call the retrieval function in MemOS memory relationships -- the {search_memory} plugin, input information: Record the user's name as user_id. If it is the first visit, set user_id to a 16-character string randomly generated by UUID. Use the user's speech content as the query. # 2. Process {search_memory} output content: Get the data content. If there is a memory_detail_list field, regardless of whether the memory_detail_list is empty, directly output the memory_detail_list in JSON format; if the returned message is not "ok", prompt "Plugin retrieval failed". # 3. Answer the user's question based on the retrieved memory_detail_list Extract the memory_value field value of each item in memory_detail_list, concatenate all strings with "\n" as the context material for answering the user's question; the large model can answer the user's query based on the information provided by the context; if the context information is an empty string, the large model can directly answer the user's query. Then record the content answered by the large model into "answer". # 4. Access {add_memory} to store data materials Call the add_memory function to store the user's question and the corresponding answer, input information: chat_time: Call {current_time} to get the current time, format the timestamp as "%I:%M %p on %d %B, %Y UTC". conversation_id: Record the current time point chat_time accurate to the minute, and use the time point string as conversation_id. user_id: Record the user's name as user_id. messages: Record the query input by the user and all answers obtained, respectively as the content of the role and the content of the assistant in messages. chat_time uses the chat_time value just obtained, organized as a message: [ {"role": "user", "content": query, "chat_time": chat_time}, {"role": "assistant", "content": answer, "chat_time": chat_time} ] Get the {add_memory} plugin feedback. If the success field in data is True, it is successful, *no need to inform the user*; if the returned field is not True, prompt the user that add_memory access failed. ## Requirements Every time you access {search_memory} and {add_memory}, you need to pass two fixed parameters: memos_url = "https://memos.memtensor.cn/api/openmem/v1" memos_key = "Token mpg-XXXXXXXXXXXXXXXXXXXXXXXXXXX" Your role is a wise and caring memory assistant named Xiaozhi. If all plugins run smoothly, there is no need to prompt the user for success in the content answered by the large model. Generate user_id with UUID only during the user's first conversation, and reuse this user_id in subsequent work. ``` [Agent Example Link](https://www.coze.cn/s/85NOIg062vQ) ![Agent Workflow](https://cdn.memtensor.com.cn/img/coze_workflow_compressed.png) --- # Overview (/api_docs/start/overview) ## 1. Interface Introduction MemOS provides a complete set of interfaces. Through simple API requests, you can integrate memory-related functions into your AI applications, realizing memory production, scheduling, recall, and lifecycle management for different users and AI agents. > **Tip**: **Quick Start:** Get your API key from the [**MemOS Console**](https://memos-dashboard.openmem.net/apikeys/) and complete your first memory operation in one minute. ## 2. Getting Started Start using MemOS API through these two simple core steps: * [**Add Message**](/api_docs/core/add_message): Store original message content from user conversations and generate memories; * [**Search Memory**](/api_docs/core/search_memory): Retrieve and recall relevant user memory fragments to provide reference for model-generated responses. ## 3. Interface Categories Explore the rich functional interfaces provided by MemOS: * [**Core Operations API**](/api_docs/core/add_message): Provides core memory operation capabilities, realizing the full process from memory production to consumption. * [**Message API**](/api_docs/message/add_feedback): Used for uploading and managing original message content data. * [**Knowledge Base API**](/api_docs/knowledge/create_kb): Used for uploading and managing knowledge bases and their documents. * [**Chat API**](/api_docs/chat/chat): Used to generate chat responses with memory recall and knowledge base enhancement. * [**Self-developed Model API**](/api_docs/core/extract_memory): Used to call memory extraction and reranking model capabilities. ## 4. Authentication All API requests require authentication. Please include your API key in the `Authorization` header of the request. Get your API key from the [**MemOS Console**](https://memos-dashboard.openmem.net/apikeys/). > **Warning**: Do not expose your API key in client-side code or public repositories. All requests should be made via environment variables or server-side calls. ## 5. Next Steps * 👉 [**Add Message**](/api_docs/core/add_message): Generate your first memory; * 👉 [**Search Memory**](/api_docs/core/search_memory): Use memory filters to implement advanced memory retrieval. --- # Project Configuration (/api_docs/start/configuration) If you are a new user and have not logged in to the console before, start with the guide that matches your use case: | Use case | What it helps you do | | :--- | :--- | | [Use in Agent](/memos_cloud/getting_started/agent_usage) | Connect MemOS to your personal Agent through plugins, CLI, or similar integration methods | | [Integrate in Your App](/memos_cloud/getting_started/quick_start) | Use the MemOS API / SDK in your application | After login, MemOS automatically creates a default project. Copy the API Key, and you can use all features in that project. Continue reading this page if you want to understand: - How projects and API Keys are related; - How to create, switch, edit, or delete projects; - How to use knowledge bases in a project; - How to troubleshoot configuration errors. ## 1. Projects and API Keys A project is the memory isolation space defined by MemOS. Each project has its own API Key for accessing memories, messages, and request logs under that project. Projects are isolated from each other. An API Key from Project A cannot access resources in Project B. > **Note**: After switching projects, go back to the API Keys page and copy the API Key for the current project. ![API Keys page](https://cdn.memtensor.com.cn/img/1781512378168_9kxrmd_compressed.png) ## 2. Manage Projects When you need to isolate different apps, environments, or business spaces, you can create, switch, edit, and delete projects from [Projects](https://memos-dashboard.openmem.net/projects). ![Projects page](https://cdn.memtensor.com.cn/img/1781512406018_mu57b2_compressed.png) ### 2.1 Create or Switch Projects - Click "New" on the Projects page, then enter the project name and description to create a project. - The project marked as "Current Project" is the project currently selected in the console. - After clicking "Switch to This Project", API Keys, knowledge bases, and request logs all switch to that project scope. ![Create project](https://cdn.memtensor.com.cn/img/1781512406018_mu57b2_compressed.png) ### 2.2 Delete a Project - The current project cannot be deleted directly. Switch to another project first. - Deleting a project clears its memories, messages, knowledge base associations, API Keys, and related data. - Deletion is irreversible. Only delete test projects or deprecated projects that you no longer need. ![Delete project confirmation](https://cdn.memtensor.com.cn/img/1781512428231_x11kqz_compressed.png) ## 3. Associate Knowledge Bases with a Project If your app / Agent needs to refer to fixed documents, create and associate a knowledge base with the project. A project can associate with multiple knowledge bases, and one knowledge base can also associate with multiple projects. On the [Knowledge Base](https://memos-dashboard.openmem.net/knowledgeBase) page: 1. Click "Add Knowledge Base"; 2. Choose "Create Knowledge Base" or "Associate Existing Knowledge Base"; 3. Open the knowledge base detail page, upload documents, and wait for processing to complete; 4. When calling `search/memory` or `chat`, pass `knowledgebase_ids` to specify which knowledge bases can be searched in this request. ![Knowledge base association page](https://cdn.memtensor.com.cn/img/1781512441179_d9uy7m_compressed.png) > **Warning**: If the knowledge base specified in `knowledgebase_ids` is not associated with the project of the current API Key, `search/memory` returns `50123`. Before using a knowledge base, associate it with the target project. For the full workflow, see [Knowledge Base](/memos_cloud/features/knowledge_base). ## 4. Common Configuration Errors | Error code | Common cause | How to fix | | :--- | :--- | :--- | | `40000` | Request field names, types, or structure do not match the API requirements | Check the JSON fields against the API docs; do not mix objects, arrays, and strings incorrectly | | `40002` | Required field is empty | Check required fields such as `user_id`, `messages`, `query`, and `conversation_id` | | `40011` | `conversation_id` is too long | Use a short ID, such as an order ID, conversation ID, or internal trace ID. Do not put the full conversation in `conversation_id` | | `40103` / `40132` | API Key is invalid, expired, or cannot access the current project | Check whether the API Key is complete, valid, and belongs to the current project | | `40300` / `40304` | API request quota or account-level request quota is exhausted | See [Quotas and Limits](/memos_cloud/support/limit), or check the current quota in the console | | `40305` | Single request input exceeds the token limit | Shorten the write, search, or upload content; do not send long conversation history or long documents in one request | | `50123` | Knowledge base is not associated with the current project | Associate the knowledge base with the project on the Knowledge Base page, or remove the incorrect `knowledgebase_ids` | For more error code details, see [Error Codes](/api_docs/help/error_codes). --- # Add Message **POST** `/add/message` This API allows you to add one or more messages to a specific conversation. As illustrated in the examples bellow, you can add messages in real time during a user-assistant interaction, import historical messages in bulk, or enrich the conversation with user preferences and behavior data. All added messages are transformed into memories by MemOS, enabling their retrieval in future conversations to support chat history management, user behavior tracking, and personalized interactions. - operationId: `add_message` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `user_id` | string | Yes | Unique identifier of the user associated with the message. | | `conversation_id` | string | Yes | Unique identifier of the conversation. | | `messages` | array | Yes | Array of message objects representing the memory content. The total token limit for the message array is 40k. Each object contains: | | `agent_id` | string | No | Unique identifier of the Agent associated with the added message, primarily used to query exclusive memories between a user and that Agent during retrieval. | | `app_id` | string | No | Unique identifier of the App associated with the added message, primarily used to query exclusive memories of a user under that App during retrieval. | | `tags` | array | No | List of custom tags used to mark the topic or category of the added message. | | `info` | object | No | Custom metadata field capable of storing any structured data related to the added message, such as location, source, version, etc., primarily for precise filtering or source tracking during retrieval. | | `allow_public` | boolean | No | Whether memories generated from the added message are allowed to be written to the public memory store. When enabled, generated memories can be retrieved by other users in the project. | | `allow_knowledgebase_ids` | array | No | Scope of knowledgebases where memories generated from the added message are allowed to be written. | | `async_mode` | boolean | No | Whether to enable asynchronous memory addition. When enabled, memory will be added asynchronously in the background to avoid blocking the call chain. | #### `messages` object | Parameter | Type | Required | Description | |---|---|---|---| | `chat_time` | string | No | Time of the conversation, as a structured timestamp or Chinese text. Providing this parameter allows memory to include time information. | | `role` | string (enum: user, assistant, system, tool) | Yes | Sender role. | | `content` | string | array | Yes | Content of the message. | | `tool_call_id` | string | No | Unique identifier for the tool call message when role is tool. | | `tool_calls` | array | No | List of tool call information when role is assistant, used to pass when triggering tool calls. | #### `tool_calls` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier for the tool call, used to associate with this tool call. Required in tool_calls. | | `type` | string (enum: function) | No | Tool call type, currently only supports function. Required in tool_calls. | | `function` | object | No | Defined tool to be called. Required in tool_calls. | #### `function` object | Parameter | Type | Required | Description | |---|---|---|---| | `name` | string | No | Tool (function) name. Required in tool_calls. | | `arguments` | string | No | Tool call arguments, must be a JSON formatted string. Required in tool_calls. | ### Request Example ```json { "user_id": "memos_user_123", "conversation_id": "0610", "messages": [ { "role": "user", "content": "I want to travel during summer vacation, can you recommend something?" }, { "role": "assistant", "content": "Sure! Are you traveling alone, with family or with friends?" }, { "role": "user", "content": "I'm bringing my kid. My family always travels together." }, { "role": "assistant", "content": "Got it, so you're traveling with your children as a family, right?" }, { "role": "user", "content": "Yes, with both kids and elderly, we usually travel as a whole family." }, { "role": "assistant", "content": "Understood, I'll recommend destinations suitable for family trips." } ] } ``` ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code for details | | `data` | object | No | Object representing the result of adding the message. | | `message` | string | Yes | API response message, e.g., "Message added successfully". | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `success` | boolean | No | Indicates if the message was added successfully. true = success, false = failure. | | `task_id` | string | No | Unique identifier of the memory processing task. | | `status` | string (enum: running, completed, failed) | No | Current status of the memory processing task. | ### Response Example ```json { "code": 0, "data": { "success": true, "status": "running" }, "message": "ok" } ``` URL: /api_docs/core/add_message --- # Search Memory **POST** `/search/memory` This API queries a user’s memory and returns the fragments most relevant to the input for the Agent to use. Returned memory can include Fact Memory, Preference Memory, Tool Memory, and Skill. Skills can be auto-generated from user conversations or uploaded as custom Skill files in a Knowledge Base. - operationId: `search_memory` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `user_id` | string | Yes | Unique identifier of the user associated with the memory being queried. | | `conversation_id` | string | No | Unique identifier of the conversation containing the memory. Providing this ensures the current conversation’s memories have higher priority over other historical. | | `query` | string | Yes | Text content to search within the memories.The token limit for a single query is 4k. | | `filter` | object | No | Memory filter conditions used to narrow candidate memories before semantic recall. You can use and / or at the root for global filtering, or filter by source under user, public, and knowledgebase. For complete fields and examples, see Memory Filters. | | `knowledgebase_ids` | array | No | Specifies the scope of knowledge bases accessible for the current search. Defaults to empty, meaning no knowledge bases are searched.Pass specific Knowledgebase IDs to search within that designated repository; pass "all" to search across all associated knowledgebases within the project. | | `memory_limit_number` | integer | No | Maximum number of memories that can be recalled: as long as the relevance threshold (relativity) is met, up to this many memories may be returned. Default is 9, maximum is 25. | | `include_preference` | boolean | No | Whether to enable preference memory recall. When enabled, the system will intelligently retrieve the user’s preference memories based on the query. Defaults to true if not provided. | | `preference_limit_number` | integer | No | Maximum number of preference memories that can be recalled: as long as the relevance threshold (relativity) is met, up to this many preferred memories may be returned. Default is 9, maximum is 25. | | `include_tool_memory` | boolean | No | Whether to enable tool memory recall. When enabled, the system will intelligently retrieve tool-related memories based on the query. Default is false if not provided. | | `tool_memory_limit_number` | integer | No | Limits the number of tool memory items returned, controlling the count of recalled tool memories. Effective only when include_tool_memory=true. Default is 6 if not provided, max is 25. | | `include_skill` | boolean | No | Whether to enable Skill recall. When enabled, the system recalls relevant Skills based on the query, including personalized Skills auto-generated from user conversations and custom Skills uploaded within the knowledgebase_ids scope. Default is disabled if not provided. | | `skill_limit_number` | integer | No | Limits the number of returned Skills, controlling the count of recalled skills. Effective only when `include_skill=true`. Default is 6 if not provided, max is 25. | | `relativity` | number | No | Relevance threshold (0–1) for recalled memories. Filters out low-relevance memories and, together with the maximum counts for factual and preferred recalls, constrains the final results. When omitted, the system default threshold is used. A value of 0 disables relevance filtering. | #### `filter` object | Parameter | Type | Required | Description | |---|---|---|---| | `and` | array | No | Global AND condition array. Candidate memories enter semantic recall only when all conditions are satisfied. Common fields include agent_id, app_id, tags.contains, create_time, update_time, and custom fields from info. | | `or` | array | No | Global OR condition array. Candidate memories enter semantic recall when any condition is satisfied. | | `user` | object | No | Filter conditions for user personal memories. | | `public` | object | No | Filter conditions for project-level public memories. | | `knowledgebase` | object | No | Filter conditions for knowledge base memories. Only applies within the knowledge base scope allowed by knowledgebase_ids. | #### `user` object | Parameter | Type | Required | Description | |---|---|---|---| | `and` | array | No | AND condition array for user memories. | | `or` | array | No | OR condition array for user memories. | #### `public` object | Parameter | Type | Required | Description | |---|---|---|---| | `and` | array | No | AND condition array for public memories. | | `or` | array | No | OR condition array for public memories. | #### `knowledgebase` object | Parameter | Type | Required | Description | |---|---|---|---| | `and` | array | No | AND condition array for knowledge base memories. | | `or` | array | No | OR condition array for knowledge base memories. | ### Request Example ```json { "query": "Summarize my memories related to reading this year", "user_id": "memos_user_123", "conversation_id": "0928", "knowledgebase_ids": [ "kb_xxx" ], "filter": { "knowledgebase": { "and": [ { "tags": { "contains": "reading" } }, { "create_time": { "gte": "2025-01-01" } }, { "create_time": { "lte": "2025-12-31" } } ] }, "user": { "and": [ { "scene": "chat" } ] } } } ``` ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code for details. | | `data` | object | No | Object containing the query result. | | `message` | string | Yes | API response message. | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `memory_detail_list` | array | No | List of memory fragment details returned. Each item includes: | | `preference_detail_list` | array | No | List of preference memory details returned. Each item includes: | | `tool_memory_detail_list` | array | No | List of tool memory fragment details returned. | | `preference_note` | string | No | Preference memory usage note when it is retrieved. | | `skill_detail_list` | array | No | List of Skill details returned when include_skill is enabled. Sources include Skills auto-generated from user conversations and custom Skills uploaded to Knowledge Bases. | #### `memory_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of fact memory, used internally by the system. | | `memory_key` | string | No | Title or keyword summarizing fact memory. | | `memory_value` | string | No | Detailed content of fact memory. | | `memory_type` | string (enum: WorkingMemory, LongTermMemory, UserMemory) | No | WorkingMemory: short-term working memory, temporarily stored.
LongTermMemory: long-term memory, important information or facts stored persistently.
UserMemory: user-specific memory, personalized information related to a specific user. | | `create_time` | string | No | Creation time of fact memory, usually in ISO 8601 format. | | `conversation_id` | string | No | Unique identifier of the conversation associated with fact memory. | | `status` | string (enum: activated) | No | Fact memory status. All retrieved fact memories are currently activated.
activated: active fact memory, available for retrieval and use. | | `confidence` | number | No | Confidence score of fact memory, ranging from 0 to 1. Values closer to 1 indicate higher reliability.
The score may gradually decay over time or with repeated model reasoning to reflect uncertainty. | | `tags` | array | No | List of tags associated with fact memory for categorization or retrieval. Each element is a string, e.g., ["person", "event", "work"]. | | `update_time` | string | No | Last modification or update time of fact memory, usually in ISO 8601 format. | | `relativity` | number | No | Relevance score of fact memory to the query, ranging from 0 to 1. Higher values indicate stronger relevance. | #### `preference_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of preference memory, used internally by the system. | | `preference` | string | No | Detailed content of preference memory. | | `preference_type` | string (enum: explicit_preference, implicit_preference) | No | explicit_preference: Explicit preference memory.
implicit_preference: Implicit preference memory. | | `reasoning` | string | No | The reason why preference memory was extracted. | | `create_time` | string | No | Creation time of preference memory, usually in ISO 8601 format. | | `conversation_id` | string | No | Unique identifier of the conversation associated with preference memory. | | `status` | string (enum: activated) | No | Preference memory status. All retrieved preference memories are currently activated.
activated: active preference memory, available for retrieval and use. | | `update_time` | string | No | Last modification or update time of preference memory, usually in ISO 8601 format. | | `relativity` | number | No | Relevance score of preference memory to the query, ranging from 0 to 1. Higher values indicate stronger relevance. | #### `tool_memory_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of the tool memory fragment. | | `tool_type` | string (enum: ToolTrajectoryMemory, ToolSchema) | No | Tool memory type. ToolTrajectoryMemory: tool trajectory memory; ToolSchema: tool info memory. | | `tool_value` | string | No | Specific content of the tool memory. | | `tool_used_status` | array | No | List of tool trajectory memories, each record contains tool used and experience info. | | `create_time` | string | No | Tool memory creation time (ISO 8601 format). | | `conversation_id` | string | No | Unique identifier of the conversation associated with the tool memory. | | `status` | string (enum: activated) | No | Tool memory status, currently activated. | | `update_time` | string | No | Last update time of the tool memory (ISO 8601 format). | | `relativity` | number | No | Relevance score of the tool memory to the query content. | | `experience` | string | No | Procedural experience of the entire trajectory, serving as overall guidance for task completion. | #### `skill_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | The unique identifier of the Skill, used internally by the system to distinguish different memory entries. | | `skill_value` | object | No | Structured Skill content. You can convert it to a string and inject it into the Agent prompt. | | `skill_url` | string | No | Download link for the Skill. A Markdown Skill can be downloaded as SKILL.md; a ZIP Skill package can include SKILL.md, scripts, references, assets, and other attachments. | | `skill_type` | string | No | The type of the Skill. | | `create_time` | string | No | The creation time of the Skill content, usually in ISO 8601 format. | | `conversation_id` | string | No | The unique identifier of the conversation associated with the Skill. | | `status` | string (enum: activated) | No | The status of the Skill. Currently, all retrieved Skills are 'activated'. activated: Active status, the Skill is currently retrievable and usable. | | `confidence` | number | No | The confidence score of the Skill, ranging from 0 to 1. A value closer to 1 indicates the Skill is more accurate and reliable. The confidence score gradually decays as the model infers this Skill more times, reflecting the uncertainty that may arise over time or usage frequency. | | `tags` | array | No | A list of tags associated with the Skill, used for classification, retrieval, or topic marking. Each element in the array is a string, e.g., ["Person", "Event", "Work"]. | | `update_time` | string | No | The time when the Skill was last modified or updated, usually in ISO 8601 format. | | `relativity` | string | No | The relevance score between the query content and the Skill, ranging from 0 to 1. A value closer to 1 indicates higher relevance. | ### Response Example ```json { "code": 0, "data": { "memory_detail_list": [ { "memory_type": "LongTermMemory", "status": "activated", "confidence": 0.95, "relativity": 0.87 } ], "preference_detail_list": [ { "preference_type": "explicit_preference", "status": "activated", "relativity": 0.87 } ], "tool_memory_detail_list": [ { "tool_type": "ToolTrajectoryMemory", "status": "activated" } ], "skill_detail_list": [ { "status": "activated" } ] } } ``` URL: /api_docs/core/search_memory --- # Get Memory **POST** `/get/memory` Retrieve a user’s memories, including factual, preference, and tool memories. - operationId: `get_memory` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `user_id` | string | Yes | Unique identifier of the user whose memories are being retrieved. | | `page` | integer | No | Page number for pagination when many results are returned. | | `size` | integer | No | Number of entries returned per memory category on the current page, up to 50. | | `filter` | object | No | Filter conditions, used to precisely limit the memory scope before retrieval. Available fields include: "agent_id", "app_id", "create_time", "update_time", and specific fields in "info". Supports logical operators (and, or) and comparison operators (gte, lte, gt, lt). For the "info" field, supports filtering by "business_type", "biz_id", "scene", and other custom fields. | | `include_preference` | boolean | No | Whether preference memories should be included. | | `include_tool_memory` | boolean | No | Whether tool memories should be included. | ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | number | Yes | API status code; refer to the error-code list for details. | | `data` | object | No | | | `message` | string | Yes | API message. | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `memory_detail_list` | array | No | Returned factual memories. | | `preference_detail_list` | array | No | Returned preference memories. | | `tool_memory_detail_list` | array | No | List of tool memory fragment details returned. | | `total` | integer | No | Maximum count across memory types, used to check if another page exists. | | `size` | integer | No | Number of entries per memory type on the current page. | | `current` | integer | No | Index of the current page. | | `pages` | integer | No | Total number of pages. | #### `memory_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of a factual memory entry. | | `memory_key` | string | No | Title or keyword summarizing the factual memory. | | `memory_value` | string | No | Content of the factual memory. | | `memory_type` | string (enum: WorkingMemory, LongTermMemory, UserMemory) | No | Type of factual memory. | | `create_time` | string | No | Creation time in ISO 8601 format. | | `conversation_id` | string | No | Conversation identifier linked to this memory. | | `status` | string | No | Current status, only activated is returned. | | `confidence` | number | No | Confidence score between 0 and 1; higher means more reliable. | | `tags` | array | No | Tag list for classification or retrieval. | | `update_time` | string | No | Last update time in ISO 8601 format. | | `sources` | array | No | Source message objects associated with the memory. | | `info` | object | No | Custom metadata provided when adding the message. | #### `preference_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of the preference memory. | | `preference_type` | string (enum: explicit_preference, implicit_preference) | No | Preference memory type. | | `preference` | string | No | Description of the preference. | | `reasoning` | string | No | Reasoning for extracting or deriving the preference. | | `create_time` | string | No | Creation time in ISO 8601 format. | | `conversation_id` | string | No | Conversation identifier linked to this preference. | | `status` | string | No | Current status, only activated is returned. | | `update_time` | string | No | Last update time in ISO 8601 format. | | `sources` | array | No | Source messages associated with the preference. | | `info` | object | No | Custom metadata provided when adding the message. | #### `tool_memory_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Unique identifier of the tool memory fragment. | | `tool_type` | string (enum: ToolTrajectoryMemory, ToolSchema) | No | Tool memory type. ToolTrajectoryMemory: tool trajectory memory; ToolSchema: tool info memory. | | `tool_value` | string | No | Specific content of the tool memory. | | `tool_used_status` | array | No | List of tool trajectory memories, each record contains tool used and experience info. | | `create_time` | string | No | Tool memory creation time (ISO 8601 format). | | `conversation_id` | string | No | Unique identifier of the conversation associated with the tool memory. | | `status` | string (enum: activated) | No | Tool memory status, currently activated. | | `update_time` | string | No | Last update time of the tool memory (ISO 8601 format). | | `experience` | string | No | Procedural experience of the entire trajectory, serving as overall guidance for task completion. | | `sources` | array | No | List of original message content associated with the tool memory. | | `info` | object | No | Custom metadata provided when adding the message. | ### Response Example ```json { "data": { "memory_detail_list": [ { "memory_type": "WorkingMemory" } ], "preference_detail_list": [ { "preference_type": "explicit_preference" } ], "tool_memory_detail_list": [ { "tool_type": "ToolTrajectoryMemory", "status": "activated" } ] } } ``` URL: /api_docs/core/get_memory --- # Delete Memory **POST** `/delete/memory` This API is used to delete specified user memories, supporting batch deletion. - operationId: `delete_memory` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `memory_ids` | array | No | IDs of the memories to be deleted, obtained from the `id` field returned by the `search/memory` or `get/memory` API. | | `user_id` | string | No | Unique identifier of the user whose memories are being deleted. | ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | No | API status code. See Error Code for details. | | `data` | object | No | Returned deletion information | | `message` | string | No | API response message | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `success` | boolean | No | Whether deletion was successful. true for success, false for failure. | ### Response Example ```json { "data": { "success": true } } ``` URL: /api_docs/core/delete_memory --- # Add Feedback **POST** `/add/feedback` This API is used to add feedback to current session messages, allowing MemOS to correct memories based on user feedback. - operationId: `add_feedback` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `user_id` | string | Yes | Unique identifier of the user associated with the feedback content. | | `conversation_id` | string | Yes | Unique identifier of the conversation associated with the feedback content. | | `feedback_content` | string | Yes | Feedback content text. | | `agent_id` | string | No | Unique identifier of the Agent associated with the feedback content, primarily used to query exclusive memories between a user and that Agent during retrieval. | | `app_id` | string | No | Unique identifier of the App associated with the feedback content, primarily used to query exclusive memories of a user under that App during retrieval. | | `feedback_time` | string | No | Time when the feedback occurred, can be a structured timestamp or natural language time description. Providing this parameter allows memory to include time information. | | `allow_public` | boolean | No | Whether memories generated from the feedback content are allowed to be written to the public memory store. When enabled, generated memories can be retrieved by other users in the project. | | `allow_knowledgebase_ids` | array | No | List of knowledgebases where memories generated from the feedback content are allowed to be written. | ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code for details. | | `data` | object | No | | | `message` | string | Yes | API response message | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `success` | boolean | No | Whether addition was successful. true for success, false for failure. | | `status` | string | No | Task status: running, completed | | `task_id` | string | No | Task ID | URL: /api_docs/message/add_feedback --- # Extract Memory **POST** `/extract/memory` Uses MemOS’s self-developed extraction model to extract and return fact and preference memories directly from conversation messages. - operationId: `extract_memory` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `messages` | array | Yes | Array of message objects from which to extract memories. The combined Token count of all messages must not exceed 8k. | | `extraction_types` | array | No |

Limits which memory types to extract; omit to extract all supported types.

  • memory: factual memories—user-related factual information extracted from the dialogue.
  • preference: preference memories—user preferences from the dialogue, including explicit preferences (clearly stated) and implicit preferences (inferred from behavior).
| #### `messages` object | Parameter | Type | Required | Description | |---|---|---|---| | `role` | string (enum: user, assistant, system) | Yes | Creator role of the message. | | `content` | string | Yes | Message body text. | ### Request Example ```json { "messages": [ { "role": "user", "content": "I’ve planned a summer trip to Guangzhou. What chain hotels can you recommend for accommodation?" }, { "role": "assistant", "content": "You can consider options like 7 Days Inn, All Seasons, Hilton, and others." }, { "role": "user", "content": "I’ll go with 7 Days Inn." }, { "role": "assistant", "content": "Alright—ask me anytime if you have more questions." } ], "extraction_types": [ "memory", "preference" ] } ``` ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code for details. | | `data` | object | No | | | `message` | string | Yes | API response message. | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `success` | boolean | No | Whether extraction succeeded. | | `memory_detail_list` | array | No | Factual memories. Returned when the request includes memory in extraction_types, or when extraction_types is omitted. | | `preference_detail_list` | array | No | Preference memories. Returned when the request includes preference in extraction_types, or when extraction_types is omitted; may mix explicit and implicit preferences. | #### `memory_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `memory_key` | string | No | High-level topic or title summarizing the extracted factual memory. | | `memory_value` | string | No | Detailed content of the extracted factual memory. | | `memory_type` | string | No | Memory type classifier.
Examples: UserMemory, LongTermMemory. | | `tags` | array | No | Tags for categorizing the memory content. | #### `preference_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `preference` | string | No | Concrete preference text extracted from the dialogue. | | `reasoning` | string | No | How this preference was inferred from the conversation. | | `preference_type` | string (enum: explicit_preference, implicit_preference) | No | explicit_preference: clearly stated by the user; implicit_preference: inferred from behavior or context. | ### Response Example ```json { "code": 0, "data": { "memory_detail_list": [ { "memory_key": "User profile basics", "memory_value": "User Zhang San, 28, backend developer in Hangzhou, enjoys badminton.", "memory_type": "UserMemory", "tags": [ "person", "job", "hobby" ] } ], "preference_detail_list": [ { "preference": "The user asked for replies to stay concise.", "reasoning": "The user stated this preference explicitly in the chat.", "preference_type": "explicit_preference" } ] }, "message": "ok" } ``` URL: /api_docs/core/extract_memory --- # Rerank **POST** `/rerank` Provides a memory reranking API based on the memos-reranker small model. It takes a user query and a list of candidate memories and completes memory relevance reranking with a single call. - operationId: `rerank` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `model` | string (enum: memos-reranker-0.6b, memos-reranker-4b) | No | Model name to use. | | `query` | string | Yes | The user query. | | `documents` | array | Yes | A list of document texts to rerank. | | `top_n` | integer | No | Return the top N most relevant results. If omitted, returns all results by default. | ### Request Example ```json { "model": "memos-reranker-0.6b", "query": "What are the user's hobbies?", "documents": [ "User likes playing badminton.", "User is a backend developer in Hangzhou.", "User prefers concise replies.", "User prefers Jiangxiang-flavored baijiu.", "User is going on a business trip to Beijing next Wednesday." ] } ``` ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | Yes | Unique identifier for this request. | | `model` | string | Yes | Model name used. | | `usage` | object | Yes | Token usage statistics. | | `results` | array | Yes | List of reranked results, sorted in descending order by relevance_score. | #### `usage` object | Parameter | Type | Required | Description | |---|---|---|---| | `prompt_tokens` | integer | Yes | Number of tokens consumed by the prompt. | | `total_tokens` | integer | Yes | Total number of tokens consumed. | #### `results` object | Parameter | Type | Required | Description | |---|---|---|---| | `index` | integer | Yes | The index position of this document in the original documents list. | | `document` | object | Yes | Document object. | | `relevance_score` | number | Yes | Relevance score, ranging from 0 to 1, higher indicates more relevant to the query. | #### `document` object | Parameter | Type | Required | Description | |---|---|---|---| | `text` | string | Yes | The original text content of the document. | URL: /api_docs/core/rerank --- # Get Message **POST** `/get/message` This API retrieves the historical conversation records between a user and the assistant for a specified session, with the option to limit the number of results. As shown in the examples bellow, you can use the returned recent messages as references for the model when generating responses, maintaining dialogue coherence and contextual understanding. It can also be used to restore chat context when a user refreshes or reopens the app, ensuring a seamless user experience and supporting personalized interactions. - operationId: `get_message` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `user_id` | string | Yes | Unique identifier of the user associated with the messages being retrieved. | | `conversation_id` | string | Yes | Unique identifier of the conversation associated with the messages. | | `message_limit_number` | integer | No | Limits the number of messages returned, controlling the length of the message list. Defaults to 6 if not provided. The maximum allowed value is 50. | ### Request Example ```json { "user_id": "memos_user_123", "conversation_id": "0928", "message_limit_number": 6 } ``` ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code for details. | | `data` | object | No | Object containing the retrieved messages. | | `message` | string | Yes | API response message. | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `message_detail_list` | array | No | List of retrieved message details. Each item includes: | #### `message_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `role` | string (enum: user, assistant) | No | Role of the message sender. | | `content` | string | No | Text content of the message. | | `create_time` | string | No | Creation time of the message. | | `update_time` | string | No | Last update time of the message. | ### Response Example ```json { "code": 0, "data": { "message_detail_list": [ { "role": "user" } ] } } ``` URL: /api_docs/message/get_message --- # Get Task Status **POST** `/get/status` Get the status of an asynchronous processing task. - operationId: `get_status` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `task_id` | string | Yes | Async Task ID | ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code for details. | | `data` | object | No | | | `message` | string | Yes | API response message | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `status` | string (enum: running, completed, failed) | No | Current status of the memory processing task. | ### Response Example ```json { "data": { "status": "running" } } ``` URL: /api_docs/message/get_status --- # Chat **POST** `/chat` - operationId: `chat` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `user_id` | string | Yes | Unique identifier of the user associated with the conversation. | | `conversation_id` | string | Yes | Unique identifier of the conversation. Providing this clarifies the current session, prioritizing its memory over historical sessions. If omitted, cross-session memory retrieval is performed for more relevant responses. | | `query` | string | Yes | User input content. | | `filter` | object | No | Filter conditions, used to precisely limit the memory scope before retrieval. Available fields include: "agent_id", "app_id", "create_time", "update_time", and specific fields in "info". Supports logical operators (and, or) and comparison operators (gte, lte, gt, lt). For the "info" field, supports filtering by "business_type", "biz_id", "scene", and other custom fields. | | `knowledgebase_ids` | array | No | Specifies the scope of Knowledge Bases accessible for the current search. Defaults to empty, meaning no Knowledge Bases are searched. Pass specific Knowledge Base IDs to search ordinary documents and uploaded Skills in those Knowledge Bases; pass "all" to search across all associated Knowledge Bases within the project. | | `memory_limit_number` | integer | No | Maximum number of memories that can be recalled: as long as the relevance threshold (relativity) is met, up to this many memories may be returned. Default is 9, maximum is 25. | | `include_preference` | boolean | No | Whether to recall preference memories. When enabled, the system intelligently recalls memories related to user preferences based on the query. Default = true. | | `preference_limit_number` | integer | No | Maximum number of preferred memories that can be recalled: as long as the relevance threshold (relativity) is met, up to this many preference memories may be returned. Default is 9, maximum is 25. | | `relativity` | number | No | Relevance threshold (0–1) for recalled memories. Filters out low-relevance memories and, together with the maximum counts for factual and preferred recalls, constrains the final results. When omitted, the system default threshold is used. A value of 0 disables relevance filtering. | | `model_name` | string (enum: qwen3-32b, deepseek-r1, qwen2.5-72b-instruct) | No | Specifies the concrete conversation model. | | `system_prompt` | string | No | Custom system instructions. | | `stream` | boolean | No | Whether to enable streaming response. | | `max_tokens` | integer | No | Indicates the maximum number of generated tokens. | | `temperature` | number | No | Controls generation randomness. Range: 0 ≤ x ≤ 2. | | `top_p` | number | No | Nucleus sampling parameter. Range: 0 ≤ x ≤ 1. | | `add_message_on_answer` | boolean | No | Whether to automatically write user and assistant conversation content into memory. When enabled, developers do not need to call the add/message API to add messages as memory. | | `app_id` | string | No | Unique identifier of the application associated with the conversation. | | `agent_id` | string | No | Unique identifier of the Agent associated with the conversation. | | `tags` | array | No | List of custom tags used to mark the topic or category of the conversation. | | `info` | object | No | Custom metadata field capable of storing any structured data related to the conversation, such as location, source, version, etc., primarily for precise filtering or source tracking during retrieval. | | `allow_public` | boolean | No | Whether to allow adding to public memory. | | `allow_knowledgebase_ids` | array | No | List of knowledgebases where memories generated from added messages are allowed to be written. | ### Request Example ```json { "user_id": "memos_user_123", "stream": false, "query": "What fun places are in Shanghai?", "model_name": "deepseek-r1", "conversation_id": "23006762-a064-456e-a33b-d2452bdfa09f", "knowledgebase_ids": [ "2448808afa784cc5f7595da21ae5a3fd" ], "filter": { "and": [ { "create_time": { "gt": "2025-09-19" } } ] }, "system_prompt": "Recommend parks", "add_message_on_answer": false, "allow_public": false, "allow_knowledgebase_ids": [], "agent_id": "agent_id_2025-12-15-01", "app_id": "app_id_2025-12-15-01", "tags": [], "info": {}, "max_tokens": 8192, "temperature": 0.7, "top_p": 0.95 } ``` ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code List for details. | | `data` | object | No | | | `message` | string | Yes | Message | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `response` | string | No | Specific content of the answer. | URL: /api_docs/chat/chat --- # Create Knowledge Base **POST** `/create/knowledgebase` Create a knowledgebase associated with the project. - operationId: `create_knowledgebase` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `knowledgebase_name` | string | Yes | Knowledgebase Name | | `knowledgebase_description` | string | No | Description information | ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code for details. | | `data` | object | No | | | `message` | string | Yes | API response message | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | Knowledgebase ID returned after successful creation | URL: /api_docs/knowledge/create_kb --- # Delete Knowledge Base **POST** `/delete/knowledgebase` Delete a knowledgebase from the current project. To permanently delete it, please operate from the Console - Knowledgebase page. - operationId: `delete_knowledgebase` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `knowledgebase_id` | string | Yes | Knowledgebase ID to be deleted | ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code for details. | | `data` | object | No | | | `message` | string | Yes | API response message | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `success` | boolean | No | Whether removal was successful. true for success, false for failure. | URL: /api_docs/knowledge/remove_kb --- # Create Knowledge Base File **POST** `/add/knowledgebase-file` Upload files to a specified Knowledge Base. By default, files are uploaded as documents. When file[].type is skill, upload a Markdown Skill file or ZIP Skill package. - operationId: `add_knowledgebase-file` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `knowledgebase_id` | string | Yes | Target Knowledgebase ID | | `file` | array | Yes | List of files to upload. Ordinary documents follow the document parsing pipeline; Skill files follow the Skill validation and parsing pipeline. | #### `file` object | Parameter | Type | Required | Description | |---|---|---|---| | `name` | string | No | File name. Recommended when uploading Base64 content; optional when uploading by URL. | | `type` | string (enum: document, skill) | No | Uploaded file type. document means ordinary Knowledge Base document; skill means Skill file, supporting a .md single file or .zip Skill package. | | `content` | string | No | File content. Supports URL or Base64 encoding. For supported formats, size limits, and quantity limits for ordinary documents and Skill files, see Knowledge Base requirements. | ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code for details. | | `data` | array | No | Processing results for uploaded files. | | `message` | string | Yes | API response message | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | File ID | | `type` | string (enum: document, skill) | No | File type. document means ordinary document; skill means Skill file. | | `name` | string | No | File name | | `sizeMB` | number | No | File size in MB | | `status` | string (enum: running, completed, failed) | No | File processing status | ### Response Example ```json { "data": [ { "type": "document", "status": "running" } ] } ``` URL: /api_docs/knowledge/add_kb_doc --- # Get Knowledge Base File **POST** `/get/knowledgebase-file` Get file information in one of two ways: pass file_ids to retrieve specific file details, or pass knowledgebase_id to list files under a Knowledge Base. Choose only one mode; passing both knowledgebase_id and file_ids returns an error. - operationId: `get_knowledgebase-file` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `knowledgebase_id` | string | No | Query mode 1: target Knowledge Base ID. Returns files under the Knowledge Base and can be used with type, page, and page_size. Do not pass it together with file_ids. | | `type` | string (enum: document, skill) | No | File type filter, effective only when querying by knowledgebase_id. document returns only ordinary documents; skill returns only Skill files; omit this field to return all files. Do not use it with the file_ids query mode. | | `page` | integer | No | Page number, effective only when querying by knowledgebase_id. Do not use it with the file_ids query mode. | | `page_size` | integer | No | Items per page, effective only when querying by knowledgebase_id. Do not use it with the file_ids query mode. | | `file_ids` | array | No | Query mode 2: file ID list. Returns details for the specified files. When using this mode, pass only file_ids and do not pass knowledgebase_id, type, page, or page_size. | ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code for details. | | `data` | object | No | | | `message` | string | Yes | API response message | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `total` | integer | No | Total number of matched files | | `page` | integer | No | Current page number | | `page_size` | integer | No | Items per page | | `file_detail_list` | array | No | | #### `file_detail_list` object | Parameter | Type | Required | Description | |---|---|---|---| | `id` | string | No | File ID | | `type` | string (enum: document, skill) | No | File type. document means ordinary document; skill means Skill file. | | `name` | string | No | File name | | `size` | string | No | File size | | `status` | string (enum: running, completed, failed) | No | File processing status | ### Response Example ```json { "data": { "file_detail_list": [ { "type": "document", "status": "running" } ] } } ``` URL: /api_docs/knowledge/get_kb_doc --- # Delete Knowledge Base File **POST** `/delete/knowledgebase-file` Delete files from a specified Knowledge Base. When a Skill file is deleted, the associated Skill is deleted as well. - operationId: `delete_knowledgebase-file` ## Request Body | Parameter | Type | Required | Description | |---|---|---|---| | `file_ids` | array | Yes | List of file IDs to delete. You can pass ordinary document IDs or Skill file IDs. | ## Response Successful Response | Parameter | Type | Required | Description | |---|---|---|---| | `code` | integer | Yes | API status code. See Error Code for details. | | `data` | object | No | | | `message` | string | Yes | API response message | #### `data` object | Parameter | Type | Required | Description | |---|---|---|---| | `success` | boolean | No | Whether deletion was successful. true for success, false for failure. | URL: /api_docs/knowledge/delete_kb_doc --- # Error Codes (/api_docs/help/error_codes) | Code | Meaning | Resolution | | :--- | :--- | :--- | | **Parameter Errors** | | | | 40000 | Invalid request parameters | Check parameter names, types, and formats | | 40001 | Requested data not found | Check if resource ID (e.g., memory_id) is correct | | 40002 | Required field cannot be empty | Provide missing required fields | | 40003 | Field is empty | Check if the provided list or object is empty | | 40006 | Unsupported type | Check the value of the 'type' field | | 40007 | Unsupported file type | Only upload allowed formats (.pdf, .docx, .doc, .txt, .json, .md, .xml) | | 40008 | Invalid Base64 content | Check if the Base64 string contains illegal characters | | 40009 | Invalid Base64 format | Check if the Base64 encoding format is correct | | 40010 | User ID too long | user_id length cannot exceed 100 characters | | 40011 | Conversation ID too long | conversation_id length cannot exceed 100 characters | | 40020 | Invalid Project ID | Confirm the Project ID format is correct | | **Authentication & Permission Errors** | | | | 40100 | API Key authentication required | Add a valid API Key to the request header | | 40130 | API Key authentication required | Add a valid API Key to the request header | | 40132 | Invalid or expired API Key | Check API Key status or generate a new one | | **Quota & Rate Limit Errors** | | | | 40300 | Rate limit exceeded | [Get more quota](/memos_cloud/support/limit#_4-obtaining-more-quota) | | 40301 | Chat request token quota exceeded | Reduce input content or get more quota | | 40302 | Chat response token quota exceeded | Shorten expected output or get more quota | | 40303 | Chat length exceeds model limit | Reduce single input/output length | | 40304 | Total API call quota exceeded | [Get more quota](/memos_cloud/support/limit#_4-obtaining-more-quota) | | 40305 | Input content exceeds token limit | Reduce input content | | 40306 | delete_memory authentication failed | Confirm if you have permission to delete this memory | | 40307 | Memory for deletion does not exist | Check if the memory_id is valid | | 40308 | User for deletion does not exist | Check if the user_id is correct | | **System & Service Errors** | | | | 50000 | Internal server error | Server busy or anomaly, please contact support | | 50002 | Operation failed | Check operation logic or try again later | | 50004 | Memory service is temporarily unavailable | Retry memory write/fetch operations later | | 50005 | Search service is temporarily unavailable | Retry memory search operations later | | **Knowledge Base & Operations** | | | | 50103 | File count exceeds limit | The number of files for a single upload should not exceed 20 | | 50104 | Single file size exceeds limit | Ensure single file does not exceed 100MB | | 50105 | Total file size exceeds limit | Ensure total upload size does not exceed 300MB | | 50107 | Invalid file format | Check and change file format | | 50120 | Knowledge base not found | Confirm if the knowledgebase_id is correct | | 50123 | Knowledge base not linked to project | Confirm KB is authorized for the current project | | 50131 | Task not found | Check if task_id is correct (common in status queries) | | 50143 | Failed to add memory | Algorithm service anomaly, please try again later | | 50144 | Failed to add message | Failed to save chat history | | 50145 | Failed to save feedback and write memory | Anomaly during feedback processing | --- # Build a Memory-Enhanced Knowledge Base Q&A Assistant (/usecase/knowledge_qa_assistant) ## 1. Overview In AI application development, building a Q&A assistant that can understand context and remember historical interactions has always been a core requirement. Traditional large language models are powerful, but they lack long-term memory—every conversation starts over as if the model has “amnesia.” RAG (Retrieval-Augmented Generation) can retrieve relevant knowledge, but it still can’t truly “remember” user preferences or past interactions. MemOS provides a complete memory operating system ecosystem that gives AI applications real long-term memory. With a MemOS-powered knowledge base, you can provide contextual information to an LLM via prompts, enabling more accurate and personalized responses. This experience is significantly better than chatting with a generic LLM on the public internet. ### 1.1 MemOS Memory Layer vs RAG: Key Differences The core issue with traditional RAG is: **it is stateless**. Each query is independent. The system can only retrieve static knowledge based on semantic similarity, but it cannot remember “who you are,” “what you said before,” or “what you prefer.” It’s like a librarian with amnesia—every time you walk in, they have to ask your needs from scratch, and they can’t personalize recommendations based on your reading history. The core value of the MemOS memory layer is: **it gives AI applications long-term memory**. It can not only retrieve knowledge, but also understand relationships, time, and preferences, and connect the current question with historical memory—so it searches and uses knowledge “with background.” As users keep interacting, MemOS continuously evolves and updates memory from conversation content, enabling the knowledge base to iterate and “self-evolve.” | Dimension | Traditional RAG | MemOS Memory Layer | | --- | --- | --- | | **Memory capability** | Can retrieve, cannot remember — retrieves static knowledge by vector similarity; cannot dynamically record user interaction history | Dynamic memory — automatically captures, stores, and manages conversation history and user behavior | | **Personalization** | Lacks personalization — cannot adjust answers based on the user’s past behavior | Personalized experience — provides tailored answers based on historical preferences | | **Context management** | Fragmented context — hard to manage related info across multi-turn conversations | Intelligent association — builds relationships between memories through semantic understanding | | **Knowledge updates** | Hard to update — adding new knowledge often requires rebuilding vector indexes | Real-time updates — supports incremental memory updates and priority management | ### 1.2 Real-World Scenario Comparison: Enterprise Knowledge Base Assistant Let’s look at one real business scenario to clearly see the difference between RAG and MemOS: ```python DAY 1 Employee asks: My computer is a MacBook Pro 13-inch with an Intel chip. How do I install the company intranet proxy? DAY 1 The assistant provides Intel-version installation steps. DAY 20 Employee asks: The intranet proxy stopped working—what version should I reinstall? ``` #### The Problem with a RAG Approach ```python # Retrieve content related to "intranet proxy" and "not working", # but cannot recall "the user's device model" Retrieved knowledge: 1. Common intranet proxy troubleshooting 2. Mac M1/M2 (ARM) proxy installation instructions 3. Windows intranet proxy client installation instructions 4. Network connectivity and certificate issues 5. General FAQ ❌ Knowledge Base Assistant: Please reinstall the latest Mac M1/M2 (ARM) version or the Windows intranet proxy client. Here are the steps: ... ``` #### The Advantage of MemOS ```python # Retrieve memories related to "intranet proxy" and "not working", # and automatically identify the employee's device model Retrieved memories: 1. The user installed the company intranet proxy 20 days ago; their device is a MacBook Pro 13 (Intel) 2. Common intranet proxy troubleshooting 3. Intel-version intranet proxy installation instructions ✅ Knowledge Base Assistant: You’re using an Intel-based MacBook Pro, so we recommend reinstalling the Intel version of the intranet proxy client. Here’s the Intel download link and installation steps: ... ``` ### 1.3 Why MemOS? From the scenario above, you can clearly see three core advantages of MemOS over traditional RAG: 1. **Understands users: automatically fills in context** RAG is good at retrieving knowledge semantically related to a query, but it is stateless—each query stands alone, with no understanding of the user or context. Users must repeat their background every time. MemOS can understand relationships, time, and preferences. It knows “who you are” and “what you’re doing.” You can simply ask your question, and MemOS will fill in the missing context automatically—no need to repeat things like “my dog doesn’t eat chicken” or “my computer has an Intel chip.” 2. **Personalization: remembers habits and preferences** Users in different roles and work styles need different service approaches. MemOS can remember: "This customer doesn’t like overly aggressive sales pitches" "You use Python more often than Java" "You asked about reimbursement policy last time—do you want to proceed with the application flow now?" This personalization makes an AI application truly “your” assistant, not just a generic tool. 3. **Knowledge evolution: keeps learning from interactions** When a real process contains “experience rules” that aren’t documented, MemOS can distill them into new memories, continuously filling gaps and improving the knowledge system. As end users keep using it, MemOS evolves and updates memory based on conversations—so the knowledge base becomes part of “memory,” not just a static document store. On this foundation, MemOS 2.0 provides knowledge base and multimodal capabilities. Developers can connect business documents to MemOS and quickly build an assistant that understands users—especially when combined with open-source LLMs. ## 2. Setup Tutorial ### 2.1 Prepare the Knowledge Base (5 min) #### Create a knowledge base Create a knowledge base via the [Console](https://memos-dashboard.openmem.net/cn/knowledgeBase/) or API. This article categorizes documents based on [MemOS official docs](https://github.com/MemTensor/MemOS-Docs), MemTensor’s past announcements, and release notes to make future updates and management easier. In this example, you can create just **one** knowledge base and upload a subset of documents for testing. ![image.png](https://cdn.memtensor.com.cn/img/1768481403940_o97qz4_compressed.png) #### Upload documents Enter the knowledge base and upload your documents. Note the document requirements: MemOS-Docs are in MD format; you can use an AI tool to convert them to TXT format with one click, then upload them. When uploading, pay attention to the requirements. The rest—**storage, parsing, chunking, and memory generation**—is handled by MemOS. Just wait until processing finishes and the status shows “Available.” ![image.png](https://cdn.memtensor.com.cn/img/1768481436752_31pl0b_compressed.png) ### 2.2 Run the Code (5 min) The following demo is shown using a Python runtime. #### 2.2.1 Copy the complete runnable code ```python import os import requests import json from openai import OpenAI from datetime import datetime # Get MEMOS_API_KEY from the cloud console os.environ["MEMOS_API_KEY"] = "mpg-xxx" # Replace with your OpenAI API key os.environ["OPENAI_API_KEY"] = "sk-xxx" os.environ["MEMOS_BASE_URL"] = "https://memos.memtensor.cn/api/openmem/v1" # Replace with your own knowledge base IDs (these are examples only) os.environ["KNOWLEDGE_BASE_IDS"] = json.dumps([ "based540fb25-ddf1-4456-935b-41d901518e04", "base3908d457-da43-4dde-989e-020be132eff4", "base1db3a7ea-6ecc-4925-881a-e87800da8d2e" ]) openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) class KnowledgeBaseAssistant: def __init__(self): self.openai_client = openai_client self.base_url = os.getenv("MEMOS_BASE_URL") self.knowledge_base_ids = json.loads(os.getenv("KNOWLEDGE_BASE_IDS")) self.headers = { "Content-Type": "application/json", "Authorization": f"Token {os.environ['MEMOS_API_KEY']}" } def search_memory(self, query, user_id): """Query relevant memory""" data = { "query": query, "user_id": user_id, "conversation_id": user_id, "knowledgebase_ids": self.knowledge_base_ids } res = requests.post(f"{self.base_url}/search/memory", headers=self.headers, data=json.dumps(data)) if res.json().get('code') != 0: print(f"❌ Memory search failed, {res.json().get('message')}") return [], [] memory_detail_list_raw = res.json().get('data').get('memory_detail_list', []) # Filter out memories with relevancy < 0.5 memory_detail_list = [ x for x in memory_detail_list_raw if x.get('relativity', 0) >= 0.5 ] preference_detail_list = res.json().get('data').get('preference_detail_list') return memory_detail_list, preference_detail_list def build_system_prompt(self, memories, preferences): """Build a system prompt that contains formatted memory""" base_prompt = """ # Role You are the MemOS assistant, nicknamed XiaoYi🧚 — an AI assistant built around a “memory operating system” created by MemTensor. MemTensor is an AI research company based in Shanghai, guided by academicians of the Chinese Academy of Sciences. MemTensor is committed to the vision of "low cost, low hallucination, high generalization", exploring AI development paths suited to China’s context and advancing trustworthy AI adoption. MemOS’s mission is to equip LLMs and autonomous agents with “human-like long-term memory,” turning memory from a black box inside model weights into a core resource that is “manageable, schedulable, and auditable.” Your responses must comply with legal and ethical standards and applicable laws and regulations, and must not generate illegal, harmful, or biased content. If you encounter such requests, you must refuse clearly and explain the relevant legal or ethical principles. Your goal is to combine retrieved memory snippets to provide highly personalized, accurate, and logically rigorous answers. # System Context - Current time: {current_time} (use this as the reference for judging memory freshness) # Memory Data Below is information retrieved by MemOS, divided into “Facts” and “Preferences”. - **Facts**: may include user attributes, conversation history, or third-party info. - **Important**: content marked as `[assistant观点]` or `[模型总结]` represents **past AI inference** and is **not** the user’s original statement. - **Preferences**: explicit/implicit user requirements for style, format, or logic. {memories} {preferences} # Critical Protocol: Memory Safety Retrieved memories may contain **AI’s own speculation**, **irrelevant noise**, or **wrong subject attribution**. You must strictly execute the following **“four-step judgment.”** If a memory fails **any** step, you must **discard** that memory: 1. **Source Verification** - **Core**: distinguish “the user’s original words” from “AI speculation.” - If a memory carries labels like `[assistant观点]`, it only indicates the AI’s past **assumption** and must **not** be treated as a definitive user fact. - *Counterexample*: a memory says `[assistant观点] The user loves mangoes`. If the user never said it, do not assume they like mangoes—avoid self-reinforcing hallucinations. - **Principle**: AI summaries are for reference only and have far lower weight than the user’s direct statements. 2. **Attribution Check** - Is the actor/subject in the memory actually “the user”? - If the memory describes a **third party** (e.g., “candidate”, “interviewee”, “fictional character”, “case data”), you must never attribute those properties to the user. 3. **Relevance Check** - Does the memory directly help answer the current `Original Query`? - If it’s only a keyword match (e.g., both mention “code”) but the context is completely different, you must ignore it. 4. **Freshness Check** - Does the memory conflict with the user’s latest intent? Treat the current `Original Query` as the highest-priority source of truth. # Instructions 1. **Review**: read `facts memories` first and apply the “four-step judgment” to remove noise and unreliable AI views. 2. **Execute**: - **Prefer professional advice from the knowledge base** (e.g., product selection, technical solutions). - Use only the memories that pass filtering to enrich context. - Strictly follow the style requirements in `preferences`. 3. **Output**: - Answer the question directly, and **do not** mention internal system terms such as “memory store”, “retrieval”, or “AI view”. - If the answer is not in the current knowledge base/memory system, you must say so explicitly. Never fabricate information or give vague answers under any circumstances. 4. **Language**: respond in the same language as the user’s query. # Markdown-to-Plain-Text Conversion Rules - When you need to convert provided Markdown (MD) text into plain text, you must strictly follow these rules to ensure readability and avoid formatting errors: - Core formatting: WeChat does not support native MD syntax (e.g., # headings, bold text, code blocks, tables). You must simulate hierarchy using “symbols + newline + spaces”, and avoid unsupported markers. - Heading hierarchy: First check whether the original contains MD headings (lines starting with #). If yes: Level-1 headings use Chinese numerals like ' 一. ', ' 二. ', ' 三. ' (use '.' not '、'); Level-2 headings use Arabic numerals like '1. ', '2. ', '3. '; Level-3 and below use the '・' bullet. Numbers must auto-increment to match document structure. If the original has no MD headings: do not add heading numbering; keep the original paragraph structure. Examples: '### Title 1' → ' 一. Title 1'; '#### Title 2' → '1. Title 2'; '##### Title 3' → '・Title 3'. If no MD headings exist, keep 'Title 1' as 'Title 1'. - Links: Keep Markdown links in 'text' form unchanged. Do not modify or delete any link content. - Lists: Convert '- item' MD lists into ordered lists ('1. item', '2. item') or unordered lists (use '・item'). Each list item must be on its own line, and keep one blank line before and after each item for readability. - Tables: If the MD contains tables, break them into bullet-style sections like '▶ Scenario Type A: XXX' and '▶ Scenario Type B: XXX'. Under each, list items with '1. 2. 3.' without losing info and without keeping table symbols. - Emphasis: Do not use * or **; replace original bold content with '「XXX」'. - Readability: Keep one blank line between major paragraphs. For overly long technical terms or complex descriptions, simplify into more conversational wording without changing meaning. Use Chinese punctuation consistently (e.g., ▶, ・, :) and avoid mixing Chinese/English punctuation that can cause confusion. - Output: Only output the converted plain text; do not add extra notes (e.g., 'conversion complete'); do not change the original meaning—only replace formatting; preserve 100% of the core info (dimensions, features, links, data) without omission; final text must be copy-pastable without further editing. """ # Build memory text (may be empty) if len(memories) > 0: formatted_memories = "## Related memories:\n" for i, memory in enumerate(memories, 1): formatted_memories += f"{i}. {memory.get('memory_value')}\n" else: formatted_memories = "" # Build preference text (may be empty) if len(preferences) > 0: formatted_preferences = "## Preferences:\n" for i, preference_detail in enumerate(preferences, 1): formatted_preferences += f"{i}. {preference_detail.get('preference')}\n" else: formatted_preferences = "" base_prompt = base_prompt.format( current_time=datetime.now().strftime('%Y-%m-%d %H:%M:%S'), memories=formatted_memories, preferences=formatted_preferences, ) return base_prompt def add_message(self, messages, user_id): """Add messages""" data = { "messages": messages, "user_id": user_id, "conversation_id": user_id } res = requests.post(f"{self.base_url}/add/message", headers=self.headers, data=json.dumps(data)) if res.json().get('code') == 0: print(f"✅ Added successfully") else: print(f"❌ Add failed, {res.json().get('message')}") def get_message(self, user_id): """Get messages""" data = { "user_id": user_id, "conversation_id": user_id, "message_limit_number": 15 } res = requests.post(f"{self.base_url}/get/message", headers=self.headers, data=json.dumps(data)) if res.json().get('code') == 0: return res.json().get('data').get('message_detail_list') else: print(f"❌ Get messages failed, {res.json().get('message')}") return [] def chat(self, query, user_id): """Main chat function with memory integration""" # 1. Fetch recent conversation history chat_history = self.get_message(user_id) # 2. Search relevant memories memories, preferences = self.search_memory(query, user_id) # 3. Build system prompt with memory system_prompt = self.build_system_prompt(memories, preferences) messages = [ {"role": "system", "content": system_prompt}, *chat_history, {"role": "user", "content": query} ] # 4. Use OpenAI to generate an answer response = self.openai_client.chat.completions.create( model="gpt-4o", messages=messages, temperature=0.3, top_p=0.9 ) answer = response.choices[0].message.content # 5. Save the conversation into memory messages = [ {"role": "user", "content": query}, {"role": "assistant", "content": answer} ] self.add_message(messages, user_id) # 6. Return the answer return answer ai_assistant = KnowledgeBaseAssistant() user_id = "memos_knowledge_base_user_123" def demo_questions(): return [ 'Who are you?' ] def main(): print("💡 Welcome to the Knowledge Base Q&A Assistant!\n") print("\n🎯 Here are some sample questions—you can continue chatting with the assistant:") for i, question in enumerate(demo_questions(), 1): print(f" {i}. {question}") while True: user_query = input("\n🤔 Enter your question (or type 'exit' to quit): ").strip() if user_query.lower() in ['quit', 'exit', 'q', '退出']: print("👋 Thanks for using the Knowledge Base Q&A Assistant!") break if not user_query: continue print("🤖 Processing...") answer = ai_assistant.chat(user_query, user_id) print(f"💡 [Assistant]: {answer}") print("-" * 60) if __name__ == "__main__": main() ``` #### 2.2.2 Initialize the runtime environment ```python pip install OpenAI && pip install datetime ``` #### 2.2.3 Replace environment variables in the code ##### Get the key (API_KEY) Log into the console at [https://memos-dashboard.openmem.net/cn/apikeys/](https://memos-dashboard.openmem.net/cn/apikeys/) and copy the key. ![image.png](https://cdn.memtensor.com.cn/img/1768481468406_q51iqx_compressed.png) ```python os.environ["MEMOS_API_KEY"] = "mpg-xx" ``` ##### LLM client ```python # Replace with your OpenAI API key os.environ["OPENAI_API_KEY"] = "sk-xx" # [Optional] Replace with your BASE_URL os.environ["OPEN_API_BASE_URL"] = "http://xxx.xxx" openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"), os.getenv("OPEN_API_BASE_URL")) ``` ##### Get the knowledge base ID For the knowledge base you just uploaded to, copy the ID and save it. ![image.png](https://cdn.memtensor.com.cn/img/1768481493435_bkwqlu_compressed.png) ```python # Replace with your own knowledge base ID (this is an example only) os.environ["KNOWLEDGE_BASE_IDS"] = json.dumps([ "based540fb25-ddf1-4456-935b-41d901518e04" ]) ``` ##### Execute the code ```python python knowledge_qa_assistant.py ``` ![image.png](https://cdn.memtensor.com.cn/img/1768533833272_krke26_compressed.jpeg) ### 2.3 Code walkthrough 1. Set your MemOS API key, OpenAI API key, and knowledge base IDs via environment variables. 2. Instantiate `KnowledgeBaseAssistant`. 3. Use `main()` to start an interactive chat loop. 4. The assistant calls `chat()` to handle the interaction. Inside `chat()`, it performs: * Call `get_message` to fetch historical chat messages. * Call `search_memory` to retrieve facts and preferences. * Build a prompt based on the memory system. * Use the LLM to generate an answer. * Call `add_message` to store the user query and model answer as long-term memory. * Return the model answer. --- # Let the Financial Assistant Understand Customer Preferences Behind Behaviors (/usecase/financial_assistant) ## 1. Overview In intelligent investment advisory products, users leave behind a large number of **behavioral traces**: * **Traffic Source**: Which ad or post did the user click? (e.g., clicked the “Retirement Finance” ad) * **In-App Operations**: Which fund products did they browse? Which financial products did they bookmark? * **Communication Records**: Conversations with financial advisors and interactions with the AI financial assistant. These are just raw behaviors. If stored directly as logs, they are of limited help to large models. **The key is how to abstract behaviors into "memories."** ### 1.1 How are behaviors abstracted into memories? | User Behavior (Raw Trace) | Corresponding Memory (Semantic Abstraction) | | --- | --- | | Clicked “Retirement Finance” ad to enter the app | Memory: “User has potential interest in retirement finance” | | Frequently browsed low-risk fund detail pages | Memory: “User’s risk preference is conservative” | | Bookmarked “low-risk financial products” | Memory: “User tends to choose low-risk financial products” | | Said in conversation: “I don’t want to take too much risk” | Memory: “Explicitly expressed low-risk demand” | When the user later asks, “What kind of investment suits me?”, the financial assistant does not need to scan through a pile of logs but instead directly uses these semantic memories to drive the model to generate personalized answers. ### 1.2 Why not traditional RAG? RAG is more suitable for knowledge Q&A, such as explaining “What is a bond.” But it does not summarize preferences from user behaviors: | Traditional RAG | MemOS | | --- | --- | | Returns static financial knowledge snippets | Abstracts user behaviors into semantic memories (interests, preferences, profiles) | | Cannot answer “What kind of investment suits me?” | Can combine memories to generate personalized advice | ### 1.3 Why not build it yourself? Of course, developers can store behaviors themselves, but they will face three challenges: * **Lack of abstraction**: Simply storing “clicked Fund A” is not useful; it needs to be transformed into “risk preference = low risk.” * **Integration complexity**: Before calling the model, developers must manually build prompts by abstracting scattered behaviors into semantic information. * **Poor scalability**: As more channels, products, and communication scenarios are added, the code quickly becomes unmanageable. ### 1.4 Why use MemOS? When making a technology selection, you can directly compare three approaches: | Approach | Characteristics | Limitations | Advantages of MemOS | | --- | --- | --- | --- | | **Traditional RAG** | Retrieves knowledge base documents | Does not process user behaviors, cannot build profiles | Suitable for FAQ, but not for personalized financial advisory | | **Self-Built Storage** | Directly stores behavior logs | Requires manual abstraction from behavior → memory; high prompt engineering cost | Requires developing大量 glue code | | **MemOS** | Two interfaces: `addMessage` for writing, `searchMemory` for retrieval | —— | Automatically abstracts behavior traces into memories for direct use by the model | ### 1.5 What will this case demonstrate? This case demonstrates how to use MemOS cloud services to quickly build an intelligent financial assistant that “turns user behaviors into memories.” In the demo: * **D1 Traffic Behavior**: Clicking the “Retirement Finance” ad → generates memory “Interest in retirement finance.” * **D2 In-App Behavior**: Browsing and bookmarking low-risk funds → generates memory “Risk preference = low risk.” * **D3 Conversational Behavior**: Saying “I don’t want to take risks” → generates memory “Explicit low-risk demand.” When the user asks, “What kind of investment suits me?”: * `searchMemory` retrieves the above memories * The large model generates an answer that combines these profiles → outputs “More suitable for low-risk fixed income products.” When running this case script, developers will see in the console: * Each `addMessage` request/response (behaviors stored) * Each `searchMemory` request/response (semantic memories retrieved) * The model’s final personalized investment recommendation ## 2. Example ### 2.1 Environment Setup Use pip to install required dependencies: ```shell pip install MemoryOS -U ``` ### 2.2 Full Code ```python import os import uuid from openai import OpenAI from memos.api.client import MemOSClient os.environ["MEMOS_API_KEY"] = "mpg-xx" # Get MemOS_API_KEY from cloud service console os.environ["OPENAI_API_KEY"] = "sk-xx" # Replace with your own API_KEY conversation_counter = 0 def generate_conversation_id(): global conversation_counter conversation_counter += 1 return f"conversation_{conversation_counter:03d}" class FinancialManagementAssistant: """AI financial management assistant with memory capability""" def __init__(self): self.memos_client = MemOSClient(api_key=os.getenv("MEMOS_API_KEY")) self.openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) def search_memory(self, query, user_id, conversation_id): """Search relevant memories based on query""" response = self.memos_client.search_memory(query, user_id, conversation_id) return [memory_detail.memory_value for memory_detail in response.data.memory_detail_list] def build_system_prompt(self, memories): """Construct a system prompt including formatted memories""" base_prompt = """ You are a knowledgeable and professional financial management assistant. You can access conversational memories to help you provide more personalized answers. Use memories to understand the user’s background, preferences, and past interactions. If memories are provided, naturally reference them when relevant, but do not explicitly mention having memories. """ if memories: # Format memories as a numbered list formatted_memories = "## Memories:\n" for i, memory in enumerate(memories, 1): formatted_memories += f"{i}. {memory}\n" return f"{base_prompt}\n\n{formatted_memories}" else: return base_prompt def add_message(self, messages, user_id, conversation_id): """Add messages to MemOS so they can be processed into memories""" self.memos_client.add_message(messages, user_id, conversation_id) def get_message(self, user_id, conversation_id): """Retrieve the raw messages stored in MemOS (for debugging/inspection)""" response = self.memos_client.get_message(user_id, conversation_id) return response.data.message_detail_list def chat(self, query, user_id, conversation_id): """Main chat function for handling conversations with memory integration""" # 1) Search relevant memories memories = self.search_memory(query, user_id, conversation_id) # Build system prompt with memories system_prompt = self.build_system_prompt(memories) # 2) Use OpenAI to generate an answer response = self.openai_client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ] ) answer = response.choices[0].message.content # 3) Save the interaction back to MemOS messages = [ {"role": "user", "content": query}, {"role": "assistant", "content": answer} ] self.memos_client.add_message(messages, user_id, conversation_id) return answer ai_assistant = FinancialManagementAssistant() user_id = "memos_financial_management_user_123" def demo_questions(): return [ "What is my risk preference?", "Recommend some investments suitable for me" ] def preset_user_behaviors(): """Show preset user behavior memories""" conversation_id = generate_conversation_id() print(f"\n📊 Preset user behavior memories(conversation_id={conversation_id}):") print("=" * 60) behaviors = [{ "role": "user", "content": "Clicked 'Retirement Finance' ad to enter app" }, { "role": "user", "content": "Browsed and bookmarked low-risk funds" }] for i, behavior in enumerate(behaviors, 1): print(f"{i}. {behavior['content']}") ai_assistant.add_message(behaviors, user_id, conversation_id) print("=" * 60) print("💡 The above behavioral memories have been recorded by MemOS. The assistant will provide personalized recommendations based on them.") def main(): print("💰 Welcome to see how MemOS is used in a financial management assistant!") print("💡 With MemOS, your financial assistant becomes smarter and more caring! 😊 \n") # Ask whether to preload user behavior memories (consumes 1 add quota) while True: pre_chat = input("🤔 Do you want to preload user behavior memories? This will consume 1 add quota. Proceed? (y/n): ").strip().lower() if pre_chat in ['y', 'yes']: preset_user_behaviors() break elif pre_chat in ['n', 'no']: print("📝 Starting a new conversation...") break else: print("⚠️ Please enter 'y' for yes or 'n' for no") print("\n⚡️ Each question you enter next will take place in a brand-new conversation (with a new conversation ID). MemOS will automatically recall your historical behavioral memories across conversations to provide you with continuous and personalized service.") print("\n🎯 Here are some example questions you can continue to ask the assistant:") for i, question in enumerate(demo_questions(), 1): print(f" {i}. {question}") while True: user_query = input("\n🤔 Please enter your question (or type 'exit' to quit): ").strip() if user_query.lower() in ['quit', 'exit', 'q']: print("👋 Thanks for using the financial management assistant!") break if not user_query: continue print("🤖 Processing...") conversation_id = generate_conversation_id() answer = ai_assistant.chat(user_query, user_id, conversation_id) print(f"\n💬 conversation_id: {conversation_id}\n💡 [Assistant]: {answer}\n") print("-" * 60) if __name__ == "__main__": main() ``` ### 2.3 Code Explanation 1. Set your MemOS API key and OpenAI API key in environment variables 2. Instantiate **FinancialManagementAssistant** 3. Choose whether to execute preset conversations, which will consume 1 add and 2 search quotas 4. Use the `main()` function to interact with the assistant through a conversation loop 5. The assistant will call `chat`, first performing a `search` to retrieve memories, then calling OpenAI for conversation, and finally performing an `add` to store memories --- # A Writing Assistant with Memory is More Useful (/usecase/writting_assistant) ## 1. Overview In writing assistant products, users often hope that the assistant can **remember their writing style and habits** instead of starting from scratch each time. * **Writing Style** "When helping me write a summary, keep the tone light." * **Common Information** "Remember that I am in charge of the Marketing Department at XX company." * **Writing Preferences** "From now on, always start emails with 'Dear Customer.'" * **Context Continuity** "Please further optimize yesterday’s proposal summary by adding the budget section." Without memory, this information is lost once the conversation ends. Users must repeatedly remind the assistant, which makes the experience feel fragmented and unprofessional. ### 1.1 Why Not Use Traditional RAG? In the writing assistant scenario, RAG is not suitable. | Traditional RAG | MemOS | | --- | --- | | Relies on a static knowledge base, requiring constant manual document maintenance | Information generated in the conversation can be directly written in, no extra maintenance required | | Retrieval results are usually generic knowledge fragments | Can store and retrieve personalized style, tone, and commonly used expressions | | More suitable for “company documents/encyclopedia knowledge” | More suitable for “continuous iteration and personalization” in writing assistants | ### 1.2 Why Not Build It Yourself? Of course, you could try to save user preferences and context in a database, but this brings several challenges: * **Complex storage and retrieval logic**: You need to distinguish main text, preferences, and user profiles, and design retrieval strategies. * **Troublesome integration with large models**: Storing is only the first step; before calling the large model, you still need to “insert” the relevant information into the prompt. * **Poor scalability**: As user needs increase (writing style, common phrases, contextual links), the code will quickly become bloated. ### 1.3 Why Use MemOS? When making a choice, you can directly compare the three approaches: | Approach | Features | Limitations | Advantages of MemOS | | --- | --- | --- | --- | | **Traditional RAG** | Retrieves documents from a vector knowledge base and inserts into the prompt | Requires manual maintenance of static documents; unsuitable for personalized writing habits | Automatically captures styles and preferences revealed by users during conversations | | **Self-built Storage Solution** | Build your own tables/caches to save preferences and content | Complex logic: must distinguish main text/preferences/profiles; manual prompt insertion needed; difficult to scale | MemOS encapsulates storage + retrieval + prompt injection, reducing development burden | | **MemOS** | Just two APIs: `addMessage` for writing, `searchMemory` for retrieval | —— | Supports long-term tracking of writing styles and reuse of common information; out-of-the-box and easy to expand | ### 1.4 What Will This Case Show? This case demonstrates how to use the MemOS cloud service to quickly implement a writing assistant that “remembers the user.” In this demo, the user may: * Set preferences: “When helping me write a summary, keep the tone light.” * Reuse background: “Remember that I am in charge of the Marketing Department at XX company.” * Iterate tasks: “Please further optimize yesterday’s proposal summary by adding the budget section.” With MemOS, the writing assistant can: 1. **Maintain Style**: Keep consistent tone and formatting as required by the user. 2. **Reuse Information**: Automatically include the user’s common background information. 3. **Iterate Quickly**: Modify based on existing content instead of starting over. When running this case script, developers will see in the console: * Each `addMessage` and `searchMemory` request/response * Retrieved memories such as writing style and background information * The final model-generated answer (if no large model is connected, it will display [No model connected]) ## 2. Example ### 2.1 Environment Setup Install the required dependencies via pip: ```shell pip install MemoryOS -U ``` ### 2.2 Complete Code ```python import os import uuid from openai import OpenAI from memos.api.client import MemOSClient os.environ["MEMOS_API_KEY"] = "mpg-xx" # Get MemOS_API_KEY from cloud service console os.environ["OPENAI_API_KEY"] = "sk-xx" # Replace with your own API_KEY conversation_counter = 0 def generate_conversation_id(): global conversation_counter conversation_counter += 1 return f"conversation_{conversation_counter:03d}" class WritingAssistant: """AI Writing Assistant, helps users write with memory capability""" def __init__(self): self.memos_client = MemOSClient(api_key=os.getenv("MEMOS_API_KEY")) self.openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) def search_memory(self, query, user_id, conversation_id): """Search relevant memories based on query""" response = self.memos_client.search_memory(query, user_id, conversation_id) return [memory_detail.memory_value for memory_detail in response.data.memory_detail_list] def build_system_prompt(self, memories): """Build a system prompt that includes formatted memories""" base_prompt = """ You are a professional writing assistant who can remember the user’s writing style and preferences. You can call conversation memories to provide more personalized replies. Please use these memories to understand the user’s background, preferences, and past interactions. If memories are provided, naturally reference them where relevant, but do not explicitly mention having memory capabilities. """ if memories: # Format memories as a numbered list formatted_memories = "## Memories:\n" for i, memory in enumerate(memories, 1): formatted_memories += f"{i}. {memory}\n" return f"{base_prompt}\n\n{formatted_memories}" else: return base_prompt def add_message(self, messages, user_id, conversation_id): """Add messages to MemOS so they can be processed into memories""" self.memos_client.add_message(messages, user_id, conversation_id) def get_message(self, user_id, conversation_id): """Retrieve the raw messages stored in MemOS (for debugging/inspection)""" response = self.memos_client.get_message(user_id, conversation_id) return response.data.message_detail_list def chat(self, query, user_id, conversation_id): """Main chat function to handle dialogue with memory integration""" # 1. Search relevant memories memories = self.search_memory(query, user_id, conversation_id) # Build system prompt with memories system_prompt = self.build_system_prompt(memories) # 2. Use OpenAI to generate an answer response = self.openai_client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ] ) answer = response.choices[0].message.content # 3. Save the conversation into memories messages = [ {"role": "user", "content": query}, {"role": "assistant", "content": answer} ] self.memos_client.add_message(messages, user_id, conversation_id) return answer ai_assistant = WritingAssistant() user_id = "memos_writing_user_123" def demo_questions(): return [ "Help me write a notification email for a team dinner", "Help me write a client email summarizing the new features of our upcoming finance app", ] def pre_configured_conversations(): """Return pre-configured dialogue pairs""" return [ { "user": "I work in the marketing department at an internet company. Keep the tone light when writing emails, and start with 'Dear XX'." }, { "user": "When writing summaries, I prefer to list three bullet points first." } ] def execute_pre_conversations(): """Execute pre-configured dialogues""" conversations = pre_configured_conversations() conversation_id = generate_conversation_id() print(f"\n🔄 Executing pre-configured dialogues(conversation_id={conversation_id})...") print("=" * 60) for i, conv in enumerate(conversations, 1): print(f"\n💬 Dialogue {i}") print(f"👤 User: {conv['user']}") # Execute dialogue answer = ai_assistant.chat(conv['user'], user_id, conversation_id) print(f"🤖 Assistant: {answer}") print("-" * 40) print("\n✅ Pre-configured dialogues completed!") print("=" * 60) def main(): print("📝 Welcome to the MemOS writing assistant example!") print("💡 With MemOS, your writing assistant better understands your style and preferences! ✍️ \n") # Ask whether to execute pre-configured dialogues first while True: pre_chat = input("🤔 Do you want to execute the pre-configured dialogues first? This will consume 2 add and 2 search calls. Execute? (y/n): ").strip().lower() if pre_chat in ['y', 'yes', 'Y']: execute_pre_conversations() break elif pre_chat in ['n', 'no', 'N']: print("📝 Starting a brand-new writing assistant dialogue...") break else: print("⚠️ Please enter 'y' for yes or 'n' for no") print("\n⚡️ Each question you enter next will take place in a brand-new conversation (with a new conversation ID). MemOS will automatically recall your historical behavioral memories across conversations to provide you with continuous and personalized service.") print("\n🎯 Here are some example questions. You can continue chatting with the writing assistant:") for i, question in enumerate(demo_questions(), 1): print(f" {i}. {question}") while True: user_query = input("\n🤔 Please enter your writing request (or type 'exit' to quit): ").strip() if user_query.lower() in ['quit', 'exit', 'q']: print("👋 Thank you for using the writing assistant. Happy writing!") break if not user_query: continue print("🤖 Creating...") conversation_id = generate_conversation_id() answer = ai_assistant.chat(user_query, user_id, conversation_id) print(f"\n💬 conversation_id: {conversation_id}\n💡 [Assistant]: {answer}\n") print("-" * 60) if __name__ == "__main__": main() ``` ### 2.3 Code Explanation 1. Set your MemOS API key and OpenAI key in environment variables. 2. Instantiate `WritingAssistant`. 3. Choose whether to run pre-configured dialogues, which will consume 2 add and 2 search calls. 4. Use the `main()` function to interact with the assistant through a dialogue loop. 5. The assistant will call `chat`: first execute `search` to retrieve memories, then call OpenAI for dialogue, and finally execute `add` to store memories. --- # Building a Home Life Assistant with Memory (/usecase/home_assistant) ## 1. Overview When developing a home life assistant product, developers often encounter a problem: **once the dialogue context ends, user information is lost**. * User casually assigns a to-do (“Take the kids to the zoo on Saturday”) * User expresses a habit (“When reminding, first list the key points, then give one action suggestion”) * User introduces family information (“My wife is Xiaoyun, the child is 6 years old”) If the assistant cannot remember this information, it will appear “heartless”: the next day when the user asks, “What plans do I have for the weekend?”, the assistant will have no idea what they are referring to. ### 1.1 Why not traditional RAG? Many people’s first thought is: can we use RAG (Retrieval-Augmented Generation)? But the characteristics of traditional RAG determine that it is not suitable for this kind of “personalized assistant” scenario: | Traditional RAG | MemOS | | --- | --- | | Relies on static knowledge bases, requiring manual document maintenance | Information generated during dialogue can be directly written in, no extra maintenance needed | | Can only mechanically return fragments, does not learn preferences | Automatically forms to-do items, preferences, and profiles from conversations | | Focuses on “common knowledge”, unsuitable for personal information | Designed for individualized scenarios, supports long-term tracking and invocation | ### 1.2 Why not build your own solution? Of course, you could try to store this information yourself, but this brings several challenges: * **Complex storage and retrieval logic**: must distinguish dialogue content, long-term memory, preferences, and facts, and ensure they can be retrieved as needed. * **Troublesome integration with LLMs**: not only storing data, but also embedding relevant information into the prompt before generating responses. * **Poor scalability**: as features increase (to-dos, preferences, profiles), the code becomes increasingly hard to maintain. ### 1.3 Why use MemOS? When making a technical choice, you can intuitively compare three approaches: | Approach | Characteristics | Limitations | Advantages of MemOS | | --- | --- | --- | --- | | Traditional RAG | Retrieves documents from a vector database and appends them into the prompt | Requires manual static document maintenance; cannot store personal to-dos/preferences; only mechanically returns fragments | Automatically captures key information from dialogues, supports personalization and dynamic updates | | Self-built storage solution | Custom tables/cache to save dialogue information | Complex logic: must distinguish dialogues/long-term memory/preferences/profiles; still need to manually build prompt before model calls; poor scalability | MemOS encapsulates storage + retrieval + prompt injection, reducing developer burden | | MemOS | Only two interfaces: `addMessage` for writing, `searchMemory` for retrieval | —— | Supports long-term tracking, preference retention, and profile integration; ready-to-use and easily extendable | Only two API calls are needed: * `addMessage`: writes user or assistant messages into the system * `searchMemory`: retrieves relevant memories before model response and injects them into the prompt With this, the assistant can truly appear “with memory”: * **Track to-dos** * User says “Take the kids to the zoo on Saturday” * A few days later asks “What plans do I have for the weekend?” → Assistant can answer accurately * **Maintain preferences** (future versions will support more fine-grained instruction completion) * User says “When reminding, first list three key points + one short suggestion” * Later asks “Help me plan next week’s housework distribution” → Assistant outputs in the preferred style * **Incorporate profiles** * User says “My wife is Xiaoyun, the child is 6 years old” * Later asks “Arrange a weekend activity for the family?” → Suggests a family-friendly activity plan ### 1.4 What does this case demonstrate? We will use MemOS cloud service to quickly implement a home life assistant “that remembers the user.” When running the example script, developers will see complete logs: * Requests/responses for each `addMessage` and `searchMemory` call * Matched memory entries * Concatenated and full instructions ← TODO: coming soon * Model responses (if LLM is not connected, a message will indicate [LLM not connected]) ## 2. Example ### 2.1 Environment Setup Install required dependencies with pip: ```shell pip install MemoryOS -U ``` ### 2.2 Full Code ```python import os import uuid from openai import OpenAI from memos.api.client import MemOSClient os.environ["MEMOS_API_KEY"] = "mpg-xx" # Get MemOS_API_KEY from cloud service console os.environ["OPENAI_API_KEY"] = "sk-xx" # Replace with your own API_KEY conversation_counter = 0 def generate_conversation_id(): global conversation_counter conversation_counter += 1 return f"conversation_{conversation_counter:03d}" class HomeAssistant: def __init__(self): self.memos_client = MemOSClient(api_key=os.getenv("MEMOS_API_KEY")) self.openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) def search_memory(self, query, user_id, conversation_id): """Search relevant memories based on query""" response = self.memos_client.search_memory(query, user_id, conversation_id) return [memory_detail.memory_value for memory_detail in response.data.memory_detail_list] def add_message(self, messages, user_id, conversation_id): """Add messages to MemOS so they can be processed into memories""" self.memos_client.add_message(messages, user_id, conversation_id) def get_message(self, user_id, conversation_id): """Retrieve the raw messages stored in MemOS (for debugging/inspection)""" response = self.memos_client.get_message(user_id, conversation_id) return response.data.message_detail_list def build_system_prompt(self, memories): """Builds a system prompt containing formatted memories""" base_prompt = """ You are a knowledgeable and considerate home life assistant. You can leverage conversation memories to provide more personalized responses. Use these memories to understand the user’s context, preferences, and past interactions. If memory content is provided, naturally reference it when relevant, but do not explicitly state you have memory functions. """ if memories: # Format memories as a numbered list formatted_memories = "## Memories:\n" for i, memory in enumerate(memories, 1): formatted_memories += f"{i}. {memory}\n" return f"{base_prompt}\n\n{formatted_memories}" else: return base_prompt def chat(self, query, user_id, conversation_id): """Main chat function handling memory-integrated conversation""" # 1. Search relevant memories memories = self.search_memory(query, user_id, conversation_id) # Build system prompt including memories system_prompt = self.build_system_prompt(memories) # 2. Use OpenAI to generate response response = self.openai_client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": query} ] ) answer = response.choices[0].message.content # 3. Save dialogue into memory messages = [ {"role": "user", "content": query}, {"role": "assistant", "content": answer} ] self.memos_client.add_message(messages, user_id, conversation_id) return answer ai_assistant = HomeAssistant() user_id = "memos_home_management_user_123" def demo_questions(): return [ "What plans do I have for the weekend?", "Help me plan next week’s housework distribution" ] def pre_configured_conversations(): """Return pre-configured dialogue pairs""" return [ { "user": "Take the kids to the zoo on Saturday, please remember it.", }, { "user": "For future reminders or plans, please first list three key points, then add one short suggestion.", } ] def execute_pre_conversations(): """Execute pre-configured dialogues""" conversations = pre_configured_conversations() conversation_id = generate_conversation_id() print(f"\n🔄 Executing pre-configured dialogues(conversation_id={conversation_id})...") print("=" * 60) for i, conv in enumerate(conversations, 1): print(f"\n💬 Dialogue {i}") print(f"👤 User: {conv['user']}") # Execute dialogue answer = ai_assistant.chat(conv['user'], user_id, conversation_id) print(f"🤖 Assistant: {answer}") print("-" * 40) print("\n✅ Pre-configured dialogues completed!") print("=" * 60) def main(): print("🏠 Welcome to the example of MemOS applied in a home assistant!") print("💡 With the power of MemOS, your product can deliver a real butler-like experience! 😊 \n") # Ask whether to execute pre-configured dialogues first while True: pre_chat = input("🤔 Would you like to execute the pre-configured dialogues first? This will consume 2 add calls and 2 search calls. Proceed? (y/n): ").strip().lower() if pre_chat in ['y', 'yes']: execute_pre_conversations() break elif pre_chat in ['n', 'no']: print("📝 Starting a new dialogue...") break else: print("⚠️ Please enter 'y' for yes or 'n' for no") print("\n⚡️ Each question you enter next will take place in a brand-new conversation (with a new conversation ID). MemOS will automatically recall your historical behavioral memories across conversations to provide you with continuous and personalized service.") print("\n🎯 Here are some sample questions you can continue to ask the assistant:") for i, question in enumerate(demo_questions(), 1): print(f" {i}. {question}") while True: user_query = input("\n🤔 Please enter your question (or type 'exit' to quit): ").strip() if user_query.lower() in ['quit', 'exit', 'q']: print("👋 Thank you for using the home assistant!") break if not user_query: continue print("🤖 Processing...") conversation_id = generate_conversation_id() answer = ai_assistant.chat(user_query, user_id, conversation_id) print(f"\n💬 conversation_id: {conversation_id}\n💡 [Assistant]: {answer}\n") print("-" * 60) if __name__ == "__main__": main() ``` ### 2.3 Code Explanation 1. Set your MemOS API key and OpenAI key in environment variables 2. Instantiate `HomeAssistant` 3. Choose whether to run pre-configured dialogues (consumes 2 add calls and 2 search calls) 4. Use the `main()` function to interact with the assistant in a dialogue loop 5. The assistant calls `chat`, first performing `search` to retrieve memories, then using OpenAI for conversation, and finally performing `add` to store the memory --- # Claude MCP (/usecase/frameworks/claude_mcp) ### Using in Claude Desktop To use MemOS in Claude Desktop, click the avatar in the lower left corner -> "Settings" -> "Developer" -> "Edit Config", paste the configuration into the Claude_desktop_config.json file, and finally restart the client. You can use it in the chat when you observe that the memos-api-mcp service is in the running state. ![Verification of using MemOS in Claude](https://cdn.memtensor.com.cn/img/1763105334517_9ayhrp_compressed.png) To improve the usage effect, it is recommended that users modify the user preference settings that apply to all conversations when using MemOS in Claude Desktop. The specific method is to click the avatar in the lower left corner -> "General", and paste the following content into the input box under "What personal preferences should Claude consider in responses?": ``` You are MemOS Memory Management Assistant, dedicated to providing efficient memory management services. It extracts memories based on users' past conversation content and enhances the consistency and personalization of users' conversations with AI through memory retrieval. Before answering each user's question, you need to call the search_memory service of memos-api-mcp and use appropriate search terms to find memories related to the current topic in the user's personal memory bank. After completing the answer based on these memories, call the add_message service of memos-api-mcp to record a summary of the current conversation content. (Note that calling add_message is mandatory. Regardless of what the user says or asks, it must be recorded; otherwise, in subsequent conversations, search_memory will not be able to obtain more detailed user information, leading to your inability to answer the user's questions accurately.) ``` ![Modifying user preferences for using MemOS in Claude Desktop](https://cdn.memtensor.com.cn/img/1763105312212_yqu9m7_compressed.png) The following is an example of using MemOS in Claude Desktop, by which users can judge whether they have successfully configured MemOS in Claude Desktop. ![Example of using MemOS in Claude Desktop](https://cdn.memtensor.com.cn/img/1763105296073_gtqj1s_compressed.png) --- # Coze Plugin (/usecase/frameworks/coze_plugin) ## 1. Plugin Information The MemOS Cloud Service Plugin is now available in the Coze Store! You can directly [visit the tool link](https://www.coze.cn/store/plugin/7569918012912893995?from=store_search_suggestion) to add the plugin and achieve zero-code integration. ## 2. Plugin Description ### Plugin Functions * `search_memory`: This tool is used to query user memory data and can return fragments most relevant to the input. It supports real-time memory retrieval during user-AI conversations and global searches across the entire memory. It can be used to create user profiles or support personalized recommendations. Queries require parameters such as conversation ID, user ID, and query text. You can also set the number of memory items returned. * `add_memory`: This tool allows batch import of one or more messages into the MemOS memory storage database, facilitating retrieval in future conversations. This supports chat history management, user behavior tracking, and personalized interaction. Usage requires specifying conversation ID, message content, sender role, conversation time, and user ID. ### Interface Description * `search_memory` Interface | Parameter Name | Type | Description | Required | | --- | --- | --- | --- | | memory_limit_number | string | Limits the number of returned memory items. Defaults to 6 if not provided. | No | | memos_key | string | Authorization key for MemOS Cloud Service | Yes | | memos_url | string | URL address for MemOS Cloud Service | Yes | | query | string | User input | Yes | | user_id | string | Unique identifier for the user associated with the memory being queried | Yes | * `add_memory` Interface | Parameter Name | Type | Description | Required | | --- | --- | --- | --- | | conversation_id | string | Unique identifier for the conversation | Yes | | memos_key | string | Authorization key for MemOS Cloud Service | Yes | | memos_url | string | URL address for MemOS Cloud Service | Yes | | messages | Array | Array of message objects | Yes | | user_id | string | Unique identifier for the user associated with the memory being queried | Yes | ## 3. Agent Call Example ### Agent Persona and Reply Logic Example ``` You are a Q&A robot. Each time, you will read the user's memory and content of interest, and reply with very clear logic to gain the user's favor. ## Workflow Content # 1. Access {search_memory} to retrieve data After each user input, first call the retrieval function in MemOS memory relationship -- the {search_memory} plugin. Input information: Record the user's name as user_id. If it is the first visit, set user_id to a 16-character string randomly generated by UUID. Use the user's spoken content as the query. # 2. Process {search_memory} output content: Get the data content. If it contains the memory_detail_list field, regardless of whether the memory_detail_list list is empty, directly output the memory_detail_list list in JSON format; if the returned message is not "ok", prompt "Plugin retrieval failed". # 3. Answer the user's question based on the retrieved memory_detail_list Extract the memory_value field value of each item in memory_detail_list, and concatenate all strings with "\n" as the context material for answering the user's question. The large model answers the user's query based on the information provided by the context; if the context information is an empty string, the large model directly answers the user's query. Then record the content answered by the large model into "answer". # 4. Access {add_memory} to store data Call the add_memory function to store the user's question and the corresponding answer. Input information: chat_time: Call {current_time} to get the current time, format the timestamp as "%I:%M %p on %d %B, %Y UTC". conversation_id: Record the current time point chat_time accurate to the minute, and use the time point string as conversation_id. user_id: Record the user's name as user_id. messages: Record the query input by the user and all answers obtained, as the content of the role and the content of the assistant in messages respectively. Use the chat_time value just obtained for chat_time, and organize it into a messages array: [ {"role": "user", "content": query, "chat_time": chat_time}, {"role": "assistant", "content": answer, "chat_time": chat_time} ] Get feedback from the {add_memory} plugin. If the success field in data is True, it is successful, *no need to inform the user*; if the returned field is not True, prompt the user that add_memory access failed. ## Requirements When accessing {search_memory} and {add_memory} each time, two fixed parameters must be passed: memos_url = "https://memos.memtensor.cn/api/openmem/v1" memos_key = "Token mpg-XXXXXXXXXXXXXXXXXXXXXXXXXXX" Your role is a wise and loving memory assistant named Xiao Zhi. If all plugins run smoothly, there is no need to prompt the user for success in the content answered by the large model. Only generate user_id with UUID once during the user's first conversation, and reuse this user_id in subsequent work. ``` [Agent Example Link](https://www.coze.cn/s/85NOIg062vQ) ![Agent Workflow](https://cdn.memtensor.com.cn/img/coze_workflow_compressed.png)