Core Operations

Continuous Conversation Chat

MemOS provides a conversation API with built-in memory management, so you do not need to manually assemble context.

1. When to Use the Chat API

The Chat API is suitable for quickly building AI conversation applications with long-term memory. You only pass the user's current message; MemOS automatically handles memory recall, prompt assembly, model response generation, and conversation writing.

  • Integrated conversational AI: one API completes conversation generation without a complex custom pipeline.
  • Automatic memory handling: automatically extracts, updates, and retrieves memories, reducing manual maintenance.
  • Continuous context: keeps understanding coherent across turns, days, and even sessions.

2. Compared with Memory Operation APIs

Use Chat

Best for general AI conversations, business PoCs, and quick validation

Use Memory Operation APIs

Best for complex Agents and deeper business-system integration

DimensionChat APIMemory operation APIs
Integration complexityLow, ready to useMedium, requires orchestration
Memory managementAutomaticManually add, search, and assemble
Model responseGenerated by MemOS built-in modelCall your own external model
ControlGood for common configurationGood for complex pipelines and fine-grained control

3. How It Works

  1. If historical user messages exist, call add/message to write them into MemOS first.
  2. When the end user sends a message, your AI application calls chat with the user message and related parameters.
  3. MemOS recalls historical memories related to the current user message and assembles custom instructions, current conversation context, and user memories.
  4. MemOS calls the model to generate an answer and returns the result to your AI application.
  5. By default, MemOS asynchronously processes the user message and model response in the background and writes them as memories.

4. Quick Start

Optional: add historical messages

If you already have conversation history, call add/message first. For a new user or a new conversation, skip this step and call Chat directly.

import requests

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://memos.memtensor.cn/api/openmem/v1"

data = {
  "user_id": "memos_user_123",
  "conversation_id": "0610",
  "messages": [
    {"role": "user", "content": "I booked a summer trip to Guangzhou. Which hotel chains are available?"},
    {"role": "assistant", "content": "You can consider 7 Days Inn, Ji Hotel, Hilton, and others."},
    {"role": "user", "content": "I'll choose 7 Days Inn."},
    {"role": "assistant", "content": "Got it. Feel free to ask if you have other questions."}
  ]
}

res = requests.post(
  f"{BASE_URL}/add/message",
  headers={"Authorization": f"Token {API_KEY}"},
  json=data
)

print(res.json())

Call Chat

When you call chat, MemOS automatically retrieves relevant memories and generates an answer.

import requests

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://memos.memtensor.cn/api/openmem/v1"

data = {
  "user_id": "memos_user_123",
  "conversation_id": "0928",
  "query": "I want to travel during National Day. Recommend a city I have not visited and a hotel brand I have not stayed at."
}

res = requests.post(
  f"{BASE_URL}/chat",
  headers={"Authorization": f"Token {API_KEY}"},
  json=data
)

print(res.json())

For the full field list, request format, and response format, see the Chat API documentation.


5. Limits

  • Input limit: 8,000 tokens.
  • Output limit: up to 25 fact memories and up to 25 preference memories can be recalled.

6. More Usage Options

The Chat API works out of the box. The following parameters are optional and only needed when you want to control memory recall, model responses, or memory writing.


Control memory recall scope

Use these fields to control which memories are considered and how many are recalled:

  • filter: filter memories by tags, time, business fields, and other conditions.
  • knowledgebase_ids: specify which knowledge bases Chat can search.
  • relativity: control the relevance threshold for recalled memories.
  • memory_limit_number: limit the number of fact memories passed to the model.
data = {
  "user_id": "memos_user_123",
  "conversation_id": "0928",
  "query": "Use the knowledge base to summarize travel reimbursement rules.",
  "knowledgebase_ids": ["kb_xxx"],
  "filter": {
    "and": [
      {"tags": {"contains": "travel"}},
      {"create_time": {"gte": "2025-01-01"}}
    ]
  },
  "relativity": 0.8,
  "memory_limit_number": 9
}

Control model response behavior

Use these fields to specify the model, enable streaming, or adjust generation parameters:

  • model_name: specify the conversation model.
  • stream: control whether to stream the response.
  • temperature: control randomness.
  • top_p: control candidate token selection.
  • max_tokens: limit the maximum generated length.
data = {
  "user_id": "memos_user_123",
  "conversation_id": "0928",
  "query": "Summarize my travel preferences in a concise tone.",
  "model_name": "qwen2.5-72b-instruct",
  "stream": False,
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 1024
}

To fully customize model behavior, pass system_prompt to override the default system prompt.


Control whether new memories are written automatically

By default, Chat writes the current user message and model response into memory. If you only want to generate an answer and do not want this turn to enter memory processing, pass:

  • add_message_on_answer: whether to write this user message and model response into memory.
data = {
  "user_id": "memos_user_123",
  "conversation_id": "0928",
  "query": "Answer this once, but do not write this turn into memory.",
  "add_message_on_answer": False
}

For ordinary conversations, you can ignore these fields. When you want new memories generated by Chat to carry business ownership or control where they are written, use:

  • agent_id: mark which Agent the conversation belongs to.
  • app_id: mark which application the conversation comes from.
  • tags: add tags for future retrieval and filtering.
  • info: write custom business metadata such as scene, order ID, or status.
  • allow_public: whether to allow writing to project-level public memory.
  • allow_knowledgebase_ids: which knowledge bases can be written.