Continuous Conversation Chat
1. When to Use the Chat API
The Chat API is suitable for quickly building AI conversation applications with long-term memory. You only pass the user's current message; MemOS automatically handles memory recall, prompt assembly, model response generation, and conversation writing.
- Integrated conversational AI: one API completes conversation generation without a complex custom pipeline.
- Automatic memory handling: automatically extracts, updates, and retrieves memories, reducing manual maintenance.
- Continuous context: keeps understanding coherent across turns, days, and even sessions.
2. Compared with Memory Operation APIs
Use Chat
Best for general AI conversations, business PoCs, and quick validation
Use Memory Operation APIs
Best for complex Agents and deeper business-system integration
| Dimension | Chat API | Memory operation APIs |
|---|---|---|
| Integration complexity | Low, ready to use | Medium, requires orchestration |
| Memory management | Automatic | Manually add, search, and assemble |
| Model response | Generated by MemOS built-in model | Call your own external model |
| Control | Good for common configuration | Good for complex pipelines and fine-grained control |
3. How It Works

- If historical user messages exist, call
add/messageto write them into MemOS first. - When the end user sends a message, your AI application calls
chatwith the user message and related parameters. - MemOS recalls historical memories related to the current user message and assembles custom instructions, current conversation context, and user memories.
- MemOS calls the model to generate an answer and returns the result to your AI application.
- By default, MemOS asynchronously processes the user message and model response in the background and writes them as memories.
4. Quick Start
Optional: add historical messages
If you already have conversation history, call add/message first. For a new user or a new conversation, skip this step and call Chat directly.
import requests
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://memos.memtensor.cn/api/openmem/v1"
data = {
"user_id": "memos_user_123",
"conversation_id": "0610",
"messages": [
{"role": "user", "content": "I booked a summer trip to Guangzhou. Which hotel chains are available?"},
{"role": "assistant", "content": "You can consider 7 Days Inn, Ji Hotel, Hilton, and others."},
{"role": "user", "content": "I'll choose 7 Days Inn."},
{"role": "assistant", "content": "Got it. Feel free to ask if you have other questions."}
]
}
res = requests.post(
f"{BASE_URL}/add/message",
headers={"Authorization": f"Token {API_KEY}"},
json=data
)
print(res.json())
from memos.api.client import MemOSClient
client = MemOSClient(api_key="YOUR_API_KEY")
messages = [
{"role": "user", "content": "I booked a summer trip to Guangzhou. Which hotel chains are available?"},
{"role": "assistant", "content": "You can consider 7 Days Inn, Ji Hotel, Hilton, and others."},
{"role": "user", "content": "I'll choose 7 Days Inn."},
{"role": "assistant", "content": "Got it. Feel free to ask if you have other questions."}
]
res = client.add_message(
messages=messages,
user_id="memos_user_123",
conversation_id="0610"
)
print(res)
Call Chat
When you call chat, MemOS automatically retrieves relevant memories and generates an answer.
import requests
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://memos.memtensor.cn/api/openmem/v1"
data = {
"user_id": "memos_user_123",
"conversation_id": "0928",
"query": "I want to travel during National Day. Recommend a city I have not visited and a hotel brand I have not stayed at."
}
res = requests.post(
f"{BASE_URL}/chat",
headers={"Authorization": f"Token {API_KEY}"},
json=data
)
print(res.json())
from memos.api.client import MemOSClient
client = MemOSClient(api_key="YOUR_API_KEY")
res = client.chat(
user_id="memos_user_123",
conversation_id="0928",
query="I want to travel during National Day. Recommend a city I have not visited and a hotel brand I have not stayed at."
)
print(res)
curl --request POST \
--url https://memos.memtensor.cn/api/openmem/v1/chat \
--header 'Authorization: Token YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"user_id": "memos_user_123",
"conversation_id": "0928",
"query": "I want to travel during National Day. Recommend a city I have not visited and a hotel brand I have not stayed at."
}'
For the full field list, request format, and response format, see the Chat API documentation.
5. Limits
- Input limit: 8,000 tokens.
- Output limit: up to 25 fact memories and up to 25 preference memories can be recalled.
6. More Usage Options
The Chat API works out of the box. The following parameters are optional and only needed when you want to control memory recall, model responses, or memory writing.
Control memory recall scope
Use these fields to control which memories are considered and how many are recalled:
filter: filter memories by tags, time, business fields, and other conditions.knowledgebase_ids: specify which knowledge bases Chat can search.relativity: control the relevance threshold for recalled memories.memory_limit_number: limit the number of fact memories passed to the model.
data = {
"user_id": "memos_user_123",
"conversation_id": "0928",
"query": "Use the knowledge base to summarize travel reimbursement rules.",
"knowledgebase_ids": ["kb_xxx"],
"filter": {
"and": [
{"tags": {"contains": "travel"}},
{"create_time": {"gte": "2025-01-01"}}
]
},
"relativity": 0.8,
"memory_limit_number": 9
}
Control model response behavior
Use these fields to specify the model, enable streaming, or adjust generation parameters:
model_name: specify the conversation model.stream: control whether to stream the response.temperature: control randomness.top_p: control candidate token selection.max_tokens: limit the maximum generated length.
data = {
"user_id": "memos_user_123",
"conversation_id": "0928",
"query": "Summarize my travel preferences in a concise tone.",
"model_name": "qwen2.5-72b-instruct",
"stream": False,
"temperature": 0.7,
"top_p": 0.95,
"max_tokens": 1024
}
To fully customize model behavior, pass system_prompt to override the default system prompt.
Control whether new memories are written automatically
By default, Chat writes the current user message and model response into memory. If you only want to generate an answer and do not want this turn to enter memory processing, pass:
add_message_on_answer: whether to write this user message and model response into memory.
data = {
"user_id": "memos_user_123",
"conversation_id": "0928",
"query": "Answer this once, but do not write this turn into memory.",
"add_message_on_answer": False
}
For ordinary conversations, you can ignore these fields. When you want new memories generated by Chat to carry business ownership or control where they are written, use:
agent_id: mark which Agent the conversation belongs to.app_id: mark which application the conversation comes from.tags: add tags for future retrieval and filtering.info: write custom business metadata such as scene, order ID, or status.allow_public: whether to allow writing to project-level public memory.allow_knowledgebase_ids: which knowledge bases can be written.