MemReader 入门
本指南将带您逐步了解如何使用
SimpleStructMemReader
——借助大语言模型(LLMs)和嵌入模型,从对话和文档中提取结构化记忆。它非常适合用于构建具备记忆能力的对话式 AI、知识库和语义搜索系统。SimpleStructMemReader
初始化
首先,使用您首选的 LLM 和嵌入器模型配置并初始化读取器。
示例:
from memos.configs.mem_reader import SimpleStructMemReaderConfig
from memos.mem_reader.simple_struct import SimpleStructMemReader
reader_config = SimpleStructMemReaderConfig.from_json_file(
"examples/data/config/simple_struct_reader_config.json"
)
reader = SimpleStructMemReader(reader_config)
您可以根据环境自定义模型名称或后端。
获取您的第一个聊天记忆
从用户和助手之间的对话中提取结构化记忆。
示例输入:
conversation_data = [
[
{"role": "user", "content": "I have a meeting tomorrow at 3 PM"},
{"role": "assistant", "content": "What's the meeting about?"},
{"role": "user", "content": "It's about the Q4 project deadline"}
]
]
提取记忆:
memories = reader.get_memory(
conversation_data,
type="chat",
info={"user_id": "user_001", "session_id": "session_001"}
)
示例输出:
[
TextualMemoryItem(
id='2d5965f9-4c9b-4c24-9068-325b53db098b',
memory='Tomorrow at 3:00 PM, the user will meet with the Q4 project team to discuss the deadline.',
metadata=TreeNodeTextualMemoryMetadata(
user_id='user_001',
session_id='session_001',
status='activated',
type='fact',
confidence=0.99,
tags=['deadline', 'project'],
visibility=None,
updated_at='2025-07-03T14:34:33.535844',
memory_type='UserMemory',
key='Meeting schedule',
sources=[
"user: I have a meeting tomorrow at 3 PM",
"assistant: What's the meeting about?",
"user: It's about the Q4 project deadline"
],
embedding=[0.0058597163, ..., 0.009375607],
created_at='2025-07-03T14:34:33.535860',
usage=[],
background="The user plans to meet with the Q4 project team tomorrow at 3:00 PM to address the project's deadline. This action reflects their proactive approach to managing project timelines and their focus on ensuring timely completion."
)
)
]
读取器从对话会话中提取相关记忆和标签。
获取您的第一个文档记忆
处理文本文件以提取结构化摘要和标签。
示例代码:
doc_paths = [
"examples/mem_reader/text1.txt",
"examples/mem_reader/text2.txt",
]
doc_memories = reader.get_memory(
doc_paths,
type="doc",
info={
"user_id": "user_001",
"session_id": "session_001",
"chunk_size": 512,
"chunk_overlap": 128
}
)
示例输出:
[
TextualMemoryItem(
id='24dabd9f-200b-40c4-84cc-2c0fccaaf8fd',
memory='This is another sample document content for testing purposes.',
metadata=TreeNodeTextualMemoryMetadata(
user_id='user_001',
session_id='session_001',
status='activated',
type='fact',
memory_time=None,
source=None,
confidence=0.99,
entities=None,
tags=['Testing', 'Sample'],
visibility=None,
updated_at='2025-07-03T14:38:29.776147',
memory_type='LongTermMemory',
key='',
sources=['examples/mem_reader/text2.txt_0'],
embedding=[0.028731367, ..., -0.018501928],
created_at='2025-07-03T14:38:29.776213',
usage=[],
background=''
)
)
]
文档被分块和摘要以创建可搜索的知识项。
支持的文件
我们使用 markitdown
将文件转换为 Markdown 格式文本。
MarkItDown 目前支持从以下格式转换:
PDF
PowerPoint
Word
Excel
Images (EXIF metadata and OCR)
Audio (EXIF metadata and speech transcription)
HTML
Text-based formats (CSV, JSON, XML)
ZIP files (iterates over contents)
YouTube URLs
EPUBs
... and more!
(内容来源于 MarkItDown GitHub 仓库)
试试看:打印提取的记忆
for memory_list in memories:
for memory_item in memory_list:
print("🧠 Memory:", memory_item.memory)
print("🏷 Tags:", memory_item.metadata.tags)
print("👤 User ID:", memory_item.metadata.user_id)
print("📅 Created At:", memory_item.metadata.created_at)
print("---")
您现在已成功:
- 初始化了
SimpleStructMemReader
- 从聊天对话中提取了结构化记忆
- 从文档中提取了知识