LLMs and Embeddings
Overview
MemOS decouples model logic from runtime config via two Pydantic factories:
Factory | Produces | Typical backends |
---|---|---|
LLMFactory | Chat‑completion model | ollama , openai , qwen , deepseek , huggingface |
EmbedderFactory | Text‑to‑vector encoder | ollama , sentence_transformer , universal_api |
Both factories accept a *_ConfigFactory(model_validate(...))
blob, so you can switch provider with a single backend=
swap.
LLM Module
Supported LLM Backends
Backend | Notes | Example Model Id |
---|---|---|
ollama | Local llama‑cpp runner | qwen3:0.6b etc. |
openai | Official or proxy | gpt-4o-mini , gpt-3.5-turbo etc. |
qwen | DashScope‑compatible | qwen-plus , qwen-max-2025-01-25 etc. |
deepseek | DeepSeek REST API | deepseek-chat , deepseek-reasoner etc. |
huggingface | Transformers pipeline | Qwen/Qwen3-1.7B etc. |
LLM Config Schema
Common fields:
Field | Type | Default | Description |
---|---|---|---|
model_name_or_path | str | – | Model id or local tag |
temperature | float | 0.8 | |
max_tokens | int | 1024 | |
top_p / top_k | float / int | 0.9 / 50 | |
API‑specific | e.g. api_key , api_base | – | OpenAI‑compatible creds |
remove_think_prefix | bool | True | Strip /think role content |
Factory Usage
from memos.configs.llm import LLMConfigFactory
from memos.llms.factory import LLMFactory
cfg = LLMConfigFactory.model_validate({
"backend": "ollama",
"config": {"model_name_or_path": "qwen3:0.6b"}
})
llm = LLMFactory.from_config(cfg)
LLM Core APIs
Method | Purpose |
---|---|
generate(messages: list) | Return full string response |
generate_stream(messages) | Yield streaming chunks |
Streaming & CoT
messages = [{"role": "user", "content": "Let’s think step by step: …"}]
for chunk in llm.generate_stream(messages):
print(chunk, end="")
Find all scenarios in
examples/basic_modules/llm.py
.Performance Tips
- Use
qwen3:0.6b
for <2 GB footprint when prototyping locally. - Combine with KV Cache (see KVCacheMemory doc) to cut TTFT .
Embedding Module
Supported Embedder Backends
Backend | Example Model | Vector Dim |
---|---|---|
ollama | nomic-embed-text:latest | 768 |
sentence_transformer | nomic-ai/nomic-embed-text-v1.5 | 768 |
universal_api | text-embedding-3-large | 3072 |
Embedder Config Schema
Shared keys: model_name_or_path
, optional API creds (api_key
, base_url
), etc.
Factory Usage
cfg = EmbedderConfigFactory.model_validate({
"backend": "ollama",
"config": {"model_name_or_path": "nomic-embed-text:latest"}
})
embedder = EmbedderFactory.from_config(cfg)
MemScheduler
MemScheduler is a concurrent memory management system parallel running with the MemOS system, which coordinates memory operations between working memory, long-term memory, and activation memory in AI systems. It handles memory retrieval, updates, and compaction through event-driven scheduling. <br/> This system is particularly suited for conversational agents and reasoning systems requiring dynamic memory management.
KV Cache Memory
KVCacheMemory is a specialized memory module in MemOS for storing and managing key-value (KV) caches, primarily used to accelerate large language model (LLM) inference and support efficient context reuse. It is especially useful for activation memory in conversational and generative AI systems.