Embedding & RAG¶
Lango supports vector embeddings for semantic search and Retrieval-Augmented Generation (RAG). When enabled, relevant knowledge entries are retrieved via vector similarity and injected into the agent's context.
Embedding Providers¶
| Provider | Config Type | Default Model | Default Dimensions | Notes |
|---|---|---|---|---|
| OpenAI | openai |
text-embedding-3-small |
1536 | Also supports text-embedding-3-large |
google |
text-embedding-004 |
768 | Via Google Generative AI API | |
| Local | local |
nomic-embed-text |
768 | Ollama-compatible, no API key required |
Setup¶
Interactive Setup¶
Configure embedding interactively through the full settings editor:
lango settings
The 5-step lango onboard wizard only covers initial provider, agent, channel, and basic security bootstrap.
Config File¶
Using a Cloud Provider¶
Reference an existing entry from the providers map via providerID:
Settings:
lango settings→ Embedding & RAG
{
"providers": {
"my-openai": {
"type": "openai",
"apiKey": "${OPENAI_API_KEY}"
}
},
"embedding": {
"providerID": "my-openai",
"model": "text-embedding-3-small",
"dimensions": 1536,
"rag": {
"enabled": true,
"maxResults": 5
}
}
}
Using Local (Ollama) Embeddings¶
For local embeddings, set provider to local instead of using providerID:
Settings:
lango settings→ Embedding & RAG
{
"embedding": {
"provider": "local",
"model": "nomic-embed-text",
"dimensions": 768,
"local": {
"baseUrl": "http://localhost:11434/v1"
},
"rag": {
"enabled": true,
"maxResults": 5
}
}
}
Local Setup
Make sure Ollama is running and the embedding model is pulled:
ollama serve
ollama pull nomic-embed-text
RAG (Retrieval-Augmented Generation)¶
When embedding.rag.enabled is true, Lango performs semantic retrieval on every agent turn:
- The user's message is embedded into a vector
- The vector is compared against stored embeddings across collections
- The most similar entries are retrieved and injected into the agent's context
Collections¶
RAG searches across these embedding collections:
| Collection | Source |
|---|---|
knowledge |
Knowledge store entries |
observation |
Observational memory observations |
reflection |
Observational memory reflections |
learning |
Learning engine entries |
You can restrict which collections are searched:
Settings:
lango settings→ Embedding & RAG
{
"embedding": {
"rag": {
"enabled": true,
"maxResults": 5,
"collections": ["knowledge", "learning"]
}
}
}
Leave collections empty to search all collections.
Distance Filtering¶
Set maxDistance to filter out low-relevance results:
Settings:
lango settings→ Embedding & RAG
{
"embedding": {
"rag": {
"enabled": true,
"maxResults": 5,
"maxDistance": 0.8
}
}
}
Set to 0.0 (default) to disable distance filtering.
Configuration Reference¶
Settings:
lango settings→ Embedding & RAG
{
"embedding": {
"providerID": "",
"provider": "",
"model": "",
"dimensions": 0,
"local": {
"baseUrl": "http://localhost:11434/v1",
"model": ""
},
"rag": {
"enabled": false,
"maxResults": 5,
"collections": [],
"maxDistance": 0.0
}
}
}
| Key | Type | Default | Description |
|---|---|---|---|
providerID |
string |
"" |
References a key in the providers map |
provider |
string |
"" |
Set to "local" for Ollama embeddings |
model |
string |
varies | Embedding model identifier |
dimensions |
int |
varies | Vector dimensionality |
local.baseUrl |
string |
http://localhost:11434/v1 |
Ollama API endpoint |
local.model |
string |
"" |
Override model for local provider |
rag.enabled |
bool |
false |
Enable RAG context injection |
rag.maxResults |
int |
5 |
Maximum results to inject per query |
rag.collections |
[]string |
[] |
Collections to search (empty = all) |
rag.maxDistance |
float32 |
0.0 |
Maximum cosine distance (0.0 = disabled) |
Embedding Cache¶
Lango includes an automatic in-memory embedding cache to reduce redundant API calls:
| Parameter | Value |
|---|---|
| TTL | 5 minutes |
| Max entries | 100 |
| Eviction | Expired entries first, then oldest |
The cache is transparent and requires no configuration. It applies to both query embeddings and content embeddings.
Verification¶
Use lango doctor to verify your embedding configuration:
lango doctor
The doctor checks:
- Provider configuration is valid
- API key is set (for cloud providers)
- Embedding model is accessible
- Vector store is operational
Related¶
- Knowledge System -- Knowledge entries are embedded for RAG retrieval
- Observational Memory -- Observations and reflections are embedded
- Knowledge Graph -- Graph RAG combines vector + graph retrieval