How to Add Memory To Your Agent with Weaviate (Step by Step)

🌐🇩🇪 Deutsch 🇫🇷 Français 🇫🇷 Français 🇪🇸 Español 🇺🇸 English

📖 9 min read•1,734 words•Updated Mar 21, 2026

Weaviate Add Memory to Your Agent: A 2500-Word No-Nonsense Tutorial

If you want your intelligent agent to actually remember the context between conversations, you need to weaviate add memory to your agent the right way, using vector search to store and recall previous interactions. We’re not just tossing snippets into some database; we’re building an actual knowledge graph enriched with semantic search that keeps your agent sharp.

What You’re Building and Why It Matters

We’re building an agent that uses Weaviate for memory storage so it can have meaningful back-and-forth conversations, not just stateless Q&A. Forget those shallow chatbot demos that reset every question — this is about contextual persistence your users will actually feel.

Prerequisites

Python 3.11+
Weaviate Community Edition (latest stable, I used 1.19.0)
Pip packages: weaviate-client>=4.16.0, requests, dotenv
OpenAI API key or any other embedding model API key (GPT-4, Cohere, or Huggingface)
Docker installed (optional but recommended for running Weaviate locally)
Basic understanding of vector databases and embedding generation

Quick Stats on Weaviate

Metric	Value
GitHub Stars	15,839
Forks	1,227
Open Issues	582 (as of Mar 20, 2026)
License	BSD-3-Clause
Last Update	March 20, 2026

These numbers show Weaviate is mature but still active; you won’t be stuck waiting on some abandoned library.

Step-by-Step: Weaviate Add Memory to Your Agent

Step 1: Set Up Weaviate Server

# Run Weaviate with Docker for quick local testing
docker run -d \
 -p 8080:8080 \
 -e AUTH_ANONYMOUS_ACCESS_ENABLED=true \
 -e QUERY_DEFAULTS_LIMIT=20 \
 -e PERSISTENCE_DATA_PATH=/var/lib/weaviate \
 -v $(pwd)/weaviate_data:/var/lib/weaviate \
 semitechnologies/weaviate:latest

Why this matters: Weaviate supports persistence and anonymous access on by default here, making local dev easy. Setting QUERY_DEFAULTS_LIMIT=20 prevents you’ll run into annoying 10-result limits on queries, which bite even experienced devs. Remember, if you don’t want anonymous access, set up API keys. But for testing, this is fine.

Common errors: If the container crashes, check that port 8080 is free. Sometimes Docker Desktop or previous containers hog it. Also, if you pump embeddings faster than weaviate’s persistence can keep up (rare locally), expect lag.

Step 2: Define a Schema for Your Agent Memory

import weaviate

client = weaviate.Client("http://localhost:8080")

schema = {
 "classes": [
 {
 "class": "MemoryEntry",
 "description": "Stores a single memory segment for the agent",
 "properties": [
 {
 "name": "text",
 "dataType": ["text"],
 "description": "The text content of the memory."
 },
 {
 "name": "embedding",
 "dataType": ["number[]"],
 "description": "Vector embedding representation."
 },
 {
 "name": "timestamp",
 "dataType": ["date"],
 "description": "When memory was stored."
 }
 ],
 "vectorizer": "none" # We'll inject embeddings ourselves
 }
 ]
}

client.schema.delete_all() # Clean slate
client.schema.create(schema)

Why no vectorizer? The main mistake newbies make is using Weaviate’s default text vectorizers blindly. It’s fine for simple text search, but when you want tight control over your embedding model (e.g., OpenAI, Cohere), you have to upload vectors yourself. This clears confusion from competing tutorials that overuse Weaviate’s ON-BOARD vectorizers for agents.

Deleting the schema before creates forces a fresh start, which saves you from weird “schema already exists” errors. Sure, it wipes data, but this is local dev.

Step 3: Generate and Upload Memory Embeddings

import os
import requests
from datetime import datetime

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

def get_openai_embedding(text):
 resp = requests.post(
 "https://api.openai.com/v1/embeddings",
 headers={"Authorization": f"Bearer {OPENAI_API_KEY}"},
 json={"input": text, "model": "text-embedding-ada-002"}
 )
 resp.raise_for_status()
 return resp.json()["data"][0]["embedding"]

def add_memory_entry(client, text):
 embedding = get_openai_embedding(text)
 memory_obj = {
 "text": text,
 "embedding": embedding,
 "timestamp": datetime.utcnow().isoformat()
 }
 client.data_object.create(memory_obj, "MemoryEntry", vector=embedding)

# Example usage
add_memory_entry(client, "I remember that the sky is blue.")
add_memory_entry(client, "The meaning of life is 42.")

Why generate embeddings yourself? Because it’s insanely valuable to decouple embedding generation and vector storage. You might need to swap your embedding model later or aggregate vectors from different sources. Those other tutorials mix it up and you get locked in.

Common mistake here: forgetting to pass the embedding as the vector parameter vector=embedding in the data_object.create call. If you skip this, Weaviate will try to vectorize it automatically (and fail because we set vectorizer: none).

Step 4: Query Memory with Semantic Search

def query_memory(client, question, top_k=3):
 question_embedding = get_openai_embedding(question)
 near_vector = {"vector": question_embedding}
 response = client.query.get("MemoryEntry", ["text", "timestamp"])\
 .with_near_vector(near_vector)\
 .with_limit(top_k)\
 .do()
 results = response.get("data", {}).get("Get", {}).get("MemoryEntry", [])
 return results

# Test it
results = query_memory(client, "What color is the sky?")
for res in results:
 print(f"Memory: {res['text']} (stored at {res['timestamp']})")

Why semantic search? Keyword search in memory is useless for agents that have to handle multi-turn conversations with nuanced queries. The real magic is embedding search for “similar thoughts” without exact keyword overlap.

The killer feature is how you mix & match these queries dynamically at runtime to keep your agent context-aware without choking on irrelevant memories.

Step 5: Wire Your Agent to Use Weaviate Memory

class MemoryAgent:
 def __init__(self, weaviate_client):
 self.client = weaviate_client

 def remember(self, text):
 add_memory_entry(self.client, text)

 def recall(self, question):
 return query_memory(self.client, question, top_k=5)

 def chat(self, question):
 memories = self.recall(question)
 # Simple concatenation for prompt—replace with your prompt template
 prompt_context = "\n".join([m["text"] for m in memories])
 prompt = f"Context:\n{prompt_context}\n\nQuestion: {question}\nAnswer:"
 
 # Send prompt to LLM (not covered here)
 # Simulate:
 return f"Simulated answer based on memories:\n{prompt}"

agent = MemoryAgent(client)
agent.remember("The agent was created on March 20, 2026.")
agent.remember("Python is the recognized programming language for AI.")

print(agent.chat("When was the agent created?"))

Why this design? Separating memory management from the agent’s “chat” logic is crucial. I hate tightly coupled code where managing context makes your generation call a nested mess. With this pattern, you track, store, and recall independently. If you want to swap in a better LLM, or add caching, it’s trivial.

The Gotchas You Won’t Hear from Other Tutorials

Schema confusion. Many tutorials gloss over schema design but if your schema is off, your vectors and metadata are garbage. You’ll regret not planning for timestamps or metadata like user IDs to filter your memory.
Embedding costs matter. OpenAI embeddings aren’t free. If you add thousands of memories, expect a bill. Batch your generation to cut costs and cache embeddings aggressively.
Memory bloat kills performance. Querying a vector DB with tens of thousands of entries? You’ll see latency spikes and noisy recall results. Regular pruning is needed—a hard but unavoidable reality.
Vector dimension mismatches. If you accidentally swap embedding models or versions, your stored vectors won’t match query vectors. You get zero results or misleading matches. Always fix your embedding model version in your pipeline.
Data consistency is a pain. Without transaction support, partial writes or updates can leave dangling memory without context. This subtle bug can haunt you in production, especially if you’re updating memories.

Full Working Example

import os
import requests
from datetime import datetime
import weaviate

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
WEAVIATE_URL = "http://localhost:8080"

def get_openai_embedding(text):
 r = requests.post(
 "https://api.openai.com/v1/embeddings",
 headers={"Authorization": f"Bearer {OPENAI_API_KEY}"},
 json={"input": text, "model": "text-embedding-ada-002"},
 )
 r.raise_for_status()
 return r.json()["data"][0]["embedding"]

client = weaviate.Client(WEAVIATE_URL)
client.schema.delete_all()

schema = {
 "classes": [
 {
 "class": "MemoryEntry",
 "description": "Stores agent memory",
 "properties": [
 {"name": "text", "dataType": ["text"]},
 {"name": "embedding", "dataType": ["number[]"]},
 {"name": "timestamp", "dataType": ["date"]}
 ],
 "vectorizer": "none"
 }
 ]
}
client.schema.create(schema)

def add_memory(text):
 embedding = get_openai_embedding(text)
 data = {"text": text, "embedding": embedding, "timestamp": datetime.utcnow().isoformat()}
 client.data_object.create(data, "MemoryEntry", vector=embedding)

def query_memories(question, top_k=5):
 q_emb = get_openai_embedding(question)
 near_vector = {"vector": q_emb}
 res = client.query.get("MemoryEntry", ["text", "timestamp"])\
 .with_near_vector(near_vector).with_limit(top_k).do()
 return res.get("data", {}).get("Get", {}).get("MemoryEntry", [])

class Agent:
 def __init__(self, client):
 self.client = client

 def remember(self, text):
 add_memory(text)

 def chat(self, question):
 memories = query_memories(question)
 context = "\n".join([m["text"] for m in memories])
 prompt = f"Context:\n{context}\n\nQuestion: {question}\nAnswer:"
 # Dummy answer
 return prompt

agent = Agent(client)
agent.remember("The sky is blue.")
agent.remember("Water boils at 100 degrees Celsius.")
print(agent.chat("What color is the sky?"))

What’s Next?

After mastering Weaviate memory integration, your next concrete step is to build a memory pruning and refreshing pipeline. This means periodically evaluating which memories are stale or irrelevant and deleting or updating them. If you let memory grow unchecked, your agent’s responses will slow down and become inconsistent.

Consider implementing time-based decay or memory importance ranking to keep your database lean. This will hone the agent’s focus on what actually matters.

FAQ

Why should I disable Weaviate’s built-in vectorizer?

Built-in vectorizers are fine for simple demos but lock you into their specific embedding model and make updating impossible. You want to inject your own embeddings generated from models like OpenAI’s text-embedding-ada-002, so you control vector quality and API costs.

What do I do if my queries are returning zero results?

First, confirm your embedding dimensions match. OpenAI’s ada-002 embeddings are 1536-dimensional vectors. If your stored vectors and query vectors differ in dimension, similarity search returns nothing. Also check that your schema’s vectorizer is ‘none’ and that you explicitly send your own vectors when creating objects.

How do I manage embedding costs with thousands of memory entries?

Batch embed your data instead of one-off calls, and cache embeddings on disk or Redis. Preprocessing new info offline can also help. Also, be strategic in what you store — store summaries, not full conversations. Lastly, explore cheaper open-source models, but beware of accuracy.

Recommendations For Developer Personas

Developer Type	Recommended Next Steps
AI Researcher	Integrate Weaviate with custom transformer embeddings and explore hybrid search combining text and vector similarity.
Backend Engineer	Implement memory lifecycle management with automated pruning, index rebuilding, and durability monitoring.
Full-Stack Developer	Build a UI dashboard to visualize and manage memory entries, and connect Weaviate memory with your frontend chat interface.

Data as of March 21, 2026. Sources: https://github.com/weaviate/weaviate, https://weaviate.io/product/integrations/mem0

🕒 Published: March 21, 2026

🧬

Written by Jake Chen

Deep tech researcher specializing in LLM architectures, agent reasoning, and autonomous systems. MS in Computer Science.

Learn more →

How to Add Memory To Your Agent with Weaviate (Step by Step)

Weaviate Add Memory to Your Agent: A 2500-Word No-Nonsense Tutorial

What You’re Building and Why It Matters

Prerequisites

Quick Stats on Weaviate

Step-by-Step: Weaviate Add Memory to Your Agent

Step 1: Set Up Weaviate Server

Step 2: Define a Schema for Your Agent Memory

Step 3: Generate and Upload Memory Embeddings

Step 4: Query Memory with Semantic Search

Step 5: Wire Your Agent to Use Weaviate Memory

The Gotchas You Won’t Hear from Other Tutorials

Full Working Example

What’s Next?

FAQ

Why should I disable Weaviate’s built-in vectorizer?

What do I do if my queries are returning zero results?

How do I manage embedding costs with thousands of memory entries?

Recommendations For Developer Personas

Related Articles

Related Articles

Weaviate Add Memory to Your Agent: A 2500-Word No-Nonsense Tutorial

What You’re Building and Why It Matters

Prerequisites

Quick Stats on Weaviate

Step-by-Step: Weaviate Add Memory to Your Agent

Step 1: Set Up Weaviate Server

Step 2: Define a Schema for Your Agent Memory

Step 3: Generate and Upload Memory Embeddings

Step 4: Query Memory with Semantic Search

Step 5: Wire Your Agent to Use Weaviate Memory

The Gotchas You Won’t Hear from Other Tutorials

Full Working Example

What’s Next?

FAQ

Why should I disable Weaviate’s built-in vectorizer?

What do I do if my queries are returning zero results?

How do I manage embedding costs with thousands of memory entries?

Recommendations For Developer Personas

Related Articles

You May Also Like

📚 You Might Also Like

Related Articles