Weaviate is an open-source, cloud-native vector database that stores both objects and vectors allowing for the combination of vector search with traditional structured filling. Weaviate can run in the following ways:
- Cloud : Weaviate can run over weaviate cloud abstracting all hardware and deployment from the user.
- Local(Docker): Weaviate can also run locally as a Docker container.

Core Architecture
Weaviate is organized into a 3-layer architecture for maximum performance, highly efficient vector search and ACID compliance.

1. API Layer(Top)
The client layer which handles the frontend and is closest to the user, handing all external communications
- REST API : Standard HTTP endpoint handling all CRUD operations
- GraphQL : Smarter alternative to REST API, trades payload length for query complexity.
- Grpc: High performance binary protocol by google for low latency applications.
2. Search Layer(Middle)
The primary engine of weaviate responsible for executing queries.
- Vector search(HNSW) : Hierarchical navigable small world is an algorithm that finds a data point in a dataset that’s very close to the given query point but not necessarily the absolute closest one.
- BM25 : Best Matching 25 algorithm is used for keyword searches , it uses an optimized version of traditional TF-IDF method.
- Hybrid Fusion : Uses Reciprocal rank fusion(RRF) algorithm, both HNSW and BM25 gives their own ranks to each documents , RRF is used to aggregate these rankings into one unified ranking system.
\mathrm{RRF}(d)=\sum_{i=1}^{N}\frac{1}{k+\mathrm{rank}_i(d)} d = A document or item
N = The number of rankings , usually equal to number of independent retrievers.
k = smoothing parameter(usually 60) prevents one rank from dominating.
rank = Rank / Position of how relevant the document is .
3. Storage Layer(Bottom)
Persistence layer with three specialized storage systems.
- Object Store : Stores the actual documents and their metadata , it is the main persistence layer.
- Inverted Index : Enables property filtering and keyword search.
- Vector Index : Organizes points in latent space , used by HNSW for vector similarity search.
Implementation
Let's try to run weaviate on docker and run a similarity search on some sample embedding objects.
Step 1: Install required libraries & check if all dependencies are installed:
- docker --version : Shows the installed Docker version
- docker info : Displays Docker system and runtime details
- pip install -U weaviate-client[agents] : Installs or updates the Weaviate Python client with agent support
- touch docker-compose.yaml : Creates an empty docker-compose configuration file
%%shell
docker --version
docker info
pip install -U weaviate-client[agents]
touch docker-compose.yaml # Create a .yaml file for config
Step 2: Starting weaviate and ollama with docker compose , paste the below config in a docker-compose.yaml file.
services:
weaviate:
command:
- --host
- 0.0.0.0
- --port
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.35.1
ports:
- 8080:8080
- 50051:50051
volumes:
- weaviate_data:/var/lib/weaviate
restart: on-failure:0
environment:
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
ENABLE_MODULES: 'text2vec-ollama,generative-ollama'
CLUSTER_HOSTNAME: 'node1'
OLLAMA_API_ENDPOINT: 'http://ollama:11434'
depends_on:
- ollama
ollama:
image: ollama/ollama:0.12.9
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
volumes:
weaviate_data:
ollama_data:
Step 3: Start the docker containers of weaviate and ollama
!docker-compose up -d
!docker compose exec ollama ollama pull nomic-embed-text # Embedding model
Step 4: Create a weaviate collection and generate embeddings using Ollama nomic embed text model:
import weaviate
from weaviate.classes.config import Configure
def seed_movies():
movie_records = [
{
"title": "The Matrix",
"description": (
"A computer hacker discovers the true nature of reality and "
"his part in the conflict against its hidden rulers."
),
"genre": "Science Fiction",
},
{
"title": "Spirited Away",
"description": (
"A young girl is trapped in a strange spirit realm and must "
"rescue her parents to return home."
),
"genre": "Animation",
},
{
"title": "The Lord of the Rings: The Fellowship of the Ring",
"description": (
"A humble Hobbit and his allies embark on a dangerous quest "
"to destroy a powerful ring and save Middle-earth."
),
"genre": "Fantasy",
},
]
return movie_records
# Establish connection to the local Weaviate instance
with weaviate.connect_to_local() as client:
# Create the Movie collection with Ollama-based vectorization
client.collections.create(
name="Movie2",
vector_config=Configure.Vectors.text2vec_ollama(
model="nomic-embed-text",
api_endpoint="http://ollama:11434", # adjust if running outside Docker
),
)
movie_collection = client.collections.use("Movie2")
movies_to_insert = seed_movies()
# Batch insert objects
with movie_collection.batch.fixed_size(batch_size=200) as batch:
for movie in movies_to_insert:
batch.add_object(properties=movie)
print(
f"Imported and vectorized {len(movie_collection)} "
f"records into the Movie collection"
)
Output:

Step 5 : We perform a vector search, Internally weaviate automatically performs a HNSW + BM25 search and gives the closest 'k' results.
import json
import weaviate
def run_semantic_search(collection, text_query, top_k=2):
result = collection.query.near_text(
query=text_query,
limit=top_k,
)
return result.objects
# Open a connection to the local Weaviate instance
with weaviate.connect_to_local() as client:
movie_collection = client.collections.use("Movie2")
hits = run_semantic_search(
collection=movie_collection,
text_query="sci-fi",
top_k=2,
)
for item in hits:
print(json.dumps(item.properties, indent=2))
Output:

You can find the source code here
Using weaviate for RAG
Weaviate for RAG (Retrieval-Augmented Generation) means using Weaviate as the vector database / retriever in a RAG pipeline so an LLM can ground its answers in your data.
In a standard RAG setup:
- Ingest data = chunk documents -> create embeddings
- Store embeddings = this is where Weaviate is used
- Retrieve relevant chunks for a user query.
- Generate an answer using an LLM + retrieved context
Weaviate handles steps 2 & 3, using native vector search and embedding store built into it, Weaviate integrates well with frameworks like langchain, llamaindex e.t.c.
Use Cases
- Semantic Search: Enables similarity search over text embeddings from models like OpenAI, Cohere or SentenceTransformers.
- Recommendation Systems: Matches user embeddings to item embeddings for personalized recommendations.
- Image & Video Retrieval: Finds visually similar content using feature embeddings from CNNs or CLIP models.
- Anomaly Detection: Identifies unusual data points in high-dimensional feature space.
- Generative AI & RAG Systems: Integrates with LangChain, LlamaIndex and LLMs for embedding-based retrieval augmentation.
Weaviate v/s Traditional Databases
Database | Best For | Key strength |
|---|---|---|
Weaviate | Hybrid Search & RAG | Native Hybrid search |
Pinecone | Production apps | Ease of use & reliablity |
Milvus | Large-scale performance | Horizontal scaling |
Qdrant | Advanced filtering | Rust performance |
Chroma | Prototyping & LLMs | Developer experience |
Advantages
- Native Hybrid Search: combines both semantic as well as keyword based search, which most providers don't.
- Open-source flexibility: Due to its open-source nature ,people avoid vendor lock-in.
- Multi-modal support: Supports images, videos, texts out of the box.
Limitations
- Operational complexity(Self-Hosted): running weaviate locally raises complexity and requires specialized people or team to handle, adding to deployment complexity.
- Smaller ecosystem: Relatively newer so lacks adoption by big players, less battle-tested software.
- Limited free-trial (cloud): Only provides a 14-day free tier which may not be enough for most use cases.