
- Artificial Intelligence
- Data & Analytics
- Data, AI & Analytics
Milvus Vector Database: Modern Similarity Search for AI Applications

Milvus Vector Database: Modern Similarity Search for AI Applications
Every recommendation you receive, every semantic search query that “just gets it,” every AI chatbot that remembers context — they’re all quietly backed by a technology most developers have only recently begun to understand: the vector database.
At the center of this quiet revolution is Milvus — an open-source, cloud-native vector database that has become the go-to infrastructure for storing, indexing, and searching high-dimensional vector embeddings at scale. Whether you’re building a RAG pipeline, a recommendation engine, or a multimodal search system, Milvus is likely in the conversation.
This post digs into what Milvus is, why it exists, how it works under the hood, and when you should (and shouldn’t) use it.
Why Vectors? Why Now?
Modern AI models — whether they’re language models, image encoders, or recommendation systems — represent data not as text or numbers, but as vectors: dense arrays of floating-point numbers in high-dimensional space. A word, sentence, image, or user behavior profile becomes a point in 768, 1536, or even 4096-dimensional space.
The power of this representation is profound: things that are semantically or conceptually similar end up geometrically close to each other. “King minus Man plus Woman” famously gives you something close to “Queen” in embedding space. A photo of a sunset and the sentence “warm evening sky” land near each other when encoded by a multimodal model.
Traditional databases are built for exact matches and range queries. But with vectors, you almost never want an exact match — you want the k nearest neighbors. Finding those efficiently across billions of vectors is a fundamentally different computational problem. This is where vector databases come in — and where Milvus was purpose-built to excel.
What Is a Vector Database?
A vector database stores data as vectors instead of plain text, numbers, or rows. A vector is a list of numbers that represents the meaning of some data — generated by machine learning models like SentenceTransformer, OpenAI Embeddings, or multimodal image encoders.
Take two sentences:
- “How to reset my password?”
- “I forgot my login password”
The words are different, but their vector representations land close to each other in embedding space. A vector database exploits this to return semantically relevant results, not just lexically matching ones. Unlike a traditional database, it’s designed to answer questions like: Which item is most similar? Which document shares the same meaning? Which image looks like this one?
Why Traditional Databases Fall Short
Traditional databases like MySQL or PostgreSQL are optimized for exact matching:
1 | SELECT * FROM users WHERE name = 'John'; |
This works perfectly when you know the exact value. But it falls apart when you want to search by meaning. If your database contains “How to reset my password” and a user searches for “I forgot my login password,” a traditional database may return nothing — the text simply doesn’t match.
Vector databases solve this by converting both texts into embeddings and comparing their similarity rather than their characters.
That said, traditional databases aren’t going anywhere. They remain the right tool for filtering, sorting, joins, and transactions. For semantic search, recommendation systems, and AI-powered retrieval, though, they’re not enough on their own.
What Is Milvus?
Milvus is a high-performance, open-source vector database designed for large-scale similarity search. It supports millions — and in distributed mode, billions — of vectors, with fast search, metadata filtering, and a choice of indexing algorithms to match your performance and accuracy requirements.
Its headline capabilities:
- Sub-100ms similarity search at high query throughput
- Multiple index types tuned for different scale and accuracy tradeoffs
- Metadata filtering alongside vector search
- A cloud-native distributed architecture that scales compute and storage independently
- Native support for both dense and sparse vector search
How Milvus Is Built
One of Milvus’s most distinctive architectural choices is a clean separation between storage and compute — the same philosophy behind modern cloud data warehouses like Snowflake, but applied to vector search.
1 2 3 4 5 6 7 8 9 | Client ↓ Proxy (stateless, load-balanced entry points) ↓ Coordinator Layer (Root, Query, Data, Index coordinators) ↓ Worker Nodes (Query / Data / Index) ↓ Storage (Object storage + etcd + message stream) |
Access Layer — Proxy nodes handle request routing, schema validation, and result aggregation. They’re stateless, so you can scale them horizontally without coordination overhead.
Coordinator Layer — Root, Query, Data, and Index coordinators manage cluster topology, segment assignment, and index scheduling. They’re the brain of the operation.
Worker Layer — Query Nodes run vector search. Data Nodes handle the write path. Index Nodes build indexes asynchronously in the background.
Storage Layer — Segments live in object storage (S3 or MinIO). Metadata lives in etcd. The write-ahead stream runs on Pulsar or Kafka.
The key insight: query nodes load segments from object storage on demand and cache them in memory. This means you can scale query throughput independently of storage costs — and you never lose data if a query node goes down. It’s a fundamentally more elastic model than traditional shared-nothing architectures.
Core Concepts
Collection
A collection in Milvus is similar to a table in SQL. For example: – documents – products – users
Field
A field is similar to a column. A collection may contain fields such as: – id – text – category – embedding
Vector Field
The vector field is where the embedding is stored.
1 | [0.12, -0.45, 0.88, ...] |
This field is what Milvus searches against.
How Similarity Search Works
When you run a query, Milvus converts it into an embedding vector and compares it against the stored vectors to find the closest matches. Take a query like “Forgot my account password”:
| Stored Text | Similarity Score |
|---|---|
| How to reset password | 0.95 |
| Python tutorial | 0.21 |
| Milvus installation guide | 0.10 |
The top result wins — not because the words match, but because the vectors are geometrically close. Milvus supports three similarity metrics, each suited to different use cases:
- L2 Distance — smaller distance means more similar; works well for raw embeddings
- Cosine Similarity — compares the angle between vectors; ideal for text embeddings where direction matters more than magnitude
- Inner Product (IP) — useful for normalized embeddings
For text-based search, cosine similarity is usually the right default. When datasets grow large enough that brute-force comparison becomes too slow, Milvus switches to approximate nearest neighbor algorithms — which brings us to indexes.
Index Types
Indexes are what make production-scale vector search feasible. Rather than comparing every stored vector against every query, indexes let Milvus prune the search space intelligently.
FLAT — Exact brute-force search. Perfectly accurate but doesn’t scale. Fine for small datasets or ground-truth benchmarking.
IVF_FLAT — Partitions the space into clusters, then searches only the most relevant ones. Good balance of speed and accuracy for medium-sized datasets.
HNSW — Hierarchical Navigable Small World graphs. Fast, highly accurate, and the most widely used index type in production Milvus deployments.
IVF_PQ — Combines IVF clustering with product quantization to compress vectors in memory. The choice when RAM is the bottleneck.
DiskANN — Designed for billion-scale datasets where even compressed vectors don’t fit in memory. Queries hit disk, but intelligently.
For most production deployments, HNSW is the right starting point.
Getting Started: A Practical Walkthrough
Here’s how to set up a basic semantic search pipeline with Milvus in Python — storing documents as vector embeddings and querying them by natural language.
Installation
1 2 3 4 5 6 7 8 | # Install the Milvus Python SDK and an embedding library pip install pymilvus sentence-transformers # For local development, use Milvus Lite (no server needed) # Or spin up a Docker instance: docker run -d --name milvus-standal>\ -p 19530:19530 -p 9091:9091 \ milvusdb/milvus:latest standalone |
Create a Collection and Insert Vectors
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | from pymilvus import MilvusClient from sentence_transformers import SentenceTransformer # Connect to Milvus Lite database file client = MilvusClient("./milvus_demo.db") COLLECTION_NAME = "documents" # Drop collection if it already exists (optional for reruns) if client.has_collection(COLLECTION_NAME): client.drop_collection(COLLECTION_NAME) # Create collection client.create_collection( collection_name=COLLECTION_NAME, dimension=768, # all-mpnet-base-v2 embedding dimension metric_type="COSINE" ) # Load embedding model model = SentenceTransformer("all-mpnet-base-v2") # Documents to store docs = [ "Milvus is an open-source vector database", "HNSW is a graph-based approximate nearest neighbor algorithm", "LLMs represent text as high-dimensional embeddings", "PostgreSQL is a relational database management system", "Milvus supports cosine similarity and vector search", ] # Create embeddings embeddings = model.encode(docs).tolist() # Prepare records data = [] for i, (doc, embedding) in enumerate(zip(docs, embeddings)): data.append( { "id": i, "vector": embedding, "text": doc, } ) # Insert into collection insert_result = client.insert( collection_name=COLLECTION_NAME, data=data ) print("Inserted IDs:", insert_result) # Query text query = "What is a vector database?" # Embed query query_embedding = model.encode([query]).tolist()[0] # Search results = client.search( collection_name=COLLECTION_NAME, data=[query_embedding], limit=3, output_fields=["text"] ) # Print results print("\nSearch Results:") for hit in results[0]: print(f"Score: {hit['distance']:.4f}") print(f"Text : {hit['entity']['text']}") print("-" * 50) |
Query by Semantic Similarity
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # Embed the query query = "What algorithm does Milvus use for fast search?" query_vec = model.encode([query]).tolist() # Search — returns top 2 semantically similar docs results = client.search( <span class="ͼe">collection_name</span><span class="ͼ8">=</span><span class="ͼc">"documents"</span>, data=query_vec, limit=2, output_fields=["text"] ) for hit in results[0]: print(f"Score: {hit['distance']:.4f} | {hit['entity']['text']}") # Score: 0.7234 | HNSW is a graph-based approximate nearest neighbor algorithm # Score: 0.6891 | Milvus is an open-source vector database |
The right result surfaces at the top — not because it contains the words “fast search,” but because it’s semantically closest to the query.
How Does Milvus Compare?
| Feature | Milvus | Pinecone | Weaviate | pgvector |
|---|---|---|---|---|
| Open source | ✓ Apache 2.0 | Proprietary | ✓ BSD | ✓ PostgreSQL |
| Self-hosted | ✓ Full control | Managed only | ✓ | ✓ via PostgreSQL |
| Billion-scale | ✓ Native | With serverless | Possible | Difficult |
| GPU support | ✓ Native indexes | — | — | — |
| Hybrid search | ✓ Dense + Sparse | Beta | ✓ BM25 fusion | Manual |
| Horizontal scaling | ✓ Distributed mode | ✓ Managed | ✓ | Limited |
| Multi-tenancy | ✓ Partition keys | ✓ Namespaces | ✓ | Schema-level |
The short version: Milvus wins on raw performance, flexibility, and ownership. Pinecone wins on developer experience if you never want to think about infrastructure. pgvector wins if you’re already deep in the PostgreSQL ecosystem and your scale is modest.
A Real-World Example
Imagine you’re building a global search feature for an enterprise application — one that spans products, users, help documents, and reports. A user searches for:
“attendance report for stitching line”
A keyword search finds nothing useful. But Milvus — searching by meaning — surfaces:
“production attendance summary for sewing line”
Different words, same intent. This is the core value proposition of semantic search, and it shows up everywhere once you start looking: customer support, internal knowledge bases, e-commerce discovery, code search across large repositories.
Common Use Cases
Milvus tends to show up in the same categories of problems:
- Semantic search — finding documents, articles, or answers by meaning rather than keywords
- RAG pipelines — grounding LLM responses in retrieved, relevant context
- Recommendation systems — finding similar products, content, or users based on behavioral embeddings
- Chatbots — enabling memory and context retrieval across long conversations
- Image and multimodal search — matching images, audio, or video by learned representations
- Code search — finding semantically similar functions or snippets across a codebase
When Not to Use Milvus
Milvus is powerful, but it’s not always the right tool. If your dataset is under a few hundred thousand vectors, pgvector or a simple FAISS index in memory will likely serve you better without the operational overhead of running a distributed system.
If you need strong ACID transactional guarantees, Milvus isn’t designed for that. It’s optimized for high-throughput insert and approximate search — not strict consistency semantics. For use cases that need vector search as a feature alongside relational data, pgvector integrated into PostgreSQL remains a pragmatic and underrated choice.
“The best database is the one that fits your problem — not the most impressive one.”
Milvus shines brightest when you have tens of millions to billions of vectors, need sub-100ms latency at high QPS, require flexible deployment across on-prem, multi-cloud, or hybrid environments, and want the freedom that comes with open-source.
Conclusion
The shift from keyword search to semantic search isn’t a minor upgrade — it’s a fundamental rethinking of how applications understand and retrieve information. Milvus sits at the infrastructure layer of that shift, doing the heavy lifting quietly and at scale.
If you’re building chatbots, RAG systems, recommendation engines, or any feature where meaning matters more than exact words, Milvus is worth a serious look. The barrier to entry is lower than you might think — a few lines of Python and a local Lite instance are all you need to get started.
Related content
Auriga: Leveling Up for Enterprise Growth!
Auriga’s journey began in 2010 crafting products for India’s internet [...]






