Artificial Intelligence
Data & Analytics
Data, AI & Analytics

Milvus Vector Database: Modern Similarity Search for AI Applications

Published On: 15 April 2026.By Akshat Gadodia.

Milvus Vector Database: Modern Similarity Search for AI Applications

Every recommendation you receive, every semantic search query that “just gets it,” every AI chatbot that remembers context — they’re all quietly backed by a technology most developers have only recently begun to understand: the vector database.

At the center of this quiet revolution is Milvus — an open-source, cloud-native vector database that has become the go-to infrastructure for storing, indexing, and searching high-dimensional vector embeddings at scale. Whether you’re building a RAG pipeline, a recommendation engine, or a multimodal search system, Milvus is likely in the conversation.

This post digs into what Milvus is, why it exists, how it works under the hood, and when you should (and shouldn’t) use it.

Why Vectors? Why Now?

Modern AI models — whether they’re language models, image encoders, or recommendation systems — represent data not as text or numbers, but as vectors: dense arrays of floating-point numbers in high-dimensional space. A word, sentence, image, or user behavior profile becomes a point in 768, 1536, or even 4096-dimensional space.

The power of this representation is profound: things that are semantically or conceptually similar end up geometrically close to each other. “King minus Man plus Woman” famously gives you something close to “Queen” in embedding space. A photo of a sunset and the sentence “warm evening sky” land near each other when encoded by a multimodal model.

Traditional databases are built for exact matches and range queries. But with vectors, you almost never want an exact match — you want the k nearest neighbors. Finding those efficiently across billions of vectors is a fundamentally different computational problem. This is where vector databases come in — and where Milvus was purpose-built to excel.

What Is a Vector Database?

A vector database stores data as vectors instead of plain text, numbers, or rows. A vector is a list of numbers that represents the meaning of some data — generated by machine learning models like SentenceTransformer, OpenAI Embeddings, or multimodal image encoders.

Take two sentences:

“How to reset my password?”
“I forgot my login password”

The words are different, but their vector representations land close to each other in embedding space. A vector database exploits this to return semantically relevant results, not just lexically matching ones. Unlike a traditional database, it’s designed to answer questions like: Which item is most similar? Which document shares the same meaning? Which image looks like this one?

Why Traditional Databases Fall Short

Traditional databases like MySQL or PostgreSQL are optimized for exact matching:

SELECT * FROM users WHERE name = 'John';

				1

						SELECT * FROM users WHERE name = 'John';

This works perfectly when you know the exact value. But it falls apart when you want to search by meaning. If your database contains “How to reset my password” and a user searches for “I forgot my login password,” a traditional database may return nothing — the text simply doesn’t match.

Vector databases solve this by converting both texts into embeddings and comparing their similarity rather than their characters.

That said, traditional databases aren’t going anywhere. They remain the right tool for filtering, sorting, joins, and transactions. For semantic search, recommendation systems, and AI-powered retrieval, though, they’re not enough on their own.

What Is Milvus?

Milvus is a high-performance, open-source vector database designed for large-scale similarity search. It supports millions — and in distributed mode, billions — of vectors, with fast search, metadata filtering, and a choice of indexing algorithms to match your performance and accuracy requirements.

Its headline capabilities:

Sub-100ms similarity search at high query throughput
Multiple index types tuned for different scale and accuracy tradeoffs
Metadata filtering alongside vector search
A cloud-native distributed architecture that scales compute and storage independently
Native support for both dense and sparse vector search

How Milvus Is Built

One of Milvus’s most distinctive architectural choices is a clean separation between storage and compute — the same philosophy behind modern cloud data warehouses like Snowflake, but applied to vector search.

Client
  ↓
Proxy (stateless, load-balanced entry points)
  ↓
Coordinator Layer (Root, Query, Data, Index coordinators)
  ↓
Worker Nodes (Query / Data / Index)
  ↓
Storage (Object storage + etcd + message stream)

Client

↓

Proxy (stateless, load-balanced entry points)

↓

Coordinator Layer (Root, Query, Data, Index coordinators)

↓

Worker Nodes (Query / Data / Index)

↓

Storage (Object storage + etcd + message stream)

Access Layer — Proxy nodes handle request routing, schema validation, and result aggregation. They’re stateless, so you can scale them horizontally without coordination overhead.

Coordinator Layer — Root, Query, Data, and Index coordinators manage cluster topology, segment assignment, and index scheduling. They’re the brain of the operation.

Worker Layer — Query Nodes run vector search. Data Nodes handle the write path. Index Nodes build indexes asynchronously in the background.

Storage Layer — Segments live in object storage (S3 or MinIO). Metadata lives in etcd. The write-ahead stream runs on Pulsar or Kafka.

The key insight: query nodes load segments from object storage on demand and cache them in memory. This means you can scale query throughput independently of storage costs — and you never lose data if a query node goes down. It’s a fundamentally more elastic model than traditional shared-nothing architectures.

Core Concepts

Collection

A collection in Milvus is similar to a table in SQL. For example: – documents – products – users

Field

A field is similar to a column. A collection may contain fields such as: – id – text – category – embedding

Vector Field

The vector field is where the embedding is stored.

[0.12, -0.45, 0.88, ...]

1	[0.12, -0.45, 0.88, ...]

This field is what Milvus searches against.

How Similarity Search Works

When you run a query, Milvus converts it into an embedding vector and compares it against the stored vectors to find the closest matches. Take a query like “Forgot my account password”:

Stored Text	Similarity Score
How to reset password	0.95
Python tutorial	0.21
Milvus installation guide	0.10

The top result wins — not because the words match, but because the vectors are geometrically close. Milvus supports three similarity metrics, each suited to different use cases:

L2 Distance — smaller distance means more similar; works well for raw embeddings
Cosine Similarity — compares the angle between vectors; ideal for text embeddings where direction matters more than magnitude
Inner Product (IP) — useful for normalized embeddings

For text-based search, cosine similarity is usually the right default. When datasets grow large enough that brute-force comparison becomes too slow, Milvus switches to approximate nearest neighbor algorithms — which brings us to indexes.

Index Types

Indexes are what make production-scale vector search feasible. Rather than comparing every stored vector against every query, indexes let Milvus prune the search space intelligently.

FLAT — Exact brute-force search. Perfectly accurate but doesn’t scale. Fine for small datasets or ground-truth benchmarking.

IVF_FLAT — Partitions the space into clusters, then searches only the most relevant ones. Good balance of speed and accuracy for medium-sized datasets.

HNSW — Hierarchical Navigable Small World graphs. Fast, highly accurate, and the most widely used index type in production Milvus deployments.

IVF_PQ — Combines IVF clustering with product quantization to compress vectors in memory. The choice when RAM is the bottleneck.

DiskANN — Designed for billion-scale datasets where even compressed vectors don’t fit in memory. Queries hit disk, but intelligently.

For most production deployments, HNSW is the right starting point.

Getting Started: A Practical Walkthrough

Here’s how to set up a basic semantic search pipeline with Milvus in Python — storing documents as vector embeddings and querying them by natural language.

Installation

# Install the Milvus Python SDK and an embedding library
pip install pymilvus sentence-transformers

# For local development, use Milvus Lite (no server needed)
# Or spin up a Docker instance:
docker run -d --name milvus-standal>\
  -p 19530:19530 -p 9091:9091 \
  milvusdb/milvus:latest standalone

				
					
				1
2
3
4
5
6
7
8

						# Install the Milvus Python SDK and an embedding library
pip install pymilvus sentence-transformers
 
# For local development, use Milvus Lite (no server needed)
# Or spin up a Docker instance:
docker run -d --name milvus-standal>\
  -p 19530:19530 -p 9091:9091 \
  milvusdb/milvus:latest standalone

					

			

Create a Collection and Insert Vectors

from pymilvus import MilvusClient
from sentence_transformers import SentenceTransformer

# Connect to Milvus Lite database file
client = MilvusClient("./milvus_demo.db")

COLLECTION_NAME = "documents"

# Drop collection if it already exists (optional for reruns)
if client.has_collection(COLLECTION_NAME):
    client.drop_collection(COLLECTION_NAME)

# Create collection
client.create_collection(
    collection_name=COLLECTION_NAME,
    dimension=768,  # all-mpnet-base-v2 embedding dimension
    metric_type="COSINE"
)

# Load embedding model
model = SentenceTransformer("all-mpnet-base-v2")

# Documents to store
docs = [
    "Milvus is an open-source vector database",
    "HNSW is a graph-based approximate nearest neighbor algorithm",
    "LLMs represent text as high-dimensional embeddings",
    "PostgreSQL is a relational database management system",
    "Milvus supports cosine similarity and vector search",
]

# Create embeddings
embeddings = model.encode(docs).tolist()

# Prepare records
data = []
for i, (doc, embedding) in enumerate(zip(docs, embeddings)):
    data.append(
        {
            "id": i,
            "vector": embedding,
            "text": doc,
        }
    )

# Insert into collection
insert_result = client.insert(
    collection_name=COLLECTION_NAME,
    data=data
)

print("Inserted IDs:", insert_result)

# Query text
query = "What is a vector database?"

# Embed query
query_embedding = model.encode([query]).tolist()[0]

# Search
results = client.search(
    collection_name=COLLECTION_NAME,
    data=[query_embedding],
    limit=3,
    output_fields=["text"]
)

# Print results
print("\nSearch Results:")
for hit in results[0]:
    print(f"Score: {hit['distance']:.4f}")
    print(f"Text : {hit['entity']['text']}")
    print("-" * 50)

from pymilvus import MilvusClient

from sentence_transformers import SentenceTransformer

# Connect to Milvus Lite database file

client = MilvusClient("./milvus_demo.db")

COLLECTION_NAME = "documents"

# Drop collection if it already exists (optional for reruns)

if client.has_collection(COLLECTION_NAME):

client.drop_collection(COLLECTION_NAME)

# Create collection

client.create_collection(

collection_name=COLLECTION_NAME,

dimension=768, # all-mpnet-base-v2 embedding dimension

metric_type="COSINE"

)

# Load embedding model

model = SentenceTransformer("all-mpnet-base-v2")

# Documents to store

docs = [

"Milvus is an open-source vector database",

"HNSW is a graph-based approximate nearest neighbor algorithm",

"LLMs represent text as high-dimensional embeddings",

"PostgreSQL is a relational database management system",

"Milvus supports cosine similarity and vector search",

]

# Create embeddings

embeddings = model.encode(docs).tolist()

# Prepare records

data = []

for i, (doc, embedding) in enumerate(zip(docs, embeddings)):

data.append(

{

"id": i,

"vector": embedding,

"text": doc,

}

)

# Insert into collection

insert_result = client.insert(

collection_name=COLLECTION_NAME,

data=data

)

print("Inserted IDs:", insert_result)

# Query text

query = "What is a vector database?"

# Embed query

query_embedding = model.encode([query]).tolist()[0]

# Search

results = client.search(

collection_name=COLLECTION_NAME,

data=[query_embedding],

limit=3,

output_fields=["text"]

)

# Print results

print("\nSearch Results:")

for hit in results[0]:

print(f"Score: {hit['distance']:.4f}")

print(f"Text : {hit['entity']['text']}")

print("-" * 50)

Query by Semantic Similarity

# Embed the query
query = "What algorithm does Milvus use for fast search?"
query_vec = model.encode([query]).tolist()

# Search — returns top 2 semantically similar docs
results = client.search(
    <span class="ͼe">collection_name</span><span class="ͼ8">=</span><span class="ͼc">"documents"</span>,
    data=query_vec,
    limit=2,
    output_fields=["text"]
)

for hit in results[0]:
    print(f"Score: {hit['distance']:.4f} | {hit['entity']['text']}")

# Score: 0.7234 | HNSW is a graph-based approximate nearest neighbor algorithm
# Score: 0.6891 | Milvus is an open-source vector database

# Embed the query

query = "What algorithm does Milvus use for fast search?"

query_vec = model.encode([query]).tolist()

# Search — returns top 2 semantically similar docs

results = client.search(

collection_name="documents",

data=query_vec,

limit=2,

output_fields=["text"]

)

for hit in results[0]:

print(f"Score: {hit['distance']:.4f} | {hit['entity']['text']}")

# Score: 0.7234 | HNSW is a graph-based approximate nearest neighbor algorithm

# Score: 0.6891 | Milvus is an open-source vector database

The right result surfaces at the top — not because it contains the words “fast search,” but because it’s semantically closest to the query.

How Does Milvus Compare?

Feature	Milvus	Pinecone	Weaviate	pgvector
Open source	✓ Apache 2.0	Proprietary	✓ BSD	✓ PostgreSQL
Self-hosted	✓ Full control	Managed only	✓	✓ via PostgreSQL
Billion-scale	✓ Native	With serverless	Possible	Difficult
GPU support	✓ Native indexes	—	—	—
Hybrid search	✓ Dense + Sparse	Beta	✓ BM25 fusion	Manual
Horizontal scaling	✓ Distributed mode	✓ Managed	✓	Limited
Multi-tenancy	✓ Partition keys	✓ Namespaces	✓	Schema-level

The short version: Milvus wins on raw performance, flexibility, and ownership. Pinecone wins on developer experience if you never want to think about infrastructure. pgvector wins if you’re already deep in the PostgreSQL ecosystem and your scale is modest.

A Real-World Example

Imagine you’re building a global search feature for an enterprise application — one that spans products, users, help documents, and reports. A user searches for:

“attendance report for stitching line”

A keyword search finds nothing useful. But Milvus — searching by meaning — surfaces:

“production attendance summary for sewing line”

Different words, same intent. This is the core value proposition of semantic search, and it shows up everywhere once you start looking: customer support, internal knowledge bases, e-commerce discovery, code search across large repositories.

Common Use Cases

Milvus tends to show up in the same categories of problems:

Semantic search — finding documents, articles, or answers by meaning rather than keywords
RAG pipelines — grounding LLM responses in retrieved, relevant context
Recommendation systems — finding similar products, content, or users based on behavioral embeddings
Chatbots — enabling memory and context retrieval across long conversations
Image and multimodal search — matching images, audio, or video by learned representations
Code search — finding semantically similar functions or snippets across a codebase

When Not to Use Milvus

Milvus is powerful, but it’s not always the right tool. If your dataset is under a few hundred thousand vectors, pgvector or a simple FAISS index in memory will likely serve you better without the operational overhead of running a distributed system.

If you need strong ACID transactional guarantees, Milvus isn’t designed for that. It’s optimized for high-throughput insert and approximate search — not strict consistency semantics. For use cases that need vector search as a feature alongside relational data, pgvector integrated into PostgreSQL remains a pragmatic and underrated choice.

“The best database is the one that fits your problem — not the most impressive one.”

Milvus shines brightest when you have tens of millions to billions of vectors, need sub-100ms latency at high QPS, require flexible deployment across on-prem, multi-cloud, or hybrid environments, and want the freedom that comes with open-source.

Conclusion

The shift from keyword search to semantic search isn’t a minor upgrade — it’s a fundamental rethinking of how applications understand and retrieve information. Milvus sits at the infrastructure layer of that shift, doing the heavy lifting quietly and at scale.

If you’re building chatbots, RAG systems, recommendation engines, or any feature where meaning matters more than exact words, Milvus is worth a serious look. The barrier to entry is lower than you might think — a few lines of Python and a local Lite instance are all you need to get started.

Auriga: Leveling Up for Enterprise Growth!

By ronak|2026-05-25T14:33:24+05:303 July 2024|Categories: expert-in|

Auriga’s journey began in 2010 crafting products for India’s [...]

Comments Off

Stay Close to What We’re Building

Get insights on product engineering, AI, and real-world technology decisions shaping modern businesses.

Akshat Gadodia

Akshat Gadodia is an SDE-2 at Auriga IT specialising in backend architecture, Django APIs, and full-stack development with React and Next.js. He builds scalable, production-grade systems and has been recognised as both Star Performer of the Month and Youngest Star at Auriga.

Read all their blogs

Milvus Vector Database: Modern Similarity Search for AI Applications

Milvus Vector Database: Modern Similarity Search for AI Applications

Why Vectors? Why Now?

What Is a Vector Database?

Why Traditional Databases Fall Short

What Is Milvus?

How Milvus Is Built

Core Concepts

Collection

Field

Vector Field

How Similarity Search Works

Index Types

Getting Started: A Practical Walkthrough

Installation

Create a Collection and Insert Vectors

Query by Semantic Similarity

How Does Milvus Compare?

A Real-World Example

Common Use Cases

When Not to Use Milvus

Conclusion

Related content

Auriga: Leveling Up for Enterprise Growth!

Auriga: Leveling Up for Enterprise Growth!

Stay Close to What We’re Building