• Artificial Intelligence
  • Data & Analytics
  • Data, AI & Analytics

Milvus Vector Database: Modern Similarity Search for AI Applications

Published On: 15 April 2026.By .

Every recommendation you receive, every semantic search query that “just gets it,” every AI chatbot that remembers context — they’re all quietly backed by a technology most developers have only recently begun to understand: the vector database.

At the center of this quiet revolution is Milvus — an open-source, cloud-native vector database that has become the go-to infrastructure for storing, indexing, and searching high-dimensional vector embeddings at scale. Whether you’re building a RAG pipeline, a recommendation engine, or a multimodal search system, Milvus is likely in the conversation.

This post digs into what Milvus is, why it exists, how it works under the hood, and when you should (and shouldn’t) use it.


Why Vectors? Why Now?

Modern AI models — whether they’re language models, image encoders, or recommendation systems — represent data not as text or numbers, but as vectors: dense arrays of floating-point numbers in high-dimensional space. A word, sentence, image, or user behavior profile becomes a point in 768, 1536, or even 4096-dimensional space.

The power of this representation is profound: things that are semantically or conceptually similar end up geometrically close to each other. “King minus Man plus Woman” famously gives you something close to “Queen” in embedding space. A photo of a sunset and the sentence “warm evening sky” land near each other when encoded by a multimodal model.

Traditional databases are built for exact matches and range queries. But with vectors, you almost never want an exact match — you want the k nearest neighbors. Finding those efficiently across billions of vectors is a fundamentally different computational problem. This is where vector databases come in — and where Milvus was purpose-built to excel.


What Is a Vector Database?

A vector database stores data as vectors instead of plain text, numbers, or rows. A vector is a list of numbers that represents the meaning of some data — generated by machine learning models like SentenceTransformer, OpenAI Embeddings, or multimodal image encoders.

Take two sentences:

  • “How to reset my password?”
  • “I forgot my login password”

The words are different, but their vector representations land close to each other in embedding space. A vector database exploits this to return semantically relevant results, not just lexically matching ones. Unlike a traditional database, it’s designed to answer questions like: Which item is most similar? Which document shares the same meaning? Which image looks like this one?


Why Traditional Databases Fall Short

Traditional databases like MySQL or PostgreSQL are optimized for exact matching:

This works perfectly when you know the exact value. But it falls apart when you want to search by meaning. If your database contains “How to reset my password” and a user searches for “I forgot my login password,” a traditional database may return nothing — the text simply doesn’t match.

Vector databases solve this by converting both texts into embeddings and comparing their similarity rather than their characters.

That said, traditional databases aren’t going anywhere. They remain the right tool for filtering, sorting, joins, and transactions. For semantic search, recommendation systems, and AI-powered retrieval, though, they’re not enough on their own.


What Is Milvus?

Milvus is a high-performance, open-source vector database designed for large-scale similarity search. It supports millions — and in distributed mode, billions — of vectors, with fast search, metadata filtering, and a choice of indexing algorithms to match your performance and accuracy requirements.

Its headline capabilities:

  • Sub-100ms similarity search at high query throughput
  • Multiple index types tuned for different scale and accuracy tradeoffs
  • Metadata filtering alongside vector search
  • A cloud-native distributed architecture that scales compute and storage independently
  • Native support for both dense and sparse vector search

How Milvus Is Built

One of Milvus’s most distinctive architectural choices is a clean separation between storage and compute — the same philosophy behind modern cloud data warehouses like Snowflake, but applied to vector search.

Access Layer — Proxy nodes handle request routing, schema validation, and result aggregation. They’re stateless, so you can scale them horizontally without coordination overhead.

Coordinator Layer — Root, Query, Data, and Index coordinators manage cluster topology, segment assignment, and index scheduling. They’re the brain of the operation.

Worker Layer — Query Nodes run vector search. Data Nodes handle the write path. Index Nodes build indexes asynchronously in the background.

Storage Layer — Segments live in object storage (S3 or MinIO). Metadata lives in etcd. The write-ahead stream runs on Pulsar or Kafka.

The key insight: query nodes load segments from object storage on demand and cache them in memory. This means you can scale query throughput independently of storage costs — and you never lose data if a query node goes down. It’s a fundamentally more elastic model than traditional shared-nothing architectures.


Core Concepts

Collection

A collection in Milvus is similar to a table in SQL. For example: – documents – products – users

Field

A field is similar to a column. A collection may contain fields such as: – id – text – category – embedding

Vector Field

The vector field is where the embedding is stored.

This field is what Milvus searches against.


How Similarity Search Works

When you run a query, Milvus converts it into an embedding vector and compares it against the stored vectors to find the closest matches. Take a query like “Forgot my account password”:

Stored TextSimilarity Score
How to reset password0.95
Python tutorial0.21
Milvus installation guide0.10

The top result wins — not because the words match, but because the vectors are geometrically close. Milvus supports three similarity metrics, each suited to different use cases:

  • L2 Distance — smaller distance means more similar; works well for raw embeddings
  • Cosine Similarity — compares the angle between vectors; ideal for text embeddings where direction matters more than magnitude
  • Inner Product (IP) — useful for normalized embeddings

For text-based search, cosine similarity is usually the right default. When datasets grow large enough that brute-force comparison becomes too slow, Milvus switches to approximate nearest neighbor algorithms — which brings us to indexes.


Index Types

Indexes are what make production-scale vector search feasible. Rather than comparing every stored vector against every query, indexes let Milvus prune the search space intelligently.

FLAT — Exact brute-force search. Perfectly accurate but doesn’t scale. Fine for small datasets or ground-truth benchmarking.

IVF_FLAT — Partitions the space into clusters, then searches only the most relevant ones. Good balance of speed and accuracy for medium-sized datasets.

HNSW — Hierarchical Navigable Small World graphs. Fast, highly accurate, and the most widely used index type in production Milvus deployments.

IVF_PQ — Combines IVF clustering with product quantization to compress vectors in memory. The choice when RAM is the bottleneck.

DiskANN — Designed for billion-scale datasets where even compressed vectors don’t fit in memory. Queries hit disk, but intelligently.

For most production deployments, HNSW is the right starting point.


Getting Started: A Practical Walkthrough

Here’s how to set up a basic semantic search pipeline with Milvus in Python — storing documents as vector embeddings and querying them by natural language.

Installation

Create a Collection and Insert Vectors

Query by Semantic Similarity

The right result surfaces at the top — not because it contains the words “fast search,” but because it’s semantically closest to the query.


How Does Milvus Compare?

FeatureMilvusPineconeWeaviatepgvector
Open source✓ Apache 2.0Proprietary✓ BSD✓ PostgreSQL
Self-hosted✓ Full controlManaged only✓ via PostgreSQL
Billion-scale✓ NativeWith serverlessPossibleDifficult
GPU support✓ Native indexes
Hybrid search✓ Dense + SparseBeta✓ BM25 fusionManual
Horizontal scaling✓ Distributed mode✓ ManagedLimited
Multi-tenancy✓ Partition keys✓ NamespacesSchema-level

The short version: Milvus wins on raw performance, flexibility, and ownership. Pinecone wins on developer experience if you never want to think about infrastructure. pgvector wins if you’re already deep in the PostgreSQL ecosystem and your scale is modest.


A Real-World Example

Imagine you’re building a global search feature for an enterprise application — one that spans products, users, help documents, and reports. A user searches for:

“attendance report for stitching line”

A keyword search finds nothing useful. But Milvus — searching by meaning — surfaces:

“production attendance summary for sewing line”

Different words, same intent. This is the core value proposition of semantic search, and it shows up everywhere once you start looking: customer support, internal knowledge bases, e-commerce discovery, code search across large repositories.


Common Use Cases

Milvus tends to show up in the same categories of problems:

  • Semantic search — finding documents, articles, or answers by meaning rather than keywords
  • RAG pipelines — grounding LLM responses in retrieved, relevant context
  • Recommendation systems — finding similar products, content, or users based on behavioral embeddings
  • Chatbots — enabling memory and context retrieval across long conversations
  • Image and multimodal search — matching images, audio, or video by learned representations
  • Code search — finding semantically similar functions or snippets across a codebase

When Not to Use Milvus

Milvus is powerful, but it’s not always the right tool. If your dataset is under a few hundred thousand vectors, pgvector or a simple FAISS index in memory will likely serve you better without the operational overhead of running a distributed system.

If you need strong ACID transactional guarantees, Milvus isn’t designed for that. It’s optimized for high-throughput insert and approximate search — not strict consistency semantics. For use cases that need vector search as a feature alongside relational data, pgvector integrated into PostgreSQL remains a pragmatic and underrated choice.

“The best database is the one that fits your problem — not the most impressive one.”

Milvus shines brightest when you have tens of millions to billions of vectors, need sub-100ms latency at high QPS, require flexible deployment across on-prem, multi-cloud, or hybrid environments, and want the freedom that comes with open-source.


Conclusion

The shift from keyword search to semantic search isn’t a minor upgrade — it’s a fundamental rethinking of how applications understand and retrieve information. Milvus sits at the infrastructure layer of that shift, doing the heavy lifting quietly and at scale.

If you’re building chatbots, RAG systems, recommendation engines, or any feature where meaning matters more than exact words, Milvus is worth a serious look. The barrier to entry is lower than you might think — a few lines of Python and a local Lite instance are all you need to get started.

Related content

Stay Close to What We’re Building

Get insights on product engineering, AI, and real-world technology decisions shaping modern businesses.

Go to Top