Data & Analytics
Digital Engineering
Data, AI & Analytics

Why Your “Read Replica” Trick Fails for Writes — And How Cassandra Fixes It

Published On: 10 June 2026.By Sushil Kumar Sadhnani.

Why Your “Read Replica” Trick Fails for Writes — And How Cassandra Fixes It

Engineering — System Design — Databases — 10 June 2026

You are in a system design interview. "Our app is read-heavy, 95% reads, 5% writes. How do you scale the database?" Easy. One primary, a fleet of read replicas, done. Then the interviewer flips it: "Now make it write-heavy. Two million IoT events per second. Now what?" Suddenly that beautiful diagram is useless. Here is why, and here is how Apache Cassandra fixes it.

System Design Apache Cassandra Write-Heavy Systems Distributed Databases NoSQL

Topic

Scaling write-heavy databases

Database

Apache Cassandra (NoSQL)

Architecture

Masterless, peer-to-peer ring

Data placement

Consistent hashing (Murmur3)

Storage engine

LSM-tree (commit log + memtable + SSTable)

Consistency

Tunable (ONE / QUORUM / ALL)

Scaling

Near-linear, add nodes

Used by

Netflix, Apple, Instagram

01 - The Setup

Let's Start With a Question

The interviewer says your app is read-heavy: 95% reads, 5% writes. How do you scale the database? Easy. You draw the classic picture: one primary node that takes all the writes, and a fleet of read replicas that copy data from the primary and serve all the reads. Need more read throughput? Add another replica. Done.

This pattern, primary-replica replication, powers a huge chunk of the internet, and it works beautifully.

Primary-replica architecture diagram: all writes funnel through one primary node while reads fan out to read replicas

Primary-replica architecture: all writes funnel through one primary node while reads fan out to read replicas

Then the interviewer leans forward: "Cool. Now flip it. The system is write-heavy. Think IoT sensors firing 2 million events per second, or a chat app, or clickstream analytics. Now what?" And suddenly your beautiful diagram is useless.

02 - The Problem

Why the Read-Replica Pattern Collapses Under Writes

In primary-replica setups, every single write must go through one node: the primary. Read replicas do not help at all, because they only copy writes; they cannot accept them. So your write throughput is capped at whatever one machine can handle.

You can fight this in three ways, and all three hurt.

Vertical scaling

Buy a bigger machine. Works until it does not. There is a physical and financial ceiling, and you still have a single point of failure.

Manual sharding

Split data across primaries (users A-M on DB1, N-Z on DB2). Now you own routing logic, rebalancing hot shards, cross-shard queries, and 3 AM pages when shard 2 dies.

Async write queues

Buffer writes in Kafka and drain slowly. Helps with bursts, but the database is still the bottleneck. You have just moved the queue.

The root problem is architectural: a single write master is a funnel. No amount of replicas widens a funnel.

⏸ Pause and think What if there was no primary at all? What if every node could accept writes? That is exactly Cassandra's bet.

03 - Big Idea #1

Kill the Master

Cassandra uses a masterless, peer-to-peer architecture. Every node in the cluster is equal. There is no primary, no replica hierarchy, no leader election drama. Any node can accept both reads and writes, which means there is no master to become a single point of failure or a throughput funnel.

The nodes arrange themselves logically in a ring. A client can connect to any node, and that node becomes the coordinator for the request: a temporary proxy that routes the write to the right place and replies to the client.

Cassandra masterless ring of six equal nodes; a client can connect to any node, which becomes the coordinator

Cassandra masterless ring of six equal nodes; a client connects to any node, which becomes the coordinator

The immediate consequence is glorious:

10 nodes means 10 machines accepting writes in parallel.
Need more write throughput? Add a node. Throughput scales near-linearly.
A node dies? The ring keeps serving traffic. No failover ceremony.

But wait: if any node can take any write, how does data not turn into chaos? How does the cluster know where a piece of data should live?

04 - Big Idea #2

Consistent Hashing Decides Where Data Lives

Every row in Cassandra has a partition key (say, sensor_id or user_id). When a write arrives, the coordinator runs the partition key through a hash function (the default is Murmur3), which produces a token: a number on a giant circle of possible values.

Each node in the ring owns a range of tokens. Whichever node owns the range your token falls into, that is the first home for your data.

Consistent hashing token ring: sensor_42 hashes to token 60, which falls in Node C's range

Think of it like a roundabout with houses on it. The hash of your key tells you the address on the roundabout, and you simply walk clockwise to the first house. That is where your data lives.

Why is this clever?

No lookup table, no router service. Every node can compute the hash itself and instantly knows where any data belongs. Routing is math, not metadata.
Adding a node only reshuffles a slice of data, not everything. That is the "consistent" in consistent hashing.
Hot-spotting is minimized, because a good hash function sprays keys uniformly around the ring.

This is the answer to "who handles the write?" The hash decides, automatically, with zero central coordination.

05 - Big Idea #3

Replication Without a Primary

"But wait," you say, "if data for sensor_42 lives on Node 3, and Node 3 dies, didn't we just lose data?" Nope, because Cassandra never stores data on just one node.

You configure a Replication Factor (RF), typically 3. The first replica goes to the token owner, and the next replicas go to the following nodes walking clockwise around the ring.

The crucial difference from primary-replica systems: all replicas are equal. There is no "main copy" and "backup copies." The coordinator sends the write to all replica nodes simultaneously, in parallel, not in a chain.

So when is a write "successful"? You decide.

This is one of Cassandra's most elegant features: tunable consistency. With RF=3, you choose how many replicas must acknowledge before the client gets a success.

Consistency Level	Waits for	You get
ONE	1 of 3 replicas	Blazing-fast writes, maximum availability
QUORUM	2 of 3 replicas	Strong-enough consistency, still fast
ALL	3 of 3 replicas	Strongest consistency, lowest availability

Write-heavy systems typically run ONE or QUORUM on writes. The remaining replicas still receive the write; the coordinator just does not wait for them.

And if a replica node is down?

The coordinator stores a hint, a small "you missed this write" note, and replays it to the node when it comes back online. This mechanism is called hinted handoff, and it is why a dead node does not block your writes.

⏸ Pause and think Notice the trade we just made. We gained massive write availability, but two replicas might briefly disagree about a value. Cassandra embraces this: it is called eventual consistency, and for sensor data, likes, messages, and logs, it is almost always the right trade.

06 - Big Idea #4

Make Each Individual Write Stupidly Fast

Distributing writes across nodes solves the cluster-level bottleneck. But Cassandra also attacks the disk-level bottleneck, and this part is beautiful.

Traditional databases (B-tree based) often must read before they write: find the page where the row lives, possibly rewrite it in place. That means random disk I/O, the slowest thing you can ask a disk to do.

Cassandra's storage engine (an LSM-tree design) refuses to play that game. When a write lands on a node, exactly two things happen:

Append to the Commit Log — a write-ahead log on disk. This is a pure sequential append, the fastest possible disk operation. Once it is here, the write is durable; even if the node crashes, it can replay the log.
Update the Memtable — an in-memory sorted structure. This makes the data instantly readable.

That is it. The client gets an acknowledgment. One sequential disk write plus one in-memory write. No reads. No seeks. No locks on existing data.

Cassandra write path: a write hits the commit log and memtable simultaneously, then later flushes to an immutable SSTable

Cassandra write path: a write hits the commit log and memtable, then later flushes to an immutable SSTable

Later, in the background, when a memtable grows past a threshold, it is flushed to disk as an SSTable (Sorted String Table): an immutable, sorted file. Updates never modify old SSTables; they just write new data with a newer timestamp. Even deletes are writes (a marker called a tombstone). A background process called compaction periodically merges SSTables and discards stale data.

This is why Cassandra's docs call the write path one of its key strengths: the engine is literally designed so that the hot path of a write touches the disk only in the cheapest way possible.

07 - End to End

Putting It All Together: The Life of One Write

Let's trace a single event, sensor_42 reporting a temperature, through a 6-node cluster with RF=3 and consistency QUORUM.

The client's driver connects to any node, say Node 1. It becomes the coordinator.
Node 1 hashes sensor_42 and the token lands in Node 3's range. Replicas: Nodes 3, 4, 5.
Node 1 fires the write to Nodes 3, 4, and 5 in parallel.
Each replica appends to its commit log and updates its memtable, microseconds of work.
The moment 2 of 3 replicas acknowledge (QUORUM), Node 1 tells the client: done.
If Node 5 was down, Node 1 keeps a hint and delivers it later.

Meanwhile, the next write might hash to Nodes 6, 1, 2: a completely different set of machines. Multiply this by millions of writes per second, and you see the magic: the load spreads itself across the entire cluster, automatically.

08 - The Honest Section

What Does This Cost You?

No free lunches in distributed systems. Cassandra optimizes writes by making other things harder.

⚠ Reads are more work A read may need to check the memtable and several SSTables, then merge results by timestamp. Cassandra mitigates this with bloom filters, caches, and compaction, but reads are inherently pricier than writes here. (Notice the irony: the exact opposite of the primary/read-replica world we started with.)

⚠ Eventual consistency is a mindset shift If you need strict ACID transactions across rows, like banking ledgers or inventory counts, Cassandra is the wrong hammer.

⚠ Query-first data modeling No joins, no ad-hoc queries on random columns. You design tables around your queries, often duplicating data. Denormalization is not a sin in Cassandra; it is the lifestyle.

⚠ Compaction needs care and feeding Background merging consumes I/O and CPU; tuning it is part of operating Cassandra at scale.

09 - The Decision

So, When Do You Reach for Cassandra?

Great fit

IoT and sensor telemetry
Time-series data
Messaging and chat history
Activity feeds
Clickstream and event logging
Fraud-detection event stores
Anything append-heavy at scale with high availability needs

Poor fit

Strongly transactional systems
Heavy ad-hoc analytics with joins and aggregations
Small datasets that fit comfortably on one Postgres box
When you would be over-engineering, seriously

This is why Netflix, Apple, and Instagram run some of the largest Cassandra clusters on the planet.

10 - Cheat Sheet

The One-Table Summary

Problem in write-heavy systems	Cassandra's answer
Single write master is a bottleneck	Masterless ring: every node accepts writes
Who stores which data?	Consistent hashing on the partition key
Node failures losing data	Replication Factor plus all-equal replicas
Latency vs safety trade-off	Tunable consistency (ONE / QUORUM / ALL)
Replica down during a write	Hinted handoff
Slow random disk I/O per write	Commit log + memtable + SSTables (LSM-tree)
Need more throughput	Add nodes: near-linear scaling

Read-heavy scaling is about copying data outward from one writer. Write-heavy scaling is about removing the single writer entirely.

Cassandra does it with a masterless ring, hash-based data placement, parallel replication with tunable consistency, and a storage engine where every write is a cheap sequential append. That combination is why "add a node" is genuinely the answer to "we need more write throughput."

11 - Frequently Asked Questions

Cassandra and Write-Heavy Systems — FAQ

What is Apache Cassandra?

Apache Cassandra is an open-source, distributed NoSQL database designed for write-heavy workloads and high availability. It uses a masterless, peer-to-peer architecture where every node can accept reads and writes, so throughput scales near-linearly as you add nodes. Netflix, Apple, and Instagram run some of the largest Cassandra clusters in the world.

Why does primary-replica replication fail for write-heavy workloads?

In a primary-replica setup, every write must go through a single primary node. Read replicas only copy writes, they cannot accept them, so total write throughput is capped at what one machine can handle. Adding replicas widens read capacity but never write capacity, because a single write master is a funnel that no number of replicas can widen.

What is a masterless architecture in Cassandra?

Masterless means there is no primary node and no replica hierarchy. Every node in a Cassandra cluster is equal and can accept both reads and writes. Nodes arrange themselves in a logical ring, and any node a client connects to becomes the coordinator for that request. With 10 nodes, all 10 accept writes in parallel, and a single node failure does not stop the cluster.

How does Cassandra decide where data is stored?

Cassandra uses consistent hashing on the partition key. The coordinator hashes the partition key (using Murmur3 by default) to produce a token, a number on a ring of possible values. Each node owns a range of tokens, and the node owning the range your token falls into is the first home for that data. Routing is pure math, so no lookup table or router service is needed.

What is Replication Factor in Cassandra?

Replication Factor (RF) is how many copies of each piece of data Cassandra stores, typically 3. The first replica goes to the token owner and the next replicas go to the following nodes clockwise around the ring. All replicas are equal, there is no main copy and backup copies, and the coordinator writes to all replicas in parallel rather than in a chain.

What is tunable consistency (ONE, QUORUM, ALL)?

Tunable consistency lets you choose how many replicas must acknowledge a write before the client gets a success. With RF=3: ONE waits for 1 replica (fastest, most available), QUORUM waits for 2 of 3 (strong-enough consistency, still fast), and ALL waits for all 3 (strongest consistency, lowest availability). Write-heavy systems typically use ONE or QUORUM.

What is hinted handoff?

If a replica node is down during a write, the coordinator stores a hint, a small note recording the missed write, and replays it to that node when it comes back online. Hinted handoff is why a dead node does not block writes in Cassandra.

How does the Cassandra write path work?

Cassandra uses an LSM-tree storage engine. When a write lands on a node, it appends to the commit log (a sequential write-ahead log on disk, the fastest disk operation) and updates the memtable (an in-memory sorted structure). The client is acknowledged immediately, with no reads or random seeks. Later, the memtable flushes to disk as an immutable SSTable, and a background compaction process merges SSTables and discards stale data.

What is eventual consistency in Cassandra?

Eventual consistency means replicas may briefly disagree about a value after a write, but converge to the same value over time. Cassandra trades strict immediate consistency for massive write availability. For sensor data, likes, messages, and logs this is almost always the right trade, but for banking ledgers or inventory counts that need strict ACID transactions, Cassandra is the wrong choice.

When should you use Cassandra instead of a relational database?

Use Cassandra for write-heavy, append-heavy workloads at scale with high availability needs: IoT and sensor telemetry, time-series data, messaging and chat history, activity feeds, clickstream and event logging, and fraud-detection event stores. Avoid it for strongly transactional systems, heavy ad-hoc analytics with joins and aggregations, and small datasets that fit comfortably on a single PostgreSQL instance.

Which companies use Apache Cassandra?

Netflix, Apple, and Instagram run some of the largest Cassandra clusters in the world. It is widely adopted for write-heavy and high-availability use cases such as time-series data, messaging, activity feeds, and event logging.

What is a coordinator node in Cassandra?

The coordinator is whichever node a client connects to for a given request. It acts as a temporary proxy: it hashes the partition key to find which nodes own the data, forwards the read or write to those replica nodes, waits for the required number of acknowledgments based on the consistency level, and replies to the client. Any node can act as coordinator for any request.

Building Systems That Scale?

Write-heavy pipelines, distributed databases, and high-throughput backends are everyday work at Auriga IT. From NHAI processing 20M+ daily transactions to data migrations with zero data loss, scale is our home turf.

Talk to Our Engineers Read More Engineering Posts

Auriga: Leveling Up for Enterprise Growth!

By ronak|2026-05-25T14:33:24+05:303 July 2024|Categories: expert-in|

Auriga’s journey began in 2010 crafting products for India’s [...]

Comments Off

Stay Close to What We’re Building

Get insights on product engineering, AI, and real-world technology decisions shaping modern businesses.

Sushil Kumar Sadhnani

Sushil Kumar Sadhnani is a Backend Developer at Auriga IT specialising in Django and Django REST Framework. He focuses on building scalable APIs, backend systems, and database management, and writes about system design and distributed systems topics on the Auriga blog.

Read all their blogs

Why Your “Read Replica” Trick Fails for Writes — And How Cassandra Fixes It

Why Your “Read Replica” Trick Fails for Writes — And How Cassandra Fixes It

Let's Start With a Question

Why the Read-Replica Pattern Collapses Under Writes

Kill the Master

Consistent Hashing Decides Where Data Lives

Why is this clever?

Replication Without a Primary

So when is a write "successful"? You decide.

And if a replica node is down?

Make Each Individual Write Stupidly Fast

Putting It All Together: The Life of One Write

What Does This Cost You?

So, When Do You Reach for Cassandra?

The One-Table Summary

Cassandra and Write-Heavy Systems — FAQ

Building Systems That Scale?

Related content

Auriga: Leveling Up for Enterprise Growth!

Auriga: Leveling Up for Enterprise Growth!

Stay Close to What We’re Building