
- Data & Analytics
- Digital Engineering
- Data, AI & Analytics
Why Your “Read Replica” Trick Fails for Writes — And How Cassandra Fixes It

Why Your “Read Replica” Trick Fails for Writes — And How Cassandra Fixes It
You are in a system design interview. "Our app is read-heavy, 95% reads, 5% writes. How do you scale the database?" Easy. One primary, a fleet of read replicas, done. Then the interviewer flips it: "Now make it write-heavy. Two million IoT events per second. Now what?" Suddenly that beautiful diagram is useless. Here is why, and here is how Apache Cassandra fixes it.
Let's Start With a Question
The interviewer says your app is read-heavy: 95% reads, 5% writes. How do you scale the database? Easy. You draw the classic picture: one primary node that takes all the writes, and a fleet of read replicas that copy data from the primary and serve all the reads. Need more read throughput? Add another replica. Done.
This pattern, primary-replica replication, powers a huge chunk of the internet, and it works beautifully.
Then the interviewer leans forward: "Cool. Now flip it. The system is write-heavy. Think IoT sensors firing 2 million events per second, or a chat app, or clickstream analytics. Now what?" And suddenly your beautiful diagram is useless.
Why the Read-Replica Pattern Collapses Under Writes
In primary-replica setups, every single write must go through one node: the primary. Read replicas do not help at all, because they only copy writes; they cannot accept them. So your write throughput is capped at whatever one machine can handle.
You can fight this in three ways, and all three hurt.
The root problem is architectural: a single write master is a funnel. No amount of replicas widens a funnel.
Kill the Master
Cassandra uses a masterless, peer-to-peer architecture. Every node in the cluster is equal. There is no primary, no replica hierarchy, no leader election drama. Any node can accept both reads and writes, which means there is no master to become a single point of failure or a throughput funnel.
The nodes arrange themselves logically in a ring. A client can connect to any node, and that node becomes the coordinator for the request: a temporary proxy that routes the write to the right place and replies to the client.
The immediate consequence is glorious:
- 10 nodes means 10 machines accepting writes in parallel.
- Need more write throughput? Add a node. Throughput scales near-linearly.
- A node dies? The ring keeps serving traffic. No failover ceremony.
But wait: if any node can take any write, how does data not turn into chaos? How does the cluster know where a piece of data should live?
Consistent Hashing Decides Where Data Lives
Every row in Cassandra has a partition key (say, sensor_id or user_id). When a write arrives, the coordinator runs the partition key through a hash function (the default is Murmur3), which produces a token: a number on a giant circle of possible values.
Each node in the ring owns a range of tokens. Whichever node owns the range your token falls into, that is the first home for your data.
Think of it like a roundabout with houses on it. The hash of your key tells you the address on the roundabout, and you simply walk clockwise to the first house. That is where your data lives.
Why is this clever?
- No lookup table, no router service. Every node can compute the hash itself and instantly knows where any data belongs. Routing is math, not metadata.
- Adding a node only reshuffles a slice of data, not everything. That is the "consistent" in consistent hashing.
- Hot-spotting is minimized, because a good hash function sprays keys uniformly around the ring.
This is the answer to "who handles the write?" The hash decides, automatically, with zero central coordination.
Replication Without a Primary
"But wait," you say, "if data for sensor_42 lives on Node 3, and Node 3 dies, didn't we just lose data?" Nope, because Cassandra never stores data on just one node.
You configure a Replication Factor (RF), typically 3. The first replica goes to the token owner, and the next replicas go to the following nodes walking clockwise around the ring.
So when is a write "successful"? You decide.
This is one of Cassandra's most elegant features: tunable consistency. With RF=3, you choose how many replicas must acknowledge before the client gets a success.
| Consistency Level | Waits for | You get |
|---|---|---|
| ONE | 1 of 3 replicas | Blazing-fast writes, maximum availability |
| QUORUM | 2 of 3 replicas | Strong-enough consistency, still fast |
| ALL | 3 of 3 replicas | Strongest consistency, lowest availability |
Write-heavy systems typically run ONE or QUORUM on writes. The remaining replicas still receive the write; the coordinator just does not wait for them.
And if a replica node is down?
The coordinator stores a hint, a small "you missed this write" note, and replays it to the node when it comes back online. This mechanism is called hinted handoff, and it is why a dead node does not block your writes.
Make Each Individual Write Stupidly Fast
Distributing writes across nodes solves the cluster-level bottleneck. But Cassandra also attacks the disk-level bottleneck, and this part is beautiful.
Traditional databases (B-tree based) often must read before they write: find the page where the row lives, possibly rewrite it in place. That means random disk I/O, the slowest thing you can ask a disk to do.
Cassandra's storage engine (an LSM-tree design) refuses to play that game. When a write lands on a node, exactly two things happen:
- Append to the Commit Log — a write-ahead log on disk. This is a pure sequential append, the fastest possible disk operation. Once it is here, the write is durable; even if the node crashes, it can replay the log.
- Update the Memtable — an in-memory sorted structure. This makes the data instantly readable.
That is it. The client gets an acknowledgment. One sequential disk write plus one in-memory write. No reads. No seeks. No locks on existing data.
Later, in the background, when a memtable grows past a threshold, it is flushed to disk as an SSTable (Sorted String Table): an immutable, sorted file. Updates never modify old SSTables; they just write new data with a newer timestamp. Even deletes are writes (a marker called a tombstone). A background process called compaction periodically merges SSTables and discards stale data.
Putting It All Together: The Life of One Write
Let's trace a single event, sensor_42 reporting a temperature, through a 6-node cluster with RF=3 and consistency QUORUM.
- The client's driver connects to any node, say Node 1. It becomes the coordinator.
- Node 1 hashes
sensor_42and the token lands in Node 3's range. Replicas: Nodes 3, 4, 5. - Node 1 fires the write to Nodes 3, 4, and 5 in parallel.
- Each replica appends to its commit log and updates its memtable, microseconds of work.
- The moment 2 of 3 replicas acknowledge (QUORUM), Node 1 tells the client: done.
- If Node 5 was down, Node 1 keeps a hint and delivers it later.
Meanwhile, the next write might hash to Nodes 6, 1, 2: a completely different set of machines. Multiply this by millions of writes per second, and you see the magic: the load spreads itself across the entire cluster, automatically.
What Does This Cost You?
No free lunches in distributed systems. Cassandra optimizes writes by making other things harder.
So, When Do You Reach for Cassandra?
- IoT and sensor telemetry
- Time-series data
- Messaging and chat history
- Activity feeds
- Clickstream and event logging
- Fraud-detection event stores
- Anything append-heavy at scale with high availability needs
- Strongly transactional systems
- Heavy ad-hoc analytics with joins and aggregations
- Small datasets that fit comfortably on one Postgres box
- When you would be over-engineering, seriously
This is why Netflix, Apple, and Instagram run some of the largest Cassandra clusters on the planet.
The One-Table Summary
| Problem in write-heavy systems | Cassandra's answer |
|---|---|
| Single write master is a bottleneck | Masterless ring: every node accepts writes |
| Who stores which data? | Consistent hashing on the partition key |
| Node failures losing data | Replication Factor plus all-equal replicas |
| Latency vs safety trade-off | Tunable consistency (ONE / QUORUM / ALL) |
| Replica down during a write | Hinted handoff |
| Slow random disk I/O per write | Commit log + memtable + SSTables (LSM-tree) |
| Need more throughput | Add nodes: near-linear scaling |
Read-heavy scaling is about copying data outward from one writer. Write-heavy scaling is about removing the single writer entirely.
Cassandra does it with a masterless ring, hash-based data placement, parallel replication with tunable consistency, and a storage engine where every write is a cheap sequential append. That combination is why "add a node" is genuinely the answer to "we need more write throughput."
Cassandra and Write-Heavy Systems — FAQ
ONE waits for 1 replica (fastest, most available), QUORUM waits for 2 of 3 (strong-enough consistency, still fast), and ALL waits for all 3 (strongest consistency, lowest availability). Write-heavy systems typically use ONE or QUORUM.Building Systems That Scale?
Write-heavy pipelines, distributed databases, and high-throughput backends are everyday work at Auriga IT. From NHAI processing 20M+ daily transactions to data migrations with zero data loss, scale is our home turf.
Related content
Auriga: Leveling Up for Enterprise Growth!
Auriga’s journey began in 2010 crafting products for India’s [...]






