Hot Partition Saturation

When a billion-dollar distributed database gets taken down by one viral celebrity.

The idea

To store a billion user profiles, you can't use one database. You split (partition) the data across 100 database nodes based on the user_id. Node 1 gets users A-C, Node 2 gets users D-F, and so on. In theory, traffic is perfectly balanced. But what if Taylor Swift (Node 42) posts a new photo? Suddenly, millions of users try to fetch her profile at the exact same second. Nodes 1 through 41 are completely idle, but Node 42 is hammered with 100% of the system's traffic and melts down. This is a Hot Partition.

Step 1: Normal Traffic. Users are evenly distributed across the alphabet. All 3 database nodes share the load.

How it works (Salting & Caching)

You cannot solve a hot partition by just adding more database nodes, because all traffic for "Taylor Swift" mathematically hashes to the exact same node. You have two main tools to fix it:

  1. Caching: Put a CDN or Redis cluster in front of the database. When Taylor posts, the first request hits the database, but the next 999,999 requests are served from the cache. The database never sees the spike.
  2. Salting: If the data is too dynamic to cache (like live comments on the post), you append a random number (a "salt") to the Partition Key. Instead of storing all comments under Post_123, you store them under Post_123_0 through Post_123_9. This artificially forces the data to spread across 10 different nodes.
// BAD: Hot Partition Waiting to Happen
// All comments for post 42 go to the EXACT SAME NODE.
function saveComment(postId, text) {
    db.put({ partitionKey: postId, text: text });
}

// GOOD: Salting for High-Write Scenarios
// Spreads the writes for post 42 across 10 different nodes.
function saveCommentSalted(postId, text) {
    const salt = Math.floor(Math.random() * 10); // 0-9
    db.put({ partitionKey: `${postId}_${salt}`, text: text });
}
// Note: To read all comments, you now have to query all 10 nodes 
// in parallel and merge the results (Scatter-Gather).

Cost

Salting fixes write bottlenecks, but it makes reads much more expensive. Instead of querying one node to get all the comments, you have to query N nodes and sort the results in memory. It's a classic tradeoff: you sacrifice read efficiency to guarantee write availability during a viral event.

Watch out for