When a billion-dollar distributed database gets taken down by one viral celebrity.
To store a billion user profiles, you can't use one database. You split (partition) the data across 100 database nodes based on the user_id. Node 1 gets users A-C, Node 2 gets users D-F, and so on. In theory, traffic is perfectly balanced. But what if Taylor Swift (Node 42) posts a new photo? Suddenly, millions of users try to fetch her profile at the exact same second. Nodes 1 through 41 are completely idle, but Node 42 is hammered with 100% of the system's traffic and melts down. This is a Hot Partition.
You cannot solve a hot partition by just adding more database nodes, because all traffic for "Taylor Swift" mathematically hashes to the exact same node. You have two main tools to fix it:
Post_123, you store them under Post_123_0 through Post_123_9. This artificially forces the data to spread across 10 different nodes.// BAD: Hot Partition Waiting to Happen
// All comments for post 42 go to the EXACT SAME NODE.
function saveComment(postId, text) {
db.put({ partitionKey: postId, text: text });
}
// GOOD: Salting for High-Write Scenarios
// Spreads the writes for post 42 across 10 different nodes.
function saveCommentSalted(postId, text) {
const salt = Math.floor(Math.random() * 10); // 0-9
db.put({ partitionKey: `${postId}_${salt}`, text: text });
}
// Note: To read all comments, you now have to query all 10 nodes
// in parallel and merge the results (Scatter-Gather).
Salting fixes write bottlenecks, but it makes reads much more expensive. Instead of querying one node to get all the comments, you have to query N nodes and sort the results in memory. It's a classic tradeoff: you sacrifice read efficiency to guarantee write availability during a viral event.
2024-05-12) as a partition key. All traffic generated today will hit today's node. Yesterday's nodes will be perfectly safe, but completely idle. Always prepend an ID to a timestamp when partitioning.