Sharding & partitioning

Splitting a giant database across multiple servers to handle more data and traffic.

The idea

When a database gets too big for one machine, we split it into pieces called Shards (or Partitions). But how do we decide which row goes to which shard?

If we use Range Partitioning (e.g., A-F goes to Shard 1, G-M to Shard 2), scanning is fast, but we risk a "Hot Partition" if everyone suddenly queries 'C'. If we use Hash Partitioning (hash(key) % N), data is evenly spread out, but range queries become impossible because adjacent keys live on different servers.

Simulating 1,000 requests for recent dates (A-F).

How it works (Routing logic)

def route_query(key, strategy="hash"):
    if strategy == "range":
        # Fast for ranges, but risks hot spots if keys are sequential
        if key < 'G': return shard_1
        if key < 'N': return shard_2
        return shard_3
        
    elif strategy == "hash":
        # Randomizes data evenly, but range queries require
        # querying ALL shards (Scatter-Gather).
        server_index = hash(key) % 3
        return shards[server_index]