Distributed algorithms

Coordinating data across multiple machines using Consistent Hashing.

The idea

If you have a distributed cache, you map a key to a server using hash(key) % N. But if you add a new server (N+1), almost every key hashes to a different server. You have to move 99% of your data!

Consistent Hashing solves this by placing both servers and keys onto a circular ring. A key belongs to the first server it encounters moving clockwise. When a new server joins, it only takes over the keys between it and its immediate predecessor. The rest of the ring is untouched.

Ring has Servers S0, S1, S2. Keys (dots) map clockwise to the next server.

How it works (Consistent Hashing)

import bisect

class ConsistentHashRing:
    def __init__(self):
        self.server_hashes = []
        self.server_map = {} # hash -> server_ip

    def add_server(self, server_ip):
        h = hash(server_ip)
        bisect.insort(self.server_hashes, h)
        self.server_map[h] = server_ip

    def get_server(self, key):
        if not self.server_hashes: return None
        h = hash(key)
        
        # Find the first server hash >= key hash
        idx = bisect.bisect_left(self.server_hashes, h)
        
        # If we went past the end, wrap around to the first server
        if idx == len(self.server_hashes):
            idx = 0
            
        return self.server_map[self.server_hashes[idx]]