Coordinating data across multiple machines using Consistent Hashing.
If you have a distributed cache, you map a key to a server using hash(key) % N. But if you add a new server (N+1), almost every key hashes to a different server. You have to move 99% of your data!
Consistent Hashing solves this by placing both servers and keys onto a circular ring. A key belongs to the first server it encounters moving clockwise. When a new server joins, it only takes over the keys between it and its immediate predecessor. The rest of the ring is untouched.
import bisect
class ConsistentHashRing:
def __init__(self):
self.server_hashes = []
self.server_map = {} # hash -> server_ip
def add_server(self, server_ip):
h = hash(server_ip)
bisect.insort(self.server_hashes, h)
self.server_map[h] = server_ip
def get_server(self, key):
if not self.server_hashes: return None
h = hash(key)
# Find the first server hash >= key hash
idx = bisect.bisect_left(self.server_hashes, h)
# If we went past the end, wrap around to the first server
if idx == len(self.server_hashes):
idx = 0
return self.server_map[self.server_hashes[idx]]