Scaling & load balancing

Distribute incoming traffic across a fleet of stateless servers to handle infinite growth.

The idea

Vertical scaling (buying a bigger server) has a hard limit. Horizontal scaling (adding more servers) scales infinitely, but requires a Load Balancer (LB) to distribute traffic.

For this to work cleanly, the application servers must be stateless—any server should be able to handle any request. The LB uses strategies like Round Robin or Least Connections to ensure no single server gets overwhelmed, keeping response times low.

Traffic is distributed among 2 servers. Load is manageable.

How it works (Round Robin LB)

class LoadBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.current = 0
        
    def route_request(self, request):
        if not self.servers:
            raise Exception("503 Service Unavailable")
            
        # Pick the next server in line
        target_server = self.servers[self.current]
        
        # Advance the pointer, wrapping around
        self.current = (self.current + 1) % len(self.servers)
        
        return target_server.handle(request)