Rate limiting

Control incoming traffic by issuing a fixed number of tokens over time.

The idea

To prevent abuse or server overload, APIs enforce Rate Limits (e.g., 5 requests per second). A popular algorithm is the Token Bucket.

Imagine a bucket that holds up to 5 tokens. A background process adds a new token every second. When a request arrives, it must take a token to proceed. If the bucket is empty, the request is Denied (HTTP 429 Too Many Requests). This naturally allows short bursts while enforcing an average rate over time.

Bucket fills at 1 token per second.

How it works (Token Bucket)

def allow_request(user_id, capacity=5, refill_rate=1.0):
    # Get state from Redis
    bucket = redis.get(user_id)
    now = current_time()
    
    # Calculate tokens added since last check
    time_passed = now - bucket.last_check
    new_tokens = time_passed * refill_rate
    
    # Add tokens, capped at capacity
    bucket.tokens = min(capacity, bucket.tokens + new_tokens)
    bucket.last_check = now
    
    if bucket.tokens >= 1:
        bucket.tokens -= 1
        redis.set(user_id, bucket)
        return True # HTTP 200 OK
    else:
        redis.set(user_id, bucket)
        return False # HTTP 429 Too Many Requests