Rate limiting

Control incoming traffic by issuing a fixed number of tokens over time.

The idea

To prevent abuse or server overload, APIs enforce Rate Limits (e.g., 5 requests per second). A popular algorithm is the Token Bucket.

Imagine a bucket that holds up to 5 tokens. A background process adds a new token every second. When a request arrives, it must take a token to proceed. If the bucket is empty, the request is Denied (HTTP 429 Too Many Requests). This naturally allows short bursts while enforcing an average rate over time.

Tokens: 5 / 5
Bucket fills at 1 token per second.

How it works (Token Bucket)

def allow_request(user_id, capacity=5, refill_rate=1.0):
    # Get state from Redis
    bucket = redis.get(user_id)
    now = current_time()
    
    # Calculate tokens added since last check
    time_passed = now - bucket.last_check
    new_tokens = time_passed * refill_rate
    
    # Add tokens, capped at capacity
    bucket.tokens = min(capacity, bucket.tokens + new_tokens)
    bucket.last_check = now
    
    if bucket.tokens >= 1:
        bucket.tokens -= 1
        redis.set(user_id, bucket)
        return True # HTTP 200 OK
    else:
        redis.set(user_id, bucket)
        return False # HTTP 429 Too Many Requests