Saturation & capacity

Why performance doesn't degrade linearly—it falls off a cliff.

The idea

Systems don't just get steadily slower as they fill up. They follow a curve with a sharp "knee." Up to about 70-80% Utilization, latency stays relatively flat. But once you hit the Saturation point, incoming requests queue up faster than they can be cleared.

Because of queueing theory, latency shoots up exponentially (the "hockey stick" curve). This applies to CPU, Database Connections, and Thread Pools. To fix it, you must either add capacity (Scale Out) or drop traffic (Load Shedding).

Low Utilization (10%). System easily handles requests immediately.

How it works (Queueing Theory)

# The mathematical model for latency based on utilization (ρ)
# L = 1 / (1 - ρ)

# At 50% utilization (ρ = 0.5):
latency = 1 / (1 - 0.5) # = 2 (Fast)

# At 80% utilization (ρ = 0.8):
latency = 1 / (1 - 0.8) # = 5 (Starting to queue)

# At 99% utilization (ρ = 0.99):
latency = 1 / (1 - 0.99) # = 100! (Catastrophic queuing, hockey-stick)

# SOLUTION: 
# Auto-scale *before* hitting the knee (e.g. at 65% CPU).
# Because once you are in the knee, it's too late.