Why your dashboard says "50ms average", but your biggest customers are timing out.
When monitoring latency, looking at the Average (Mean) is a trap. If 99 users load a page in 10ms, but 1 user takes 5,000ms (because their large payload triggered a database scan), the average is still a healthy-looking 60ms. You are completely blind to the fact that your heaviest users are suffering.
Instead, look at Percentiles. The p50 (Median) shows what the typical user experiences. The p99 (Tail Latency) shows what the worst 1% experience. As traffic scales, minor bottlenecks (like garbage collection or lock contention) cause the p99 tail to blow up, even while the p50 stays flat.
When observing a system, track these 4 signals (SRE Book):
1. Latency: How long it takes to service a request (Track p50 and p99!)
2. Traffic: The amount of demand on the system (RPS).
3. Errors: The rate of requests that fail (5xx).
4. Saturation: How "full" your system is (CPU, Memory, DB Conns).
# The Trap of the Average
def calc_average(latencies):
# If latencies = [10, 10, 10, 10, 5000]
return sum(latencies) / len(latencies) # Returns 1008ms!
# Averages hide the fact that 80% of users are perfectly fast!