Finding the needle in the haystack of regular data.
Anomaly detection (or outlier detection) identifies data points that deviate significantly from normal behavior. It's used everywhere from credit card fraud detection to server monitoring. Rather than explicitly defining what "bad" looks like, we define what "normal" looks like, and flag everything else.
A simple statistical approach is the Z-Score, which measures how many standard deviations a data point is from the mean. If Z > 3, it's generally considered an anomaly.
import numpy as np
def detect_anomalies(data):
mean = np.mean(data)
std_dev = np.std(data)
anomalies = []
for item in data:
z_score = abs(item - mean) / std_dev
if z_score > 3: # Threshold
anomalies.append(item)
return anomalies
cpu_usage = [20, 22, 19, 21, 24, 99, 20, 21]
print(detect_anomalies(cpu_usage)) # [99]
Time Complexity: O(N) to compute mean/std, and O(N) to calculate Z-scores. Space Complexity: O(1) if done in a streaming fashion. More complex ML approaches (like Isolation Forests) take O(N log N).