WAF bot detection

Score every request on how bot-like it looks, then let it in, slow it down with a challenge, or turn it away.

The idea

A web application firewall sits in front of your app and inspects each incoming request before it reaches your code. It reads signals: how fast this client is hitting you, whether the headers look like a real browser, the TLS/JA3 fingerprint, the behaviour over time, and the IP's reputation.

Those signals combine into a single risk score. A low score means let it through. A medium score means add friction — a JavaScript or CAPTCHA challenge. A high score means block it outright. It's a graduated risk decision, not a binary on/off switch, because no single signal is ever certain.

Press play to feed each request through the WAF and watch where it lands.

How it works

Each signal contributes a weighted amount to the score. The weights say how much that signal matters and how much you trust it. Sum them, clamp the result into [0, 1], then compare against two thresholds: at or above 0.8 block, at or above 0.5 challenge, otherwise allow.

WEIGHTS = {
    "ip_reputation":             0.35,  # known abusive / datacenter range
    "req_rate":                  0.25,  # requests per second from this client
    "missing_js_cookie":         0.20,  # never solved a JS challenge
    "tls_fingerprint_known_bot": 0.30,  # JA3 matches a scraping toolkit
    "ua_anomaly":                0.15,  # user-agent inconsistent / spoofed
}

def score(request):
    s = 0.0
    for signal, weight in WEIGHTS.items():
        s += weight * request.signals.get(signal, 0.0)  # each in [0, 1]
    return max(0.0, min(1.0, s))                          # clamp to [0, 1]

def decide(request):
    s = score(request)
    if s >= 0.8:
        return "block"      # high risk: turn it away
    if s >= 0.5:
        return "challenge"  # medium risk: add friction (JS / CAPTCHA)
    return "allow"          # low risk: let it through

The thresholds are policy, not physics. Raise them to be more permissive (fewer false blocks, more bots get in); lower them to be stricter (fewer bots, but more real users see a challenge).

Cost

SignalWhat it catches
Request rateCrawlers and scrapers hammering endpoints far faster than a human could click
TLS / JA3 fingerprintAutomation toolkits whose TLS handshake differs from a real browser's, even with a faked user-agent
JS challenge cookieHeadless clients that never run JavaScript, so they never earn the proof-of-work cookie
IP reputationDatacenter ranges and addresses seen abusing other sites recently
BehaviouralInhuman patterns — no mouse movement, perfectly even timing, hitting only the API never the page

Watch out for

Worked example

A request arrives from a residential IP, 2 req/s, with a valid JS challenge cookie and a browser-matching TLS fingerprint. Its signals are near zero, the score lands around 0.10, and the WAF allows it — a human browsing normally.

A second request comes from a clean-looking residential IP but at 18 req/s with no JS cookie and a slightly odd user-agent. Nothing screams "bot," but several mild signals stack up to roughly 0.62. The WAF challenges it: a real user solves the JS check in a blink and proceeds; an automated client without a JS engine stalls there.

A third request comes from a datacenter IP on a reputation list, 80 req/s, no JS cookie, and a TLS fingerprint matching a known scraping toolkit. The signals pile up past the block threshold to about 0.86, and the WAF blocks it before it ever touches the app.

Check yourself

Why challenge a medium-score request instead of just blocking it?