Score every request on how bot-like it looks, then let it in, slow it down with a challenge, or turn it away.
A web application firewall sits in front of your app and inspects each incoming request before it reaches your code. It reads signals: how fast this client is hitting you, whether the headers look like a real browser, the TLS/JA3 fingerprint, the behaviour over time, and the IP's reputation.
Those signals combine into a single risk score. A low score means let it through. A medium score means add friction — a JavaScript or CAPTCHA challenge. A high score means block it outright. It's a graduated risk decision, not a binary on/off switch, because no single signal is ever certain.
Each signal contributes a weighted amount to the score. The weights say how much that signal matters and how much you trust it. Sum them, clamp the result into [0, 1], then compare against two thresholds: at or above 0.8 block, at or above 0.5 challenge, otherwise allow.
WEIGHTS = {
"ip_reputation": 0.35, # known abusive / datacenter range
"req_rate": 0.25, # requests per second from this client
"missing_js_cookie": 0.20, # never solved a JS challenge
"tls_fingerprint_known_bot": 0.30, # JA3 matches a scraping toolkit
"ua_anomaly": 0.15, # user-agent inconsistent / spoofed
}
def score(request):
s = 0.0
for signal, weight in WEIGHTS.items():
s += weight * request.signals.get(signal, 0.0) # each in [0, 1]
return max(0.0, min(1.0, s)) # clamp to [0, 1]
def decide(request):
s = score(request)
if s >= 0.8:
return "block" # high risk: turn it away
if s >= 0.5:
return "challenge" # medium risk: add friction (JS / CAPTCHA)
return "allow" # low risk: let it through
The thresholds are policy, not physics. Raise them to be more permissive (fewer false blocks, more bots get in); lower them to be stricter (fewer bots, but more real users see a challenge).
| Signal | What it catches |
|---|---|
| Request rate | Crawlers and scrapers hammering endpoints far faster than a human could click |
| TLS / JA3 fingerprint | Automation toolkits whose TLS handshake differs from a real browser's, even with a faked user-agent |
| JS challenge cookie | Headless clients that never run JavaScript, so they never earn the proof-of-work cookie |
| IP reputation | Datacenter ranges and addresses seen abusing other sites recently |
| Behavioural | Inhuman patterns — no mouse movement, perfectly even timing, hitting only the API never the page |
A request arrives from a residential IP, 2 req/s, with a valid JS challenge cookie and a browser-matching TLS fingerprint. Its signals are near zero, the score lands around 0.10, and the WAF allows it — a human browsing normally.
A second request comes from a clean-looking residential IP but at 18 req/s with no JS cookie and a slightly odd user-agent. Nothing screams "bot," but several mild signals stack up to roughly 0.62. The WAF challenges it: a real user solves the JS check in a blink and proceeds; an automated client without a JS engine stalls there.
A third request comes from a datacenter IP on a reputation list, 80 req/s, no JS cookie, and a TLS fingerprint matching a known scraping toolkit. The signals pile up past the block threshold to about 0.86, and the WAF blocks it before it ever touches the app.
Why challenge a medium-score request instead of just blocking it?