A free-form string your client mumbles on every request: “hi, I’m this browser, on this OS” — and the server tries to make sense of it.
When your browser asks a server for a page, it tags along a request header called User-Agent — one plain string that names the software, rendering engine, operating system, and device. The server reads it to guess who is calling: a desktop Chrome, an iPhone, a search-engine crawler.
The catch is that the string is free-form text the client controls. Decades of “pretend to be the other browser so sites don’t lock me out” left it stuffed with vestigial tokens like Mozilla/5.0, KHTML, like Gecko, and a trailing Safari even on Chrome. Servers tokenize it heuristically for analytics, feature gating, and bot detection — but because anyone can send anything, it is spoofable and unreliable. That’s why the platform is shifting to structured Sec-CH-UA Client Hints.
A real parser does not “understand” the string — it runs ordered substring and regex checks and takes the first plausible match. Order is everything: Edge’s string contains Chrome, and almost every Chromium/Safari string contains Safari, so you must test the most specific token first. Bot detection is a coarse keyword sweep; device type leans on Mobile/Android/iPhone.
import re
def parse_ua(ua):
s = ua # raw, attacker-controlled — keep checks cheap and anchored
# 1) Bots first: cheap keyword sweep, before any browser guess
if re.search(r"bot|spider|crawl", s, re.I):
ver = re.search(r"(?:bot|spider)/(\d+(?:\.\d+)?)", s, re.I)
return {"kind": "bot", "version": ver.group(1) if ver else None}
# 2) Browser — MOST SPECIFIC token wins, so order matters
browser = "Unknown"; m = None
if "Edg/" in s: # Edge ships "Chrome" too!
browser, m = "Edge", re.search(r"Edg/(\d+)", s)
elif "OPR/" in s or "Opera" in s: # Opera also carries "Chrome"
browser, m = "Opera", re.search(r"OPR/(\d+)", s)
elif "Chrome/" in s and "Safari" in s: # plain Chromium Chrome
browser, m = "Chrome", re.search(r"Chrome/(\d+)", s)
elif "Firefox/" in s:
browser, m = "Firefox", re.search(r"Firefox/(\d+)", s)
elif "Safari/" in s and "Version/" in s: # real Safari, not Chrome
browser, m = "Safari", re.search(r"Version/(\d+)", s)
version = m.group(1) if m else None
# 3) OS — first match by anchored substring
os_name = next((name for tok, name in [
("Windows NT", "Windows"), ("Mac OS X", "macOS"),
("Android", "Android"), ("iPhone", "iOS"),
("iPad", "iPadOS"), ("Linux", "Linux"),
] if tok in s), "Unknown")
# 4) Device — a hint, not a fact
device = "mobile" if re.search(r"Mobile|Android|iPhone", s) else "desktop"
return {"kind": "browser", "browser": browser, "version": version,
"os": os_name, "device": device}
Notice the Safari and like Gecko tokens on a Chrome string are pure noise — they exist only so old servers don’t reject Chromium. A good parser ignores them once a more specific token has matched.
| Dimension | What you get |
|---|---|
| Parse cost | O(n) over a short string (a few hundred chars) — effectively free per request. |
| Maintenance | Brittle. Hand-rolled regexes rot as browsers add or freeze tokens; a maintained library (ua-parser) is safer than DIY. |
| Trust | Zero. Fully client-controlled and trivially spoofed — never a security or auth signal. |
| Successor | Client Hints (Sec-CH-UA, Sec-CH-UA-Platform, Sec-CH-UA-Mobile) give structured, opt-in, server-requested fields instead of one guessable blob. |
Chrome; Opera contains Chrome; nearly everything contains Safari. Check the most specific token (Edg/, OPR/) before the generic one, or you’ll mislabel every Edge user as Chrome.'IntersectionObserver' in window), since the UA lies and lags behind the actual engine.Take Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36. We sweep for bot|spider|crawl — no match, so it’s not a crawler. We look for Edg/ and OPR/ — absent, so it isn’t Edge or Opera. We see Chrome/ alongside Safari, so browser = Chrome, and Chrome/(\d+) pulls version 120. For OS, Mac OS X matches first → macOS. No Mobile/Android/iPhone token, so device = desktop. The leading Mozilla/5.0, the AppleWebKit engine tag, the KHTML, like Gecko aside, and the trailing Safari/537.36 are all compatibility cruft we read past once Chrome was identified.
1. An Edge user-agent contains the token Chrome/120. Your parser checks "Chrome/" in s before "Edg/" in s. What does it report, and why is that wrong?
2. Your login endpoint blocks any request whose User-Agent doesn’t look like a real browser. Is that a sound defense?