The User-Agent header

A free-form string your client mumbles on every request: “hi, I’m this browser, on this OS” — and the server tries to make sense of it.

The idea

When your browser asks a server for a page, it tags along a request header called User-Agent — one plain string that names the software, rendering engine, operating system, and device. The server reads it to guess who is calling: a desktop Chrome, an iPhone, a search-engine crawler.

The catch is that the string is free-form text the client controls. Decades of “pretend to be the other browser so sites don’t lock me out” left it stuffed with vestigial tokens like Mozilla/5.0, KHTML, like Gecko, and a trailing Safari even on Chrome. Servers tokenize it heuristically for analytics, feature gating, and bot detection — but because anyone can send anything, it is spoofable and unreliable. That’s why the platform is shifting to structured Sec-CH-UA Client Hints.

See it work

Paste a UA:

Pick a preset or paste a string, then step through the parse.

How it works

A real parser does not “understand” the string — it runs ordered substring and regex checks and takes the first plausible match. Order is everything: Edge’s string contains Chrome, and almost every Chromium/Safari string contains Safari, so you must test the most specific token first. Bot detection is a coarse keyword sweep; device type leans on Mobile/Android/iPhone.

import re

def parse_ua(ua):
    s = ua  # raw, attacker-controlled — keep checks cheap and anchored

    # 1) Bots first: cheap keyword sweep, before any browser guess
    if re.search(r"bot|spider|crawl", s, re.I):
        ver = re.search(r"(?:bot|spider)/(\d+(?:\.\d+)?)", s, re.I)
        return {"kind": "bot", "version": ver.group(1) if ver else None}

    # 2) Browser — MOST SPECIFIC token wins, so order matters
    browser = "Unknown"; m = None
    if "Edg/" in s:                         # Edge ships "Chrome" too!
        browser, m = "Edge", re.search(r"Edg/(\d+)", s)
    elif "OPR/" in s or "Opera" in s:       # Opera also carries "Chrome"
        browser, m = "Opera", re.search(r"OPR/(\d+)", s)
    elif "Chrome/" in s and "Safari" in s:  # plain Chromium Chrome
        browser, m = "Chrome", re.search(r"Chrome/(\d+)", s)
    elif "Firefox/" in s:
        browser, m = "Firefox", re.search(r"Firefox/(\d+)", s)
    elif "Safari/" in s and "Version/" in s:  # real Safari, not Chrome
        browser, m = "Safari", re.search(r"Version/(\d+)", s)
    version = m.group(1) if m else None

    # 3) OS — first match by anchored substring
    os_name = next((name for tok, name in [
        ("Windows NT", "Windows"), ("Mac OS X", "macOS"),
        ("Android", "Android"), ("iPhone", "iOS"),
        ("iPad", "iPadOS"), ("Linux", "Linux"),
    ] if tok in s), "Unknown")

    # 4) Device — a hint, not a fact
    device = "mobile" if re.search(r"Mobile|Android|iPhone", s) else "desktop"

    return {"kind": "browser", "browser": browser, "version": version,
            "os": os_name, "device": device}

Notice the Safari and like Gecko tokens on a Chrome string are pure noise — they exist only so old servers don’t reject Chromium. A good parser ignores them once a more specific token has matched.

Cost / trade-offs

Dimension	What you get
Parse cost	O(n) over a short string (a few hundred chars) — effectively free per request.
Maintenance	Brittle. Hand-rolled regexes rot as browsers add or freeze tokens; a maintained library (ua-parser) is safer than DIY.
Trust	Zero. Fully client-controlled and trivially spoofed — never a security or auth signal.
Successor	Client Hints (`Sec-CH-UA`, `Sec-CH-UA-Platform`, `Sec-CH-UA-Mobile`) give structured, opt-in, server-requested fields instead of one guessable blob.

Watch out for

Never use it for auth or security. It is free-form text the client picks. Gating access, CSRF defenses, or rate-limit trust on the UA is gating on a value the attacker writes.
Ordering bugs. Edge contains Chrome; Opera contains Chrome; nearly everything contains Safari. Check the most specific token (Edg/, OPR/) before the generic one, or you’ll mislabel every Edge user as Chrome.
UA freezing / reduced strings. Modern Chrome freezes minor version digits and trims platform detail. Parsers that depend on exact sub-versions or fine OS info silently break — the missing data now lives in Client Hints.
Feature detection. Don’t infer “does this browser support X” from the UA. Test the capability directly ('IntersectionObserver' in window), since the UA lies and lags behind the actual engine.
Bots impersonate browsers, and regexes can hang. Scrapers happily send a real Chrome UA, so the string proves nothing. And a greedy, backtracking regex run on attacker-controlled input can blow up into ReDoS — keep patterns anchored and simple.

Worked example

Take Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36. We sweep for bot|spider|crawl — no match, so it’s not a crawler. We look for Edg/ and OPR/ — absent, so it isn’t Edge or Opera. We see Chrome/ alongside Safari, so browser = Chrome, and Chrome/(\d+) pulls version 120. For OS, Mac OS X matches first → macOS. No Mobile/Android/iPhone token, so device = desktop. The leading Mozilla/5.0, the AppleWebKit engine tag, the KHTML, like Gecko aside, and the trailing Safari/537.36 are all compatibility cruft we read past once Chrome was identified.

Check yourself

1. An Edge user-agent contains the token Chrome/120. Your parser checks "Chrome/" in s before "Edg/" in s. What does it report, and why is that wrong?

2. Your login endpoint blocks any request whose User-Agent doesn’t look like a real browser. Is that a sound defense?