The User-Agent header

A free-form string your client mumbles on every request: “hi, I’m this browser, on this OS” — and the server tries to make sense of it.

The idea

When your browser asks a server for a page, it tags along a request header called User-Agent — one plain string that names the software, rendering engine, operating system, and device. The server reads it to guess who is calling: a desktop Chrome, an iPhone, a search-engine crawler.

The catch is that the string is free-form text the client controls. Decades of “pretend to be the other browser so sites don’t lock me out” left it stuffed with vestigial tokens like Mozilla/5.0, KHTML, like Gecko, and a trailing Safari even on Chrome. Servers tokenize it heuristically for analytics, feature gating, and bot detection — but because anyone can send anything, it is spoofable and unreliable. That’s why the platform is shifting to structured Sec-CH-UA Client Hints.

See it work

Pick a preset or paste a string, then step through the parse.

How it works

A real parser does not “understand” the string — it runs ordered substring and regex checks and takes the first plausible match. Order is everything: Edge’s string contains Chrome, and almost every Chromium/Safari string contains Safari, so you must test the most specific token first. Bot detection is a coarse keyword sweep; device type leans on Mobile/Android/iPhone.

import re

def parse_ua(ua):
    s = ua  # raw, attacker-controlled — keep checks cheap and anchored

    # 1) Bots first: cheap keyword sweep, before any browser guess
    if re.search(r"bot|spider|crawl", s, re.I):
        ver = re.search(r"(?:bot|spider)/(\d+(?:\.\d+)?)", s, re.I)
        return {"kind": "bot", "version": ver.group(1) if ver else None}

    # 2) Browser — MOST SPECIFIC token wins, so order matters
    browser = "Unknown"; m = None
    if "Edg/" in s:                         # Edge ships "Chrome" too!
        browser, m = "Edge", re.search(r"Edg/(\d+)", s)
    elif "OPR/" in s or "Opera" in s:       # Opera also carries "Chrome"
        browser, m = "Opera", re.search(r"OPR/(\d+)", s)
    elif "Chrome/" in s and "Safari" in s:  # plain Chromium Chrome
        browser, m = "Chrome", re.search(r"Chrome/(\d+)", s)
    elif "Firefox/" in s:
        browser, m = "Firefox", re.search(r"Firefox/(\d+)", s)
    elif "Safari/" in s and "Version/" in s:  # real Safari, not Chrome
        browser, m = "Safari", re.search(r"Version/(\d+)", s)
    version = m.group(1) if m else None

    # 3) OS — first match by anchored substring
    os_name = next((name for tok, name in [
        ("Windows NT", "Windows"), ("Mac OS X", "macOS"),
        ("Android", "Android"), ("iPhone", "iOS"),
        ("iPad", "iPadOS"), ("Linux", "Linux"),
    ] if tok in s), "Unknown")

    # 4) Device — a hint, not a fact
    device = "mobile" if re.search(r"Mobile|Android|iPhone", s) else "desktop"

    return {"kind": "browser", "browser": browser, "version": version,
            "os": os_name, "device": device}

Notice the Safari and like Gecko tokens on a Chrome string are pure noise — they exist only so old servers don’t reject Chromium. A good parser ignores them once a more specific token has matched.

Cost / trade-offs

DimensionWhat you get
Parse costO(n) over a short string (a few hundred chars) — effectively free per request.
MaintenanceBrittle. Hand-rolled regexes rot as browsers add or freeze tokens; a maintained library (ua-parser) is safer than DIY.
TrustZero. Fully client-controlled and trivially spoofed — never a security or auth signal.
SuccessorClient Hints (Sec-CH-UA, Sec-CH-UA-Platform, Sec-CH-UA-Mobile) give structured, opt-in, server-requested fields instead of one guessable blob.

Watch out for

Worked example

Take Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36. We sweep for bot|spider|crawl — no match, so it’s not a crawler. We look for Edg/ and OPR/ — absent, so it isn’t Edge or Opera. We see Chrome/ alongside Safari, so browser = Chrome, and Chrome/(\d+) pulls version 120. For OS, Mac OS X matches first → macOS. No Mobile/Android/iPhone token, so device = desktop. The leading Mozilla/5.0, the AppleWebKit engine tag, the KHTML, like Gecko aside, and the trailing Safari/537.36 are all compatibility cruft we read past once Chrome was identified.

Check yourself

1. An Edge user-agent contains the token Chrome/120. Your parser checks "Chrome/" in s before "Edg/" in s. What does it report, and why is that wrong?

2. Your login endpoint blocks any request whose User-Agent doesn’t look like a real browser. Is that a sound defense?