Regular expression denial of service (ReDoS)

A tiny input string can make a regex engine try billions of paths and freeze the server.

The idea

Most regex engines that everyday languages ship (Python's re, JavaScript, Java, Ruby, PCRE) match by backtracking: when an attempt fails, they back up and try a different way of dividing the input among the pattern's pieces.

For most patterns that's fine. But a nested quantifier like (a+)+$ creates many different ways to split the same run of characters. When the input almost matches but fails right at the end, the engine retries every one of those splits before giving up — and the number of splits grows like 2ⁿ. A 30-character string can cost over a billion attempts, hanging the request and starving the whole server.

See it work

Number of a's (n): 6

Pick n, then run the engine on n a’s followed by a non-matching X.

How it works

The pattern (a+)+$ has two stacked greedy quantifiers. The inner a+ can grab some a's; the outer (…)+ can repeat the group to grab more. For a string of n a's there are 2ⁿ⁻¹ ways to partition those a's between the two quantifiers — and all of them ultimately consume the same n a's.

Append a character that can't match (here X, which fails the $ end-anchor). Now the engine reaches the X, fails, and backtracks: it returns to the last quantifier, hands one fewer a to it, and tries the next partition. Each fails the same way. The engine grinds through every partition — roughly 2ⁿ of them — before it can answer "no match." That is catastrophic backtracking.

# DANGEROUS: nested quantifier — exponential backtracking
import re
re.match(r'(a+)+$', s)          # "aaaaaaaaaaaaaaaaX" can hang for minutes

# SAFER fixes
re.match(r'a+$', s)             # un-nest: one quantifier, linear time
re.match(r'(?>(a+))+$', s)      # atomic group: no backtrack into the group
import re2                       # RE2-style linear engine (no backtracking)
re2.match(r'(a+)+$', s)         # always O(n), immune to ReDoS

A linear engine (Google's RE2, Rust's regex, or a classic Thompson NFA) sidesteps this entirely. It tracks all possible match positions at once in a single left-to-right pass, so its cost is O(n) no matter how the pattern nests. The trade-off: it drops a few backtracking-only features such as backreferences.

Cost

Worst-case attempts for (a+)+$ against a string of n a's plus one failing character. "Time" assumes a fast engine at roughly 10 million attempts per second.

Input length n	Backtracking (~2ⁿ)	RE2 / linear (O(n))	Approx. wall time
10	~1,000	10	instant
20	~1 million	20	~0.1 s
30	~1 billion	30	~2 minutes
40	~1 trillion	40	~1.5 days
50	~10¹⁵	50	~3,500 years

Each row adds one character to the input but doubles the work for the backtracking engine. The linear engine grows by exactly one unit. That gap is the whole vulnerability: an attacker pays for a few extra bytes; the server pays exponentially.

Watch out for

Nested quantifiers: (a+)+, (a*)*, (a+)* — a quantifier applied to a group that itself contains a quantifier. This is the classic ReDoS shape.
Overlapping alternation under a quantifier: (a|a)*, (\d|[0-9])+, (.*|.*x)+ — when two branches match the same characters, the engine has multiple ways to split each repetition.
Unbounded user input into a regex: validating an uploaded field or URL with a risky pattern lets an attacker pick the worst-case string. Bound the input length before you match.
Untrusted regex patterns: letting users supply the pattern (search filters, rules engines) means they can hand you a catastrophic one. Never run an arbitrary user-supplied regex on a backtracking engine.
No timeout or safe engine: a backtracking match has no built-in cap. Run matches with a timeout, on a linear engine (RE2, Rust regex), or both.

Worked example

Take the pattern (a+)+$ and the 17-character input "aaaaaaaaaaaaaaaaX" — sixteen a's then an X.

The engine first lets the quantifiers swallow all 16 a's, reaches the X, and fails the $ anchor. It backtracks: give the last repetition one fewer a, retry — fail again. There are 2¹⁶⁻¹ = 32,768 partitions of those 16 a's between the inner and outer quantifier, and the engine tries essentially every one before concluding "no match." So a 17-byte string drives roughly 2¹⁶ ≈ 65,000 attempts.

n = 16   ->  ~2^16  =      65,536 attempts   (milliseconds)
n = 24   ->  ~2^24  =  16,777,216 attempts   (seconds)
n = 32   ->  ~2^32  = 4,294,967,296 attempts (minutes — request hangs)

Drag the slider above from 6 up to 12 and re-run: the "paths tried" counter climbs 2, 4, 8, 16, …, doubling with every extra a. That doubling is exactly what makes a few attacker-controlled bytes able to freeze a server.

Check yourself

1. Which of these patterns is most exposed to catastrophic backtracking?

2. You must keep using a backtracking engine. Which change actually removes the ReDoS risk for (a+)+$?