A tiny input string can make a regex engine try billions of paths and freeze the server.
Most regex engines that everyday languages ship (Python's re, JavaScript, Java, Ruby, PCRE) match by backtracking: when an attempt fails, they back up and try a different way of dividing the input among the pattern's pieces.
For most patterns that's fine. But a nested quantifier like (a+)+$ creates many different ways to split the same run of characters. When the input almost matches but fails right at the end, the engine retries every one of those splits before giving up — and the number of splits grows like 2ⁿ. A 30-character string can cost over a billion attempts, hanging the request and starving the whole server.
The pattern (a+)+$ has two stacked greedy quantifiers. The inner a+ can grab some a's; the outer (…)+ can repeat the group to grab more. For a string of n a's there are 2n−1 ways to partition those a's between the two quantifiers — and all of them ultimately consume the same n a's.
Append a character that can't match (here X, which fails the $ end-anchor). Now the engine reaches the X, fails, and backtracks: it returns to the last quantifier, hands one fewer a to it, and tries the next partition. Each fails the same way. The engine grinds through every partition — roughly 2ⁿ of them — before it can answer "no match." That is catastrophic backtracking.
# DANGEROUS: nested quantifier — exponential backtracking
import re
re.match(r'(a+)+$', s) # "aaaaaaaaaaaaaaaaX" can hang for minutes
# SAFER fixes
re.match(r'a+$', s) # un-nest: one quantifier, linear time
re.match(r'(?>(a+))+$', s) # atomic group: no backtrack into the group
import re2 # RE2-style linear engine (no backtracking)
re2.match(r'(a+)+$', s) # always O(n), immune to ReDoS
A linear engine (Google's RE2, Rust's regex, or a classic Thompson NFA) sidesteps this entirely. It tracks all possible match positions at once in a single left-to-right pass, so its cost is O(n) no matter how the pattern nests. The trade-off: it drops a few backtracking-only features such as backreferences.
Worst-case attempts for (a+)+$ against a string of n a's plus one failing character. "Time" assumes a fast engine at roughly 10 million attempts per second.
| Input length n | Backtracking (~2ⁿ) | RE2 / linear (O(n)) | Approx. wall time |
|---|---|---|---|
| 10 | ~1,000 | 10 | instant |
| 20 | ~1 million | 20 | ~0.1 s |
| 30 | ~1 billion | 30 | ~2 minutes |
| 40 | ~1 trillion | 40 | ~1.5 days |
| 50 | ~10¹⁵ | 50 | ~3,500 years |
Each row adds one character to the input but doubles the work for the backtracking engine. The linear engine grows by exactly one unit. That gap is the whole vulnerability: an attacker pays for a few extra bytes; the server pays exponentially.
(a+)+, (a*)*, (a+)* — a quantifier applied to a group that itself contains a quantifier. This is the classic ReDoS shape.(a|a)*, (\d|[0-9])+, (.*|.*x)+ — when two branches match the same characters, the engine has multiple ways to split each repetition.regex), or both.Take the pattern (a+)+$ and the 17-character input "aaaaaaaaaaaaaaaaX" — sixteen a's then an X.
The engine first lets the quantifiers swallow all 16 a's, reaches the X, and fails the $ anchor. It backtracks: give the last repetition one fewer a, retry — fail again. There are 216−1 = 32,768 partitions of those 16 a's between the inner and outer quantifier, and the engine tries essentially every one before concluding "no match." So a 17-byte string drives roughly 216 ≈ 65,000 attempts.
n = 16 -> ~2^16 = 65,536 attempts (milliseconds)
n = 24 -> ~2^24 = 16,777,216 attempts (seconds)
n = 32 -> ~2^32 = 4,294,967,296 attempts (minutes — request hangs)
Drag the slider above from 6 up to 12 and re-run: the "paths tried" counter climbs 2, 4, 8, 16, …, doubling with every extra a. That doubling is exactly what makes a few attacker-controlled bytes able to freeze a server.
1. Which of these patterns is most exposed to catastrophic backtracking?
2. You must keep using a backtracking engine. Which change actually removes the ReDoS risk for (a+)+$?