Regex Catastrophic
Backtracking
What it is, why it freezes your app, how to identify a vulnerable pattern, and how to fix it permanently.
Regex catastrophic backtracking freezes your app when a pattern with nested quantifiers hits certain input — the engine retries exponentially many match paths before admitting failure, turning a 40-character string into a minutes-long hang. This guide shows you how to identify, benchmark, and fix vulnerable patterns.
What Catastrophic Backtracking Actually Is
Most modern regex engines — JavaScript's V8, Python's re, Java's java.util.regex — are built on NFA (Nondeterministic Finite Automaton) implementations. An NFA engine tries one path through the pattern, backtracks when it fails, and tries the next path. This is what makes features like capture groups, backreferences, and lookaheads possible. It also introduces a performance trap.
The classic demonstration is the pattern (a+)+ applied to the string "aaaaaX".
// Pattern: (a+)+ Input: "aaaaaX"
//
// The outer + can repeat the inner group any number of times.
// The inner group (a+) can consume 1, 2, 3, 4, or 5 'a' chars.
// Together they can partition "aaaaa" in 2^(n-1) different ways:
//
// Iteration 1: [aaaaa]
// Iteration 2: [aaaa][a]
// Iteration 3: [aaa][aa]
// Iteration 4: [aaa][a][a]
// Iteration 5: [aa][aaa]
// ... 2^4 = 16 combinations for 5 'a' chars
//
// 'X' fails every combination. Engine tries ALL of them.
// n=30 chars → 2^29 ≈ 500 million attempts.
const re = /(a+)+/;
re.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaX');
// ↑ This will hang your process for seconds or minutes
The time complexity is O(2n) where n is the number of characters the ambiguous portion can match. Each additional character doubles the work. At n=20 the engine makes roughly one million attempts; at n=40 it makes over a trillion.
A Real-World Example
The toy (a+)+ pattern makes the problem obvious. Real-world catastrophic backtracking is subtler — it hides inside patterns that look reasonable until a specific input shape hits them.
Consider this common attempt at matching a repeating word pattern:
// Intended: match one or more repetitions of "x" followed by "y"
const re = /(x+x+)+y/;
// Fine on matching input:
re.test('xxxxy'); // true — fast
// Catastrophic on non-matching input:
re.test('xxxxxxxxxxxxxxxxxxxxxxxxx');
// ↑ No trailing 'y' — engine backtracks through 2^n partitions
Notice the inner group x+x+: two adjacent + quantifiers over the same character class. The inner group can match "xx", "xxx", "xxxx", etc. — and it can split those characters between the two x+ halves in countless ways. Wrapping it in an outer + multiplies the ambiguity further.
The same issue appears in naive email-like patterns:
// Vulnerable: nested quantifiers with overlapping character classes
const vulnerable = /^([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+)+$/;
// ^^^^^^^^^^^^^^^^ inner + ^^^^^^^^^^ outer +
//
// Input "aaaaaaaaaaaaaaaaaaaaaa@" has no valid ending.
// Engine backtracks through every way to partition the 'a's.
This is exactly why production regex for user-controlled input — form fields, API parameters, file names — must be audited before deployment. The attacker just needs to submit a string designed to trigger maximum backtracking.
How to Spot a Vulnerable Pattern
Three structural red flags indicate catastrophic backtracking risk:
A quantified group containing another quantified element over the same or overlapping character set.
(a+)+ (\w+)+ ([a-z]+)* (\d{1,3})+
Branches that can match the same characters, repeated with a quantifier. The engine tries every branch at every position.
(ab|a)+ (x|x?)+ (\w|\d)+
An optional element inside a repeated group means the group can match the same substring in multiple ways — zero times the optional or once — multiplying the paths the engine must explore.
(\w+\s?)+ ([a-z]+[_-]?)+ (https?://)?(\w+\.)+
The quick mental test: "Can two different sequences of match steps produce the same overall matched string?" If yes, the engine will explore all of them when forced to backtrack.
How to Test for It
The fastest way to confirm a pattern is vulnerable is a timing benchmark with increasingly long inputs. A healthy pattern's match time grows linearly with input length; a catastrophic one grows exponentially.
function benchmarkRegex(pattern, buildInput, maxN = 30) {
console.log(`Testing: ${pattern}`);
for (let n = 5; n <= maxN; n += 5) {
const input = buildInput(n);
const start = performance.now();
pattern.test(input);
const ms = (performance.now() - start).toFixed(3);
console.log(` n=${n.toString().padStart(2)}: ${ms.padStart(8)}ms`);
if (parseFloat(ms) > 500) { console.log(' *** CATASTROPHIC — stopping ***'); break; }
}
}
// Test the vulnerable pattern
benchmarkRegex(/(x+x+)+y/, n => 'x'.repeat(n));
// Typical output:
// n= 5: 0.012ms
// n=10: 0.089ms
// n=15: 2.910ms
// n=20: 94.203ms
// n=25: 3041.770ms
// *** CATASTROPHIC — stopping ***
If you'd rather use an automated checker, tools like vuln-regex-detector (Node.js) and the online ReDoS checker at devina.io can analyse patterns statically. They won't catch every case, but they'll flag the obvious structural vulnerabilities without you having to construct adversarial inputs manually.
How to Fix It
There are four reliable techniques, each suited to different situations:
1. Atomic Groups — (?>...)
An atomic group commits to its match and discards the saved backtrack positions. Once the group matches, the engine will never retry different partitions of the consumed characters. This eliminates the exponential blowup at its source.
// Vulnerable:
(a+)+
// Fixed with atomic group (Java, PCRE, .NET, Ruby, Python regex module):
(?>a+)+
// The inner match is locked — no re-partitioning on failure.
// Java example:
Pattern p = Pattern.compile("(?>a+)+");
// Now safe against adversarial input.
2. Possessive Quantifiers — a++
Possessive quantifiers are syntactic sugar for atomic groups over a single element. a++ means "match as many a characters as possible and never give any back." Available in Java, PCRE (PHP), and Python's regex module.
// Vulnerable: (a+)+
// Possessive fix: a++ (no outer group needed)
// PHP (PCRE):
preg_match('/a++/', $input);
// Java:
Pattern.compile("a++");
// Python regex module:
import regex
regex.match(r'a++', text)
3. Eliminate Ambiguity with Specific Character Classes
Often the best fix requires no special syntax — just make the character classes mutually exclusive so no two branches of the pattern can match the same characters. This is the most portable fix because it works in every engine.
// Vulnerable: domain matching with overlapping classes
/(\w+\.)+\w+/
// \w matches letters, digits AND underscore — overlaps everywhere
// Fixed: separate the label and dot explicitly
/[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?)+/
// Each segment is defined precisely — no ambiguous overlap
// Vulnerable: words with optional separator
/(\w+\s?)+/
// Fixed: separate the two roles
/\w+(\s\w+)*/
// First word then zero or more (space + word) — unambiguous
4. Anchor Aggressively and Fail Fast
Anchors limit the positions the engine attempts. A pattern with ^ and $ can only start at position 0 and must consume the entire string — it fails immediately on long non-matching inputs instead of trying from every character position.
// Anchoring doesn't eliminate exponential backtracking on its own,
// but it prevents the engine from retrying the whole pattern from
// each successive character position in a longer string.
// Unanchored (tries from every position):
/(a+)+b/
// Anchored (tries once from position 0 only):
/^(a+)+b$/
// For validation use cases, always anchor both ends.
// For search-within-text use cases, consider splitting into
// a simple find-candidate step + a precise validation step.
Language-by-Language Notes
| Language / Engine | Vulnerable? | Atomic Groups | Possessive Quantifiers |
|---|---|---|---|
| JavaScript (V8) | Yes | (?>...) — ES2025+V8 ≥12.4, Node ≥22 |
Not supported |
Python re |
Yes | Not supported | Not supported |
Python regex module |
Fixable | Supported (?>...) |
Supported a++ |
Java (java.util.regex) |
Yes | Supported (?>...) |
Supported a++ |
| PHP (PCRE) | Yes | Supported (?>...) |
Supported a++ |
Go (regexp, RE2) |
Immune | N/A — linear time | N/A — linear time |
Rust (regex crate) |
Immune | N/A — linear time | N/A — linear time |
| Ruby | Yes | Supported (?>...) |
Supported a++ |
Go and Rust deliberately omit features (backreferences, lookarounds) that would require NFA-style backtracking. This is the deliberate RE2 trade-off: linear-time safety in exchange for a smaller feature set.
JavaScript note
Prior to ES2025, JavaScript had no atomic groups or possessive quantifiers. The practical mitigation for older runtimes is to rewrite the pattern using unambiguous character classes (fix #3 above) and apply it only after a simpler pre-filter validates input length and rough shape. Never apply a complex regex directly to unbounded user input in pre-ES2025 JavaScript.
Python note
The standard library re module has no protection. Install the third-party regex module (pip install regex), which is API-compatible with re but adds atomic groups and possessive quantifiers. Swap the import on any code path that handles user-controlled strings with complex patterns.
Quick Decision Checklist
- User-controlled input (form fields, API params, file names)
- Pattern has nested quantifiers
(x+)+or(x|y)+ - Pattern processes untrusted log lines or webhook payloads
- Regex runs in a request handler (synchronous blocking)
- NFA engine: JS, Python re, Java, PHP, Ruby
- Fixed literal strings (no user input)
- Simple patterns: anchored, no nested quantifiers
- Go or Rust (RE2 engine — linear time always)
- Pattern applies to bounded input (e.g., max 20 chars enforced upstream)
- Offline/batch processing where a slow match is acceptable
Test Your Pattern for Backtracking
Paste your regex and a long non-matching string into the tester and watch the timing display.
Runs entirely in your browser — no data leaves your machine.
Open Regex Tester →