What is catastrophic backtracking in regex?

Catastrophic backtracking occurs when an NFA-based regex engine encounters a pattern with nested or ambiguous quantifiers that can match the same characters in an exponential number of ways. When the overall match fails, the engine must explore every possible combination before giving up, causing matching time to grow as O(2^n) or worse relative to input length. A string of 30 characters can take seconds; 40 characters can take hours.

Which programming languages are vulnerable to catastrophic backtracking?

JavaScript, Python (re module), Java, Ruby, and PHP all use NFA-based regex engines and are vulnerable. Go and Rust use RE2-based engines that guarantee linear-time matching regardless of pattern complexity, making them immune. Python's third-party regex module also supports atomic groups and possessive quantifiers that prevent the issue.

How do I fix a catastrophic backtracking regex?

The most reliable fixes are: (1) use atomic groups (?>...) where supported to prevent the engine from revisiting positions, (2) use possessive quantifiers like a++ instead of a+ so matches are not given back on failure, (3) eliminate ambiguity by using mutually exclusive character classes so no two branches can match the same character, (4) anchor the pattern with ^ and $ to fail fast, or (5) replace the regex with a dedicated parser for complex formats like email or HTML.

What is the (a+)+ catastrophic backtracking example?

The pattern (a+)+ applied to a string like 'aaaaaX' demonstrates catastrophic backtracking. The outer + repeats the inner group (a+) one or more times. Because the inner group can consume 1, 2, 3, or more 'a' characters at each position, and the outer + can repeat the group any number of times, there are 2^n ways to partition n 'a' characters between the iterations. When the trailing 'X' fails the match, the engine exhausts all 2^n partitions before giving up.

Regex Guide

Regex Catastrophic
Backtracking

What it is, why it freezes your app, how to identify a vulnerable pattern, and how to fix it permanently.

9 min read·Updated May 2026

Regex catastrophic backtracking freezes your app when a pattern with nested quantifiers hits certain input — the engine retries exponentially many match paths before admitting failure, turning a 40-character string into a minutes-long hang. This guide shows you how to identify, benchmark, and fix vulnerable patterns.

What Catastrophic Backtracking Actually Is

Most modern regex engines — JavaScript's V8, Python's re, Java's java.util.regex — are built on NFA (Nondeterministic Finite Automaton) implementations. An NFA engine tries one path through the pattern, backtracks when it fails, and tries the next path. This is what makes features like capture groups, backreferences, and lookaheads possible. It also introduces a performance trap.

The classic demonstration is the pattern (a+)+ applied to the string "aaaaaX".

The Classic Example

// Pattern: (a+)+    Input: "aaaaaX"
//
// The outer + can repeat the inner group any number of times.
// The inner group (a+) can consume 1, 2, 3, 4, or 5 'a' chars.
// Together they can partition "aaaaa" in 2^(n-1) different ways:
//
//   Iteration 1: [aaaaa]
//   Iteration 2: [aaaa][a]
//   Iteration 3: [aaa][aa]
//   Iteration 4: [aaa][a][a]
//   Iteration 5: [aa][aaa]
//   ... 2^4 = 16 combinations for 5 'a' chars
//
// 'X' fails every combination. Engine tries ALL of them.
// n=30 chars → 2^29 ≈ 500 million attempts.

const re = /(a+)+/;
re.test('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaX');
// ↑ This will hang your process for seconds or minutes

The time complexity is O(2ⁿ) where n is the number of characters the ambiguous portion can match. Each additional character doubles the work. At n=20 the engine makes roughly one million attempts; at n=40 it makes over a trillion.

A Real-World Example

The toy (a+)+ pattern makes the problem obvious. Real-world catastrophic backtracking is subtler — it hides inside patterns that look reasonable until a specific input shape hits them.

Consider this common attempt at matching a repeating word pattern:

Real-World Pattern

// Intended: match one or more repetitions of "x" followed by "y"
const re = /(x+x+)+y/;

// Fine on matching input:
re.test('xxxxy');     // true — fast

// Catastrophic on non-matching input:
re.test('xxxxxxxxxxxxxxxxxxxxxxxxx');
// ↑ No trailing 'y' — engine backtracks through 2^n partitions

Notice the inner group x+x+: two adjacent + quantifiers over the same character class. The inner group can match "xx", "xxx", "xxxx", etc. — and it can split those characters between the two x+ halves in countless ways. Wrapping it in an outer + multiplies the ambiguity further.

The same issue appears in naive email-like patterns:

// Vulnerable: nested quantifiers with overlapping character classes
const vulnerable = /^([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+)+$/;
//                      ^^^^^^^^^^^^^^^^ inner +    ^^^^^^^^^^ outer +
//
// Input "aaaaaaaaaaaaaaaaaaaaaa@" has no valid ending.
// Engine backtracks through every way to partition the 'a's.

This is exactly why production regex for user-controlled input — form fields, API parameters, file names — must be audited before deployment. The attacker just needs to submit a string designed to trigger maximum backtracking.

How to Spot a Vulnerable Pattern

Three structural red flags indicate catastrophic backtracking risk:

Red Flag 1 — Nested quantifiers

A quantified group containing another quantified element over the same or overlapping character set.

(a+)+ (\w+)+ ([a-z]+)* (\d{1,3})+

Red Flag 2 — Alternation with overlapping branches

Branches that can match the same characters, repeated with a quantifier. The engine tries every branch at every position.

(ab|a)+ (x|x?)+ (\w|\d)+

Red Flag 3 — Quantified group containing optional parts

An optional element inside a repeated group means the group can match the same substring in multiple ways — zero times the optional or once — multiplying the paths the engine must explore.

(\w+\s?)+ ([a-z]+[_-]?)+ (https?://)?(\w+\.)+

The quick mental test: "Can two different sequences of match steps produce the same overall matched string?" If yes, the engine will explore all of them when forced to backtrack.

How to Test for It

The fastest way to confirm a pattern is vulnerable is a timing benchmark with increasingly long inputs. A healthy pattern's match time grows linearly with input length; a catastrophic one grows exponentially.

Node.js Timing Benchmark

function benchmarkRegex(pattern, buildInput, maxN = 30) {
  console.log(`Testing: ${pattern}`);
  for (let n = 5; n <= maxN; n += 5) {
    const input = buildInput(n);
    const start = performance.now();
    pattern.test(input);
    const ms = (performance.now() - start).toFixed(3);
    console.log(`  n=${n.toString().padStart(2)}: ${ms.padStart(8)}ms`);
    if (parseFloat(ms) > 500) { console.log('  *** CATASTROPHIC — stopping ***'); break; }
  }
}

// Test the vulnerable pattern
benchmarkRegex(/(x+x+)+y/, n => 'x'.repeat(n));

// Typical output:
//   n= 5:    0.012ms
//   n=10:    0.089ms
//   n=15:    2.910ms
//   n=20:   94.203ms
//   n=25: 3041.770ms
//   *** CATASTROPHIC — stopping ***

If you'd rather use an automated checker, tools like vuln-regex-detector (Node.js) and the online ReDoS checker at devina.io can analyse patterns statically. They won't catch every case, but they'll flag the obvious structural vulnerabilities without you having to construct adversarial inputs manually.

How to Fix It

There are four reliable techniques, each suited to different situations:

1. Atomic Groups — `(?>...)`

An atomic group commits to its match and discards the saved backtrack positions. Once the group matches, the engine will never retry different partitions of the consumed characters. This eliminates the exponential blowup at its source.

// Vulnerable:
(a+)+

// Fixed with atomic group (Java, PCRE, .NET, Ruby, Python regex module):
(?>a+)+
// The inner match is locked — no re-partitioning on failure.

// Java example:
Pattern p = Pattern.compile("(?>a+)+");
// Now safe against adversarial input.

2. Possessive Quantifiers — `a++`

Possessive quantifiers are syntactic sugar for atomic groups over a single element. a++ means "match as many a characters as possible and never give any back." Available in Java, PCRE (PHP), and Python's regex module.

// Vulnerable:        (a+)+
// Possessive fix:    a++  (no outer group needed)

// PHP (PCRE):
preg_match('/a++/', $input);

// Java:
Pattern.compile("a++");

// Python regex module:
import regex
regex.match(r'a++', text)

3. Eliminate Ambiguity with Specific Character Classes

Often the best fix requires no special syntax — just make the character classes mutually exclusive so no two branches of the pattern can match the same characters. This is the most portable fix because it works in every engine.

// Vulnerable: domain matching with overlapping classes
/(\w+\.)+\w+/
// \w matches letters, digits AND underscore — overlaps everywhere

// Fixed: separate the label and dot explicitly
/[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?)+/
// Each segment is defined precisely — no ambiguous overlap

// Vulnerable: words with optional separator
/(\w+\s?)+/

// Fixed: separate the two roles
/\w+(\s\w+)*/
// First word then zero or more (space + word) — unambiguous

4. Anchor Aggressively and Fail Fast

Anchors limit the positions the engine attempts. A pattern with ^ and $ can only start at position 0 and must consume the entire string — it fails immediately on long non-matching inputs instead of trying from every character position.

// Anchoring doesn't eliminate exponential backtracking on its own,
// but it prevents the engine from retrying the whole pattern from
// each successive character position in a longer string.

// Unanchored (tries from every position):
/(a+)+b/

// Anchored (tries once from position 0 only):
/^(a+)+b$/

// For validation use cases, always anchor both ends.
// For search-within-text use cases, consider splitting into
// a simple find-candidate step + a precise validation step.

Language-by-Language Notes

Language / Engine	Vulnerable?	Atomic Groups	Possessive Quantifiers
JavaScript (V8)	Yes	`(?>...)` — ES2025+ V8 ≥12.4, Node ≥22	Not supported
Python `re`	Yes	Not supported	Not supported
Python `regex` module	Fixable	Supported `(?>...)`	Supported `a++`
Java (`java.util.regex`)	Yes	Supported `(?>...)`	Supported `a++`
PHP (PCRE)	Yes	Supported `(?>...)`	Supported `a++`
Go (`regexp`, RE2)	Immune	N/A — linear time	N/A — linear time
Rust (`regex` crate)	Immune	N/A — linear time	N/A — linear time
Ruby	Yes	Supported `(?>...)`	Supported `a++`

Go and Rust deliberately omit features (backreferences, lookarounds) that would require NFA-style backtracking. This is the deliberate RE2 trade-off: linear-time safety in exchange for a smaller feature set.

JavaScript note

Prior to ES2025, JavaScript had no atomic groups or possessive quantifiers. The practical mitigation for older runtimes is to rewrite the pattern using unambiguous character classes (fix #3 above) and apply it only after a simpler pre-filter validates input length and rough shape. Never apply a complex regex directly to unbounded user input in pre-ES2025 JavaScript.

Python note

The standard library re module has no protection. Install the third-party regex module (pip install regex), which is API-compatible with re but adds atomic groups and possessive quantifiers. Swap the import on any code path that handles user-controlled strings with complex patterns.

Quick Decision Checklist

Audit these cases

User-controlled input (form fields, API params, file names)
Pattern has nested quantifiers (x+)+ or (x|y)+
Pattern processes untrusted log lines or webhook payloads
Regex runs in a request handler (synchronous blocking)
NFA engine: JS, Python re, Java, PHP, Ruby

Lower risk

Fixed literal strings (no user input)
Simple patterns: anchored, no nested quantifiers
Go or Rust (RE2 engine — linear time always)
Pattern applies to bounded input (e.g., max 20 chars enforced upstream)
Offline/batch processing where a slow match is acceptable

Test Your Pattern for Backtracking

Paste your regex and a long non-matching string into the tester and watch the timing display.

Runs entirely in your browser — no data leaves your machine.

Open Regex Tester →

Regex Catastrophic
Backtracking

What Catastrophic Backtracking Actually Is

A Real-World Example

How to Spot a Vulnerable Pattern

How to Test for It

How to Fix It

1. Atomic Groups — `(?>...)`

2. Possessive Quantifiers — `a++`

3. Eliminate Ambiguity with Specific Character Classes

4. Anchor Aggressively and Fail Fast

Language-by-Language Notes

JavaScript note

Python note

Quick Decision Checklist

Test Your Pattern for Backtracking

Related

What should we build next?

What Catastrophic Backtracking Actually Is

A Real-World Example

How to Spot a Vulnerable Pattern

How to Test for It

How to Fix It

1. Atomic Groups — (?>...)

2. Possessive Quantifiers — a++

3. Eliminate Ambiguity with Specific Character Classes

4. Anchor Aggressively and Fail Fast

Language-by-Language Notes

JavaScript note

Python note

Quick Decision Checklist

Test Your Pattern for Backtracking

Related

What should we build next?

1. Atomic Groups — `(?>...)`

2. Possessive Quantifiers — `a++`