Skip to main content
Regex Guide

Regex in Python
The re Module Explained with Examples

match() vs search() vs fullmatch(), findall() vs finditer(), re.compile(), named groups, flags, and raw strings — all in one place.

10 min read·Updated May 2026

Regex in Python lives in the re module, and the most important thing to understand is the difference between re.match() (anchored to the start), re.search() (scans the whole string), and re.fullmatch() (must match the entire string) — picking the wrong one is the source of most "my regex works on regex101 but not in Python" bugs.

Quick-Reference: re Module Functions

Seven functions cover almost every use case. Here is what each one does and what it returns.

Function Returns Behaviour
re.match(pattern, string) Match object or None Anchored to the start of the string. Only checks position 0.
re.search(pattern, string) Match object or None Scans the whole string. Returns the first match anywhere.
re.fullmatch(pattern, string) Match object or None The entire string must match the pattern — equivalent to anchoring with ^ and $.
re.findall(pattern, string) List of strings (or tuples) Returns all non-overlapping matches. Returns tuples when the pattern has multiple groups.
re.finditer(pattern, string) Iterator of match objects Like findall but yields match objects — preserves full group and position info.
re.sub(pattern, repl, string) New string Replaces every match. repl can be a string (with \1 back-refs) or a callable.
re.compile(pattern, flags) Compiled Pattern object Returns a Pattern with the same methods. Compile once, reuse everywhere.

match() vs search() vs fullmatch()

The clearest way to see the difference is to run all three on the same input. The pattern r'\d+' and the string "price: 42 dollars" produce three completely different outcomes.

import re

s = "price: 42 dollars"
p = r'\d+'

# re.match() — anchored to position 0
re.match(p, s)          # None  (string starts with 'p', not a digit)

# re.search() — scans the whole string
m = re.search(p, s)
m.group()             # '42'
m.start()             # 7  (index where match begins)

# re.fullmatch() — entire string must match
re.fullmatch(p, s)     # None  (string has non-digit characters)
re.fullmatch(p, "42") # <Match object; span=(0,2), match='42'>

# Safe pattern: always check before calling .group()
m = re.search(p, s)
if m:
    print(m.group())     # '42'

Rule of thumb: use re.fullmatch() for input validation, re.search() for extraction from arbitrary text, and re.match() only when you specifically want to assert the match begins at position 0 (rare in practice — re.fullmatch() or adding a ^ anchor is usually clearer).

findall() vs finditer()

re.findall() is convenient for simple patterns but loses information when you have multiple groups. re.finditer() returns full match objects so you always have access to individual groups, positions, and the full match.

import re

log = "2026-05-02 ERROR timeout | 2026-05-02 INFO connected"

# findall() with NO groups → list of full match strings
re.findall(r'\d{4}-\d{2}-\d{2}', log)
# ['2026-05-02', '2026-05-02']

# findall() with ONE group → list of the group's matches
re.findall(r'(\d{4}-\d{2}-\d{2})', log)
# ['2026-05-02', '2026-05-02']

# findall() with TWO groups → list of TUPLES (each group becomes an element)
re.findall(r'(\d{4}-\d{2}-\d{2}) (\w+)', log)
# [('2026-05-02', 'ERROR'), ('2026-05-02', 'INFO')]

# finditer() → iterator of match objects — full access to groups + position
for m in re.finditer(r'(\d{4}-\d{2}-\d{2}) (\w+)', log):
    print(m.group(1), m.group(2), "at pos", m.start())
# 2026-05-02 ERROR at pos 0
# 2026-05-02 INFO  at pos 25

Use findall() for quick one-liner extraction when you have zero or one group and don't need positions. Use finditer() whenever you have multiple groups, need match positions, or are processing large strings (it is memory-efficient — it yields lazily rather than building a full list).

re.compile() — When and Why

Every time you call re.search(pattern, ...), Python compiles the pattern string into an internal regex object. When the same pattern appears in a loop, compiling it once outside is the cleaner approach.

import re

# Without compile — pattern compiled on every iteration
for line in log_lines:
    m = re.search(r'ERROR|WARN', line)

# With compile — pattern compiled ONCE, reused on every iteration
SEVERITY = re.compile(r'ERROR|WARN')
for line in log_lines:
    m = SEVERITY.search(line)   # same methods, no pattern arg needed

# Compiled patterns have all the same methods:
SEVERITY.match(line)
SEVERITY.fullmatch(line)
SEVERITY.findall(text)
SEVERITY.finditer(text)
SEVERITY.sub('[REDACTED]', text)

# Compile with flags
WORD = re.compile(r'\bpython\b', re.IGNORECASE)

Named Groups in Python

Python uses the (?P<name>...) syntax for named capture groups. Access them via match.group('name') or match.groupdict() to get all named groups as a dictionary.

import re

# Parse an Apache-style access log line
LOG_PATTERN = re.compile(
    r'(?P<ip>\d{1,3}(?:\.\d{1,3}){3})'
    r' - - \[(?P<timestamp>[^\]]+)\]'
    r' "(?P<method>\w+) (?P<path>[^"]+) HTTP/[\d.]+"'
    r' (?P<status>\d{3})'
)

line = '192.168.1.1 - - [02/May/2026:10:00:00 +0000] "GET /api/users HTTP/1.1" 200'

m = LOG_PATTERN.search(line)
if m:
    print(m.group('ip'))        # '192.168.1.1'
    print(m.group('method'))    # 'GET'
    print(m.group('status'))    # '200'

    # groupdict() — all named groups as a dict
    data = m.groupdict()
    # {'ip': '192.168.1.1', 'timestamp': '02/May/2026:10:00:00 +0000',
    #  'method': 'GET', 'path': '/api/users', 'status': '200'}

re.sub() with a Callable

The replacement argument to re.sub() can be a function (or lambda) instead of a string. The function receives the match object and must return the replacement string — this unlocks transformations that are impossible with a plain replacement string.

import re

# Uppercase every match
text = "the quick brown fox"
re.sub(r'\b\w{5}\b', lambda m: m.group().upper(), text)
# 'the QUICK BROWN fox'

# Redact credit card numbers — keep last 4 digits
def redact_card(m):
    full = m.group().replace('-', '')
    return '****-****-****-' + full[-4:]

log = "Card 4111-1111-1111-1234 was charged"
re.sub(r'\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}', redact_card, log)
# 'Card ****-****-****-1234 was charged'

# re.sub() with back-references in a plain string replacement
# Reformat date YYYY-MM-DD → DD/MM/YYYY
re.sub(r'(\d{4})-(\d{2})-(\d{2})', r'\3/\2/\1', '2026-05-02')
# '02/05/2026'

Flags

Flags modify how a pattern is interpreted. Pass them as the third argument to module-level functions, or the second argument to re.compile(). Combine multiple flags with |.

Flag Short Effect
re.IGNORECASE re.I Case-insensitive matching. a–z matches A–Z.
re.MULTILINE re.M ^ and $ match the start/end of each line, not just the whole string.
re.DOTALL re.S . matches any character including newline (by default . skips \n).
re.VERBOSE re.X Allow whitespace and # comments inside the pattern for readability.
re.ASCII re.A \w, \d, \s etc. match only ASCII characters, not full Unicode.

re.VERBOSE is especially useful for complex patterns — it lets you add whitespace and comments to make the pattern self-documenting:

import re

# Without re.VERBOSE — hard to read
EMAIL = re.compile(r'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}')

# With re.VERBOSE — same pattern, self-documenting
EMAIL = re.compile(r"""
    [a-zA-Z0-9._%+\-]+   # local part (before the @)
    @                     # literal @
    [a-zA-Z0-9.\-]+       # domain name
    \.                    # literal dot
    [a-zA-Z]{2,}          # top-level domain
""", re.VERBOSE)

# Combine VERBOSE with IGNORECASE
PATTERN = re.compile(r"""
    \b
    python  # the word python
    \b
""", re.VERBOSE | re.IGNORECASE)

Raw Strings — Always Use r'...'

Raw strings are the single most important Python-specific regex habit. A regular string interprets backslash sequences (\n = newline, \t = tab, \d = invalid escape in older Python). A raw string passes the backslashes straight through to the regex engine.

# The double-backslash problem

# Without raw string — you must double every backslash
re.search('\\d+', text)      # works but ugly
re.search('\\bword\\b', text)  # cluttered

# With raw string — write exactly what the regex engine sees
re.search(r'\d+', text)       # clean
re.search(r'\bword\b', text)   # clean

# What goes wrong without raw strings
print('\n')    # newline character (1 char)
print(r'\n')   # backslash + n (2 chars) — what regex wants

# In Python 3.12+ unrecognised escape sequences raise DeprecationWarning
# e.g. '\d' will eventually become a SyntaxWarning then SyntaxError
# Use raw strings now and never worry about it

Common Mistakes

Using match() when you want search()
re.match() is anchored to position 0. If your string has any leading text before the pattern, match() returns None. Unless you specifically need to assert the match starts at position 0, use re.search() or add a ^ anchor explicitly.
findall() with multiple groups returns tuples, not strings
When your pattern has two or more groups, re.findall() returns a list of tuples — one tuple per match. This surprises many people who expect a flat list of strings. If you need string results and have groups, either drop the parentheses (if you don't need extraction) or use re.finditer() and access .group(1), .group(2) explicitly.
Forgetting raw strings — r'\n' vs '\n'
A plain string '\n' is a newline character. A raw string r'\n' is two characters — backslash and n — which is what the regex engine needs to match the metacharacter. Always prefix your pattern strings with r. It is a zero-cost habit that prevents a class of subtle bugs.
Using re.match() for validation without anchoring the end
re.match(r'\d+', '42abc') succeeds — it matches '42' at position 0 and ignores the rest. For input validation you want the whole string to match. Use re.fullmatch(r'\d+', value) instead, or add both ^ and $ anchors: re.match(r'^\d+$', value).

Test Your Python Patterns

The regex tester lets you iterate on a pattern in real time — paste any example from this page and see matches, groups, and spans highlighted instantly.

Open Regex Tester →

Related