What is the regex for parsing Apache log files?

The Apache Combined Log Format pattern is: /^(?P \S+)\s+\S+\s+(?P \S+)\s+\[(?P [^\]]+)\]\s+"(?P \S+)\s+(?P \S+)\s+\S+"\s+(?P \d{3})\s+(?P \S+)/. Named groups extract IP, user, timestamp, HTTP method, path, status code and response size.

How do I extract timestamps from log files with regex?

For ISO 8601 timestamps: /(?P \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})?)/ — this matches 2026-05-01T14:32:10.123Z and similar formats. For Apache/Nginx bracket timestamps: /\[(?P [\d\/\w:+ ]+)\]/

Should I use regex or a library to parse logs?

For standard formats like Apache CLF, a purpose-built library (apache-log-parser in Python, morgan in Node) is more robust. For custom or mixed formats, regex with named capture groups is often the right choice — it's flexible, dependency-free, and easy to adjust when the format changes.

How do I parse JSON logs with regex?

For structured JSON logs, use your language's JSON parser rather than regex on the whole object. Regex is useful for extracting a specific JSON field: /"level":\s*"(?P [^"]+)"/ pulls the level field without parsing the whole line. For log files where each line is a JSON object, parse line by line with JSON.parse().

Regex Guide

How to Parse Logs with Regex
Apache, Nginx, JSON & Custom Formats

Copy-paste patterns for the formats you actually encounter — with named groups so the output is immediately usable.

10 min read·Updated May 2026

How to parse logs with regex: the key is using named capture groups so each extracted field has a meaningful name rather than a positional index. This guide covers production-ready patterns for Apache Combined Log Format, Nginx, JSON structured logs, Python tracebacks and custom application log formats — with JavaScript and Python implementations for each.

Before sharing parsed logs: strip emails, IPs and credentials with the Log Sanitizer — especially if you're pasting into ChatGPT or a ticket system.

Apache Combined Log Format

The most common web server log format. Each line looks like:

192.168.1.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326 "http://ref.example.com/" "Mozilla/5.0"

# Python
APACHE_CLF = re.compile(r'''
  ^(?P<ip>\S+)\s+           # client IP
  \S+\s+                     # ident (usually -)
  (?P<user>\S+)\s+           # auth user
  \[(?P<time>[^\]]+)\]\s+   # timestamp
  "(?P<method>\S+)\s+        # HTTP method
  (?P<path>\S+)\s+\S+"\s+  # request path
  (?P<status>\d{3})\s+      # status code
  (?P<size>\S+)              # response size
''', re.VERBOSE)

def parse_apache(line):
    m = APACHE_CLF.match(line)
    return m.groupdict() if m else None

// JavaScript
const APACHE_CLF = /^(?<ip>\S+)\s+\S+\s+(?<user>\S+)\s+\[(?<time>[^\]]+)\]\s+"(?<method>\S+)\s+(?<path>\S+)\s+\S+"\s+(?<status>\d{3})\s+(?<size>\S+)/;

const parseApache = line => line.match(APACHE_CLF)?.groups ?? null;

Nginx Access Log (Default Format)

Nginx's default format is similar to Apache CLF but with subtle differences:

# Nginx default: $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"

NGINX_DEFAULT = re.compile(r'''
  ^(?P<ip>\S+)\s+-\s+
  (?P<user>\S+)\s+
  \[(?P<time>[^\]]+)\]\s+
  "(?P<request>[^"]+)"\s+
  (?P<status>\d{3})\s+
  (?P<bytes>\d+)\s+
  "(?P<referer>[^"]*)"\s+
  "(?P<ua>[^"]*)"
''', re.VERBOSE)

# Split request into method + path + protocol separately
def parse_nginx(line):
    m = NGINX_DEFAULT.match(line)
    if not m: return None
    data = m.groupdict()
    parts = data['request'].split(' ', 2)
    data['method'], data['path'] = parts[0], parts[1] if len(parts) > 1 else ('', '')
    return data

Timestamp Extraction (Multiple Formats)

# ISO 8601 — 2026-05-01T14:32:10.123Z or 2026-05-01T14:32:10+00:00
ISO8601 = r'(?P<ts>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})?)'

# Apache/Nginx bracket format — [10/Oct/2000:13:55:36 -0700]
BRACKET_TS = r'\[(?P<ts>\d{2}/\w{3}/\d{4}:\d{2}:\d{2}:\d{2}\s[+-]\d{4})\]'

# Syslog — May  1 14:32:10
SYSLOG_TS = r'(?P<ts>\w{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})'

# Unix epoch (milliseconds) — 1746105130000
EPOCH_MS = r'(?P<ts>\d{13})'

Structured / JSON Logs

When each log line is a JSON object, use your language's JSON parser — not regex — for the full line. Regex is still useful for extracting a single field without a full parse:

# Extract 'level' from JSON log line — fast, no full parse
LEVEL = re.compile(r'"level"\s*:\s*"(?P<level>[^"]+)"')
MSG   = re.compile(r'"(?:msg|message)"\s*:\s*"(?P<msg>(?:[^"\\]|\\.)*)"')

# Full parse — correct approach for structured logs
import json
for line in log_file:
    entry = json.loads(line.strip())
    print(entry['level'], entry['message'])

Custom Application Log Formats

Most application logs follow a pattern like [TIMESTAMP] [LEVEL] [SERVICE] message. Build the pattern incrementally using named groups:

# Log line: 2026-05-01 14:32:10 ERROR auth Failed login for user@example.com from 192.168.1.1

APP_LOG = re.compile(r'''
  ^(?P<date>\d{4}-\d{2}-\d{2})\s+   # date
  (?P<time>\d{2}:\d{2}:\d{2})\s+    # time
  (?P<level>DEBUG|INFO|WARN|ERROR|FATAL)\s+  # level
  (?P<service>\w+)\s+                # service name
  (?P<msg>.+)$                       # rest is message
''', re.VERBOSE)

# Output: {'date': '2026-05-01', 'time': '14:32:10',
#          'level': 'ERROR', 'service': 'auth',
#          'msg': 'Failed login for user@example.com from 192.168.1.1'}

Python Traceback — Extract Exception and Location

# Match traceback frame lines
FRAME = re.compile(r'File "(?P<file>[^"]+)", line (?P<line>\d+), in (?P<func>\S+)')

# Match the exception line
EXCEPTION = re.compile(r'^(?P<exc>[\w.]+(?:Error|Exception|Warning)): (?P<msg>.+)$', re.MULTILINE)

# Strip PII before using — see /how-to-remove-secrets-from-python-tracebacks

Performance Tips for Large Log Files

Compile patterns once — re.compile() in Python, store the regex object outside the loop. Recompiling on every line is 10–50× slower.
Anchor patterns — start with ^ so the engine doesn't scan the whole line for a possible match position.
Use non-capturing groups for parts you don't need — (?:) avoids the overhead of storing the match.
Process line by line — don't read the whole file into memory. Iterate line by line with a generator for multi-GB logs.
Consider re.VERBOSE — the performance cost is negligible and the readability gain for complex patterns is large.

Test Your Log Pattern

Paste any pattern from this guide and a sample log line to verify your capture groups before putting them in production.

Open Regex Tester →