How to Parse Logs with Regex
Apache, Nginx, JSON & Custom Formats
Copy-paste patterns for the formats you actually encounter — with named groups so the output is immediately usable.
How to parse logs with regex: the key is using named capture groups so each extracted field has a meaningful name rather than a positional index. This guide covers production-ready patterns for Apache Combined Log Format, Nginx, JSON structured logs, Python tracebacks and custom application log formats — with JavaScript and Python implementations for each.
Before sharing parsed logs: strip emails, IPs and credentials with the Log Sanitizer — especially if you're pasting into ChatGPT or a ticket system.
Apache Combined Log Format
The most common web server log format. Each line looks like:
192.168.1.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326 "http://ref.example.com/" "Mozilla/5.0"
# Python
APACHE_CLF = re.compile(r'''
^(?P<ip>\S+)\s+ # client IP
\S+\s+ # ident (usually -)
(?P<user>\S+)\s+ # auth user
\[(?P<time>[^\]]+)\]\s+ # timestamp
"(?P<method>\S+)\s+ # HTTP method
(?P<path>\S+)\s+\S+"\s+ # request path
(?P<status>\d{3})\s+ # status code
(?P<size>\S+) # response size
''', re.VERBOSE)
def parse_apache(line):
m = APACHE_CLF.match(line)
return m.groupdict() if m else None
// JavaScript
const APACHE_CLF = /^(?<ip>\S+)\s+\S+\s+(?<user>\S+)\s+\[(?<time>[^\]]+)\]\s+"(?<method>\S+)\s+(?<path>\S+)\s+\S+"\s+(?<status>\d{3})\s+(?<size>\S+)/;
const parseApache = line => line.match(APACHE_CLF)?.groups ?? null;
Nginx Access Log (Default Format)
Nginx's default format is similar to Apache CLF but with subtle differences:
# Nginx default: $remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"
NGINX_DEFAULT = re.compile(r'''
^(?P<ip>\S+)\s+-\s+
(?P<user>\S+)\s+
\[(?P<time>[^\]]+)\]\s+
"(?P<request>[^"]+)"\s+
(?P<status>\d{3})\s+
(?P<bytes>\d+)\s+
"(?P<referer>[^"]*)"\s+
"(?P<ua>[^"]*)"
''', re.VERBOSE)
# Split request into method + path + protocol separately
def parse_nginx(line):
m = NGINX_DEFAULT.match(line)
if not m: return None
data = m.groupdict()
parts = data['request'].split(' ', 2)
data['method'], data['path'] = parts[0], parts[1] if len(parts) > 1 else ('', '')
return data
Timestamp Extraction (Multiple Formats)
# ISO 8601 — 2026-05-01T14:32:10.123Z or 2026-05-01T14:32:10+00:00
ISO8601 = r'(?P<ts>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})?)'
# Apache/Nginx bracket format — [10/Oct/2000:13:55:36 -0700]
BRACKET_TS = r'\[(?P<ts>\d{2}/\w{3}/\d{4}:\d{2}:\d{2}:\d{2}\s[+-]\d{4})\]'
# Syslog — May 1 14:32:10
SYSLOG_TS = r'(?P<ts>\w{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})'
# Unix epoch (milliseconds) — 1746105130000
EPOCH_MS = r'(?P<ts>\d{13})'
Structured / JSON Logs
When each log line is a JSON object, use your language's JSON parser — not regex — for the full line. Regex is still useful for extracting a single field without a full parse:
# Extract 'level' from JSON log line — fast, no full parse
LEVEL = re.compile(r'"level"\s*:\s*"(?P<level>[^"]+)"')
MSG = re.compile(r'"(?:msg|message)"\s*:\s*"(?P<msg>(?:[^"\\]|\\.)*)"')
# Full parse — correct approach for structured logs
import json
for line in log_file:
entry = json.loads(line.strip())
print(entry['level'], entry['message'])
Custom Application Log Formats
Most application logs follow a pattern like [TIMESTAMP] [LEVEL] [SERVICE] message. Build the pattern incrementally using named groups:
# Log line: 2026-05-01 14:32:10 ERROR auth Failed login for user@example.com from 192.168.1.1
APP_LOG = re.compile(r'''
^(?P<date>\d{4}-\d{2}-\d{2})\s+ # date
(?P<time>\d{2}:\d{2}:\d{2})\s+ # time
(?P<level>DEBUG|INFO|WARN|ERROR|FATAL)\s+ # level
(?P<service>\w+)\s+ # service name
(?P<msg>.+)$ # rest is message
''', re.VERBOSE)
# Output: {'date': '2026-05-01', 'time': '14:32:10',
# 'level': 'ERROR', 'service': 'auth',
# 'msg': 'Failed login for user@example.com from 192.168.1.1'}
Python Traceback — Extract Exception and Location
# Match traceback frame lines
FRAME = re.compile(r'File "(?P<file>[^"]+)", line (?P<line>\d+), in (?P<func>\S+)')
# Match the exception line
EXCEPTION = re.compile(r'^(?P<exc>[\w.]+(?:Error|Exception|Warning)): (?P<msg>.+)$', re.MULTILINE)
# Strip PII before using — see /how-to-remove-secrets-from-python-tracebacks
Performance Tips for Large Log Files
- Compile patterns once —
re.compile()in Python, store the regex object outside the loop. Recompiling on every line is 10–50× slower. - Anchor patterns — start with
^so the engine doesn't scan the whole line for a possible match position. - Use non-capturing groups for parts you don't need —
(?:)avoids the overhead of storing the match. - Process line by line — don't read the whole file into memory. Iterate line by line with a generator for multi-GB logs.
- Consider
re.VERBOSE— the performance cost is negligible and the readability gain for complex patterns is large.
Test Your Log Pattern
Paste any pattern from this guide and a sample log line to verify your capture groups before putting them in production.
Open Regex Tester →