What Is Structured Logging
JSON Logs, Better Search, Safer PII Handling
Structured logging means treating every log event as data — not text. Here is how it works, why unstructured logs break at scale, and how structlog, pino, and zap implement it in Python, Node.js, and Go.
What is structured logging: instead of writing a formatted string like INFO 2026-05-02 User alice@example.com logged in from 203.0.113.5, you emit a JSON object where every piece of information is a named field — {"level":"info","event":"user.login","user_id":42,"ip":"203.0.113.5"}. Log aggregation platforms can index those fields directly, dashboards can filter on them precisely, and PII processors can drop a field like user.email from every event without touching a single log statement.
Unstructured vs Structured: The Same Event
Consider a login event. Here is what it looks like written in the two styles side by side:
INFO 2026-05-02T14:22:31Z User alice@example.com
logged in successfully from
203.0.113.42 using Chrome/124
on Windows (attempt 1 of 3)
{
"level": "info",
"ts": "2026-05-02T14:22:31Z",
"event": "user.login",
"user_id": 9182,
"ip": "203.0.113.42",
"browser": "Chrome/124",
"os": "Windows",
"attempt": 1
}
The structured version contains exactly the same information. But now every field has a name, a type, and a predictable location. Elasticsearch can index user_id as a keyword field. A Loki LogQL query can filter on {event="user.login"} | ip != "10.0.0.0/8". A pipeline processor can drop the ip field entirely before the event reaches long-term storage.
Notice that the structured version uses user_id: 9182 instead of logging the email address at all — that is intentional and covered in the PII section below.
Why Unstructured Logs Break at Scale
Unstructured logging works fine when one engineer tails a single log file to debug one service. It breaks in three specific ways when you try to operate it at scale.
Regex parsing is fragile. Log aggregation pipelines like Logstash use grok patterns to extract fields from free text. A grok pattern for the unstructured line above might look like this:
# Matches: "User alice@example.com logged in successfully from 203.0.113.42"
grok {
match => { "message" => "User %{EMAILADDRESS:email} logged in %{WORD} from %{IP:client_ip}" }
}
# The pattern above breaks silently the moment the message changes to:
# "User alice@example.com signed in from 203.0.113.42" — "signed" ≠ "logged"
# "User alice@example.com login attempt from 203.0.113.42" — "attempt" != match
# "alice@example.com logged in from 203.0.113.42" — "User " prefix gone
# Each variation silently drops the event into _grokparsefailure.
# Your dashboard shows zero logins. Nobody notices for three days.
Log format changes silently break dashboards. When a developer changes a log message from "logged in" to "authenticated", every grok pattern, every Splunk rex command, and every Loki line filter that matched the old string stops working — with no error, no alert, and no obvious cause. Dashboards go to zero. Alerts stop firing. The failure is invisible until someone manually investigates.
You cannot reliably extract a field like user_id from free text. If a user ID appears in different positions depending on the code path that generated the message, no single regex extracts it reliably. You get partial extraction, silent failures on unexpected formats, and — worst of all — the occasional false positive where a number that looks like a user ID is actually a duration or an HTTP status code.
Python: structlog
structlog is the standard Python structured logging library. Its core idea is a processor chain: each log call passes through a list of callables that transform the event dictionary before it is rendered and written. This makes it trivial to add, remove, or transform any field.
import structlog
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars, # merge request-scoped context
structlog.stdlib.add_log_level, # adds "level" field
structlog.stdlib.add_logger_name, # adds "logger" field
structlog.processors.TimeStamper(fmt="iso"),# adds "timestamp" field
structlog.processors.JSONRenderer(), # renders to JSON string
],
wrapper_class=structlog.make_filtering_bound_logger(20), # INFO level
context_class=dict,
logger_factory=structlog.PrintLoggerFactory(),
)
log = structlog.get_logger()
# Basic event with key=value context fields
log.info("user.login", user_id=9182, action="login", success=True)
# Output:
{
"level": "info",
"timestamp": "2026-05-02T14:22:31.004Z",
"logger": "structlog_setup",
"event": "user.login",
"user_id": 9182,
"action": "login",
"success": true
}
For request-scoped context — data you want attached to every log call within a single HTTP request without passing it explicitly — use structlog.contextvars:
import uuid
import structlog
from structlog.contextvars import bind_contextvars, clear_contextvars
log = structlog.get_logger()
def handle_request(request):
# Bind fields that should appear on EVERY log call in this request
clear_contextvars()
bind_contextvars(
request_id=str(uuid.uuid4()),
method=request.method,
path=request.path,
)
# Every log call in this request automatically includes request_id etc.
log.info("request.start")
process(request)
log.info("request.end", status=200, duration_ms=42)
# Output for request.end:
# {"event":"request.end","request_id":"a3f9...","method":"GET",
# "path":"/api/users","status":200,"duration_ms":42,"level":"info"}
To drop a field from every event — for example, removing ip before events reach long-term storage — add a processor to the chain:
# Fields to strip before any event is written to the output sink
SENSITIVE_FIELDS = {"ip", "email", "user_email", "phone"}
def drop_sensitive(logger, method, event_dict):
"""structlog processor: remove sensitive keys from every event."""
for field in SENSITIVE_FIELDS:
event_dict.pop(field, None)
return event_dict
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
drop_sensitive, # <-- insert anywhere before the renderer
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer(),
],
...
)
# Now even if code calls:
# log.info("user.login", user_id=9182, ip="203.0.113.42")
# the output will be:
# {"event":"user.login","user_id":9182,"level":"info","timestamp":"..."}
# The "ip" field is gone — without editing the log call site.
Node.js: pino
pino is the fastest JSON logger for Node.js. It writes newline-delimited JSON by default, with no configuration needed. The object you pass as the first argument becomes the structured fields; the string becomes the msg field.
import pino from 'pino';
const logger = pino({
level: 'info',
// Redact sensitive fields before serialisation
redact: {
paths: ['user.email', 'req.headers.authorization'],
censor: '[REDACTED]',
},
});
// Object first = structured fields; string second = msg
logger.info({ userId: 9182, action: 'login', success: true }, 'User logged in');
// Output:
{
"level": 30,
"time": 1746187351004,
"pid": 12345,
"hostname": "app-01",
"userId": 9182,
"action": "login",
"success": true,
"msg": "User logged in"
}
Child loggers inherit context and are the idiomatic way to attach a requestId to every log call within a single request without explicitly passing it through the call stack:
import { randomUUID } from 'crypto';
function handleRequest(req, res) {
// Child logger: every call on reqLog includes requestId automatically
const reqLog = logger.child({ requestId: randomUUID(), path: req.url });
reqLog.info('request.start');
try {
const result = processRequest(req);
reqLog.info({ statusCode: 200, durationMs: 38 }, 'request.end');
res.send(result);
} catch (err) {
reqLog.error({ err, statusCode: 500 }, 'request.error');
res.status(500).send('Internal Server Error');
}
}
// All three calls above produce JSON lines containing:
// "requestId": "a3f9c2d1-...", "path": "/api/users"
// — without you passing those values to each log call explicitly.
If your codebase already uses winston, you can add structured output by setting format: winston.format.json() and using the metadata object pattern: logger.info('user.login', { userId: 9182 }). winston's splat format combined with json() merges the metadata object into the JSON output, giving you the same field-per-key structure as pino with minimal changes to existing call sites.
Go: zap
zap is Uber's production Go logger. zap.NewProduction() gives you a zero-allocation JSON logger out of the box. Fields are declared with typed constructors (zap.String, zap.Int, zap.Bool), which allows the compiler to verify types and the runtime to avoid interface allocation.
package main
import "go.uber.org/zap"
func main() {
logger, _ := zap.NewProduction()
defer logger.Sync()
// Typed field constructors — no fmt.Sprintf, no interface{}
logger.Info("user.login",
zap.String("user_id", "9182"),
zap.String("action", "login"),
zap.Bool("success", true),
zap.Int("attempt", 1),
)
}
// Output:
{
"level": "info",
"ts": 1746187351.004,
"caller": "main/main.go:12",
"msg": "user.login",
"user_id": "9182",
"action": "login",
"success": true,
"attempt": 1
}
For teams that find the typed-field API verbose, zap provides a sugared logger that accepts key, value pairs with runtime type inference — slower than the core logger but still structured:
sugar := logger.Sugar()
// key-value pairs, alternating — still produces JSON
sugar.Infow("user.login",
"user_id", "9182",
"action", "login",
"success", true,
)
// Named fields attached to every call from this logger instance
requestLogger := logger.With(
zap.String("request_id", "a3f9c2d1"),
zap.String("path", "/api/login"),
)
requestLogger.Info("request.start")
requestLogger.Info("request.end", zap.Int("status", 200), zap.Int("ms", 12))
The PII Advantage — Why Structured Logs Make Redaction Reliable
This is the most operationally important reason to switch to structured logging, and it is under-appreciated. Consider two scenarios where you need to remove a user's email address from your logs — perhaps in response to a GDPR deletion request, or because you discovered it should never have been logged.
# These are all the same email in your logs:
INFO User alice@corp.com logged in
INFO Login failed for alice@corp.com
INFO Sent password reset to alice@corp.com
INFO alice@corp.com: session expired after 30m
INFO Processing request from user alice@corp.com
INFO user=alice@corp.com action=profile_update
# A regex like:
# \b[a-z.]+@corp\.com\b
# catches most of them but misses:
# - URL-encoded: alice%40corp.com
# - Quoted: "alice@corp.com"
# - Split across a line wrap
# - Anywhere in a JSON string embedded in the message
# Every event uses user_id — email was never logged
{"event":"user.login","user_id":9182}
{"event":"login.failed","user_id":9182}
{"event":"password.reset_sent","user_id":9182}
{"event":"session.expired","user_id":9182,"dur_s":1800}
{"event":"profile.update","user_id":9182}
# To drop ip from all future events, one processor change:
# SENSITIVE_FIELDS.add("ip")
# Done. No grep. No regex. No log format to parse.
# To redact historical logs: jq 'del(.ip)' app.log
# This works 100% reliably because ip is always a key.
The structured approach has two distinct advantages. First, you never logged the email in the first place — you logged a user_id that can be linked to the email only through your database, which means the logs themselves contain no directly identifiable data. Second, if you do need to drop a field (like an IP address after your retention period), it is always at a predictable key — a single pipeline change or jq del(.ip) command handles every historical event uniformly.
With unstructured logs, a regex to remove an email will miss variants, and a format change six months ago means some events put the email in a different position. Reliable redaction becomes genuinely difficult.
The structlog processor approach in a sentence
Add SENSITIVE_FIELDS = {"ip", "email"} and a single processor that calls event_dict.pop(field, None) for each. Every log call in every file is now redacted — without touching a single call site. That is the architectural value of treating logs as data.
Structured Logging Best Practices
Switching to structured logging is a format change. The practices below determine whether you get the full benefit or just structured noise.
"user.login" and "payment.declined" describe things that happened. "User is logged in" and "payment status is declined" describe state that should live in your database, not your logs. Events are easier to query, count, and alert on.
Bind a random UUID as request_id at the start of every request. Use it to correlate all log lines for that request. This lets you trace an entire request without any personally identifiable data in your logs.
Even with structured logging, bodies are dangerous: they may contain PII, secrets, or large payloads. Log the content type, the byte size, and the status code — not the body itself.
A duration should be a number (duration_ms: 42), not a string ("42ms"). A boolean should be true/false, not "yes"/"no". Consistent types mean you can aggregate and alert on fields without parsing.
"user.login", "user.logout", "payment.created", "payment.failed" — namespace your event names so you can filter by namespace (all user events, all payment events) without string matching on partial names.
INFO for normal events that operators might care about. WARN for abnormal-but-handled situations (rate limit hit, retry succeeded). ERROR for events that require human attention. DEBUG for detail that is only useful when actively debugging — and should be disabled in production.
Migrating an Existing Application Without Rewriting Everything
The most common objection to structured logging is the cost of migration. If your application has thousands of console.log() or logging.info() calls using string formatting, rewriting each one is impractical. The good news is you do not have to.
Step 1: Replace the logger, keep the call sites. In Python, configure structlog to wrap the standard library logger — all existing logging.info("message") calls are intercepted and emitted as JSON with the message in the event field. The output is now structured even though the call sites are unchanged.
import logging
import structlog
# Configure structlog to intercept stdlib logging calls
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.stdlib.filter_by_level,
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
structlog.stdlib.PositionalArgumentsFormatter(),
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.processors.UnicodeDecoder(),
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.stdlib.BoundLogger,
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
)
# Route all stdlib logging through structlog
logging.basicConfig(
format="%(message)s",
handlers=[logging.StreamHandler()],
level=logging.INFO,
)
# Now existing code that calls:
# import logging
# logger = logging.getLogger(__name__)
# logger.info("User %s logged in from %s", user_id, ip)
#
# automatically produces JSON output:
# {"event":"User 9182 logged in from 203.0.113.42","logger":"auth","level":"info","ts":"..."}
Step 2: Add context incrementally. Once the stdlib bridge is in place, migrate high-value call sites one at a time. Change logger.info("User %s logged in", email) to log.info("user.login", user_id=user_id) as you touch each file. Both styles work simultaneously — you are not blocked until the migration is complete.
Step 3: Configure the renderer last. In structlog, the renderer (JSON vs pretty-print) is always the last processor. During local development you can swap JSONRenderer for ConsoleRenderer to get human-readable coloured output, without changing anything else. Switch back to JSON for staging and production. This separation means developer experience never compromises production log format.
In Node.js, the migration path is similar: install pino and create a logger.js that wraps it. Wherever code previously called console.log(), route it through the new logger. For express applications, pino-http adds automatic request/response logging with zero changes to route handlers.
import express from 'express';
import pino from 'pino';
import pinoHttp from 'pino-http';
const logger = pino({ level: 'info' });
const app = express();
// One line adds structured request/response logging to every route
app.use(pinoHttp({ logger }));
app.get('/api/users/:id', (req, res) => {
// req.log is a child logger pre-bound with request context
req.log.info({ userId: req.params.id }, 'user.fetch');
res.json({ id: req.params.id });
});
// pino-http automatically logs:
// {"level":30,"time":...,"req":{"method":"GET","url":"/api/users/9182","id":1},
// "res":{"statusCode":200},"responseTime":4,"msg":"request completed"}
Sanitise Your Logs Before Sharing
Even with structured logging, you may need to share a log snippet externally. The Log Sanitizer strips emails, IPs, tokens, and API keys instantly — client-side, nothing uploaded.
Open Log Sanitizer — Free →FAQ
What is structured logging? +
Structured logging means emitting each log event as a machine-readable data object — usually JSON or key=value pairs — rather than a formatted human-readable string. Each piece of information (event name, user ID, duration, status code) is a separate field with a consistent name and type, making it directly queryable in log aggregation platforms without any regex parsing.
Why is structured logging better for PII redaction? +
With structured logs, sensitive data like an email address or user ID lives in a named field — for example user.email. A log processor or pipeline can drop or redact that specific field reliably across every event. With unstructured logs, the same email address might appear anywhere inside a free-text message string, at different positions depending on the code path, and reliably removing it requires regex patterns that break whenever the message wording changes.
What is the difference between structlog, pino, and zap? +
All three are structured logging libraries for different languages: structlog is for Python and uses a configurable processor pipeline; pino is a high-performance Node.js logger that emits JSON by default; zap is Go's production logger from Uber, designed for zero-allocation performance with typed field constructors. Each produces JSON output but uses language-idiomatic APIs appropriate to their runtime.
Can I add structured logging to an existing application without rewriting it? +
Yes. In Python, configuring structlog with a stdlib bridge intercepts all existing logging.getLogger() calls and emits them as JSON without touching call sites. In Node.js, pino-http adds structured request logging to Express with one middleware line. You can then migrate individual call sites incrementally while both formats work simultaneously.