How do I find PII in log files?

Start with grep patterns for the most common PII types: email addresses (grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'), credit card patterns (grep -E '\b4[0-9]{12}(?:[0-9]{3})?\b'), and auth tokens (grep -E '(Bearer|token)[=:\s][A-Za-z0-9._-]{20,}'). Run these against rotated log archives, not just live logs.

How do I search Elasticsearch logs for PII?

Use Kibana's Discover with a KQL query like: message: *@*.* to find email-shaped strings, or use the Elasticsearch query_string query with wildcard patterns. For structured logs, use an exists query on fields that should never contain PII — like error.message or request.body — and sample the results.

What PII most commonly appears in logs by accident?

The most common sources are: authentication errors that log the submitted username/password, SQL error messages that echo query parameters, request body logging enabled at DEBUG level (capturing form fields), stack traces that include user input in the message, and URL parameters containing tokens or email addresses (e.g. /verify?email=user@example.com).

Log Security Guide

How to Audit Logs for PII
Grep, Elasticsearch, Automated Scanning

Find what's already in your logs before a regulator does — practical grep one-liners, Kibana queries, and a Python scanner you can run in CI.

9 min read·Updated May 2026

Auditing logs for PII means actively searching existing log archives for personal data that shouldn't be there — before a breach, a GDPR audit, or a compliance review forces the issue. Most teams only think about this after an incident. This guide gives you the tools to do it proactively.

Where PII Enters Logs

Auth error logging

Failed login attempts logged as: "Invalid credentials for user@example.com" — the email goes straight into the log.

SQL error messages

ORMs and query builders often echo the full query in error messages, including WHERE email = 'user@example.com' or bound parameters.

Request body logging

DEBUG-level middleware that logs the full request body captures registration forms, profile updates, and payment fields.

URL parameters

Password reset links (/reset?token=abc&email=user@example.com), email verification, and OAuth redirects log PII in the URL.

Stack traces

Exception messages that include user input — "Invalid date: 1980-13-45 for user John Smith" — propagate PII up the stack.

Third-party SDK logs

Stripe, Twilio, and analytics SDKs often emit verbose logs at DEBUG level that include phone numbers, emails, and addresses.

GraphQL query logging

Logging full GraphQL queries exposes field arguments: query { user(email: "x@x.com") { name } }.

Webhook payloads

Stripe, GitHub, and payment processors send full payloads that may include card details, billing addresses, or PII in event data.

Grep One-Liners

Start here — run against rotated log archives, not just today's live log. Use -r to scan a directory recursively and -l to list files rather than print every match.

# Email addresses
grep -rE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' /var/log/app/ \
  --include='*.log' -l

# Auth tokens in headers or query strings
grep -rE '(Bearer|token|api[_-]?key)[=:\s][A-Za-z0-9._\-]{20,}' /var/log/app/ -l

# Visa/Mastercard credit card numbers (basic pattern)
grep -rE '\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14})\b' /var/log/app/ -l

# UK National Insurance numbers
grep -rE '\b[A-Z]{2}[0-9]{6}[A-D]\b' /var/log/app/ -l

# US Social Security Numbers
grep -rE '\b[0-9]{3}-[0-9]{2}-[0-9]{4}\b' /var/log/app/ -l

# Phone numbers (loose international pattern)
grep -rE '\+?[\d\s\-().]{10,15}(?=\s|$|")' /var/log/app/ -l

# Passwords in auth error messages (adjust for your log format)
grep -rE '(password|passwd|pwd)[=:\s]["'"'"']?[^\s"'"'"']{6,}' /var/log/app/ -l

# Count matches by file
grep -rcE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' /var/log/app/ \
  | grep -v ':0$' | sort -t: -k2 -rn | head -20

Elasticsearch / Kibana Queries

If your logs are in an ELK stack, use KQL in Kibana Discover or the Elasticsearch query API:

## KQL in Kibana Discover

# Email-shaped strings anywhere in the message field
message: *@*.*

# Specific field that should never contain email
error.message: *@*.*

# SQL errors (often contain query parameters)
message: "ERROR" AND message: "syntax" AND message: *@*.*

## Elasticsearch Query DSL (via API)
POST /logs-*/_search
{
  "query": {
    "query_string": {
      "query": "*.@*.*",
      "default_field": "message"
    }
  },
  "size": 100,
  "_source": ["@timestamp", "message", "service"]
}

# Find indices with high PII density — sample 1000 docs from each index
GET /logs-*/_count?q=message:*%40*.*

Python Automated Scanner

Run this in CI against a sample of production logs — flag any matches and fail the build or send an alert:

import re, sys, json
from pathlib import Path

PATTERNS = {
    "email":        re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'),
    "bearer_token": re.compile(r'Bearer\s+[A-Za-z0-9._\-]{20,}'),
    "visa_card":    re.compile(r'\b4[0-9]{12}(?:[0-9]{3})?\b'),
    "ssn":          re.compile(r'\b\d{3}-\d{2}-\d{4}\b'),
    "api_key":      re.compile(r'(api[_-]?key|secret)[=:\s][A-Za-z0-9]{16,}', re.I),
}

def scan_file(path: Path) -> list[dict]:
    findings = []
    with path.open("r", errors="replace") as f:
        for lineno, line in enumerate(f, 1):
            for pii_type, pattern in PATTERNS.items():
                if pattern.search(line):
                    findings.append({
                        "file":    str(path),
                        "line":    lineno,
                        "type":    pii_type,
                        "preview": line[:120].strip(),
                    })
    return findings

if __name__ == "__main__":
    log_dir  = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("./logs")
    all_hits = []
    for log_file in log_dir.rglob("*.log"):
        all_hits.extend(scan_file(log_file))

    if all_hits:
        print(json.dumps(all_hits, indent=2))
        print(f"\n{len(all_hits)} potential PII matches found.", file=sys.stderr)
        sys.exit(1)  # non-zero exit fails CI
    else:
        print("No PII patterns detected.")

Audit Checklist

☐ Does your auth service log the submitted username on failure?

Check login error handlers — they often log "Invalid credentials for {email}" before returning 401.

☐ Does your ORM log full queries in dev mode?

Django DEBUG=True, SQLAlchemy echo=True, ActiveRecord logger — these log every query with bound parameters. Ensure dev config doesn't ship to staging/prod.

☐ Is request body logging enabled at any log level?

Search for middleware names: morgan (Node), django-request-logging (Python), Spring's CommonsRequestLoggingFilter. If present, verify it strips sensitive fields.

☐ Are URL parameters logged in access logs?

Nginx and Apache log full URLs including query strings. A /verify?email=x or /reset?token=y endpoint leaks PII in the access log.

☐ Do your Elasticsearch indices have a field-level data retention policy?

An index lifecycle policy deletes old indices, but if PII is in a high-cardinality field it gets replicated across shards. Check replica count and snapshot policies too.

☐ Have you run the grep scan against log archives, not just live logs?

A bug fixed 6 months ago may have logged PII for months before the fix. Archive logs are often the largest exposure.

Redact PII Before Sharing Logs

Found PII in a log file you need to share? The Log Sanitizer strips emails, tokens, IPs, and card numbers entirely in your browser — nothing is uploaded.

Open Log Sanitizer →