Secret logged at scale

One careless log line carries a live secret — and your logging pipeline faithfully copies it everywhere.

The idea

A secret is anything that grants access: an API key, a database password, a session token. Logs are meant for plain, replayable facts — not for credentials. The trouble is that a log line is not a single object you can quietly delete. It is a fact that fans out: to the app log on disk, to a central index you can search, to a security pipeline (SIEM), and to cold storage kept for months.

So a single line like log.info("request", req) — where req happens to carry an Authorization header — becomes a credential sitting in four places, readable by everyone with log access. Incident response is then a sequence: detect it, contain it (rotate the secret, scrub the lines, stop the bleed), and find the root cause (the statement that dumped a whole object).

See it work

checkout-svc emits log line tok_9f3a…b2
A log line leaves the service carrying a live token. Press play to watch it fan out.

How it works

The fix is structural, not a one-time scrub. You make secrets impossible to log by default, and you only ever log fields from a known allow-list. Never hand a whole request, header set, or arbitrary object to the logger — that is how secrets sneak in. Mark sensitive fields so any formatter redacts them, and run a secret scanner in CI so a leaking line never ships.

# A redaction filter applied at the logging boundary.
# Every record passes through it before it reaches any sink.

import logging, re

SENSITIVE_KEYS = {"authorization", "password", "api_key", "token", "set-cookie"}
TOKEN_RE = re.compile(r"(tok|sk|ghp)_[A-Za-z0-9]{6,}")

def mask(value: str) -> str:
    # keep a short prefix for correlation, drop the rest
    return value[:4] + "_****" if len(value) > 4 else "****"

class RedactFilter(logging.Filter):
    def filter(self, record: logging.LogRecord) -> bool:
        # 1) structured fields: redact by key name (allow-list everything else)
        for key in list(getattr(record, "fields", {})):
            if key.lower() in SENSITIVE_KEYS:
                record.fields[key] = "****"
        # 2) free-text safety net: mask anything shaped like a token
        if isinstance(record.msg, str):
            record.msg = TOKEN_RE.sub(lambda m: mask(m.group()), record.msg)
        return True  # never drop the line, only sanitise it

logging.getLogger().addFilter(RedactFilter())

# Good: only named, non-sensitive fields are logged.
log.info("request", fields={"method": "POST", "path": "/pay", "status": 200})

# Bad: dumps the whole request, headers and all -> Authorization leaks.
# log.info("request", req)

Two layers matter: the key allow-list handles structured logs (you log status, never headers), and the token regex is a safety net for free text and stack traces. Neither replaces rotation — once a value reaches a sink, treat it as compromised.

Signals

SignalWhat it likely means
Secret scanner hits in CI or repo historyA pattern like tok_… or AKIA… reached code or logs — the leak path is reproducible, so fix it at the source.
The key appears in the log index searchAnyone with read access to the index can find it. Blast radius is everyone who can query logs, not just whoever saw the request.
Downstream alert from a key vendorThe provider scanned public surfaces and flagged your key. Assume it is already known externally — rotate now, do not wait to scrub.
Retention copies in cold storage or backupsThe line was archived. Scrubbing the live index does not touch month-old snapshots — rotation is the only thing that neutralises the value.
Unusual use of the credentialAuth from a new region, or a spike on that token, suggests the leaked value is already being used. Contain immediately.

Watch out for

Worked example

A payments engineer adds log.info("incoming", req) while chasing a flaky webhook, and ships it. req carries an Authorization: Bearer tok_9f3a…b2 header, so every webhook now writes the live partner token into the app log.

Detect. Twelve hours later the CI secret scanner flags tok_ in a sampled log export, and the key vendor sends an automated “exposed credential” alert. Two independent signals — this is real.

Contain. The on-call rotates the partner token first: the leaked value is revoked and a fresh tok_4c81… is issued, so anything copied out is now dead. They deploy the redaction filter so future lines show tok_****, then scrub the matched lines from the live log index. They confirm the value also sits in last night’s cold-storage snapshot — which is exactly why rotation, not scrubbing, was the move that mattered.

Root cause. The offending statement passed a whole request object to the logger. The lasting fix is a logging boundary that only emits allow-listed fields, plus a CI scanner that would have blocked the change. The postmortem notes that the value was exposed for twelve hours across four sinks, so rotation was non-negotiable regardless of how thorough the scrub looked.

Check yourself

You scrubbed every matching log line from the live index within minutes of detection. Is the incident contained?

Which change actually stops the next leak at the source?