Swap a sensitive value for a meaningless token, and keep the real data locked in a vault.
A token is a surrogate that stands in for a sensitive value — a credit card number, an SSN — but carries no exploitable meaning on its own. The real value lives in a secured token vault; everywhere else in your systems you store only the token. If a downstream database, backup, or log leaks, attackers get tokens, not real PII.
Unlike encryption, a token is not mathematically derived from the value — there is no key to steal that reverses it. Detokenizing means a lookup in the vault, which is access-controlled and audited. You can even mint format-preserving tokens that keep the last 4 digits for display, so receipts still read •••• 4242 without exposing the rest.
Tokenizing generates a fresh random token and stores the mapping in the vault keyed by that token. Detokenizing authorizes the caller, writes an audit record, then returns the value — a lookup, never a decryption.
vault = {} # token -> real value, lives ONLY here
audit = [] # every detokenize is recorded
def tokenize(value):
token = "tok_" + random_id() # NOT derived from value
vault[token] = value # mapping stays in the vault
return token # store this everywhere else
def detokenize(token, caller):
if not authorized(caller, token): # access-controlled
raise Forbidden
audit.append((now(), caller, token)) # who, when, which
return vault[token] # a lookup, not a decrypt
| Operation | Cost | Why |
|---|---|---|
| tokenize(value) | O(1) | Mint a random token, one write into the vault |
| detokenize(token) | O(1) | One vault lookup plus an authz check and an audit append |
| Storage | N mappings | The vault holds all N pairs; everywhere else holds only tokens |
| Leak blast radius | tokens only | A downstream breach yields surrogates with zero PII value |
A card number is tokenized at the payment edge into tok_8Kd2. The order service, the data warehouse, and the log pipeline all store tok_8Kd2 — never the real PAN. Months later a warehouse backup leaks: attackers walk away with tokens and zero card value. Meanwhile the settlement service, which is authorized, calls detokenize(tok_8Kd2) to charge the card — and that single call shows up in the audit log with the caller, the timestamp, and the token, so you know exactly who touched real data and when.
A downstream analytics database is breached, and it stored only tokens. What did the attacker get?
Coach note: if this didn't click yet, replay the visual and watch the warm path (the real value) stay inside the vault while only the green token fans out — that separation is the whole point.