If your server fetches whatever URL a user hands it, the attacker borrows your server's place inside the trusted network — and points it at things they could never reach themselves.
Plenty of features take a URL from the user and have the server go fetch it: "import from URL", webhook validators, link previews, thumbnail generators. The user types a link; the backend opens a connection and pulls the bytes back.
The trouble is where that fetch comes from. The attacker's browser sits outside, blocked by firewalls. Your server sits inside — it can reach localhost admin panels, private 10.0.0.0/8 hosts, and the cloud metadata endpoint 169.254.169.254. Ask the server to fetch one of those and it does the attacker's reconnaissance from a trusted vantage point. The fix is an allowlist: resolve the destination, re-check the final IP, and only fetch hosts you explicitly trust.
The defense never trusts the hostname string. It resolves the host to an IP, rejects any address in a private, loopback, or link-local range, confirms the host is on the allowlist, and re-checks the IP again after every redirect — only then does it open the real connection. Checking the name but fetching the resolved IP is exactly the gap attackers slip through.
def safe_fetch(user_url):
url = parse(user_url)
if url.scheme not in ("http", "https"): # no file:// gopher:// etc
return BLOCK("scheme")
if url.host not in ALLOWED_HOSTS: # allowlist, not denylist
return BLOCK("not allowed")
ip = resolve(url.host) # DNS -> actual address
if is_private(ip) or is_link_local(ip): # 10/8 127/8 169.254/16 ::1
return BLOCK("internal range")
# follow redirects yourself, re-checking each hop's resolved IP
return fetch(url, pin_ip=ip, on_redirect=safe_fetch)
Two ideas carry the weight: decide on the resolved IP range, not on a string the attacker controls, and re-resolve and re-check at fetch time so a redirect or a rebinding swap can't slip an internal address past the gate.
| Choice | Buys you | Costs you |
|---|---|---|
| Allowlist of hosts | Default-deny; nothing internal is reachable | Less flexible — every new destination needs adding |
| Denylist of internal ranges | Flexible; arbitrary external hosts work | Easy to miss a range (IPv6, decimal IP, metadata) |
| Resolve then pin the IP | Decision matches what actually gets fetched | Must re-resolve per hop; a little more plumbing |
| Egress proxy or no redirects | One enforced choke point for all outbound calls | Operational overhead; breaks naive redirect-following |
localhost and 127.0.0.1 have many spellings — raw decimal IP, IPv6 ::1, 0.0.0.0 — so block by resolved IP range, not by text match.169.254.169.254. It's the classic SSRF target because it hands out instance credentials with no auth.30x redirect points at 10.0.0.5 — re-validate every hop.file:// reads local files, gopher:// can forge arbitrary TCP — restrict to http and https.An attacker submits http://169.254.169.254/latest/meta-data/iam/security-credentials/ to a link-preview endpoint. With the allowlist on: the server resolves the host, sees 169.254.169.254 falls in the link-local 169.254.0.0/16 range and isn't on the allowed list, and blocks the request before any packet leaves — no credentials, no preview. With no allowlist: the same endpoint dutifully fetches the metadata path, the response body contains the instance's temporary access keys, and they come straight back to the attacker. Same input, same server — the allowlist is the entire difference between a blocked request and a leaked credential.
Your denylist blocks localhost and 127.0.0.1, yet an SSRF still reaches an internal service. Why?
Coach note: a string denylist only covers the spellings you thought of. Decide on the resolved IP's range and an explicit allowlist instead. Take another pass if the difference feels slippery — it's the heart of the defense.