Data exfiltration

The break-in is loud; the theft is quiet — data leaving slowly looks almost like normal traffic.

The idea

Once someone is inside — a real attacker, a malicious insider, or a compromised service account — the dangerous step isn't getting in, it's copying sensitive data out. They rarely smash a hole in the wall. They lean on channels you already allow: HTTPS to a cloud bucket, DNS lookups, a legitimate SaaS API.

Data exfiltration is the unauthorized transfer of data off your systems, often slow, encrypted, and blended into ordinary egress so it stays under your alerts. The perimeter firewall sees nothing wrong — the data is leaving through a door marked "open". Detection lives in egress baselines and data-loss monitoring, not in the front gate.

Press play to watch outbound traffic on the egress dashboard — normal at first, then a quiet bulk read.

How it works

The failure mode: nobody watches what leaves. Reads are over-permissioned, outbound traffic has no baseline, and the destination list is open, so a service account can stream a customer table to an unfamiliar bucket and look like ordinary HTTPS.

The fix is to treat egress like a tap you can measure and close. Keep a rolling per-destination baseline, allow-list where data is allowed to go, and alert or block when a destination is new or volume jumps to many times normal. Below, the check both compares to the baseline and gates on the allow-list.

def egress_guard(flow, baselines, allow_list):
    # flow = one outbound transfer: destination + bytes, this window.
    dest = flow.destination
    mb = flow.bytes / 1_000_000
    base = baselines.rolling_mb(dest)        # normal MB for this dest

    # 1. Destination not on the allow-list — block, don't just log.
    if dest not in allow_list:
        block(flow)
        alert("egress.new_destination", dest, mb)
        return

    # 2. Allowed dest, but volume far above its baseline — bulk export.
    if base == 0 or mb > 5 * base:           # 5x the rolling baseline
        throttle(flow)
        alert("egress.volume_spike", dest, mb, ratio=mb / max(base, 1))

    # 3. Least privilege caps the blast radius even if this slips.
    enforce_read_quota(flow.principal)       # one creds != whole table

Signals

Signal	What exfiltration looks like
Outbound volume	Egress far above the normal baseline for that window
Destination	Large transfers to new or rarely-seen external endpoints
Timing	Data leaving at odd hours, off the usual batch schedule
Access breadth	One role reading far more records than it normally touches
Shape of traffic	Compressed / encrypted blobs to cloud storage, or DNS tunneling

Watch out for

Watching only the perimeter. Inbound rules and the front-door firewall say nothing about data leaving. Instrument egress as a first-class signal, not an afterthought.
No baseline. Without a sense of "normal" outbound per destination, you have nothing to compare a spike against, so anomalies hide in plain sight.
Slow-and-low transfers. A patient thief paces the copy to stay under fixed thresholds. Watch cumulative volume and rare-destination patterns, not just instantaneous rate.
Exfil over allowed channels. HTTPS, DNS, and legitimate SaaS are trusted by default. Allow-list where data may go, not just whether the protocol is permitted.
Over-broad read access. When one credential can SELECT the whole table, a single compromise drains everything. Least-privilege reads and quotas cap the blast radius.

Worked example

Your normal nightly egress to known backups is about 200 MB. A compromised service account starts copying a customer table to an unfamiliar bucket at roughly 2 GB/hour. Over six hours that's about 12 GB — millions of rows — every one of them riding ordinary HTTPS, so the perimeter firewall stays green the whole time.

An egress baseline that alerts at 5x normal would have fired in the first hour, when 2 GB blew past the 200 MB line to an endpoint nobody had allow-listed. The slow-and-low variant — paced to a few hundred MB/hour to a known-looking domain — slides under a fixed threshold, which is why destination allow-listing and cumulative-volume tracking matter alongside rate. And a role-based read cap is the cleanest stop of all: if that service account could never SELECT the whole table, the bulk export never assembles in the first place.

Check yourself

Your perimeter firewall shows nothing unusual, but customer data is quietly leaking. Where do you look first?

Coach note: the theft leaves through doors you already allow, so the signal is in what goes out and where it's headed — compared against what "normal" looks like.