Temp file leaks

A temp file is a guest, not a tenant — if nobody shows it the door, it stays forever.

The idea

Lots of jobs write a scratch file: an upload buffer, a resized image, a render of a PDF. The plan is always “use it, then delete it.” But when a request throws an exception between creating the file and deleting it, the delete never runs. The file is orphaned on disk.

One orphan is harmless. A leak is the same bug repeating thousands of times a day. Disk usage creeps up, inodes run out, and one morning the partition is full — and the service that mattered can no longer write. The fix is to make cleanup guaranteed, not best-effort: tie the file's lifetime to a scope that always unwinds.

/tmp directory Disk used 0 of 12 slots used
Pick a cleanup strategy, then press Play to run 8 requests. Some will fail mid-job.

How it works

The leaky version deletes only on the happy path, so any exception skips the delete. The robust version binds the temp file to a scope that the language guarantees will unwind — a with block, try/finally, or a self-deleting temp handle. When the scope exits, normally or by exception, the file is removed.

# Leaky: delete only runs if nothing throws
def handle(req):
    path = make_temp()
    process(req, path)   # raises? path is orphaned
    os.remove(path)      # never reached on error

# Robust: cleanup is tied to scope exit
import tempfile, os
def handle(req):
    fd, path = tempfile.mkstemp()
    try:
        process(req, path)
    finally:
        os.close(fd)
        os.remove(path)  # runs on success AND on exception

# Even simpler: a self-deleting temp file
def handle(req):
    with tempfile.NamedTemporaryFile() as tmp:
        process(req, tmp.name)
    # file is gone the moment the block exits

A reaper job (delete temp files older than N hours) is a good safety net, but it is a backstop, not the primary fix.

Signals

SymptomWhat it usually means
Disk usage rises monotonicallyTemp files created but not removed
“No space left” with small real dataOrphans filling the partition
“No inodes” though disk has free bytesMany tiny orphaned files
Leak rate tracks the error rateCleanup is on the happy path only

Watch out for

Worked example

An image service writes a temp file per upload and deletes it after a successful resize. The resize throws on corrupt images about 2% of the time. At 100,000 uploads/day that's ~2,000 orphans/day, each a few hundred KB. Within a couple of weeks the 50 GB scratch volume is full and all uploads start failing — even the valid ones. Wrapping the resize in a finally that removes the file drops the leak to zero overnight.

Check yourself

You move os.remove(path) to the very last line of the function. Is the leak fixed?