A temp file is a guest, not a tenant — if nobody shows it the door, it stays forever.
Lots of jobs write a scratch file: an upload buffer, a resized image, a render of a PDF. The plan is always “use it, then delete it.” But when a request throws an exception between creating the file and deleting it, the delete never runs. The file is orphaned on disk.
One orphan is harmless. A leak is the same bug repeating thousands of times a day. Disk usage creeps up, inodes run out, and one morning the partition is full — and the service that mattered can no longer write. The fix is to make cleanup guaranteed, not best-effort: tie the file's lifetime to a scope that always unwinds.
The leaky version deletes only on the happy path, so any exception skips the delete. The robust version binds the temp file to a scope that the language guarantees will unwind — a with block, try/finally, or a self-deleting temp handle. When the scope exits, normally or by exception, the file is removed.
# Leaky: delete only runs if nothing throws
def handle(req):
path = make_temp()
process(req, path) # raises? path is orphaned
os.remove(path) # never reached on error
# Robust: cleanup is tied to scope exit
import tempfile, os
def handle(req):
fd, path = tempfile.mkstemp()
try:
process(req, path)
finally:
os.close(fd)
os.remove(path) # runs on success AND on exception
# Even simpler: a self-deleting temp file
def handle(req):
with tempfile.NamedTemporaryFile() as tmp:
process(req, tmp.name)
# file is gone the moment the block exits
A reaper job (delete temp files older than N hours) is a good safety net, but it is a backstop, not the primary fix.
| Symptom | What it usually means |
|---|---|
| Disk usage rises monotonically | Temp files created but not removed |
| “No space left” with small real data | Orphans filling the partition |
| “No inodes” though disk has free bytes | Many tiny orphaned files |
| Leak rate tracks the error rate | Cleanup is on the happy path only |
os.remove at the end of the function instead of in finally — an early return or raise jumps right over it./tmp/work.tmp — concurrent requests stomp each other and you get corruption on top of the leak./tmp on reboot. Long-running servers may not reboot for months, and the disk fills first.An image service writes a temp file per upload and deletes it after a successful resize. The resize throws on corrupt images about 2% of the time. At 100,000 uploads/day that's ~2,000 orphans/day, each a few hundred KB. Within a couple of weeks the 50 GB scratch volume is full and all uploads start failing — even the valid ones. Wrapping the resize in a finally that removes the file drops the leak to zero overnight.
You move os.remove(path) to the very last line of the function. Is the leak fixed?