Stale Mounts (Block Storage)

When the operating system is lying to the application about the disk.

The idea

In cloud environments (like AWS EBS), a virtual hard drive (Block Volume) is attached to a virtual machine (EC2) over the network. If that network connection blips, or the volume is detached forcefully via the cloud console, the operating system (Linux) might not realize it immediately. Applications trying to read/write to the disk will freeze indefinitely (a "stale mount"), causing a catastrophic outage.

Step 1: The application reads/writes to /mnt/data perfectly fine.

How it works (Uninterruptible Sleep)

When an application reads a file, it makes a system call to the Linux kernel. If the underlying cloud disk disappears, the kernel waits for it to come back. The application process is put into a "D" state (Uninterruptible Sleep). You cannot even `kill -9` the process!

# Scenario: The underlying EBS volume is detached.

# The application runs:
data = open("/mnt/data/logs.txt").read()

# The kernel tries to fetch blocks from the missing network disk.
# It retries indefinitely. 
# The Python script HANGS FOREVER.

# Even running `ls /mnt/data` in the terminal will hang your terminal!

# The fix: forcefully unmount the stale filesystem lazily
# $ umount -l /mnt/data

Cost

This is an availability cost. If you rely on attached block storage (EBS) rather than object storage (S3) or managed databases (RDS), your application code is at the mercy of the operating system's filesystem driver locking up your threads.

Watch out for