Network incidents

The network is never perfectly reliable; cables get cut and routers crash.

The idea

When you send data over the internet, it hops across multiple physical routers. If a fiber cable is cut, or a router crashes, the packets currently on that wire simply vanish (a Blackhole).

Modern networks are designed to heal. Load balancers use Health Checks to detect dead paths. Once a health check fails (after a few seconds), the routing tables update to pull the bad route, and traffic flows around the damage.

Client Router A Router B Server
Health Check: OK (Path A)
Traffic is flowing normally through the primary route (Router A).

How it works (BGP and Health Checks)

# The reality of the internet:
# - Packets are just fire-and-forget.
# - If a router dies, packets sent to it vanish (Blackhole).

# To fix this, load balancers continuously probe routes:
def health_check(router_ip):
    try:
        ping(router_ip, timeout=2.0)
        return True
    except Timeout:
        return False

# BGP Routing Logic:
if not health_check(ROUTER_A):
    # Route is withdrawn!
    routing_table.remove(ROUTER_A)
    # Traffic instantly shifts to the fallback (ROUTER_B)
    routing_table.use_primary(ROUTER_B)