Dependency & vendor failures

Why a 3rd-party analytics API going down can take your entire checkout page with it.

The idea

Your app likely calls external APIs (Stripe, Twilio, Analytics). If an external API returns a fast "500 Error", your app can just show an error message and move on. That is easy.

The deadly scenario is a Slow Dependency. If the vendor API starts taking 60 seconds to respond, your web server leaves the connection open, waiting. Very quickly, every single worker thread on your server gets stuck waiting on the vendor. Now your server cannot serve any requests (even healthy ones). You must configure aggressive Timeouts and Fallbacks for all external network calls.

App Threads (4)
0 / 4 Used
Vendor API
Latency: 10ms
The system is healthy. Requests flow quickly.

How it works (Timeouts)

# BAD: Default or missing timeouts
# The requests library will wait FOREVER by default!
# If the vendor hangs, this thread is permanently dead.
response = requests.get("https://api.vendor.com/data")

# GOOD: Strict timeouts
try:
    # Fail fast if the vendor takes more than 1 second.
    response = requests.get("https://api.vendor.com/data", timeout=1.0)
    data = response.json()
except requests.exceptions.Timeout:
    # Fallback: Serve stale data from cache, or disable feature.
    # We sacrificed the feature, but SAVED THE SERVER.
    data = {"status": "degraded_mode"}