Configuration Errors

When perfectly good code is destroyed by a bad JSON file.

The idea

We test our code rigorously. But often, we treat configuration—JSON, YAML, feature flags, or environment variables—as "just data." A developer changes max_connections: 100 to max_connections: 0 in a production YAML file. The code is flawless, but the application instantly crashes upon reading the config. Configuration changes are code changes, and they cause some of the worst outages in history.

Step 1: The App is running fine. A developer edits config.yaml directly in production.

How it works (Static Config Validation)

To prevent this, you must treat configuration as Code. It must live in Git, go through Code Review, and most importantly, be validated at Startup. If an app starts up and detects a bad configuration, it should Crash Early (Fail Fast), ideally preventing the orchestrator (like Kubernetes) from routing traffic to it.

# Python example of failing fast on bad config
import os
import sys

def load_config():
    max_conn = int(os.getenv("MAX_CONNECTIONS", "100"))
    
    # Static Validation (Fail Fast)
    if max_conn <= 0:
        print("CRITICAL: MAX_CONNECTIONS must be > 0. Shutting down.")
        sys.exit(1) # Kubernetes sees the crash and halts the deployment
        
    return {"max_connections": max_conn}

# Call this BEFORE binding to the HTTP port
config = load_config()

Cost

Treating config as code (GitOps) slows down operational changes. If you need to quickly flip a feature flag during an incident, waiting 10 minutes for a CI/CD pipeline to deploy a YAML file feels agonizing. The trade-off is stability vs. agility.

Watch out for