Log Volume Management

Why logging every HTTP request is the fastest way to bankrupt your startup.

The idea

When you build a small app, you print everything to the console: "User logged in", "Database queried", "Button clicked". When your app goes viral, you are suddenly processing 10,000 requests per second. Every single log line is sent over the network to a central log server (like Datadog or Splunk). Very quickly, the network is choked, the log server runs out of disk space, and you get a bill for $50,000 at the end of the month. To survive, you must aggressively filter and sample your logs before they leave your servers.

Step 1: Low Traffic. 10 requests/sec. We log every single event (INFO, WARN, ERROR). The Log Server is happy.

How it works (Levels & Sampling)

Managing log volume requires strict discipline at the application layer:

Log Levels: In Production, globally disable DEBUG and INFO logs. Only emit WARN and ERROR. The vast majority of system noise disappears.
Dynamic Sampling: If you absolutely must log HTTP access (e.g. for analytics), don't log 100% of them. Use dynamic sampling to log 1 in every 100 requests. When calculating metrics later, multiply the results by 100.

// 1. Strict Log Levels
const logger = createLogger({
    // Only print Warnings and Errors in Production!
    level: process.env.NODE_ENV === 'prod' ? 'warn' : 'debug' 
});

// 2. Head-based Sampling (1% of traffic)
function logHttpRequest(req, res) {
    if (Math.random() < 0.01) {
        logger.info(`Access: ${req.url} - ${res.statusCode}`);
    }
}

// 3. Log Aggregation (Batching)
// Never send 1 log at a time over HTTP. Buffer them in memory 
// and send 500 at once to save network overhead.

Cost

Sampling means you intentionally throw away data. If a customer reports a bug ("My payment failed at 2:00 PM"), and that specific request wasn't part of the 1% sample, you have no logs to investigate. The trade-off is between perfect observability (astronomical cost) and statistical observability (affordable, but missing needles in the haystack).

Watch out for

Tail-based Sampling: Random sampling is dangerous because you might drop the 1% of requests that actually failed! Advanced systems use "Tail-based sampling": they hold 100% of the logs in a temporary memory buffer. If the request succeeds, they throw the logs away. If the request throws an Exception, they retroactively save 100% of the logs for that specific trace.