Cold Storage for Logs

Because keeping 3 years of server logs in a hot database will bankrupt you.

The idea

Application logs and metrics are highly valuable for the first 7 days when you are actively debugging incidents. After that, they are almost never queried, but compliance teams often require you to keep them for 1 to 7 years. You must aggressively move old logs out of expensive "Hot" indexing systems (like Datadog or Elasticsearch) into "Cold" object storage (like AWS S3 Glacier).

Step 1: Fresh logs arrive. They are placed in Hot Storage for instant searching.

How it works (Data Lifecycle Policies)

Data is migrated automatically based on its age. Hot storage keeps data on fast SSDs and indexes every field. Cold storage compresses the data into chunks (e.g., Parquet or GZIP) and dumps it onto slow, cheap magnetic tape/disks.

# Typical Data Lifecycle Policy

def evaluate_log_retention(log_index):
    age_days = (current_date - log_index.creation_date).days
    
    if age_days < 7:
        # Keep in Elasticsearch on fast NVMe SSDs ($$$)
        # Allows sub-second search during active incidents.
        log_index.tier = "HOT"
        
    elif age_days < 30:
        # Move to slower HDDs, reduce index replication ($)
        # Search takes ~5 seconds, but cheaper.
        log_index.tier = "WARM"
        
    elif age_days < 2555: # 7 years
        # Compress to .gz files and send to S3 Glacier (¢)
        # Search requires a manual "restore" job taking 12+ hours.
        s3.upload(log_index.compress())
        log_index.delete_from_elasticsearch()
        
    else:
        # Compliance period is over. Delete forever to avoid liability.
        s3.delete(log_index.s3_path)

Cost

The cost savings are monumental (often 100x cheaper per GB). The trade-off is Retrieval Time. If auditors ask for a log from 2 years ago, an engineer has to trigger a data restoration job and wait 12-24 hours before they can even run a search query.

Watch out for

Retrieval Fees: Cloud providers charge you virtually nothing to store data in Cold Storage, but they charge heavily to retrieve it. If a misconfigured script tries to scan 3 years of cold logs, you will incur a massive surprise bill.