Moving data from operational databases into analytical warehouses.
You shouldn't run massive analytical queries ("Total sales by month") on your live production database—it will slow down user traffic. Instead, data is extracted, transformed, and loaded (ETL) into a Data Warehouse.
Older systems run Batch Pipelines every night (e.g. at 2 AM). Modern systems use Streaming / CDC (Change Data Capture) to capture database changes in real-time. Good pipelines also include Quality Gates to quarantine bad data before it ruins executive dashboards.
def transform_data(record):
# 1. Standardize formatting
record['date'] = parse_iso(record['date'])
# 2. Quality Gate (Data Contract)
if record['amount'] < 0:
# Invalid data! Do NOT load to warehouse.
# Quarantine it for human review.
send_to_dead_letter_queue(record)
return None
return record