Question
Your storage cluster's periodic integrity scrub flags a rising number of checksum mismatches: stored object checksums no longer match the data on disk for a growing set of objects, all concentrated on two older storage nodes. There's no outage and reads still 'succeed', but some clients have recently reported occasional corrupted downloads. The two nodes are past their planned hardware refresh and one has logged ECC memory errors this week. Data is 3x replicated with per-object checksums. How do you triage this silent-corruption signal, protect data integrity, and recover any damaged copies?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.