Question
Your primary-replica datastore uses a separate failover controller that promotes a replica when it can't reach the primary. At 03:10 a brief network partition isolates the failover controller from the primary (but the primary itself is healthy and still serving app writes from clients on its side of the partition). The controller promotes the replica. The partition heals at 03:14. Now you have two nodes that both accepted writes for ~4 minutes, with conflicting data. Dashboards: write success was ~100% the whole time on BOTH nodes; no errors; clients on each side were happy. How do you triage and mitigate the damage?
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.