Question
You operate thousands of Raft/Paxos consensus groups (one per data shard) and routinely need to MOVE replicas between nodes/regions — for rebalancing load, draining a node for maintenance, or relocating a shard's replicas closer to where its traffic moved. Doing this naively (remove old member, add new member as two separate steps, or swapping multiple members at once) has historically caused a group to lose quorum or, worse, briefly admit two disjoint majorities. Design safe online membership reconfiguration for these consensus groups so a live group never loses availability and never risks split-brain during the change, including how you relocate a replica across regions without a window where the group can't make progress.
Clarify scale and constraints first. Propose a clean component breakdown, then go deep on the hard parts — data model, bottlenecks, consistency, failure modes — and name the trade-offs you are making.