On 2021-08-05 2:25 p.m., Andrei
Borzenkov wrote:
Three nodes A, B, C. Communication between A and B is blocked (completely - no packet can come in both direction). A and B can communicate with C.I expected that result will be two partitions - (A, C) and (B, C). To my surprise, A went offline leaving (B, C) running. It was always the same node (with node id 1 if it matters, out of 1, 2, 3). How surviving partition is determined in this case? Can I be sure the same will also work in case of multiple nodes? I.e. if I have two sites with equal number of nodes and the third site as witness and connectivity between multi-node sites is lost but each site can communicate with witness. Will one site go offline? Which one?
In your case, your nodes were otherwise healthy so quorum worked.
To properly avoid a split brain (when a node is not behaving
properly, ie: lockups, bad RAM/CPU, etc) you reallllly need actual
fencing. In such a case, whichever nodes maintain quorum, will
fence the lost node (be it because it became inquorate or stopped
behaving properly).
As for the mechanics of how quorum is determined in your case
above, I'll let one of the corosync people decide.
-- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
