On 2021-08-05 2:25 p.m., Andrei Borzenkov wrote:
Three nodes A, B, C. Communication between A and B is blocked
(completely - no packet can come in both direction). A and B can
communicate with C.

I expected that result will be two partitions - (A, C) and (B, C). To my
surprise, A went offline leaving (B, C) running. It was always the same
node (with node id 1 if it matters, out of 1, 2, 3).

How surviving partition is determined in this case?

Can I be sure the same will also work in case of multiple nodes? I.e. if
I have two sites with equal number of nodes and the third site as
witness and connectivity between multi-node sites is lost but each site
can communicate with witness. Will one site go offline? Which one?

In your case, your nodes were otherwise healthy so quorum worked. To properly avoid a split brain (when a node is not behaving properly, ie: lockups, bad RAM/CPU, etc) you reallllly need actual fencing. In such a case, whichever nodes maintain quorum, will fence the lost node (be it because it became inquorate or stopped behaving properly).

As for the mechanics of how quorum is determined in your case above, I'll let one of the corosync people decide.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to