Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

Sherrard Burton Tue, 07 Apr 2020 11:14:11 -0700


On 4/7/20 1:16 PM, Andrei Borzenkov wrote:

07.04.2020 00:21, Sherrard Burton пишет:


It looks like some timing issue or race condition. After reboot node
manages to contact qnetd first, before connection to other node is
established. Qnetd behaves as documented - it sees two equal size
partitions and favors the partition that includes tie breaker (lowest
node id). So existing node goes out of quorum. Second later both nodes
see each other and so quorum is regained.


Define the right problem to solve?

Educated guess is that your problem is not corosync but pacemaker
stopping resources. In this case just do what was done for years in two
node cluster - set no-quorum-policy=ignore and rely on stonith to
resolve split brain.

I dropped idea to use qdevice in two node cluster. If you have reliable
stonith device it is not needed and without stonith relying on watchdog
suicide has too many problems.


Andrei,

in a two-node cluster with stonith only, but no qdevice, how do youavoid the dreaded stonith death match, and the resultant flip-floppingof services?

and are you using this configuration with stateful services? my main usecase is DRBD, so i am very cautious of making sure that there is no datacorruption, or disruption. so the qdevice is a part of my "belt andsuspenders" approach.

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

Reply via email to