> -----Original Message----- > From: Jehan-Guillaume de Rorthais <[email protected]> > Sent: Saturday, November 5, 2022 3:45 PM > To: [email protected] > Cc: Robert Hayden <[email protected]> > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests > > On Sat, 5 Nov 2022 20:53:09 +0100 > Valentin Vidić via Users <[email protected]> wrote: > > > On Sat, Nov 05, 2022 at 06:47:59PM +0000, Robert Hayden wrote: > > > That was my impression as well...so I may have something wrong. My > > > expectation was that SBD daemon should be writing to the > /dev/watchdog > > > within 20 seconds and the kernel watchdog would self fence. > > > > I don't see anything unusual in the config except that pacemaker mode is > > also enabled. This means that the cluster is providing signal for sbd even > > when the storage device is down, for example: > > > > 883 ? SL 0:00 sbd: inquisitor > > 892 ? SL 0:00 \_ sbd: watcher: /dev/vdb1 - slot: 0 - uuid: ... > > 893 ? SL 0:00 \_ sbd: watcher: Pacemaker > > 894 ? SL 0:00 \_ sbd: watcher: Cluster > > > > You can strace different sbd processes to see what they are doing at any > > point. > > I suspect both watchers should detect the loss of network/communication > with > the other node. > > BUT, when sbd is in Pacemaker mode, it doesn't reset the node if the > local **Pacemaker** is still quorate (via corosync). See the full chapter: > «If Pacemaker integration is activated, SBD will not self-fence if **device** > majority is lost [...]» > https://urldefense.com/v3/__https://documentation.suse.com/sle-ha/15- > SP4/html/SLE-HA-all/cha-ha-storage- > protect.html__;!!ACWV5N9M2RV99hQ!LXxpjg0QHdAP0tvr809WCErcpPH0lx > MKesDNqK-PU_Xpvb_KIGlj3uJcVLIbzQLViOi3EiSV3bkPUCHr$ > > Would it be possible that no node is shutting down because the cluster is in > two-node mode? Because of this mode, both would keep the quorum > expecting the > fencing to kill the other one... Except there's no active fencing here, only > "self-fencing". >
I failed to mention I also have a Quorum Device also setup to add its vote to the quorum. So two_node is not enabled. I suspect Valentin was onto to something with pacemaker keeping the watchdog device updated as it thinks the cluster is ok. Need to research and test that theory out. I will try to carve some time out next week for that. Appreciate all of the feedback. I have been dealing with Cluster Suite for a decade+ but focused on the company's setup. I still have lots to learn, which keeps me interested. > To verify this guess, check the corosync conf for the "two_node" parameter > and > if both nodes still report as quorate during network outage using: > > corosync-quorumtool -s > > If this turn to be a good guess, without **active** fencing, I suppose a > cluster > can not rely on the two-node mode. I'm not sure what would be the best > setup > though. > > Regards, _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
