On Wed, Feb 4, 2026 at 4:36 PM Anton Gavriliuk via Users < [email protected]> wrote:
> > > Hello > > > > There is two-node (HPE DL345 Gen12 servers) shared-nothing DRBD-based sync > (Protocol C) replication, distributed active/standby pacemaker storage > metro-cluster. The distributed active/standby pacemaker storage > metro-cluster configured with qdevice, heuristics (parallel fping) and > fencing - fence_ipmilan and diskless sbd (hpwdt, /dev/watchdog). All > cluster resources are configured to always run together on the same node. > > > > The two storage cluster nodes and qdevice running on Rocky Linux 10.1 > > Pacemaker version 3.0.1 > > Corosync version 3.1.9 > > DRBD version 9.3.0 > > > > So, the question is – what is the most correct way of implementing > STONITH/fencing with fence_iomilan + diskless sbd (hpwdt, /dev/watchdog) ? > The correct way of using diskless sbd with a two-node cluster is not to use it ;-) diskless sbd (watchdog-fencing) requires 'real' quorum and quorum provided by corosync in two-node mode would introduce split-brain which is the reason why sbd recognizes the two-node operation and replaces quorum from corosync by the information that the peer node is currently in the cluster. This is fine for working with poison-pill fencing - a single single shared disk then doesn't become a single-point-of-failure as long as the peer is there. But for watchdog-fencing that doesn't help because the peer going away would mean you have to commit suicide. and alternative with a two-node cluster is to step away from the actual two-node design and go with qdevice for 'real' quorum. You'll need some kind of 3rd node but it doesn't have to be a full cluster node. Regards, Klaus > I’m not sure about two-level fencing topology, because diskless sbd is not > an external agent/resource… > > > > Currently it works without fencing topology, and both running in > “parallel”. Really no matter who wins. I just want to make sure fenced > node is powered off of rebooted. > > > > Here is log how it works now in “parallel”, > > > > [root@memverge2 ~]# cat /var/log/messages|grep -i fence > > Feb 2 12:46:07 memverge2 pacemaker-fenced[3902]: notice: Node memverge > state is now lost > > Feb 2 12:46:07 memverge2 pacemaker-fenced[3902]: notice: Removed 1 > inactive node with cluster layer ID 27 from the membership cache > > Feb 2 12:46:10 memverge2 pacemaker-schedulerd[3905]: warning: Cluster > node memverge will be fenced: peer is no longer part of the cluster > > Feb 2 12:46:10 memverge2 pacemaker-schedulerd[3905]: warning: > ipmi-fence-memverge2_stop_0 on memverge is unrunnable (node is offline) > > Feb 2 12:46:10 memverge2 pacemaker-schedulerd[3905]: warning: > ipmi-fence-memverge2_stop_0 on memverge is unrunnable (node is offline) > > Feb 2 12:46:10 memverge2 pacemaker-schedulerd[3905]: notice: Actions: > Fence (reboot) memverge 'peer is no longer part of the cluster' > > Feb 2 12:46:10 memverge2 pacemaker-schedulerd[3905]: notice: Actions: > Stop ipmi-fence-memverge2 ( memverge > ) due to node availability > > Feb 2 12:46:10 memverge2 pacemaker-fenced[3902]: notice: Client > pacemaker-controld.3906 wants to fence (reboot) memverge using any device > > Feb 2 12:46:10 memverge2 pacemaker-fenced[3902]: notice: Requesting peer > fencing (reboot) targeting memverge > > Feb 2 12:46:10 memverge2 pacemaker-fenced[3902]: notice: Requesting that > memverge2 perform 'reboot' action targeting memverge > > Feb 2 12:46:10 memverge2 pacemaker-fenced[3902]: notice: Waiting 25s for > memverge to self-fence (reboot) for client pacemaker-controld.3906 > > Feb 2 12:46:10 memverge2 pacemaker-fenced[3902]: notice: Delaying > 'reboot' action targeting memverge using ipmi-fence-memverge for 5s > > Feb 2 12:46:36 memverge2 pacemaker-fenced[3902]: notice: Self-fencing > (reboot) by memverge for pacemaker-controld.3906 assumed complete > > Feb 2 12:46:36 memverge2 pacemaker-fenced[3902]: notice: Operation > 'reboot' targeting memverge by memverge2 for > pacemaker-controld.3906@memverge2: OK (Done) > > Feb 2 12:46:36 memverge2 kernel: drbd ha-nfs memverge: helper command: > /sbin/drbdadm fence-peer > > Feb 2 12:46:36 memverge2 kernel: drbd ha-iscsi memverge: helper command: > /sbin/drbdadm fence-peer > > Feb 2 12:46:36 memverge2 crm-fence-peer.9.sh[7332]: > DRBD_BACKING_DEV_1=/dev/mapper/object_block_nfs_vg-ha_nfs_exports_lv_with_vdo_1x8 > DRBD_BACKING_DEV_2=/dev/mapper/object_block_nfs_vg-ha_nfs_internal_lv_without_vdo > DRBD_BACKING_DEV_5=/dev/mapper/object_block_nfs_vg-ha_samba_exports_lv_with_vdo_1x8 > DRBD_CONF=/etc/drbd.conf DRBD_CSTATE=Connecting > DRBD_LL_DISK=/dev/mapper/object_block_nfs_vg-ha_nfs_exports_lv_with_vdo_1x8\ > /dev/mapper/object_block_nfs_vg-ha_nfs_internal_lv_without_vdo\ > /dev/mapper/object_block_nfs_vg-ha_samba_exports_lv_with_vdo_1x8 > DRBD_MINOR=1\ 2\ 5 DRBD_MINOR_1=1 DRBD_MINOR_2=2 DRBD_MINOR_5=5 > DRBD_MY_ADDRESS=192.168.0.8 DRBD_MY_AF=ipv4 DRBD_MY_NODE_ID=28 > DRBD_NODE_ID_27=memverge DRBD_NODE_ID_28=memverge2 > DRBD_PEER_ADDRESS=192.168.0.6 DRBD_PEER_AF=ipv4 DRBD_PEER_NODE_ID=27 > DRBD_RESOURCE=ha-nfs DRBD_VOLUME=1\ 2\ 5 UP_TO_DATE_NODES=0x10000000 > /usr/lib/drbd/crm-fence-peer.9.sh > > Feb 2 12:46:36 memverge2 crm-fence-peer.9.sh[7333]: > DRBD_BACKING_DEV_3=/dev/mapper/object_block_nfs_vg-ha_block_exports_lv_with_vdo_1x8 > DRBD_BACKING_DEV_4=/dev/mapper/object_block_nfs_vg-ha_block_exports_lv_without_vdo > DRBD_CONF=/etc/drbd.conf DRBD_CSTATE=Connecting > DRBD_LL_DISK=/dev/mapper/object_block_nfs_vg-ha_block_exports_lv_with_vdo_1x8\ > /dev/mapper/object_block_nfs_vg-ha_block_exports_lv_without_vdo > DRBD_MINOR=3\ 4 DRBD_MINOR_3=3 DRBD_MINOR_4=4 DRBD_MY_ADDRESS=192.168.0.8 > DRBD_MY_AF=ipv4 DRBD_MY_NODE_ID=28 DRBD_NODE_ID_27=memverge > DRBD_NODE_ID_28=memverge2 DRBD_PEER_ADDRESS=192.168.0.6 DRBD_PEER_AF=ipv4 > DRBD_PEER_NODE_ID=27 DRBD_RESOURCE=ha-iscsi DRBD_VOLUME=3\ 4 > UP_TO_DATE_NODES=0x10000000 /usr/lib/drbd/crm-fence-peer.9.sh > > Feb 2 12:46:36 memverge2 crm-fence-peer.9.sh[7333]: INFO Concurrency > check: Peer is already marked clean/fenced by another resource. Returning > success (Exit 4). > > Feb 2 12:46:36 memverge2 crm-fence-peer.9.sh[7332]: INFO Concurrency > check: Peer is already marked clean/fenced by another resource. Returning > success (Exit 4). > > Feb 2 12:46:36 memverge2 kernel: drbd ha-iscsi memverge: helper command: > /sbin/drbdadm fence-peer exit code 4 (0x400) > > Feb 2 12:46:36 memverge2 kernel: drbd ha-iscsi memverge: fence-peer > helper returned 4 (peer was fenced) > > Feb 2 12:46:36 memverge2 kernel: drbd ha-nfs memverge: helper command: > /sbin/drbdadm fence-peer exit code 4 (0x400) > > Feb 2 12:46:36 memverge2 kernel: drbd ha-nfs memverge: fence-peer helper > returned 4 (peer was fenced) > > Feb 2 12:46:37 memverge2 pacemaker-fenced[3902]: notice: Operation > 'reboot' [7068] targeting memverge using ipmi-fence-memverge returned 0 > > Feb 2 12:46:37 memverge2 pacemaker-fenced[3902]: notice: Operation > 'reboot' targeting memverge by memverge2 for > pacemaker-controld.3906@memverge2: Result arrived too late > > [root@memverge2 ~]# > > > > Anton > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
