Thanks! I tried first option, by adding pcmk_delay_base to the two stonith primitives. First has 1 second, second has 5 seconds. It didn't work :( they still killed each other :( Anything wrong with the way I did it? Here's the config: node 1: xstha1 \ attributes standby=off maintenance=off node 2: xstha2 \ attributes standby=off maintenance=off primitive xstha1-stonith stonith:external/ipmi \ params hostname=xstha1 ipaddr=192.168.221.18 userid=ADMIN passwd="***" interface=lanplus pcmk_delay_base=1 \ op monitor interval=25 timeout=25 start-delay=25 \ meta target-role=Started primitive xstha1_san0_IP IPaddr \ params ip=10.10.10.1 cidr_netmask=255.255.255.0 nic=san0 primitive xstha2-stonith stonith:external/ipmi \ params hostname=xstha2 ipaddr=192.168.221.19 userid=ADMIN passwd="***" interface=lanplus pcmk_delay_base=5 \ op monitor interval=25 timeout=25 start-delay=25 \ meta target-role=Started primitive xstha2_san0_IP IPaddr \ params ip=10.10.10.2 cidr_netmask=255.255.255.0 nic=san0 primitive zpool_data ZFS \ params pool=test \ op start timeout=90 interval=0 \ op stop timeout=90 interval=0 \ meta target-role=Started location xstha1-stonith-pref xstha1-stonith -inf: xstha1 location xstha1_san0_IP_pref xstha1_san0_IP 100: xstha1 location xstha2-stonith-pref xstha2-stonith -inf: xstha2 location xstha2_san0_IP_pref xstha2_san0_IP 100: xstha2 order zpool_data_order inf: zpool_data ( xstha1_san0_IP ) location zpool_data_pref zpool_data 100: xstha1 colocation zpool_data_with_IPs inf: zpool_data xstha1_san0_IP property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.15-e174ec8 \ cluster-infrastructure=corosync \ stonith-action=poweroff \ no-quorum-policy=stop Sonicle S.r.l. : http://www.sonicle.com Music: http://www.gabrielebulfon.com eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets
---------------------------------------------------------------------------------- Da: Andrei Borzenkov <[email protected]> A: [email protected] Data: 13 dicembre 2020 7.50.57 CET Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure 12.12.2020 20:30, Gabriele Bulfon пишет: > Thanks, I will experiment this. > > Now, I have a last issue about stonith. > I tried to reproduce a stonith situation, by disabling the network interface > used for HA on node 1. > Stonith is configured with ipmi poweroff. > What happens, is that once the interface is down, both nodes tries to stonith > the other node, causing both to poweroff... Yes, this is expected. The options are basically 1. Have separate stonith resource for each node and configure static (pcmk_delay_base) or random dynamic (pcmk_delay_max) delays to avoid both nodes starting stonith at the same time. This does not take resources in account. 2. Use fencing topology and create pseudo-stonith agent that does not attempt to do anything but just delays for some time before continuing with actual fencing agent. Delay can be based on anything including resources running on node. 3. If you are using pacemaker 2.0.3+, you could use new priority-fencing-delay feature that implements resource-based priority fencing: + controller/fencing/scheduler: add new feature 'priority-fencing-delay' Optionally derive the priority of a node from the resource-priorities of the resources it is running. In a fencing-race the node with the highest priority has a certain advantage over the others as fencing requests for that node are executed with an additional delay. controlled via cluster option priority-fencing-delay (default = 0) See also https://www.mail-archive.com/[email protected]/msg10328.html > I would like the node running all resources (zpool and nfs ip) to be the > first trying to stonith the other node. > Or is there anything else better? > > Here is the current crm config show: > It is unreadable > node 1: xstha1 \ attributes standby=off maintenance=offnode 2: xstha2 \ > attributes standby=off maintenance=offprimitive xstha1-stonith > stonith:external/ipmi \ params hostname=xstha1 ipaddr=192.168.221.18 > userid=ADMIN passwd="******" interface=lanplus \ op monitor interval=25 > timeout=25 start-delay=25 \ meta target-role=Startedprimitive xstha1_san0_IP > IPaddr \ params ip=10.10.10.1 cidr_netmask=255.255.255.0 nic=san0primitive > xstha2-stonith stonith:external/ipmi \ params hostname=xstha2 > ipaddr=192.168.221.19 userid=ADMIN passwd="******" interface=lanplus \ op > monitor interval=25 timeout=25 start-delay=25 \ meta > target-role=Startedprimitive xstha2_san0_IP IPaddr \ params ip=10.10.10.2 > cidr_netmask=255.255.255.0 nic=san0primitive zpool_data ZFS \ params > pool=test \ op start timeout=90 interval=0 \ op stop timeout=90 interval=0 \ > meta target-role=Startedlocation xstha1-stonith-pref xstha1-stonith -inf: > xstha1location xstha1_san0_IP_pref xstha1_san0_IP 100: xstha1location > xstha2-stonith-pref xstha2-stonith -inf: xstha2location xstha2_san0_IP_pref > xstha2_san0_IP 100: xstha2order zpool_data_order inf: zpool_data ( > xstha1_san0_IP )location zpool_data_pref zpool_data 100: xstha1colocation > zpool_data_with_IPs inf: zpool_data xstha1_san0_IPproperty > cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.15-e174ec8 \ > cluster-infrastructure=corosync \ stonith-action=poweroff \ > no-quorum-policy=stop > > Thanks! > Gabriele > > > Sonicle S.r.l. : http://www.sonicle.com > Music: http://www.gabrielebulfon.com > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets > > > > > > ---------------------------------------------------------------------------------- > > Da: Andrei Borzenkov <[email protected]> > A: [email protected] > Data: 11 dicembre 2020 18.30.29 CET > Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure > > > 11.12.2020 18:37, Gabriele Bulfon пишет: >> I found I can do this temporarily: >> >> crm config property cib-bootstrap-options: no-quorum-policy=ignore >> > > All two node clusters I remember run with setting forever :) > >> then once node 2 is up again: >> >> crm config property cib-bootstrap-options: no-quorum-policy=stop >> >> so that I make sure nodes will not mount in another strange situation. >> >> Is there any better way? > > "better" us subjective, but ... > >> (such as ignore until everything is back to normal then conisder top again) >> > > That is what stonith does. Because quorum is pretty much useless in two > node cluster, as I already said all clusters I have seem used > no-quorum-policy=ignore and stonith-enabled=true. It means when node > boots and other node is not available stonith is attempted; if stonith > succeeds pacemaker continues with starting resources; if stonith fails, > node is stuck. > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
