Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

Gabriele Bulfon Sat, 12 Dec 2020 13:30:53 -0800

Thanks, I will experiment this.
 
Now, I have a last issue about stonith.
I tried to reproduce a stonith situation, by disabling the network interface 
used for HA on node 1.
Stonith is configured with ipmi poweroff.
What happens, is that once the interface is down, both nodes tries to stonith 
the other node, causing both to poweroff...
I would like the node running all resources (zpool and nfs ip) to be the first 
trying to stonith the other node.
Or is there anything else better?
 
Here is the current crm config show:
 
node 1: xstha1 \        attributes standby=off maintenance=offnode 2: xstha2 \  
      attributes standby=off maintenance=offprimitive xstha1-stonith 
stonith:external/ipmi \        params hostname=xstha1 ipaddr=192.168.221.18 
userid=ADMIN passwd="******" interface=lanplus \        op monitor interval=25 
timeout=25 start-delay=25 \        meta target-role=Startedprimitive 
xstha1_san0_IP IPaddr \        params ip=10.10.10.1 cidr_netmask=255.255.255.0 
nic=san0primitive xstha2-stonith stonith:external/ipmi \        params 
hostname=xstha2 ipaddr=192.168.221.19 userid=ADMIN passwd="******" 
interface=lanplus \        op monitor interval=25 timeout=25 start-delay=25 \   
     meta target-role=Startedprimitive xstha2_san0_IP IPaddr \        params 
ip=10.10.10.2 cidr_netmask=255.255.255.0 nic=san0primitive zpool_data ZFS \     
   params pool=test \        op start timeout=90 interval=0 \        op stop 
timeout=90 interval=0 \        meta target-role=Startedlocation 
xstha1-stonith-pref xstha1-stonith -inf: xstha1location xstha1_san0_IP_pref 
xstha1_san0_IP 100: xstha1location xstha2-stonith-pref xstha2-stonith -inf: 
xstha2location xstha2_san0_IP_pref xstha2_san0_IP 100: xstha2order 
zpool_data_order inf: zpool_data ( xstha1_san0_IP )location zpool_data_pref 
zpool_data 100: xstha1colocation zpool_data_with_IPs inf: zpool_data 
xstha1_san0_IPproperty cib-bootstrap-options: \        have-watchdog=false \    
    dc-version=1.1.15-e174ec8 \        cluster-infrastructure=corosync \        
stonith-action=poweroff \        no-quorum-policy=stop
 
Thanks!
Gabriele
 
 
Sonicle S.r.l. : http://www.sonicle.com
Music: http://www.gabrielebulfon.com
eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets





----------------------------------------------------------------------------------

Da: Andrei Borzenkov <[email protected]>
A: [email protected] 
Data: 11 dicembre 2020 18.30.29 CET
Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure


11.12.2020 18:37, Gabriele Bulfon пишет:
> I found I can do this temporarily:
>  
> crm config property cib-bootstrap-options: no-quorum-policy=ignore
>  

All two node clusters I remember run with setting forever :)

> then once node 2 is up again:
>  
> crm config property cib-bootstrap-options: no-quorum-policy=stop
>  
> so that I make sure nodes will not mount in another strange situation.
>  
> Is there any better way? 

"better" us subjective, but ...

> (such as ignore until everything is back to normal then conisder top again)
>  

That is what stonith does. Because quorum is pretty much useless in two
node cluster, as I already said all clusters I have seem used
no-quorum-policy=ignore and stonith-enabled=true. It means when node
boots and other node is not available stonith is attempted; if stonith
succeeds pacemaker continues with starting resources; if stonith fails,
node is stuck.

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure

Reply via email to