[ClusterLabs] controlling cluster behavior on startup

Faaland, Olaf P. via Users Mon, 29 Jan 2024 10:05:23 -0800

Hi,

I have configured clusters of node pairs, so each cluster has 2 nodes.  The 
cluster members are statically defined in corosync.conf before corosync or 
pacemaker is started, and quorum {two_node: 1} is set.


When both nodes are powered off and I power them on, they do not start 
pacemaker at exactly the same time.  The time difference may be a few minutes 
depending on other factors outside the nodes.

My goals are (I call the first node to start pacemaker "node1"):
1) I want to control how long pacemaker on node1 waits before fencing node2 if 
node2 does not start pacemaker.
2) If node1 is part-way through that waiting period, and node2 starts pacemaker 
so they detect each other, I would like them to proceed immediately to probing 
resource state and starting resources which are down, not wait until the end of 
that "grace period".

It looks from the documentation like dc-deadtime is how #1 is controlled, and 
#2 is expected normal behavior.  However, I'm seeing fence actions before 
dc-deadtime has passed.

Am I misunderstanding Pacemaker's expected behavior and/or how dc-deadtime 
should be used?

One possibly unusual aspect of this cluster is that these two nodes are 
stateless - they PXE boot from an image on another server - and I build the 
cluster configuration at boot time with a series of pcs commands, because the 
nodes have no local storage for this purpose.  The commands are:

['pcs', 'cluster', 'start']
['pcs', 'property', 'set', 'stonith-action=off']
['pcs', 'property', 'set', 'cluster-recheck-interval=60']
['pcs', 'property', 'set', 'start-failure-is-fatal=false']
['pcs', 'property', 'set', 'dc-deadtime=300']
['pcs', 'stonith', 'create', 'fence_gopher11', 'fence_powerman', 
'ip=192.168.64.65', 'pcmk_host_check=static-list', 
'pcmk_host_list=gopher11,gopher12']
['pcs', 'stonith', 'create', 'fence_gopher12', 'fence_powerman', 
'ip=192.168.64.65', 'pcmk_host_check=static-list', 
'pcmk_host_list=gopher11,gopher12']
['pcs', 'resource', 'create', 'gopher11_zpool', 'ocf:llnl:zpool', 
'import_options="-f -N -d /dev/disk/by-vdev"', 'pool=gopher11', 'op', 'start', 
'timeout=805']
...
['pcs', 'property', 'set', 'no-quorum-policy=ignore']

I could, instead, generate a CIB so that when Pacemaker is started, it has a 
full config.  Is that better?

thanks,
Olaf

=== corosync.conf:
totem {
    version: 2
    cluster_name: gopher11
    secauth: off
    transport: udpu
}
nodelist {
    node {
        ring0_addr: gopher11
        name: gopher11
        nodeid: 1
    }
    node {
        ring0_addr: gopher12
        name: gopher12
        nodeid: 2
    }
}
quorum {
    provider: corosync_votequorum
    two_node: 1
}

=== Log excerpt

Here's an except from Pacemaker logs that reflect what I'm seeing.  These are 
from gopher12, the node that came up first.  The other node, which is not yet 
up, is gopher11.

Jan 25 17:55:38 gopher12 pacemakerd          [116033] (main)    notice: 
Starting Pacemaker 2.1.7-1.t4 | build=2.1.7 features:agent-manpages ascii-docs 
compat-2.0 corosync-ge-2 default-concurrent-fencing generated-manpages 
monotonic nagios ncurses remote systemd
Jan 25 17:55:39 gopher12 pacemaker-controld  [116040] (peer_update_callback)    
info: Cluster node gopher12 is now member (was in unknown state)
Jan 25 17:55:43 gopher12 pacemaker-based     [116035] (cib_perform_op)  info: 
++ 
/cib/configuration/crm_config/cluster_property_set[@id='cib-bootstrap-options']:
  <nvpair id="cib-bootstrap-options-dc-deadtime" name="dc-deadtime" 
value="300"/>
Jan 25 17:56:00 gopher12 pacemaker-controld  [116040] (crm_timer_popped)        
info: Election Trigger just popped | input=I_DC_TIMEOUT time=300000ms
Jan 25 17:56:01 gopher12 pacemaker-based     [116035] (cib_perform_op)  info: 
++ 
/cib/configuration/crm_config/cluster_property_set[@id='cib-bootstrap-options']:
  <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" 
value="ignore"/>
Jan 25 17:56:01 gopher12 pacemaker-controld  [116040] (abort_transition_graph)  
info: Transition 0 aborted by cib-bootstrap-options-no-quorum-policy doing 
create no-quorum-policy=ignore: Configuration change | cib=0.26.0 
source=te_update_diff_v2:464 
path=/cib/configuration/crm_config/cluster_property_set[@id='cib-bootstrap-options']
 complete=true
Jan 25 17:56:01 gopher12 pacemaker-controld  [116040] 
(controld_execute_fence_action)   notice: Requesting fencing (off) targeting 
node gopher11 | action=11 timeout=60


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] controlling cluster behavior on startup

Reply via email to