>>> Piotr Szafarczyk <[email protected]> schrieb am 12.07.2022 um 12:34 in Nachricht <[email protected]>: > Hi, > > I used to have a working cluster with 3 nodes (and stonith disabled).
THE SLES guide says: Important: No Support Without STONITH You must have a node fencing mechanism for your cluster. The global cluster options stonith-enabled and startup-fencing must be set to true . When you change them, you lose support. Maybe that helps. > After an unexpected restart of one node, the cluster split. The node #2 > started to see the others as unclean. Nodes 1 and 2 were cooperating > with each other, showing #2 as offline. There were no network connection > problems. > > I removed #2 (operating from #1) with > pcs cluster node remove n2 > > I verified that it had removed all configuration from #2, both for > corosync and for pacemaker. The cluster looks like working correctly > with two nodes (and no traces of #2). > > Now I am trying to add the third node back. > pcs cluster node add n2 > Disabling SBD service... > n2: sbd disabled > Sending 'corosync authkey', 'pacemaker authkey' to 'n2' > n2: successful distribution of the file 'corosync authkey' > n2: successful distribution of the file 'pacemaker authkey' > Sending updated corosync.conf to nodes... > n3: Succeeded > n2: Succeeded > n1: Succeeded > n3: Corosync configuration reloaded > > I am able to start #2 operating from #1 > > pcs cluster pcsd-status > n2: Online > n3: Online > n1: Online > > pcs cluster enable n2 > pcs cluster start n2 > > I can see that corosync's configuration has been updated, but > pacemaker's not. > > _Checking from #1:_ > > pcs config > Cluster Name: n > Corosync Nodes: > n1 n3 n2 > Pacemaker Nodes: > n1 n3 > [...] > > pcs status > * 2 nodes configured > Node List: > * Online: [ n1 n3 ] > [...] > > pcs cluster cib scope=nodes > <nodes> > <node id="1" uname="n1"/> > <node id="3" uname="n3"/> > </nodes> > > _#2 is seeing the state differently:_ > > pcs config > Cluster Name: n > Corosync Nodes: > n1 n3 n2 > Pacemaker Nodes: > n1 n2 n3 > > pcs status > * 3 nodes configured > Node List: > * Online: [ n2 ] > * OFFLINE: [ n1 n3 ] > Full List of Resources: > * No resources > [...] > (there are resources configured on #1 and #3) > > pcs cluster cib scope=nodes > <nodes> > <node id="1" uname="n1"/> > <node id="3" uname="n3"/> > <node id="2" uname="n2"/> > </nodes> > > Help me diagnose it please. Where should I look for the problem? (I have > already tried a few things more - I see nothing helpful in log files, > pcs --debug shows nothing suspicious, tried even editing the CIB manually) > > Best regards, > > Piotr Szafarczyk _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
