On Tue, 2019-01-22 at 16:52 +0100, Lentes, Bernd wrote: > Hi, > > we have a new UPS which has enough charge to provide our 2-node > cluster with the periphery (SAN, switches ...) for a resonable time. > I'm currently thinking of the shutdown- and restart-procedure of the > complete cluster when the power is lost and does not come back soon. > Then cluster is provided via UPS, but that does not work infinite. So > i have to shutdown the complete cluster. > I have the possibility to run scripts on each node which are > triggered by the UPS. > > My shutdown procedure is: > crm -w node standby node1 > resources are migrated to node2 > systemctl stop pacemaker > stops also corosync > node is not fenced ! (because of standby ?)
Clean shutdowns don't get fenced. As long as the exiting node can tell the rest of the cluster that it's leaving, everything can be coordinated gracefully. > systemctl poweroff > clean shutdown of node1 > > crm -w node standby node2 > clean stop of resources > systemctl stop pacmeaker > systemctl poweroff > > The scripts would be executed form node2, via ssh for node1. > What do you think about it ? Good plan, though perhaps there should be some allowance for the case in which only node1 is running when the power dies. > Now the restart, which makes me trouble. > Currently i want to restart the cluster manually, because i'm not > completly familiar with pacemaker and a bit afraid of getting > constellations > due to automotization i didn't think of before. > > I can do that from anywhere because both nodes have ILO-cards. > > I start e.g. node1 with power button. > > systemctl start corosync > systemctl start pacemaker > corosync and pacemaker don't start automatically, i read that > several times as a recommendation. > Now my first problem. Let's assume the other node is broken. But i > still want to get > resources running. My no-quorum-policy is ignore. That should be > fine. But i have this setup now and don't get the resources running > automatically. I'm guessing you have corosync 2's wait_for_all set (probably implicitly by two_node). This is a safeguard for the situation where both nodes are booted up but can't see each other. If you're sure the other node is down, you can disable wait_for_all before starting the node. (I'm not sure if this can be changed while corosync is already running.) > > crm_mon says: > ===================================================================== > === > Stack: corosync > Current DC: ha-idg-1 (version 1.1.19+20180928.0d2680780-1.8- > 1.1.19+20180928.0d2680780) - partition WITHOUT quorum > Last updated: Tue Jan 22 15:34:19 2019 > Last change: Tue Jan 22 13:39:14 2019 by root via crm_attribute on > ha-idg-1 > > 2 nodes configured > 13 resources configured > > Node ha-idg-1: online > Node ha-idg-2: UNCLEAN (offline) > > Inactive resources: > > fence_ha-idg-2 (stonith:fence_ilo2): Stopped > fence_ha-idg-1 (stonith:fence_ilo4): Stopped > Clone Set: cl_share [gr_share] > Stopped: [ ha-idg-1 ha-idg-2 ] > vm_mausdb (ocf::heartbeat:VirtualDomain): Stopped > vm_sim (ocf::heartbeat:VirtualDomain): Stopped > vm_geneious (ocf::heartbeat:VirtualDomain): Stopped > Clone Set: cl_SNMP [SNMP] > Stopped: [ ha-idg-1 ha-idg-2 ] > > Node Attributes: > * Node ha-idg-1: > + maintenance : off > > Migration Summary: > * Node ha-idg-1: > > Failed Fencing Actions: > * Off of ha-idg-2 failed: delegate=, client=crmd.9938, origin=ha-idg- > 1, > last-failed='Tue Jan 22 15:34:17 2019' > > Negative Location Constraints: > loc_fence_ha-idg-1 prevents fence_ha-idg-1 from running on ha- > idg-1 > loc_fence_ha-idg-2 prevents fence_ha-idg-2 from running on ha- > idg-2 > ===================================================================== > Cluster does not have quorum but that shouldn't be a problem. > corosync and pacemaker are started. > Why do the resources don't start automatically ? All target-roles are > set to "started". > Is it because the fencing didn't succeed ? The status of ha-idg-2 > isn't clear for the cluster ? > If yes, what can i do ? > > Bernd > _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
