[ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

Lentes, Bernd Tue, 22 Jan 2019 07:53:23 -0800

Hi,

we have a new UPS which has enough charge to provide our 2-node cluster with 
the periphery (SAN, switches ...) for a resonable time.
I'm currently thinking of the shutdown- and restart-procedure of the complete 
cluster when the power is lost and does not come back soon.
Then cluster is provided via UPS, but that does not work infinite. So i have to 
shutdown the complete cluster.
I have the possibility to run scripts on each node which are triggered by the 
UPS.


My shutdown procedure is:
crm -w node standby node1
  resources are migrated to node2
systemctl stop pacemaker
  stops also corosync
  node is not fenced ! (because of standby ?)
systemctl poweroff
  clean shutdown of node1

crm -w node standby node2
  clean stop of resources
systemctl stop pacmeaker
systemctl poweroff

The scripts would be executed form node2, via ssh for node1.
What do you think about it ?

Now the restart, which makes me trouble.
Currently i want to restart the cluster manually, because i'm not completly 
familiar with pacemaker and a bit afraid of getting constellations 
due to automotization i didn't think of before.
I can do that from anywhere because both nodes have ILO-cards.

I start e.g. node1 with power button.

systemctl start corosync
systemctl start pacemaker
  corosync and pacemaker don't start automatically, i read that several times 
as a recommendation.
Now my first problem. Let's assume the other node is broken. But i still want 
to get
resources running. My no-quorum-policy is ignore. That should be fine. But i 
have this setup now and don't get the resources running automatically.

crm_mon says:
========================================================================
Stack: corosync
Current DC: ha-idg-1 (version 
1.1.19+20180928.0d2680780-1.8-1.1.19+20180928.0d2680780) - partition WITHOUT 
quorum
Last updated: Tue Jan 22 15:34:19 2019
Last change: Tue Jan 22 13:39:14 2019 by root via crm_attribute on ha-idg-1

2 nodes configured
13 resources configured

Node ha-idg-1: online
Node ha-idg-2: UNCLEAN (offline)

Inactive resources:

fence_ha-idg-2  (stonith:fence_ilo2):   Stopped
fence_ha-idg-1  (stonith:fence_ilo4):   Stopped
 Clone Set: cl_share [gr_share]
     Stopped: [ ha-idg-1 ha-idg-2 ]
vm_mausdb       (ocf::heartbeat:VirtualDomain): Stopped
vm_sim  (ocf::heartbeat:VirtualDomain): Stopped
vm_geneious     (ocf::heartbeat:VirtualDomain): Stopped
 Clone Set: cl_SNMP [SNMP]
     Stopped: [ ha-idg-1 ha-idg-2 ]

Node Attributes:
* Node ha-idg-1:
    + maintenance                       : off

Migration Summary:
* Node ha-idg-1:

Failed Fencing Actions:
* Off of ha-idg-2 failed: delegate=, client=crmd.9938, origin=ha-idg-1,
    last-failed='Tue Jan 22 15:34:17 2019'

Negative Location Constraints:
 loc_fence_ha-idg-1     prevents fence_ha-idg-1 from running on ha-idg-1
 loc_fence_ha-idg-2     prevents fence_ha-idg-2 from running on ha-idg-2
=====================================================================
Cluster does not have quorum but that shouldn't be a problem. corosync and 
pacemaker are started.
Why do the resources don't start automatically ? All target-roles are set to 
"started".
Is it because the fencing didn't succeed ? The status of ha-idg-2 isn't clear 
for the cluster ?
If yes, what can i do ?

Bernd

-- 

Bernd Lentes 
Systemadministration 
Institut für Entwicklungsgenetik 
Gebäude 35.34 - Raum 208 
HelmholtzZentrum münchen 
[ mailto:[email protected] | 
[email protected] ] 
phone: +49 89 3187 1241 
fax: +49 89 3187 2294 
[ http://www.helmholtz-muenchen.de/idg | http://www.helmholtz-muenchen.de/idg ] 

wer Fehler macht kann etwas lernen 
wer nichts macht kann auch nichts lernen
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDirig'in Petra Steiner-Hoffmann
Stellv.Aufsichtsratsvorsitzender: MinDirig. Dr. Manfred Wolter
Geschaeftsfuehrer: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Heinrich Bassler
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671

_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] shutdown and restart of complete cluster due to power outage with UPS

Reply via email to