Re: [Pacemaker] Creating a safe cluster-node shutdown script (for when UPS goes OnBattery+LowBattery)

Giuseppe Ragusa Wed, 09 Jul 2014 05:35:25 -0700

On Tue, Jul 8, 2014, at 02:59, Andrew Beekhof wrote:
> 
> On 4 Jul 2014, at 3:16 pm, Giuseppe Ragusa <[email protected]> 
> wrote:
> 
> > Hi all,
> > I'm trying to create a script as per subject (on CentOS 6.5, 
> > CMAN+Pacemaker, only DRBD+KVM active/passive resources; SNMP-UPS monitored 
> > by NUT).
> > 
> > Ideally I think that each node should stop (disable) all locally-running 
> > VirtualDomain resources (doing so cleanly demotes than downs the DRBD 
> > resources underneath), then put itself in standby and finally shutdown.
> 
> Since the end goal is shutdown, why not just run 'pcs cluster stop' ?


I thought that this action would cause communication interruption (since 
Corosync would be not responding to the peer) and so cause the other node to 
stonith us; I know that ideally the other node too should perform "pcs cluster 
stop" in short, since the same UPS powers both, but I worry about timing issues 
(and "races") in UPS monitoring since it is a large Enterprise UPS monitored by 
SNMP.

Furthermore I do not know what happens to running resources at "pcs cluster 
stop": I infer from your suggestion that resources are brought down and not 
migrated on the other node, correct?

> Possibly with 'pcs cluster standby' first if you're worried that stopping the 
> resources might take too long.

I thought that "pcs cluster standby" would usually migrate the resources to the 
other node (I actually tried it and confirmed the expected behaviour); so this 
would risk to become a race with the timing of the other node standby, so this 
is why I took the hassle of explicitly and orderly stopping all locally-running 
resources in my script BEFORE putting the local node in standby.

> Pacemaker will stop everything in the required order and stop the node when 
> done... problem solved?

I thought that after a "pcs cluster standby" a regular "shutdown -h" of the 
operating system would cleanly bring down the cluster too, without the need for 
a "pcs cluster stop", given that both Pacemaker and CMAN are correctly 
configured for automatic startup/shutdown as operating system services (SysV 
initscripts controlled by CentOS 6.5 Upstart, in my case).

Many thanks again for your always thought-provoking and informative answers!

Regards,
Giuseppe

> > 
> > On further startup, manual intervention would be required to unstandby all 
> > nodes and enable resources (nodes already in standby and resources already 
> > disabled before blackout should be manually distinguished).
> > 
> > Is this strategy conceptually safe?
> > 
> > Unfortunately, various searches have turned out no "prior art" :)
> > 
> > This is my tentative script (consider it in the public domain):
> > 
> > ------------------------------------------------------------------------------------------------------------------------------------
> > #!/bin/bash
> > 
> > # Note: "pcs cluster status" still has a small bug vs. CMAN-controlled 
> > Corosync and would always return != 0
> > pcs status > /dev/null 2>&1
> > STATUS=$?
> > 
> > # Detect if cluster is running at all on local node
> > # TODO: detect node already in standby and bypass this
> > if [ "${STATUS}" = 0 ]; then
> >     local_node="$(cman_tool status | grep -i 'Node[[:space:]]*name:' | sed 
> > -e 's/^.*Node\s*name:\s*\([^[:space:]]*\).*$/\1/i')"
> >     for local_resource in $(pcs status 2>/dev/null | grep 
> > "ocf::heartbeat:VirtualDomain.*${local_node}\\s*\$" | awk '{print $1}'); do
> >         pcs resource disable "${local_resource}"
> >     done
> >     # TODO: each resource disabling above may return without waiting for 
> > complete stop - wait here for "no more resources active"? (but avoid 
> > endless loops)
> >     pcs cluster standby "${local_node}"
> > fi
> > 
> > # Shut down gracefully anyway at the end
> > /sbin/shutdown -h +0
> > 
> > ------------------------------------------------------------------------------------------------------------------------------------
> > 
> > Comments/suggestions/improvements are more than welcome.
> > 
> > Many thanks in advance.
> > 
> > Regards,
> > Giuseppe
> > 
> > _______________________________________________
> > Pacemaker mailing list: [email protected]
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> > 
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> _______________________________________________
> Pacemaker mailing list: [email protected]
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)
-- 
  Giuseppe Ragusa
  [email protected]


_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Creating a safe cluster-node shutdown script (for when UPS goes OnBattery+LowBattery)

Reply via email to