[Pacemaker] Long failover

Dmitry Matveichev Fri, 14 Nov 2014 04:03:13 -0800

Hello,

We have a cluster configured via pacemaker+corosync+crm. The configuration is:


node master
node slave
primitive HA-VIP1 IPaddr2 \
        params ip=192.168.22.71 nic=bond0 \
        op monitor interval=1s
primitive HA-variator lsb: variator \
        op monitor interval=1s \
        meta migration-threshold=1 failure-timeout=1s
group HA-Group HA-VIP1 HA-variator
property cib-bootstrap-options: \
        dc-version=1.1.10-14.el6-368c726 \
        cluster-infrastructure="classic openais (with plugin)" \
        expected-quorum-votes=2 \
        stonith-enabled=false \
       no-quorum-policy=ignore \
        last-lrm-refresh=1383871087
rsc_defaults rsc-options: \
        resource-stickiness=100

Firstly I make the variator service down  on the master node (actually I delete 
the service binary and kill the variator process, so the variator fails to 
restart). Resources very quickly move on the slave node as expected. Then I 
return the binary on the master and restart the variator service. Now I make 
the same stuff with binary and service on slave node. The crm status command 
quickly shows me HA-variator   (lsb: variator):        Stopped. But it take to 
much time (for us) before recourses are switched on the master node (around 1 
min).   Then line
Failed actions:
    HA- variator _monitor_1000 on slave 'unknown error' (1): call=-1, 
status=Timed Out, last-rc-change='Sat Dec 21 03:59:45 2013', queued=0ms, 
exec=0ms
appears in the crm status and recourses are switched.

What is that timeout? Where I can change it?

------------------------
Kind regards,
Dmitriy Matveichev.

_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Long failover

Reply via email to