Hi
Config pacemaker on centos 6.5
pacemaker-cli-1.1.10-14.el6_5.3.x86_64
pacemaker-1.1.10-14.el6_5.3.x86_64
pacemaker-libs-1.1.10-14.el6_5.3.x86_64
pacemaker-cluster-libs-1.1.10-14.el6_5.3.x86_64
this is my config
Cluster Name: ybrp
Corosync Nodes:
Pacemaker Nodes:
devrp1 devrp2
Resources:
Resource: ybrpip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=10.172.214.50 cidr_netmask=24 nic=eth0
clusterip_hash=sourceip-sourceport
Meta Attrs: stickiness=0,migration-threshold=3,failure-timeout=600s
Operations: monitor on-fail=restart interval=5s timeout=20s
(ybrpip-monitor-interval-5s)
Clone: ybrpstat-clone
Meta Attrs: globally-unique=false clone-max=2 clone-node-max=1
Resource: ybrpstat (class=ocf provider=yb type=proxy)
Operations: monitor on-fail=restart interval=5s timeout=20s
(ybrpstat-monitor-interval-5s)
Stonith Devices:
Fencing Levels:
Location Constraints:
Ordering Constraints:
start ybrpstat-clone then start ybrpip (Mandatory)
(id:order-ybrpstat-clone-ybrpip-mandatory)
Colocation Constraints:
ybrpip with ybrpstat-clone (INFINITY)
(id:colocation-ybrpip-ybrpstat-clone-INFINITY)
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.10-14.el6_5.3-368c726
last-lrm-refresh: 1404892739
no-quorum-policy: ignore
stonith-enabled: false
I have my own resource file and I start stop the proxy service outside of
pacemaker!
I had an interesting problem, where I did a vmware update on the linux box,
which interrupted network activity.
Part of my monitor function on my script is to 1) test if the proxy process is
running, 2) get a status page from the proxy and confirm it is 200
This is what I got in /var/log/messages
Jul 9 06:16:13 devrp1 crmd[6849]: warning: update_failcount: Updating
failcount for ybrpstat on devrp2 after failed monitor: rc=7 (update
=value++, time=1404850573)
Jul 9 06:16:13 devrp1 crmd[6849]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_
INTERNAL origin=abort_transition_graph ]
Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing
failed op monitor for ybrpstat:0 on devrp2: not running (7)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing
failed op start for ybrpstat:1 on devrp1: unknown error (1)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness:
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma
x=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness:
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (ma
x=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart
ybrpip#011(Started devrp2)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover
ybrpstat:0#011(Started devrp2)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: process_pe_message: Calculated
Transition 1054: /var/lib/pacemaker/pengine/pe-input-235.bz2
Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing
failed op monitor for ybrpstat:0 on devrp2: not running (7)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing
failed op start for ybrpstat:1 on devrp1: unknown error (1)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness:
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness:
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart
ybrpip#011(Started devrp2)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover
ybrpstat:0#011(Started devrp2)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: process_pe_message: Calculated
Transition 1055: /var/lib/pacemaker/pengine/pe-input-236.bz2
Jul 9 06:16:13 devrp1 pengine[6848]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing
failed op monitor for ybrpstat:0 on devrp2: not running (7)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: unpack_rsc_op: Processing
failed op start for ybrpstat:1 on devrp1: unknown error (1)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness:
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: warning: common_apply_stickiness:
Forcing ybrpstat-clone away from devrp1 after 1000000 failures (max=1000000)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Restart
ybrpip#011(Started devrp2)
Jul 9 06:16:13 devrp1 pengine[6848]: notice: LogActions: Recover
ybrpstat:0#011(Started devrp2)
And it stay this way for the next 12 hours, until I got on.
I poked around and to fix it I ran this
/usr/sbin/pcs resource cleanup ybrpip
/usr/sbin/pcs resource cleanup ybrpstat
Bascially I cleaned up the errors and off it went all by itself.
So my question is how do I configure it or what do I need to change in the
resource script file to send a temp error back to pacemaker so that it should
have kept trying to check the status of proxy ?
It seems to me it tried once and then failed... although the log says filed
after 1000000 failures .... how can I change that to infinite and where is the
interval setting for this, cause in the config above it looks to me like it
should be infinite ?
Thanks
Alex
_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org