I have two pacemaker resources. We call them A and B. Because of environmental
reasons, their start methods and monitor methods always return failure
(OCF_ERR_GENERIC). The following are their configurations:(The cluster property
of start-failure-is-fatal is false)
primitive A A \
op monitor interval=20 timeout=120 \
op stop interval=0 timeout=120 on-fail=restart \
op start interval=0 timeout=240 on-fail=restart \
meta failure-timeout=60s
primitive B B \
op monitor interval=20 timeout=120 \
op stop interval=0 timeout=120 on-fail=restart \
op start interval=0 timeout=240 on-fail=restart \
meta failure-timeout=60s
clone A_cl A
clone B_cl B
The time consuming of their methods is different:
A:
start = 60s monitor < 1s stop = 80s
B:
start < 1s monitor < 1s stop < 1s
Resource of A is scheduled normally, always start and stop. But for resource B,
there is only circular monitor fails, without start and stop.
. And there is no fail-count showing of B in "crm status -f".
Two operations can solve the problem of B not being scheduled:
1,Set failure-timeout of B from 60s to 600s
2,Modify ocf of A,make the stop method return as soon as possible
I tested it several times, and the results were the same. Why does the resource
not be scheduled when failure-timeout setting too short? And what does
it have to do with the time consuming stop of another resource? Is this a bug?
My pacemaker version is 1.1.16. Any suggestion is welcome. Thank you!
James
2018-05-20
_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org