Hello, I currently have Pacemaker v2.0.3-3ubuntu4.2 running on two Ubuntu 20.04 LTS systems. My config consists of two service groups, both of which have an LSB resource and a floating IP resource. The LSB resource is configured with a monitor operation, so that "/etc/init.d/<lsb-resource-name> status" is ran in 30 second intervals. the "status portion of the script only returns a healthy exit code when it determines that the PID behind a PIDfile is active. Additionally, I have also set an 'rsc_location' constraint so that the service group for VIP A prefers node A, and VIP B prefers node B, so that ideally with both nodes active and healthy, VIP A will always be running on node A, and B on node B.
The problem that I'm having is that if I intentionally shutdown the service that my "/etc/init.d/<lsb-resource-name> status" script is checking against, I get the following behavior: - I shutdown backing service on node B. - Pacemaker performs a status check which returns a bad result. - Pacemaker then correctly migrates the VIP and the LSB resource for the now 'offline' service group from node B to node A - Pacemaker 'failure-timeout' interval expires. - Pacemaker shuts down the VIP B service group on node A. - Pacemaker attempts to start the VIP B service group on node B, which fails. - Pacemaker starts the VIP B service group on node A. - Pacemaker 'failure-timeout' interval expires. - Pacemaker shuts down the VIP B service group on node A. - Pacemaker attempts to start the VIP B service group on node B, which fails. - Pacemaker starts the VIP B service group on node A. - .... and so on What I would LIKE to happen is for pacemaker to attempt to run a "status" on node B, PRIOR to stopping the service group on node A and attempting to start the service group on node B. Something like this behavior. - Pacemaker 'failure-timeout' interval expires. - Pacemaker checks the status of the LSB service (/etc/init.d/<lsb resource name> status) which returns a bad error code. - Pacemaker 'failure-timeout' interval expires. - Pacemaker checks the status of the LSB service (/etc/init.d/<lsb resource name> status) which returns a bad error code. At which point an administrator or an automated script could intervene and bring the backing service online, at which point we would have this behavior: - Pacemaker 'failure-timeout' interval expires. - Pacemaker checks the status of the LSB service (/etc/init.d/<lsb resource name> status) which returns a HEALTHY error code. - Pacemaker shuts down the VIP B service group on node A. - Pacemaker starts the VIP B service group on node B. I have attached an obfuscated pastebin of my current Pacemaker configuration, as well as a copy of the logs for the pacemaker service, when the initial failure occurs, and also capturing the repetitive failed attempts to start the LSB resource. Obfuscated "crm configure show" https://pastebin.com/emAw8juQ Obfuscated "journalctl -fu pacemaker" https://pastebin.com/kcnfCrjf Please let me know if there is a configuration parameter I can place in my config that would tell Pacemaker to perform a status check on the LSB resource PRIOR to attempting to start the service group on it's preferred node. -- Michael Romero Lead Infrastructure Engineer Engineering | Convoso 562-338-9868 [email protected] www.convoso.com [image: linkedin] <https://linkedin.com/in/romerom>
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
