Hello Team,
Hope you doing well.
Running into an issue with multi-state resources not running stop function on a
node but failing over to start the resource on another node part of the cluster
when corosync process is killed.
Note, in the below, actual resource names/hostnames have been changed from the
original.
Snippet of pcs status before corosync is killed:
$ hostname
pace_node_a
snippet of "pcs status"
colocated-resource (ocf::xxx:colocated-resource): Started pace_node_a
Master/Slave Set: main-multi-state-resource [main-multi]
Masters: [ pace_node_a ]
Stopped: [ pace_node_b ]
Now executed action to kill corosync process using kill -9 on "pace_node_a"
Resulting snippet of "pcs status"
colocated-resource (ocf::xxx:colocated-resource): Started pace_node_b
Master/Slave Set: main-multi-state-resource [main-multi]
Stopped: [ pace_node_a ]
Masters: [ pace_node_b ]
As you can see, pcs status indicates that "main-multi-state-resource" stopped
where corosync was killed on "pace_node_a" and started on "pace_node_b".
Although, this indication is right, the underlying resource managed by
"main-multi-state-resource" never stopped on "pace_node_a". Also, there were no
logs from crmd and other components stating it even attempted to stop on
"pace_node_a". Interestingly, crmd logs indicated that the colocated resource -
"colocated-resource" was being stopped and there is evidence that the resource
managed by "colocated-resource" actually stopped.
Is this a known issue?
Please let us know if any additional information is needed.
Thanks for your help!
-Raghav
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/