Hello Team,

Hope you doing well.

Running into an issue with multi-state resources not running stop function on a 
node but failing over to start the resource on another node part of the cluster 
when corosync process is killed.

Note, in the below, actual resource names/hostnames have been changed from the 
original.

Snippet of pcs status before corosync is killed:

             $ hostname
pace_node_a

snippet of "pcs status"
colocated-resource (ocf::xxx:colocated-resource):  Started pace_node_a
Master/Slave Set: main-multi-state-resource [main-multi]
     Masters: [ pace_node_a ]
     Stopped: [ pace_node_b ]

Now executed action to kill corosync process using kill -9 on "pace_node_a"

Resulting snippet of "pcs status"

colocated-resource (ocf::xxx:colocated-resource):  Started pace_node_b
Master/Slave Set: main-multi-state-resource [main-multi]
     Stopped: [ pace_node_a ]
     Masters: [ pace_node_b ]

As you can see, pcs status indicates that "main-multi-state-resource" stopped 
where corosync was killed on "pace_node_a" and started on "pace_node_b". 
Although, this indication is right, the underlying resource managed by 
"main-multi-state-resource" never stopped on "pace_node_a". Also, there were no 
logs from crmd and other components stating it even attempted to stop on 
"pace_node_a". Interestingly, crmd logs indicated that the colocated resource - 
"colocated-resource" was being stopped and there is evidence that the resource 
managed by "colocated-resource" actually stopped.

Is this a known issue?

Please let us know if any additional information is needed.

Thanks for your help!

-Raghav
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to