>So a monitor failure on the fence agent rendered the cluster effectively
unresponsive? How would I normally recover from this?
Actually it will ban the resource (stonith) from the node when it reaches the 
maximum fail count. When the stonith is banned from all nodes, no node will be 
able to use that stonith.

You can use 'failure-timeout' meta attribute to reset the fail count. I'm using 
it for the ipmi fencing mechanisms.

Of course the best approach is to make that stonith more reliable but usually 
this is out of our control.
Another approach is to define a second stonith method and use stonith topology.
Best Regards,Strahil Nikolov


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to