Aaron Bush schrieb:
I am mostly concerned that I ended up with a node that had no associated
stonith resource available to shoot it if it was truly down since the
resource did not restart like I thought it should once the network cable
was reconnected.
Hi Aaron,

without knowing the details: Is the stonith plugin implemented to time out and return FALSE? In this case failure count should be raised for that stonith plugin resource and you get a
change for the resource score.
A list member once contributed a script showscore.sh which shows the current score of a resource in the cluster. You should watch your stonith resource in that failure case. Probably the score gets so bad that the resource can't be started anywhere. But just a guess. The best what you can do IMHO is ignore the failures for score calculation, but react on them externally (e.g. nagios monitoring). Failure count would raise with each try but score should be
kept constant.

But probably Dejan can bring additional light to this.  :-)

Best regards
Andreas Mock


_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to