Aaron Bush schrieb:
I am mostly concerned that I ended up with a node that had no associated
stonith resource available to shoot it if it was truly down since the
resource did not restart like I thought it should once the network cable
was reconnected.
Hi Aaron,
without knowing the details: Is the stonith plugin implemented to time
out and return FALSE?
In this case failure count should be raised for that stonith plugin
resource and you get a
change for the resource score.
A list member once contributed a script showscore.sh which shows the
current score of a
resource in the cluster. You should watch your stonith resource in that
failure case.
Probably the score gets so bad that the resource can't be started
anywhere. But just a guess.
The best what you can do IMHO is ignore the failures for score
calculation, but react on them
externally (e.g. nagios monitoring). Failure count would raise with each
try but score should be
kept constant.
But probably Dejan can bring additional light to this. :-)
Best regards
Andreas Mock
_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker