On Jun 16, 2008, at 11:36 AM, Dejan Muhamedagic wrote:
Hi Junko-san,
On Mon, Jun 16, 2008 at 01:44:28PM +0900, Junko IKEDA wrote:
Hi,
A stonith resource is started only in the current stonithd
instance. If the stonithd process is gone, along with it gone is
the status of all its stonith resources. A started stonith
resource should more properly be termed enabled and this is only
valid in the current stonithd process.
In other words, there's no use trying a monitor operation with a
new stonithd instance: it is "empty" and will always return "not
running". The only way to proceed, once crmd realises that
stonithd process has died, is to consider all stonith resources
which were "started" on that node as stopped and to start them
again. Probably also not to update the fail_count since the
resources themselves didn't fail, just the stonithd process.
You mean, this is stonithd's correct behavior for the current
specifications.
stonithd has no configuration itself. There's simply no other way
stonithd can behave.
Is it possible for crmd to have stonith resources restart when
stonithd
died/up as its design?
I certainly hope so.
or should we contrive ways to do this with migration-threshold and
expire
fail-count?
Basically, yes.
In this particular case, it probably makes sense not to set a
migration-threshold for the stonith resource.
I'd say that it should be done by crmd. Don't know how complex it
may be though.
By design, the CRM does not (and will not) try to understand the
resources it manages.
That the CRM also has a connection to stonithd (and knows when it
dies) does not mean that stonith resources will be treated any
differently.
We restart them (if possible given the configuration) when they fail
just like any other resource. Thats it.
The CRM's design wont be changing to optimize its behavior for an
artificial test scenario ;-)
_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker