On Sep 8, 2008, at 10:11 AM, Junko IKEDA wrote:
Hi,
This is a requet about showing monitor NG with crm_mon.
One dummy resource is running like this;
# crm_mon -fot -i1 -r
Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): online
dummy (ocf::heartbeat:Dummy): Started node-a
Operations:
* Node node-b:
* Node node-a:
dummy:
+ start: rc=0 (ok)
+ monitor: interval=10000ms rc=0 (ok)
remove its status file, so dummy will do failover.
# rm -f /var/run/heartbeat/rsctmp/Dummy-dummy.state
Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): online
dummy (ocf::heartbeat:Dummy): Started node-b
Operations:
* Node node-b:
dummy:
+ start: rc=0 (ok)
+ monitor: interval=10000ms rc=0 (ok)
* Node node-a:
dummy: fail-count=1
+ start: rc=0 (ok)
+ monitor: interval=10000ms rc=7 (not running)
+ stop: rc=0 (ok)
Failed actions:
dummy_monitor_10000 (node=node-a, call=4, rc=7): complete
After that, the node which the resource is running now is stopped
manually.
# service heartbeat stop
Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): OFFLINE
Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): online
Operations:
* Node node-b:
dummy:
+ start: rc=0 (ok)
+ monitor: interval=10000ms rc=0 (ok)
+ stop: rc=0 (ok)
* Node node-a:
dummy: fail-count=1
+ start: rc=0 (ok)
+ stop: rc=0 (ok)
At this time, node-a's monitor NG disappears from crm_mon.
because it is no longer in the current start/stop series for the
resource.
It might be an expected behavior for now,
it is.
it would be convenient if crm_mon can keep showing some past failures.
it cant display them forever. they are not (and should not) be kept
in the CIB forever as it would cause the CIB size to explode.
_______________________________________________
Pacemaker mailing list
[email protected]
http://list.clusterlabs.org/mailman/listinfo/pacemaker