>>> Ken Gaillot <[email protected]> schrieb am 20.08.2018 um 16:49 in Nachricht <[email protected]>: > On Mon, 2018‑08‑20 at 10:51 +0200, Ulrich Windl wrote: >> Hi! >> >> I wonder whether it's possible to run a monitoring op only if some >> specific resource is up. >> Background: We have some resource that runs fine without NFS, but the >> start, stop and monitor operations will just hang if NFS is down. In >> effect the monitor operation will time out, the cluster will try to >> recover, calling the stop operation, which in turn will time out, >> making things worse (i.e.: causing a node fence). >> >> So my idea was to pause the monitoing operation while NFS is down >> (NFS itself is controlled by the cluster and should recover "rather >> soon" TM). >> >> Is that possible? > > A possible mitigation would be to set on‑fail=block on the dependent > resource monitor, so if NFS is down, the monitor will still time out, > but the cluster will not try to stop it. Of course then you lose the > ability to automatically recover from an actual resource failure. > > The only other thing I can think of probably wouldn't be reliable: you > could put the NFS resource in a group with an ocf:pacemaker:attribute > resource. That way, whenever NFS is started, a node attribute will be > set, and whenever NFS is stopped, the attribute will be unset. Then, > you can set a rule using that attribute. For example you could make the > dependent resource's is‑managed property depend on the node attribute > value. The reason I think it wouldn't be reliable is that if NFS > failed, there would be some time before the cluster stopped the NFS > resource and updated the node attribute, and the dependent resource > monitor could run during that time. But it would at least diminish the > problem space.
Hi! That sounds interesting, even though it's still a work-around and not the solution for the original problem. Could you show a sketch of the mechanism: How to set the attribute with the resource, and how to make the monitor operation depend on it? > > Probably any dynamic solution would have a similar race condition ‑‑ > the NFS will be failed in reality for some amount of time before the > cluster detects the failure, so the cluster could never prevent the > monitor from running during that window. I agree completely. Regards, Ulrich > >> And before you ask: No, I have not written that RA that has the >> problem; a multi‑million‑dollar company wrote it (Years before I had >> written a monitor for HP‑UX' cluster that did not have this problem, >> even though the configuration files were read from NFS (It's not >> magic: Just periodically copy them to shared memory, and read the >> config from shared memory). >> >> Regards, >> Ulrich > ‑‑ > Ken Gaillot <[email protected]> > _______________________________________________ > Users mailing list: [email protected] > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
