On Tue, 2021-06-01 at 13:18 +0200, Ulrich Windl wrote: > Hi! > > I can't answer, but I doubt the usefulness of "no-quorum- > policy=stop": > If nodes loose quorum, they try to stop all resources, but "remain" > in the > cluster (will respond to network queries (if any arrive). > If one of those "stop"s fails, the other part of the cluster never > knows. > So what can be done? Should the "other(left)" part of the cluster > start > resources, assuming the "other(right)" part of the cluster had > stopped > resources successfully?
no-quorum-policy only affects what the non-quorate partition will do. The quorate partition will still fence the non-quorate part if it is able, regardless of no-quorum-policy, and won't recover resources until fencing succeeds. > > Regards, > Ulrich > > > > > Lars Ellenberg <[email protected]> schrieb am > > > > 01.06.2021 um 12:52 > > in > Nachricht > <canr6vz-rbs3bnujsxhqzrnmpje1u+nphqp+ejnjwnhdsczw...@mail.gmail.com>: > > pcmk 2.0.5, corosync 3.1.0, knet, rhel8 > > I know fencing "solves" this just fine. > > > > what I'd like to understand though is: what exactly is corosync or > > pacemaker waiting for here, > > why does it not manage to get to the stage where it would even > > attempt > > to "stop" stuff? > > > > two "rings" aka knet interfaces. > > node isolation test with iptables, > > INPUT/OUTPUT ‑j DROP on one interface, shortly after on the second > > as well. > > node loses quorum (obviously). > > > > pacemaker is expected to no‑quorum‑policy=stop, > > but is "stuck" in Election ‑> Integration, > > while corosync "cycles" bewteen "new membership" (with only > > itself, > > obviously) > > and "token has not been received in ...", "sync members ...", "new > > membership has formed ..." > > > > I would have expected corosync to come back with a "stable > > non‑quorate > > membership" of just itself > > within a very short period of time, and pacemaker winning the > > "election"/"integration" with just itself, > > and then trying to call "stop" on everything it knows about. That's what I'd expect, too. I'm guessing the corosync cycling is what's causing the pacemaker cycling, so I'd focus on corosync first. > > I'm asking for hints what to look for in the logs, or how to drill > > down further as to why that is not the case. > > > > Lars > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot <[email protected]> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
