>>> Zoran Bošnjak <[email protected]> schrieb am 07.06.2022 um 10:26 in Nachricht <[email protected]>: > Hi, I need some help with correct fencing configuration in 5‑node cluster. > > The speciffic issue is that there are 3 rooms, where in addition to node > failure scenario, each room can fail too (for example in case of room power
> failure or room network failure). > > room0: [ node0 ] > roomA: [ node1, node2 ] > roomB: [ node3, node4 ] First, it's good that even after a complete room failed, you will still have a quorum. > > ‑ ipmi board is present on each node > ‑ watchdog timer is available > ‑ shared storage is not available The last one sounds adventuous to me, but I'll read on... > > Please advice, what would be a proper fencing configuration in this case. sbd using shared storage ;-) > > The intention is to configure ipmi fencing (using "fence_idrac" agent) plus > watchdog timer as a fallback. In other words, I would like to tell the > pacemaker: "If fencing is required, try to fence via ipmi. In case of ipmi > fence failure, after some timeout assume watchdog has rebooted the node, so > it is safe to proceed, as if the (self)fencing had succeeded)." An interesting question would be how to reach any node in a room if that room failed. A perfect solution would be to have a shared storage in every room and configure 3-way sbd disks. In addition you could use three-way mirroring of your data, just to be paranoid ;-) > > From the documentation is not clear to me whether this would be: > a) multiple fencing where ipmi would be first level and sbd would be a > second level fencing (where sbd always succeeds) > b) or this is considered a single level fencing with a timeout > > I have tried to followed option b) and create stonith resource for each node > and setup the stonith‑watchdog‑timeout, like this: > > ‑‑‑ > # for each node... [0..4] > export name=... > export ip=... > export password=... > sudo pcs stonith create "fence_ipmi_$name" fence_idrac \ > lanplus=1 ip="$ip" \ > username="admin" password="$password" \ > pcmk_host_list="$name" op monitor interval=10m timeout=10s > > sudo pcs property set stonith‑watchdog‑timeout=20 > > # start dummy resource > sudo pcs resource create dummy ocf:heartbeat:Dummy op monitor interval=30s > ‑‑‑ > > I am not sure if additional location constraints have to be specified for > stonith resources. For example: I have noticed that pacemaker will start a > stonith resource on the same node as the fencing target. Is this OK? > > Should there be any location constraints regarding fencing and rooms? > > 'sbd' is running, properties are as follows: > > ‑‑‑ > $ sudo pcs property show > Cluster Properties: > cluster‑infrastructure: corosync > cluster‑name: debian > dc‑version: 2.0.3‑4b1f869f0f > have‑watchdog: true > last‑lrm‑refresh: 1654583431 > stonith‑enabled: true > stonith‑watchdog‑timeout: 20 > ‑‑‑ > > Ipmi fencing (when the ipmi connection is alive) works correctly for each > node. The watchdog timer also seems to be working correctly. The problem is > that dummy resource is not restarted as expected. My favourite here is "crm_mon -1Arfj" ;-) > > In the test scenario, the dummy resource is currently running on node1. I > have simulated node failure by unplugging the ipmi AND host network > interfaces from node1. The result was that node1 gets rebooted (by watchdog), > but the rest of the pacemaker cluster was unable to fence node1 (this is > expected, since node1's ipmi is not accessible). The problem is that dummy > resource remains stopped and node1 unclean. I was expecting that "unclean" means fencing is either in progress, or did not succeed (like when you have no fencing at all). > stonith‑watchdog‑timeout kicks in, so that dummy resource gets restarted on > some other node which has quorum. So that actually does the fencing. Logs could be interesting to read, too. > > Obviously there is something wrong with my configuration, since this seems > to be a reasonably simple scenario for the pacemaker. Appreciate your help. See above. Regards, Ulrich _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
