On Sun, 2020-08-09 at 21:11 +0200, Adam Cécile wrote: > Hello, > > > I'm experiencing issue with corosync/pacemaker running on Debian > Buster. > Cluster has three nodes running in VMWare virtual machine and the > cluster fails when VEEAM backups the virtual machine (I know it's > doing > bad things, like freezing completely the VM for a few minutes to > make > disk snapshot). > > My biggest issue is that once the backup has been completed, the > cluster > stays in split brain state, and I'd like it to heal itself. Here
Fencing is how the cluster prevents split-brain. When one node is lost, the other nodes will not recover any resources from it until it's fenced. For VMWare there's a fence_vmware_soap fence agent. However that's intended for failure scenarios, not a planned outage like a backup snapshot. For planned outages, you can set the cluster-wide property "maintenance-mode" to true. The cluster won't start, monitor, or stop resources while in maintenance mode. You can use rules to automatically put the cluster in maintenance mode at specific times. However I believe even in maintenance mode, the node will get fenced if it drops out of the corosync membership. Ideally you'd put the cluster in maintenance mode, stop pacemaker and corosync on the node, do the backup, then start pacemaker and corosync, wait for them to come up, and take the cluster out of maintenance mode. Alternatively, if you want the resources to move to other nodes while the backup is being done, you could put the node in standby rather than set maintenance mode. > current > status: > > > One node is isolated: > > Stack: corosync > Current DC: host2.domain.com (version 2.0.1-9e909a5bdd) - partition > WITHOUT quorum > Last updated: Sat Aug 8 11:59:46 2020 > Last change: Fri Jul 24 07:18:12 2020 by root via cibadmin on > host1.domain.com > > 3 nodes configured > 6 resources configured > > Online: [ host2.domain.com ] > OFFLINE: [ host3.domain.com host1.domain.com ] > > > Two others are seeing each others: > > Stack: corosync > Current DC: host3.domain.com (version 2.0.1-9e909a5bdd) - partition > with > quorum > Last updated: Sat Aug 8 12:07:56 2020 > Last change: Fri Jul 24 07:18:12 2020 by root via cibadmin on > host1.domain.com > > 3 nodes configured > 6 resources configured > > Online: [ host3.domain.com host1.domain.com ] > OFFLINE: [ host2.domain.com ] > > > The problem is that one of the resources is a floating IP address > which > is currently assigned to two different hosts... > > > Can you help me configuring the cluster correctly so this cannot > occurs ? > > > Thanks in advance, > > Adam. > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot <[email protected]> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
