Andrei Borzenkov wrote on 05/04/2023 08:36: > On Fri, Mar 31, 2023 at 12:42 AM Ronny Adsetts > <[email protected]> wrote: >> >> Hi, >> >> I wonder if someone more familiar with the workings of pacemaker/corosync >> would be able to assist in solving an issue. >> >> I have a 3-node NFS cluster which exports several iSCSI LUNs. The LUNs are >> presented to the nodes via multipathd. >> >> This all works fine except that I can't stop just one export. Sometimes I >> need to take a single filesystem offline for maintenance for example. Or if >> there's an issue and a filesystem goes offline and can't come back. >> >> There's a trimmed down config below but essentially I want all the NFS >> exports on one node but I don't want any of the exports to block. So it's OK >> to stop (or fail) a single export. >> >> My config has a group for each export and filesystem and another group for >> the NFS server and VIP. I then co-locate them together. >> >> Cut-down config to limit the number of exports: >> >> node 1: nfs-01 >> node 2: nfs-02 >> node 3: nfs-03 >> primitive NFSExportAdminHomes exportfs \ >> params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" >> directory="/srv/adminhomes" fsid=dcfd1bbb-c026-4d6d-8541-7fc29d6fef1a \ >> op monitor timeout=20 interval=10 \ >> op_params interval=10 >> primitive NFSExportArchive exportfs \ >> params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" >> directory="/srv/archive" fsid=3abb6e34-bff2-4896-b8ff-fc1123517359 \ >> op monitor timeout=20 interval=10 \ >> op_params interval=10 \ >> meta target-role=Started >> primitive NFSExportDBBackups exportfs \ >> params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" >> directory="/srv/dbbackups" fsid=df58b9c0-593b-45c0-9923-155b3d7d9483 \ >> op monitor timeout=20 interval=10 \ >> op_params interval=10 >> primitive NFSFSAdminHomes Filesystem \ >> params device="/dev/mapper/adminhomes-part1" >> directory="/srv/adminhomes" fstype=xfs \ >> op start interval=0 timeout=120 \ >> op monitor interval=60 timeout=60 \ >> op_params OCF_CHECK_LEVEL=20 \ >> op stop interval=0 timeout=240 >> primitive NFSFSArchive Filesystem \ >> params device="/dev/mapper/archive-part1" directory="/srv/archive" >> fstype=xfs \ >> op start interval=0 timeout=120 \ >> op monitor interval=60 timeout=60 \ >> op_params OCF_CHECK_LEVEL=20 \ >> op stop interval=0 timeout=240 \ >> meta target-role=Started >> primitive NFSFSDBBackups Filesystem \ >> params device="/dev/mapper/dbbackups-part1" >> directory="/srv/dbbackups" fstype=xfs \ >> op start timeout=60 interval=0 \ >> op monitor interval=20 timeout=40 \ >> op stop timeout=60 interval=0 \ >> op_params OCF_CHECK_LEVEL=20 >> primitive NFSIP-01 IPaddr2 \ >> params ip=172.16.40.17 cidr_netmask=24 nic=ens14 \ >> op monitor interval=30s >> group AdminHomes NFSFSAdminHomes NFSExportAdminHomes \ >> meta target-role=Started >> group Archive NFSFSArchive NFSExportArchive \ >> meta target-role=Started >> group DBBackups NFSFSDBBackups NFSExportDBBackups \ >> meta target-role=Started >> group NFSServerIP NFSIP-01 NFSServer \ >> meta target-role=Started >> colocation NFSMaster inf: NFSServerIP AdminHomes Archive DBBackups > > This is entirely equivalent to defining a group and says that > resources must be started in strict order on the same node. Like with > a group, if an earlier resource cannot be started, all following > resources are not started either. > >> property cib-bootstrap-options: \ >> have-watchdog=false \ >> dc-version=2.0.1-9e909a5bdd \ >> cluster-infrastructure=corosync \ >> cluster-name=nfs-cluster \ >> stonith-enabled=false \ >> last-lrm-refresh=1675344768 >> rsc_defaults rsc-options: \ >> resource-stickiness=200 >> >> >> The problem is that if one export fails, none of the following exports will >> be attempted. Reading the docs, that's to be expected as each item in the >> colocation needs the preceding item to succeed. >> >> I tried changing the colocation line like so to remove the dependency: >> >> colocation NFSMaster inf: NFSServerIP ( AdminHomes Archive DBBackups ) >> > > 1. The ( AdminHomes Archive DBBackups ) creates a set with > sequential=false. Now, the documentation for "sequential" is one of > the most obscure I have seen, but judging by "the individual members > within any one set may or may not be colocated relative to each other > (determined by the set’s sequential property)" and "A colocated set > with sequential="false" makes sense only if there is another set in > the constraint. Otherwise, the constraint has no effect" members of a > set with sequential=false are not colocated on the same node. > > 2. The condition is backward. You colocate NFSServerIP *with* set ( > AdminHomes Archive DBBackups ), while you actually want to colocate > set ( AdminHomes Archive DBBackups ) *with* NFSServerIP. > > So the > > colocation NFSMaster inf: ( AdminHomes Archive DBBackups ) ( NFSServerIP ) > > may work. > > The pacemaker behavior is rather puzzling though. According to > documentation "in order for any member of one set in the constraint to > be active, all members of sets listed after it must also be active > (and naturally on the same node)", but in your case members of set are > on the same node which would imply that NFSServerIP (which is a sole > member of an implicit set) should not be active.
Thanks for the explainer here, that's really useful. I don't spend lots of time tinkering with pacemaker as it's only a tiny part of what I do so I do suffer from lack of in-depth knowledge which can be both painful and annoying. :-). This particular issue only came to the fore and therefore became more urgent to solve when one of the LUNs failed to mount. > Anyway, an alternative is to define separate colocation for each group > which likely makes configuration more clear Yes, this seems the sensible way forward. I'll reconfigure and give it a go. I've no idea why I did it the way I did - it was a couple of years ago now. Probably some aversion to having NFSServerIP listed in multiple colocation lines. Ronny -- Ronny Adsetts Technical Director Amazing Internet Ltd, London t: +44 20 8977 8943 w: www.amazinginternet.com Registered office: 85 Waldegrave Park, Twickenham, TW1 4TJ Registered in England. Company No. 4042957 _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
