On Thu, 2020-10-01 at 10:40 +0200, Riccardo Manfrin wrote: > Ciao, > > I'm among the people that have to deal with with the in-famous two > nodes problem (http://www.beekhof.net/blog/2018/two-node-problems). > I am not sure if to open a bug for this.. so I'm first off reporting > on the list.. in the hope to get fast feedback. > Problem statement > > I have a cluster made by two nodes with a DRBD shared partition which > some resources (systemd services) have to stick to. > Software versions > corosync -v > Corosync Cluster Engine, version '2.4.5' > Copyright (c) 2006-2009 Red Hat, Inc. > pacemakerd --version > Pacemaker 1.1.21-4.el7 > drbdadm --version > DRBDADM_BUILDTAG=GIT-hash:\ fb98589a8e76783d2c56155c645dbaf02ac7ece7\ > build\ by\ mockbuild@\,\ 2020-04-05\ 03:21:05 > DRBDADM_API_VERSION=2 > DRBD_KERNEL_VERSION_CODE=0x090010 > DRBD_KERNEL_VERSION=9.0.16 > DRBDADM_VERSION_CODE=0x090c02 > DRBDADM_VERSION=9.12.2 > corosync.conf nodes: > nodelist { > node { > ring0_addr: 10.1.3.1 > nodeid: 1 > } > node { > ring0_addr: 10.1.3.2 > nodeid: 2 > } > } > quorum { > provider: corosync_votequorum > two_node: 1 > } > drbd nodes config: > resource myresource { > > volume 0 { > device /dev/drbd0; > disk /dev/mapper/vg0-res--etc; > meta-disk internal; > } > > on 123z555666y0 { > node-id 0; > address 10.1.3.1:7789; > } > > on 123z555666y1 { > node-id 1; > address 10.1.3.2:7789; > } > > connection { > host 123z555666y0; > host 123z555666y1; > } > > handlers { > before-resync-target "/usr/lib/drbd/snapshot-resync-target- > lvm.sh"; > after-resync-target "/usr/lib/drbd/unsnapshot-resync-target- > lvm.sh"; > } > > } > I need to reconfigure the hostname of both the nodes of the cluster. > I've gathered some literature around > https://pacemaker.oss.clusterlabs.narkive.com/csHZkR5R/change-hostname > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-name.html > https://www.suse.com/support/kb/doc/?id=000018878 <- DIDN'T WORK > https://bugs.clusterlabs.org/show_bug.cgi?id=5265 <- DIDN'T WORK > but have not yet found a way to address this (unless with > simultaneous reboot of both nodes). > The procedure: > Update the hostname on both Master and Slave nodes > update /etc/hostname > update /etc/hosts > update system with hostname -F /etc/hostname > Reconfigure drbd on Master and Slave nodes > modify drbd.01.conf (attached) to reflect new hostname > invoke drbdadm adjust all > Update pacemaker config on Master node only > crm configure property maintenance-mode=true > crm configure delete --force 1 > crm configure delete --force 2 > crm configure xml ' <node id="1" uname="newhostname0"> > <instance_attributes id="node-1"> > <nvpair id="node-1-standby" name="standby" value="off"/> > </instance_attributes> > </node>' > crm configure xml ' <node id="2" uname="newhostname1"> > <instance_attributes id="node-2"> > <nvpair id="node-2-standby" name="standby" value="off"/> > </instance_attributes> > </node>' > crm resource reprobe > crm configure refresh > crm configure property maintenance-mode=false > Let's say for example that I migrate the hostnames like this > hostname10 -> hostname20 > hostname11 -> hostname21 > After the above procedure is concluded the cluster is correctly > reconfigured and when I check with crm_mon or crm status or crm > configure show xml or even by inspecting the cib.xml I find the > proper new hostnames fetched by pacemaker/corosync (hostname20 and > hostname21). > The documentation reports that pacemaker node name is taken from > corosync.conf nodelist->ring0_addr if not an ip address: NOT MY CASE > => skip > corosync.conf nodelist->name if available: NOT MY CASE => skip > uname -n [SHOULD BE IN HERE] > Apparently case number 3 does not apply: > [root@hostname20 ~]# crm_node -n > hostname10 > [root@hostname20 ~]# uname -n > hostname20 > This becomes evident as soon as I reboot/poweroff one of the two > nodes: crm_mon which after the reconfiguration was correctly showing > Online: [ hostname21 hostname20 ] > "rolls back" the configuration without any notice and starts showing > the old one > Online: [ hostname10 ] > OFFLINE: [ hostname11 ] > Do you have any idea of where on heath pacemaker is recovering the > old hostnames ?
Does "uname -n" also revert? It looks like you're using RHEL 7 or a derivative -- if so, use hostnamectl to change the host name. That will make sure it's updated in the right places. > > I've even checked the code and see that there are cmaps involved so > I suspect there's some caching issues involved in this. > It looks like it is retaining the old hostnames in memory and when > something .. "fails" it restores them. > Besides don't blame me for this use case (reconfigure hostnames in a > two-nodes cluster), as I didn't make it up. I just carry the pain. > R > > > > Riccardo Manfrin > R&D DEPARTMENT > Web | LinkedIn t +39 (0)444 750045 > e [email protected] > > ATHONET | Via Cà del Luogo, 6/8 - 36050 Bolzano Vicentino (VI) Italy > This email and any attachments are confidential and intended solely > for the use of the intended recipient. If you are not the named > addressee, please be aware that you shall not distribute, copy, use > or disclose this email. If you have received this email by error, > please notify us immediately and delete this email from your system. > Email transmission cannot be guaranteed to be secured or error-free > or not to contain viruses. Athonet S.r.l. processes any personal data > exchanged in email correspondence in accordance with EU Reg. 679/2016 > (GDPR) - you may find here the privacy policy with information on > such processing and your rights. Any views or opinions presented in > this email are solely those of the sender and do not necessarily > represent those of Athonet S.r.l. -- Ken Gaillot <[email protected]> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
