Ciao,

[TXT version]

I'm among the people that have to deal with with the in-famous two nodes
problem (http://www.beekhof.net/blog/2018/two-node-problems).

I am not sure if to open a bug for this.. so I'm first off reporting on
the list.. in the hope to get fast feedback.

Problem statement

I have a cluster made by two nodes with a DRBD shared partition which
some resources (systemd services) have to stick to.

Software versions

    corosync -v
    Corosync Cluster Engine, version '2.4.5'
    Copyright (c) 2006-2009 Red Hat, Inc.

    pacemakerd --version
    Pacemaker 1.1.21-4.el7

    drbdadm --version
    DRBDADM_BUILDTAG=GIT-hash:\
fb98589a8e76783d2c56155c645dbaf02ac7ece7\ build\ by\ mockbuild@\,\
2020-04-05\ 03:21:05
    DRBDADM_API_VERSION=2
    DRBD_KERNEL_VERSION_CODE=0x090010
    DRBD_KERNEL_VERSION=9.0.16
    DRBDADM_VERSION_CODE=0x090c02
    DRBDADM_VERSION=9.12.2

corosync.conf nodes:

nodelist {
    node {
        ring0_addr: 10.1.3.1
        nodeid: 1
    }
    node {
        ring0_addr: 10.1.3.2
        nodeid: 2
    }
}
quorum {
    provider: corosync_votequorum
    two_node: 1
}

drbd nodes config:

resource myresource {

  volume 0 {
    device    /dev/drbd0;
    disk      /dev/mapper/vg0-res--etc;
    meta-disk internal;
  }

  on 123z555666y0 {
    node-id 0;
    address 10.1.3.1:7789;
  }

  on 123z555666y1 {
    node-id 1;
    address 10.1.3.2:7789;
  }

  connection {
    host 123z555666y0;
    host 123z555666y1;
  }

  handlers {
    before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh";
    after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh";
  }

}

I need to reconfigure the hostname of both the nodes of the cluster.
I've gathered some literature around

    https://pacemaker.oss.clusterlabs.narkive.com/csHZkR5R/change-hostname

https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-name.html
    https://www.suse.com/support/kb/doc/?id=000018878 <- DIDN'T  WORK
    https://bugs.clusterlabs.org/show_bug.cgi?id=5265 <- DIDN'T  WORK

but have not yet found a way to address this (unless with simultaneous
reboot of both nodes).

The procedure:

    Update the hostname on both Master and Slave nodes
        update /etc/hostname
        update /etc/hosts
        update system with hostname -F /etc/hostname
    Reconfigure drbd on Master and Slave nodes
        modify drbd.01.conf (attached) to reflect new hostname
        invoke drbdadm adjust all
    Update pacemaker config on Master node only
        crm configure property maintenance-mode=true
        crm configure delete --force 1
        crm configure delete --force 2
        crm configure xml '<node id="1" uname="newhostname0">
                <instance_attributes id="node-1">
                  <nvpair id="node-1-standby" name="standby" value="off"/>
                </instance_attributes>
              </node>'
        crm configure xml '<node id="2" uname="newhostname1">
                <instance_attributes id="node-2">
                  <nvpair id="node-2-standby" name="standby" value="off"/>
                </instance_attributes>
              </node>'
        crm resource reprobe
        crm configure refresh
        crm configure property maintenance-mode=false

Let's say for example that I migrate the hostnames like this

hostname10 -> hostname20
hostname11 -> hostname21

After the above procedure is concluded the cluster is correctly
reconfigured and when I check with crm_mon or crm status or crm
configure show xml or even by inspecting the cib.xml I find the proper
new hostnames fetched by pacemaker/corosync (hostname20 and hostname21).

The documentation reports that pacemaker node name is taken from

    corosync.conf nodelist->ring0_addr if not an ip address: NOT MY
CASE => skip
    corosync.conf nodelist->name if available: NOT MY CASE => skip
    uname -n [SHOULD BE IN HERE]

Apparently case number 3 does not apply:

[root@hostname20 ~]# crm_node -n
hostname10
[root@hostname20 ~]# uname -n
hostname20

This becomes evident as soon as I reboot/poweroff one of  the two nodes:
crm_mon which after the reconfiguration was correctly showing

Online: [ hostname21 hostname20 ]

"rolls back" the configuration without any notice and starts showing the
old one

Online: [ hostname10 ]
OFFLINE: [ hostname11 ]

Do you have any idea of where on heath pacemaker is recovering the old
hostnames ?

I've even checked  the code and see that there are cmaps involved so I
suspect there's some caching issues involved in this.

It looks like it is retaining the old hostnames in memory and when
something .. "fails" it restores them.

Besides don't blame me for this use case (reconfigure hostnames in a
two-nodes cluster), as I didn't make it up. I just carry the pain.

R
________________________________

Riccardo Manfrin
R&D DEPARTMENT
Web<https://www.athonet.com/> | 
LinkedIn<https://www.linkedin.com/company/athonet/>     t +39 (0)444 750045
e [email protected]<mailto:[email protected]>
[https://www.athonet.com/signature/logo_athonet.png]<https://www.athonet.com/>
ATHONET | Via Cà del Luogo, 6/8 - 36050 Bolzano Vicentino (VI) Italy
This email and any attachments are confidential and intended solely for the use 
of the intended recipient. If you are not the named addressee, please be aware 
that you shall not distribute, copy, use or disclose this email. If you have 
received this email by error, please notify us immediately and delete this 
email from your system. Email transmission cannot be guaranteed to be secured 
or error-free or not to contain viruses. Athonet S.r.l. processes any personal 
data exchanged in email correspondence in accordance with EU Reg. 679/2016 
(GDPR) - you may find here the privacy policy with information on such 
processing and your rights. Any views or opinions presented in this email are 
solely those of the sender and do not necessarily represent those of Athonet 
S.r.l.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to