Ciao,

I'm among the people that have to deal with with the in-famous two nodes problem (http://www.beekhof.net/blog/2018/two-node-problems).

I am not sure if to open a bug for this.. so I'm first off reporting on the list.. in the hope to get fast feedback.

Problem statement

I have a cluster made by two nodes with a DRBD shared partition which some resources (systemd services) have to stick to.

Software versions

  • corosync -v
    Corosync Cluster Engine, version '2.4.5'
    Copyright (c) 2006-2009 Red Hat, Inc.
  • pacemakerd --version
    Pacemaker 1.1.21-4.el7
  • drbdadm --version
    DRBDADM_BUILDTAG=GIT-hash:\ fb98589a8e76783d2c56155c645dbaf02ac7ece7\ build\ by\ mockbuild@\,\ 2020-04-05\ 03:21:05
    DRBDADM_API_VERSION=2
    DRBD_KERNEL_VERSION_CODE=0x090010
    DRBD_KERNEL_VERSION=9.0.16
    DRBDADM_VERSION_CODE=0x090c02
    DRBDADM_VERSION=9.12.2

corosync.conf nodes:

nodelist {
    node {
        ring0_addr: 10.1.3.1
        nodeid: 1
    }
    node {
        ring0_addr: 10.1.3.2
        nodeid: 2
    }
}
quorum {
    provider: corosync_votequorum
    two_node: 1
}

drbd nodes config:

resource myresource {

  volume 0 {
    device    /dev/drbd0;
    disk      /dev/mapper/vg0-res--etc;
    meta-disk internal;
  }

  on 123z555666y0 {
    node-id 0;
    address 10.1.3.1:7789;
  }

  on 123z555666y1 {
    node-id 1;
    address 10.1.3.2:7789;
  }

  connection {
    host 123z555666y0;
    host 123z555666y1;
  }

  handlers {
    before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh";
    after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh";
  }

}

I need to reconfigure the hostname of both the nodes of the cluster. I've gathered some literature around

but have not yet found a way to address this (unless with simultaneous reboot of both nodes).

The procedure:

  • Update the hostname on both Master and Slave nodes
    • update /etc/hostname
    • update /etc/hosts
    • update system with hostname -F /etc/hostname
  • Reconfigure drbd on Master and Slave nodes
    • modify drbd.01.conf (attached) to reflect new hostname
    • invoke drbdadm adjust all
  • Update pacemaker config on Master node only
    • crm configure property maintenance-mode=true
    • crm configure delete --force 1
    • crm configure delete --force 2
    • crm configure xml ' <node id="1" uname="newhostname0">
              <instance_attributes id="node-1">
                <nvpair id="node-1-standby" name="standby" value="off"/>
              </instance_attributes>
            </node>'
    • crm configure xml ' <node id="2" uname="newhostname1">
              <instance_attributes id="node-2">
                <nvpair id="node-2-standby" name="standby" value="off"/>
              </instance_attributes>
            </node>'
    • crm resource reprobe
    • crm configure refresh
    • crm configure property maintenance-mode=false

Let's say for example that I migrate the hostnames like this

hostname10 -> hostname20
hostname11 -> hostname21

After the above procedure is concluded the cluster is correctly reconfigured and when I check with crm_mon or crm status or crm configure show xml or even by inspecting the cib.xml I find the proper new hostnames fetched by pacemaker/corosync (hostname20 and hostname21).

The documentation reports that pacemaker node name is taken from

  1. corosync.conf nodelist->ring0_addr if not an ip address: NOT MY CASE => skip
  2. corosync.conf nodelist->name if available: NOT MY CASE => skip
  3. uname -n [SHOULD BE IN HERE]

Apparently case number 3 does not apply:

[root@hostname20 ~]# crm_node -n
hostname10
[root@hostname20 ~]# uname -n
hostname20

This becomes evident as soon as I reboot/poweroff one of  the two nodes: crm_mon which after the reconfiguration was correctly showing

Online: [ hostname21 hostname20 ]

"rolls back" the configuration without any notice and starts showing the old one

Online: [ hostname10 ]
OFFLINE: [ hostname11 ]

Do you have any idea of where on heath pacemaker is recovering the old hostnames ?

I've even checked  the code and see that there are cmaps involved so I suspect there's some caching issues involved in this.

It looks like it is retaining the old hostnames in memory and when something .. "fails" it restores them.

Besides don't blame me for this use case (reconfigure hostnames in a two-nodes cluster), as I didn't make it up. I just carry the pain.

R





Riccardo Manfrin
R&D DEPARTMENT
Web | LinkedIn
t +39 (0)444 750045
e [email protected]
ATHONET | Via Cà del Luogo, 6/8 - 36050 Bolzano Vicentino (VI) Italy
This email and any attachments are confidential and intended solely for the use of the intended recipient. If you are not the named addressee, please be aware that you shall not distribute, copy, use or disclose this email. If you have received this email by error, please notify us immediately and delete this email from your system. Email transmission cannot be guaranteed to be secured or error-free or not to contain viruses. Athonet S.r.l. processes any personal data exchanged in email correspondence in accordance with EU Reg. 679/2016 (GDPR) - you may find here the privacy policy with information on such processing and your rights. Any views or opinions presented in this email are solely those of the sender and do not necessarily represent those of Athonet S.r.l.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to