Hi Ken, Thanks for the explanation.
As an additional information, we are using Daemon(*1) that registers Corosync's ring status as attributes, so I want to avoid events where attributes are not displayed. *1 It's a ifcheckd that always running, not a resource. and registers attributes when Pacemaker is running. ( https://github.com/linux-ha-japan/pm_extras/tree/master/tools ) Attribute example : Node Attributes: * Node rhel73-1: + ringnumber_0 : 192.168.101.131 is UP + ringnumber_1 : 192.168.102.131 is UP * Node rhel73-2: + ringnumber_0 : 192.168.101.132 is UP + ringnumber_1 : 192.168.102.132 is UP Regards, Kazunori INOUE > -----Original Message----- > From: Ken Gaillot [mailto:[email protected]] > Sent: Tuesday, August 15, 2017 2:42 AM > To: Cluster Labs - All topics related to open-source clustering welcomed > Subject: Re: [ClusterLabs] Updated attribute is not displayed in crm_mon > > On Mon, 2017-08-14 at 12:33 -0500, Ken Gaillot wrote: > > On Wed, 2017-08-02 at 09:59 +0000, 井上 和徳 wrote: > > > Hi, > > > > > > In Pacemaker-1.1.17, the attribute updated while starting pacemaker is > > > not displayed in crm_mon. > > > In Pacemaker-1.1.16, it is displayed and results are different. > > > > > > https://github.com/ClusterLabs/pacemaker/commit/fe44f400a3116a158ab331a92a49a4ad8937170d > > > This commit is the cause, but the following result (3.) is expected > > > behavior? > > > > This turned out to be an odd one. The sequence of events is: > > > > 1. When the node leaves the cluster, the DC (correctly) wipes all its > > transient attributes from attrd and the CIB. > > > > 2. Pacemaker is newly started on the node, and a transient attribute is > > set before the node joins the cluster. > > > > 3. The node joins the cluster, and its transient attributes (including > > the new value) are sync'ed with the rest of the cluster, in both attrd > > and the CIB. So far, so good. > > > > 4. Because this is the node's first join since its crmd started, its > > crmd wipes all of its transient attributes again. The idea is that the > > node may have restarted so quickly that the DC hasn't yet done it (step > > 1 here), so clear them now to avoid any problems with old values. > > However, the crmd wipes only the CIB -- not attrd (arguably a bug). > > Whoops, clarification: the node may have restarted so quickly that > corosync didn't notice it left, so the DC would never have gotten the > "peer lost" message that triggers wiping its transient attributes. > > I suspect the crmd wipes only the CIB in this case because we assumed > attrd would be empty at this point -- missing exactly this case where a > value was set between start-up and first join. > > > 5. With the older pacemaker version, both the joining node and the DC > > would request a full write-out of all values from attrd. Because step 4 > > only wiped the CIB, this ends up restoring the new value. With the newer > > pacemaker version, this step is no longer done, so the value winds up > > staying in attrd but not in CIB (until the next write-out naturally > > occurs). > > > > I don't have a solution yet, but step 4 is clearly the problem (rather > > than the new code that skips step 5, which is still a good idea > > performance-wise). I'll keep working on it. > > > > > [test case] > > > 1. Start pacemaker on two nodes at the same time and update the attribute > > > during startup. > > > In this case, the attribute is displayed in crm_mon. > > > > > > [root@node1 ~]# ssh -f node1 'systemctl start pacemaker ; > > > attrd_updater -n KEY -U V-1' ; \ > > > ssh -f node3 'systemctl start pacemaker ; > > > attrd_updater -n KEY -U V-3' > > > [root@node1 ~]# crm_mon -QA1 > > > Stack: corosync > > > Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with > > > quorum > > > > > > 2 nodes configured > > > 0 resources configured > > > > > > Online: [ node1 node3 ] > > > > > > No active resources > > > > > > > > > Node Attributes: > > > * Node node1: > > > + KEY : V-1 > > > * Node node3: > > > + KEY : V-3 > > > > > > > > > 2. Restart pacemaker on node1, and update the attribute during startup. > > > > > > [root@node1 ~]# systemctl stop pacemaker > > > [root@node1 ~]# systemctl start pacemaker ; attrd_updater -n KEY -U > > > V-10 > > > > > > > > > 3. The attribute is registered in attrd but it is not registered in CIB, > > > so the updated attribute is not displayed in crm_mon. > > > > > > [root@node1 ~]# attrd_updater -Q -n KEY -A > > > name="KEY" host="node3" value="V-3" > > > name="KEY" host="node1" value="V-10" > > > > > > [root@node1 ~]# crm_mon -QA1 > > > Stack: corosync > > > Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with > > > quorum > > > > > > 2 nodes configured > > > 0 resources configured > > > > > > Online: [ node1 node3 ] > > > > > > No active resources > > > > > > > > > Node Attributes: > > > * Node node1: > > > * Node node3: > > > + KEY : V-3 > > > > > > > > > Best Regards > > > > > > _______________________________________________ > > > Users mailing list: [email protected] > > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > -- > Ken Gaillot <[email protected]> > > > > > > _______________________________________________ > Users mailing list: [email protected] > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
