Hi, Ken Thanks for your comment. Network fencing that's a valid means, I also think. However, I think that the reliance on equipment is strong. Since we do not have an SNMP-capable network switch in our environment, we can not immediately try it.
Thanks, Yusuke > -----Original Message----- > From: Users [mailto:[email protected]] On Behalf Of Ken Gaillot > Sent: Friday, April 06, 2018 11:12 PM > To: Cluster Labs - All topics related to open-source clustering welcomed > Subject: Re: [ClusterLabs] How can I prevent multiple start of IPaddr 2 in an > environment using fence_mpath? > > On Fri, 2018-04-06 at 04:30 +0000, 飯田 雄介 wrote: > > Hi, all > > I am testing the environment using fence_mpath with the following > > settings. > > > > ======= > > Stack: corosync > > Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition with > > quorum > > Last updated: Fri Apr 6 13:16:20 2018 > > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e > > > > 2 nodes configured > > 13 resources configured > > > > Online: [ x3650e x3650f ] > > > > Full list of resources: > > > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e > > fenceMpath-x3650f (stonith:fence_mpath): Started x3650f > > Resource Group: grpPostgreSQLDB > > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start > > ed x3650e > > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start > > ed x3650e > > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start > > ed x3650e > > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650e > > Resource Group: grpPostgreSQLIP > > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Start > > ed x3650e > > Clone Set: clnDiskd1 [prmDiskd1] > > Started: [ x3650e x3650f ] > > Clone Set: clnDiskd2 [prmDiskd2] > > Started: [ x3650e x3650f ] > > Clone Set: clnPing [prmPing] > > Started: [ x3650e x3650f ] > > ======= > > > > When split-brain occurs in this environment, x3650f executes fence and > > the resource is started with x3650f. > > > > === view of x3650e ==== > > Stack: corosync > > Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition > > WITHOUT quorum > > Last updated: Fri Apr 6 13:16:36 2018 > > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e > > > > 2 nodes configured > > 13 resources configured > > > > Node x3650f: UNCLEAN (offline) > > Online: [ x3650e ] > > > > Full list of resources: > > > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e > > fenceMpath-x3650f (stonith:fence_mpath): Started[ x3650e x3650f > > ] > > Resource Group: grpPostgreSQLDB > > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start > > ed x3650e > > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start > > ed x3650e > > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start > > ed x3650e > > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650e > > Resource Group: grpPostgreSQLIP > > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Start > > ed x3650e > > Clone Set: clnDiskd1 [prmDiskd1] > > prmDiskd1 (ocf::pacemaker:diskd): Started x3650f > > (UNCLEAN) > > Started: [ x3650e ] > > Clone Set: clnDiskd2 [prmDiskd2] > > prmDiskd2 (ocf::pacemaker:diskd): Started x3650f > > (UNCLEAN) > > Started: [ x3650e ] > > Clone Set: clnPing [prmPing] > > prmPing (ocf::pacemaker:ping): Started x3650f (UNCLEAN) > > Started: [ x3650e ] > > > > === view of x3650f ==== > > Stack: corosync > > Current DC: x3650f (version 1.1.17-1.el7-b36b869) - partition > > WITHOUT quorum > > Last updated: Fri Apr 6 13:16:36 2018 > > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e > > > > 2 nodes configured > > 13 resources configured > > > > Online: [ x3650f ] > > OFFLINE: [ x3650e ] > > > > Full list of resources: > > > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650f > > fenceMpath-x3650f (stonith:fence_mpath): Started x3650f > > Resource Group: grpPostgreSQLDB > > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start > > ed x3650f > > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start > > ed x3650f > > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start > > ed x3650f > > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Started x3650f > > Resource Group: grpPostgreSQLIP > > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Start > > ed x3650f > > Clone Set: clnDiskd1 [prmDiskd1] > > Started: [ x3650f ] > > Stopped: [ x3650e ] > > Clone Set: clnDiskd2 [prmDiskd2] > > Started: [ x3650f ] > > Stopped: [ x3650e ] > > Clone Set: clnPing [prmPing] > > Started: [ x3650f ] > > Stopped: [ x3650e ] > > ======= > > > > However, IPaddr2 of x3650e will not stop until pgsql monitor error > > occurs. > > At this time, IPaddr2 is temporarily started on two nodes. > > > > === view of after pgsql monitor error === > > Stack: corosync > > Current DC: x3650e (version 1.1.17-1.el7-b36b869) - partition > > WITHOUT quorum > > Last updated: Fri Apr 6 13:16:56 2018 > > Last change: Thu Mar 1 18:38:02 2018 by root via cibadmin on x3650e > > > > 2 nodes configured > > 13 resources configured > > > > Node x3650f: UNCLEAN (offline) > > Online: [ x3650e ] > > > > Full list of resources: > > > > fenceMpath-x3650e (stonith:fence_mpath): Started x3650e > > fenceMpath-x3650f (stonith:fence_mpath): Started[ x3650e x3650f > > ] > > Resource Group: grpPostgreSQLDB > > prmFsPostgreSQLDB1 (ocf::heartbeat:Filesystem): Start > > ed x3650e > > prmFsPostgreSQLDB2 (ocf::heartbeat:Filesystem): Start > > ed x3650e > > prmFsPostgreSQLDB3 (ocf::heartbeat:Filesystem): Start > > ed x3650e > > prmApPostgreSQLDB (ocf::heartbeat:pgsql): Stopped > > Resource Group: grpPostgreSQLIP > > prmIpPostgreSQLDB (ocf::heartbeat:IPaddr2): Stopp > > ed > > Clone Set: clnDiskd1 [prmDiskd1] > > prmDiskd1 (ocf::pacemaker:diskd): Started x3650f > > (UNCLEAN) > > Started: [ x3650e ] > > Clone Set: clnDiskd2 [prmDiskd2] > > prmDiskd2 (ocf::pacemaker:diskd): Started x3650f > > (UNCLEAN) > > Started: [ x3650e ] > > Clone Set: clnPing [prmPing] > > prmPing (ocf::pacemaker:ping): Started x3650f (UNCLEAN) > > Started: [ x3650e ] > > > > Node Attributes: > > * Node x3650e: > > + default_ping_set : 100 > > + diskcheck_status : normal > > + diskcheck_status_internal : normal > > > > Migration Summary: > > * Node x3650e: > > prmApPostgreSQLDB: migration-threshold=1 fail-count=1 last- > > failure='Fri Apr 6 13:16:39 2018' > > > > Failed Actions: > > * prmApPostgreSQLDB_monitor_10000 on x3650e 'not running' (7): > > call=60, status=complete, exitreason='Configuration file > > /dbfp/pgdata/data/postgresql.conf doesn't exist', > > last-rc-change='Fri Apr 6 13:16:39 2018', queued=0ms, exec=0ms > > ====== > > > > We regard this behavior as a problem. > > Is there a way to avoid this behavior? > > > > Regards, Yusuke > > Hi Yusuke, > > One possibility would be to implement network fabric fencing as well, e.g. > fence_snmp with an SNMP-capable network switch. You can make a fencing > topology > level with both the storage and network devices. > > The main drawback is that unfencing isn't automatic. After a fenced node is > ready to rejoin, you have to clear the block at the switch yourself. > -- > Ken Gaillot <[email protected]> > _______________________________________________ > Users mailing list: [email protected] > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
