On 07/31/2017 11:13 PM, Ulrich Windl [Masked] wrote: >> I am experimenting with pacemaker for high availability for some load >> balancers. I was able to sucessfully get two CentOS (6.9) machines >> (scahadev01da and scahadev01db) to form a cluster and the shared IP was >> assigned to scahadev01da. I simulated a failure by halting the primary >> and the secondary eventually noticed bringing up the shared IP on its >> eth0. So far, so good. >> >> A problem arises when the primary comes back up and, for some reason, >> each node thinks the other is offline. This leads to both nodes adding > > If a node thinks the other is unexpectedly offline, it will fence it, and > then it will be offline! Thus the IP can't run there. I guess you have no > fencing configured, right?
No. I didn't realize it was necessary unless there was shared storage involved. I guess it is time to go back to the drawing board. Can clustering even be done reliably on CentOS 6? I have no objection to moving to 7 but I was hoping I could get this up quicker than building out a bunch of new balancers. On a related note: I tried rebooting both nodes and each node still thinks the other is offline. For future reference is there a way to clear that? > Regards, > Ulrich > >> the duplicate IP to its own eth0. I probably do not need to tell you >> the mischief that can cause if these were production servers. >> >> I tried restarting cman, pcsd and pacemaker on both machines with no >> effect on the situation. >> >> I've found several mentions of it in the search engines but I've been >> unable to find how to fix it. Any help is appreciated >> >> Both nodes have quorum disabled in /etc/sysconfig/cman >> >> CMAN_QUORUM_TIMEOUT=0 >> >> #------------------------------------------------ >> Node 1 >> >> scahadev01da# sudo pcs status >> Cluster name: scahadev01d >> Stack: cman >> Current DC: scahadev01da (version 1.1.15-5.el6-e174ec8) - partition >> WITHOUT quorum >> Last updated: Mon Jul 31 10:43:54 2017 Last change: Mon Jul 31 >> 10:30:46 >> 2017 by root via cibadmin on scahadev01da >> >> 2 nodes and 1 resource configured >> >> Online: [ scahadev01da ] >> OFFLINE: [ scahadev01db ] >> >> Full list of resources: >> >> VirtualIP (ocf::heartbeat:IPaddr2): Started scahadev01da >> >> Daemon Status: >> cman: active/enabled >> corosync: active/disabled >> pacemaker: active/enabled >> pcsd: active/enabled >> >> #------------------------------------------------ >> Node 2 >> >> scahadev01db ~]$ sudo pcs status >> Cluster name: scahadev01d >> Stack: cman >> Current DC: scahadev01db (version 1.1.15-5.el6-e174ec8) - partition >> WITHOUT quorum >> Last updated: Mon Jul 31 10:43:47 2017 Last change: Sat Jul 29 >> 13:45:15 >> 2017 by root via cibadmin on scahadev01da >> >> 2 nodes and 1 resource configured >> >> Online: [ scahadev01db ] >> OFFLINE: [ scahadev01da ] >> >> Full list of resources: >> >> VirtualIP (ocf::heartbeat:IPaddr2): Started scahadev01db >> >> Daemon Status: >> cman: active/enabled >> corosync: active/disabled >> pacemaker: active/enabled >> pcsd: active/enabled >> >> -- >> Stephen Carville >> >> _______________________________________________ >> Users mailing list: [email protected] >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > > > _______________________________________________ > Users mailing list: [email protected] > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: [email protected] http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
