> On 11 Nov 2014, at 1:32 am, Zach Wolf <[email protected]> wrote: > > Hey Team, > > I’m receiving some strange intermittent failovers on a two-node cluster > (happens once every week or two). When this happens, both nodes are > unavailable; one node will be marked offline and the other will be shown as > unclean. Any help on this would be massively appreciated. Thanks. > > Running Ubuntu 12.04 (64-bit) > Pacemaker 1.1.6-2ubuntu3.3 > Corosync 1.4.2-2ubuntu0.2 > > Here are the logs: > Nov 08 14:26:26 corosync [pcmk ] info: pcmk_ipc_exit: Client crmd > (conn=0x12bebe0, async-conn=0x12bebe0) left > Nov 08 14:26:26 corosync [pcmk ] WARN: route_ais_message: Sending message to > local.crmd failed: ipc delivery failed (rc=-2) > Nov 08 14:26:27 corosync [pcmk ] info: pcmk_ipc_exit: Client attrd > (conn=0x12d0230, async-conn=0x12d0230) left > Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc_exit: Client cib > (conn=0x12c7d80, async-conn=0x12c7d80) left > Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc_exit: Client stonith-ng > (conn=0x12c3a20, async-conn=0x12c3a20) left > Nov 08 14:26:32 corosync [pcmk ] WARN: route_ais_message: Sending message to > local.crmd failed: ipc delivery failed (rc=-2) > Nov 08 14:26:32 corosync [pcmk ] WARN: route_ais_message: Sending message to > local.cib failed: ipc delivery failed (rc=-2)
Nothing at all from the crmd, cib, attrd or stonith-ng processes? > Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc: Recorded connection > 0x12bebe0 for stonith-ng/0 > Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc: Recorded connection > 0x12c2f40 for attrd/0 > Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Recorded connection > 0x12c72a0 for cib/0 > Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Sending membership update > 12 to cib > Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Recorded connection > 0x12cb600 for crmd/0 > Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Sending membership update > 12 to crmd > > Output of crm configure show: > node p-sbc3 \ > attributes standby="off" > node p-sbc4 \ > attributes standby="off" > primitive fs lsb:FSSofia \ > op monitor interval="2s" enabled="true" timeout="10s" > on-fail="standby" \ > meta target-role="Started" > primitive fs-ip ocf:heartbeat:IPaddr2 \ > params ip="10.100.0.90" nic="eth0:0" cidr_netmask="24" \ > op monitor interval="10s" > primitive fs-ip2 ocf:heartbeat:IPaddr2 \ > params ip="10.100.0.99" nic="eth0:1" cidr_netmask="24" \ > op monitor interval="10s" > group cluster_services fs-ip fs-ip2 fs \ > meta target-role="Started" > property $id="cib-bootstrap-options" \ > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > last-lrm-refresh="1348755080" \ > no-quorum-policy="ignore" > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
