On 1 Aug 2014, at 2:04 pm, Andrew Beekhof <[email protected]> wrote:
> > On 1 Aug 2014, at 7:47 am, Andrew Beekhof <[email protected]> wrote: > >> >> On 31 Jul 2014, at 4:46 pm, Cédric Dufour - Idiap Research Institute >> <[email protected]> wrote: >> >>> On 31/07/14 00:17, Andrew Beekhof wrote: >>>> On 31 Jul 2014, at 2:48 am, Cédric Dufour - Idiap Research Institute >>>> <[email protected]> wrote: >>>> >>>>> After packaging pacemaker 1.1.12 for Debian/Wheezy (along corosync 1.4.6 >>>>> and libqb 0.17.0), I have successfully initialized a new cluster. >>>>> >>>>> Back to a very simple test cluster, the only problem I have is with >>>>> fencing, which fails altogether with "route_ais_message: Sending message >>>>> to local.stonith-ng failed: ipc delivery failed (rc=-2)" messages: >>>>> >>>>> root@bc1hs22a01:~ # tail /var/log/corosync.rsyslog >>>>> Jul 30 18:41:41 bc1hs22a01 stonith_admin[5411]: notice: crm_log_args: >>>>> Invoked: stonith_admin -F bc1hs22a02 >>>>> Jul 30 18:41:41 bc1hs22a01 stonithd[4754]: notice: handle_request: >>>>> Client stonith_admin.5411.fe1388ed wants to fence (off) 'bc1hs22a02' with >>>>> device '(any)' >>>>> Jul 30 18:41:41 bc1hs22a01 stonithd[4754]: notice: >>>>> initiate_remote_stonith_op: Initiating remote operation off for >>>>> bc1hs22a02: 48b69f82-29ad-4c9a-af57-0e60ae5242e4 (0) >>>>> Jul 30 18:41:41 bc1hs22a01 corosync[4686]: [pcmk ] WARN: >>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc >>>>> delivery failed (rc=-2) >>>> rc=-2 is coming from send_client_ipc(void *conn, const AIS_Message * >>>> ais_msg) >>>> >>>> specifically: >>>> >>>> if (conn == NULL) { >>>> rc = -2; >>>> >>>> So the plugin thinks that stonith-ng isn't connected. >>>> More logs? >>>> >>> >>> I have completed a full restart of the cluster in order to provide the logs >>> at each step; see attached log files: >>> (from node_1/DC) >>> - node_1-corosync-start.log >>> - node_1-pacemaker-start.log >>> - node_1-corosync-node_2_join.log >>> - node_1-pacemaker-node_2_join.log >>> (from node_2) >>> - node_2-corosync-start.log >>> - node_2-pacemaker-start.log >>> >>> The problem manifests itself already in DC start log - because of previous >>> fencing attempt - at 08:19:21 and 08:19:42: >>> >>> root@bc1hs22a01:~ # fgrep 'ipc delivery failed' node_1-corosync-start.log >>> Jul 31 08:19:21 bc1hs22a01 corosync[31057]: [pcmk ] WARN: >>> route_ais_message: Sending message to local.stonith-ng failed: ipc delivery >>> failed (rc=-2) >>> Jul 31 08:19:42 bc1hs22a01 corosync[31057]: [pcmk ] WARN: >>> route_ais_message: Sending message to local.stonith-ng failed: ipc delivery >>> failed (rc=-2) >>> >>> While it would seem (to me) that the stonith plugin successfully connected >>> to the CIB: >> >> Its not the CIB thats the issue: >> >>>>> Jul 30 18:41:41 bc1hs22a01 corosync[4686]: [pcmk ] WARN: >>>>> route_ais_message: Sending message to local.stonith-ng failed: ipc >>>>> delivery failed (rc=-2) >> >> Thats the pacemaker plugin inside corosync (which uses a completely >> different IPC mechanism). > > It looks like there is a name mismatch: > > Jul 31 08:19:20 bc1hs22a01 corosync[31057]: [pcmk ] info: pcmk_ipc: > Recorded connection 0x2543e30 for stonithd/0 > Jul 31 08:19:20 bc1hs22a01 corosync[31057]: [pcmk ] debug: > process_ais_message: Msg[1] (dest=local:ais, from=bc1hs22a01:stonithd.31092, > remote=true, size=6): 31092 > ... > Jul 31 08:19:21 bc1hs22a01 corosync[31057]: [pcmk ] WARN: > route_ais_message: Sending message to local.stonith-ng failed: ipc delivery > failed (rc=-2) > Jul 31 08:19:42 bc1hs22a01 corosync[31057]: [pcmk ] WARN: > route_ais_message: Sending message to local.stonith-ng failed: ipc delivery > failed (rc=-2) > > Could you try the following patch? Actually, try this one instead: https://github.com/beekhof/pacemaker/commit/21830a0 > > diff --git a/lib/ais/plugin.c b/lib/ais/plugin.c > index 3d4f369..560e18b 100644 > --- a/lib/ais/plugin.c > +++ b/lib/ais/plugin.c > @@ -1508,6 +1508,9 @@ route_ais_message(const AIS_Message * msg, gboolean > local_origin) > /* te messages are routed via the crm */ > dest = crm_msg_crmd; > > + } else if (dest == crm_msg_stonith_ng) { > + dest = crm_msg_stonithd; > + > } else if (dest >= SIZEOF(pcmk_children)) { > /* Transient client */ > > > > > >> >> FWIW, the plugin is extremely deprecated, you're encouraged to use >> pacemaker+cman or begin working towards corosync2 + pacemakerd. >> >>> >>> root@bc1hs22a01:~ # fgrep cib_native_signon_raw node_1-pacemaker-start.log >>> Jul 31 08:19:20 [31096] bc1hs22a01 crmd: debug: >>> cib_native_signon_raw: Connection unsuccessful (0 (nil)) >>> Jul 31 08:19:20 [31096] bc1hs22a01 crmd: debug: >>> cib_native_signon_raw: Connection to CIB failed: Transport endpoint is >>> not connected >>> Jul 31 08:19:20 [31092] bc1hs22a01 stonithd: debug: >>> cib_native_signon_raw: Connection unsuccessful (0 (nil)) >>> Jul 31 08:19:20 [31092] bc1hs22a01 stonithd: debug: >>> cib_native_signon_raw: Connection to CIB failed: Transport endpoint is >>> not connected >>> Jul 31 08:19:21 [31096] bc1hs22a01 crmd: debug: >>> cib_native_signon_raw: Connection to CIB successful >>> Jul 31 08:19:21 [31092] bc1hs22a01 stonithd: debug: >>> cib_native_signon_raw: Connection to CIB successful >>> Jul 31 08:19:25 [31094] bc1hs22a01 attrd: debug: >>> cib_native_signon_raw: Connection to CIB successful >>> >>> Best, >>> >>> Cédric >>> >>> <node_1-corosync-start.log><node_1-pacemaker-start.log><node_1-corosync-node_2_join.log><node_1-pacemaker-node_2_join.log><node_2-corosync-start.log><node_2-pacemaker-start.log>_______________________________________________ >>> Pacemaker mailing list: [email protected] >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
