Hello Ulrich, Could you reproduce this issue stably? if yes, please share your steps. Since we also encountered a similar issue, it looks that Cmirrord can not join the CPG(corosync related concept), then the resource is timeout, the node is fenced.
Thanks Gang >>> On 2018/11/12 at 15:46, in message <[email protected]>, "Ulrich Windl" <[email protected]> wrote: > Hi! > > While analyzing some odd cluster problem in SLES11 SP4, I found this message > repeating quite a lot (several times per second) with the same text: > > [...more...] > Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX] Retry #1 of > cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN > [...many more...] > > I wonder: Shouldn't the retry number be incremented? Or are these different > retries? If so, where is it visible? > > The situation I'm analyzing is when a node should have been fenced, but > somehow it wasn't, but also just stopped working (seemed like frozen). The > last message a few minutes(!) before the other rnodes complained was: > > Nov 10 22:04:18 h01 crmd[16596]: notice: throttle_mode: High CIB load > detected: 1.246333 > (After this the node seemed dead/frozen). > > Regards, > Ulrich > > > > _______________________________________________ > Users mailing list: [email protected] > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
