Hello Ulrich,

Could you reproduce this issue stably? if yes, please share your steps.
Since we also encountered a similar issue, it looks that Cmirrord can not join 
the CPG(corosync related concept), then the resource is timeout, the node is 
fenced.

Thanks
Gang

>>> On 2018/11/12 at 15:46, in message
<[email protected]>, "Ulrich Windl"
<[email protected]> wrote:
> Hi!
> 
> While analyzing some odd cluster problem in SLES11 SP4, I found this message 
> repeating quite a lot (several times per second) with the same text:
> 
> [...more...]
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> Nov 10 22:10:47 h05 cmirrord[17741]: [yEa32lLX]  Retry #1 of 
> cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN
> [...many more...]
> 
> I wonder: Shouldn't the retry number be incremented? Or are these different 
> retries? If so, where is it visible?
> 
> The situation I'm analyzing is when a node should have been fenced, but 
> somehow it wasn't, but also just stopped working (seemed like frozen). The 
> last message a few minutes(!) before the other rnodes complained was:
> 
> Nov 10 22:04:18 h01 crmd[16596]:   notice: throttle_mode: High CIB load 
> detected: 1.246333
> (After this the node seemed dead/frozen).
> 
> Regards,
> Ulrich
> 
> 
> 
> _______________________________________________
> Users mailing list: [email protected] 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to