Honza, On 4/24/18 6:38 PM, Jan Friesse wrote: >> On 4/6/18 10:59 AM, Jan Friesse wrote: >>> Thomas Lamprecht napsal(a): >>>> Am 03/09/2018 um 05:26 PM schrieb Jan Friesse: >>>>> I've tested it too and yes, you are 100% right. Bug is there and it's >>>>> pretty easy to reproduce when node with lowest nodeid is paused. It's >>>>> slightly harder when node with higher nodeid is paused. >>>>> >>>> >>>> Do you were able to make some progress on this issue? >>> >>> Ya, kind of. Sadly I had to work on different problem, but I'm expecting to >>> sent patch next week. >>> >> >> I guess the different problems where the ones related to the issued CVEs :) > > Yep. > > Also I've spent quite a lot of the time thinking about best possible > solution. CPG is quite old, it was full of weird bugs and risk of breakage is > very high. > > Anyway, I've decided to not to try hack what is apparently broken and just go > for risky but proper solution (= needs a LOT more testing, but so far looks > good). >
I did not looked deep into how your revert plays out with the mentioned commits of the heuristics approach, but this fix would mean to bring corosync back to a state it had already, and thus was already battle tested? Patch and approach seems good to me, with my limited knowledge, when looking at the various "bandaid" fix commits you mentioned. > Patch is in PR (needle): https://github.com/corosync/corosync/pull/347 > Much thanks! First tests work well here. I could not yet reproduce the problem with the patch applied in both, testcpg and our cluster configuration file system. I'll let it run cheers, Thomas _______________________________________________ Users mailing list: [email protected] https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
