Hi Thomas,
Hi,
Am 04/25/2018 um 09:57 AM schrieb Jan Friesse:
Thomas Lamprecht napsal(a):
On 4/24/18 6:38 PM, Jan Friesse wrote:
On 4/6/18 10:59 AM, Jan Friesse wrote:
Thomas Lamprecht napsal(a):
Am 03/09/2018 um 05:26 PM schrieb Jan Friesse:
I've tested it too and yes, you are 100% right. Bug is there and
it's
pretty easy to reproduce when node with lowest nodeid is paused.
It's
slightly harder when node with higher nodeid is paused.
Do you were able to make some progress on this issue?
Ya, kind of. Sadly I had to work on different problem, but I'm
expecting to sent patch next week.
I guess the different problems where the ones related to the issued
CVEs :)
Yep.
Also I've spent quite a lot of the time thinking about best possible
solution. CPG is quite old, it was full of weird bugs and risk of
breakage is very high.
Anyway, I've decided to not to try hack what is apparently broken
and just go for risky but proper solution (= needs a LOT more
testing, but so far looks good).
I did not looked deep into how your revert plays out with the
mentioned commits of the heuristics approach, but this fix would
mean to bring corosync back to a state it had already, and thus
was already battle tested?
Yep, but not fully. Important change was to use joinlists as
authoritative source of information about other node clients, so I
believe that solved problems which should had been "solved" by
downlist heuristics.
Patch and approach seems good to me, with my limited knowledge,
when looking at the various "bandaid" fix commits you mentioned.
Patch is in PR (needle): https://github.com/corosync/corosync/pull/347
Much thanks! First tests work well here.
I could not yet reproduce the problem with the patch applied in both,
testcpg and our cluster configuration file system.
That's good to hear :)
I'll let it run
Perfect.
Just wanted to give some quick feedback.
We deployed this to your community repository about a week ago (after
another week of successful testing), we had no negative feedback or
issues reported or seen yet, with (strong lower bound) > 10k systems
running the fix by now.
Thanks, that's exciting news.
I saw just now that you merged it into needle and master, so, while a
bit late, this just backs the confidence into the fix up.
Definitively not late until it's released :)
Much thanks for your, and the reviewers, work!
Yep, you are welcomed.
Honza
cheers,
Thomas
_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org