Re: REBALANCELEADERS is not reliable

Erick Erickson Tue, 08 Jan 2019 17:33:28 -0800

It's weirder than that. In the current test on master, the
assumption is that the node recorded as leader in ZK
is actually the leader, see
TestRebalanceLeaders.checkZkLeadersAgree(). The theory
is that the identified leader node in ZK is actually the leader
after the rebalance command. But you're right, I don't see
an actual check that the collection's status agrees.


That aside, though, there are several problems I'm uncovering

1> BALANCESHARDUNIQUE can wind up with multiple
"preferredLeader" properties defined. Some time between
the original code and now someone refactored a bunch of
code and missed removing a unique property if it was
already assigned and being assigned to another replica
in the same slice.

2> to make it much worse, I've rewritten the tests
extensively and I can beast the rewritten tests 1,000
times and no failures. If I test manually by just issuing
the commands, everything works fine. By "testing manually"
I mean (working with 4 Vms, 10 shards 4 replicas)
> create the collection
> issue the BALANCESHARDUNIQUE command
> issue the REBALANCELEADERS command


However, if instead I
> create the collection
> issue the BALANCESHARDUNIQUE command
> shut down 3 of 4 Solr instances so all the leaders
   are on the same host.
> restart the 3 instances
> issue the REBALANCELEADERS command then
   it doesn't work.

At least that's what I think I'm seeing, but it makes no
real sense yet.

So I'm first trying to understand why my manual test
fails so regularly, then I can incorporate that setup
into the unit test (I'm thinking of just shutting down
and restarting some of the Jetty instances).

But it's a total mystery to me why restarting Solr instances
should have any effect. But that's certainly not
something that happens in the current test so I have
hopes that tracking that down will lead to understanding
what the invalid assumption I'm making is and we can
test for that too.,

On Tue, Jan 8, 2019 at 1:42 AM Bernd Fehling
<bernd.fehl...@uni-bielefeld.de> wrote:
>
> Hi Erick,
>
> after some more hours of debugging the rough result is, who ever invented
> this leader election did not check if an action returns the estimated
> result. There are only checks for exceptions, true/false, new sequence
> numbers and so on, but never if a leader election to the preferredleader
> really took place.
>
> If doing a rebalanceleaders to preferredleader I also have to check if:
> - a rebalance took place
> - the preferredleader has really become leader (and not anyone else)
>
> Currently this is not checked and the call rebalanceleaders to preferredleader
> is like a shot into the dark with hope of success. And thats why any
> problems have never been discovered or reported.
>
> Bernd
>
>
> Am 21.12.18 um 18:00 schrieb Erick Erickson:
> > I looked at the test last night and it's...disturbing. It succeeds
> > 100% of the time. Manual testing seems to fail very often.
> > Of course it was late and I was a bit cross-eyed, so maybe
> > I wasn't looking at the manual tests correctly. Or maybe the
> > test is buggy.
> >
> > I beasted the test 100x last night and all of them succeeded.
> >
> > This was with all NRT replicas.
> >
> > Today I'm going to modify the test into a stand-alone program
> > to see if it's something in the test environment that causes
> > it to succeed. I've got to get this to fail as a unit test before I
> > have confidence in any fixes, and also confidence that things
> > like this will be caught going forward.
> >
> > Erick
> >
> > On Fri, Dec 21, 2018 at 3:59 AM Bernd Fehling
> > <bernd.fehl...@uni-bielefeld.de> wrote:
> >>
> >> As far as I could see with debugger there is still a problem in requeing.
> >>
> >> There is a watcher and it is recognized that the watcher is not a 
> >> preferredleader.
> >> So it tries to locate a preferredleader with success.
> >> It then calls makeReplicaFirstWatcher and gets a new sequence number for
> >> the preferredleader replica. But now we have two replicas with the same
> >> sequence number. One replica which already owns that sequence number and
> >> the replica which got the new (and the same) number as new sequence number.
> >> It now tries to solve this with queueNodesWithSameSequence.
> >> Might be something in rejoinElection.
> >> At least the call to rejoinElection seems right. For preferredleader it
> >> is true for rejoinAtHead and for the other replica with same sequence 
> >> number
> >> it is false for rejoinAtHead.
> >>
> >> A test case should have 3 shards with 3 cores per shard and should try to
> >> set preferredleader to different replicas at random. And then try to
> >> rebalance and check the results.
> >>
> >> So far, regards, Bernd
> >>
> >>
> >> Am 21.12.18 um 07:11 schrieb Erick Erickson:
> >>> I'm reworking the test case, so hold off on doing that. If you want to
> >>> raise a JIRA, though. please do and attach your patch...
> >>>
> >>> On Thu, Dec 20, 2018 at 10:53 AM Erick Erickson <erickerick...@gmail.com> 
> >>> wrote:
> >>>>
> >>>> Nothing that I know of was _intentionally_ changed with this between
> >>>> 6x and 7x. That said, nothing that I know of was done to verify that
> >>>> TLOG and PULL replicas (added in 7x) were handled correctly. There's a
> >>>> test "TestRebalanceLeaders" for this functionality that has run since
> >>>> the feature was put in, but it has _not_ been modified to create TLOG
> >>>> and PULL replicas and test with those.
> >>>>
> >>>> For this patch to be complete, we should either extend that test or
> >>>> make another that fails without this patch and succeeds with it.
> >>>>
> >>>> I'd probably recommend modifying TestRebalanceLeaders to randomly
> >>>> create TLOG and (maybe) PULL replicas so we'd keep covering the
> >>>> various cases.
> >>>>
> >>>> Best,
> >>>> Erick
> >>>>
> >>>>
> >>>> On Thu, Dec 20, 2018 at 8:06 AM Bernd Fehling
> >>>> <bernd.fehl...@uni-bielefeld.de> wrote:
> >>>>>
> >>>>> Hi Vadim,
> >>>>> I just tried it with 6.6.5.
> >>>>> In my test cloud with 5 shards, 5 nodes, 3 cores per node it missed
> >>>>> one shard to become leader. But noticed that one shard already was
> >>>>> leader. No errors or exceptions in logs.
> >>>>> May be I should enable debug logging and try again to see all logging
> >>>>> messages from the patch.
> >>>>>
> >>>>> Might be they also changed other parts between 6.6.5 and 7.6.0 so that
> >>>>> it works for you.
> >>>>>
> >>>>> I also just changed from zookeeper 3.4.10 to 3.4.13 which works fine,
> >>>>> even with 3.4.10 dataDir. No errors no complains. Seems to be 
> >>>>> compatible.
> >>>>>
> >>>>> Regards, Bernd
> >>>>>
> >>>>>
> >>>>> Am 20.12.18 um 12:31 schrieb Vadim Ivanov:
> >>>>>> Yes! It works!
> >>>>>> I have tested RebalanceLeaders today with the patch provided by Endika 
> >>>>>> Posadas. 
> >>>>>> (http://lucene.472066.n3.nabble.com/Rebalance-Leaders-Leader-node-deleted-when-rebalancing-leaders-td4417040.html)
> >>>>>> And at last it works as expected on my collection with 5 nodes and 
> >>>>>> about 400 shards.
> >>>>>> Original patch was slightly incompatible with 7.6.0
> >>>>>> I hope this patch will help to try this feature with 7.6
> >>>>>> https://drive.google.com/file/d/19z_MPjxItGyghTjXr6zTCVsiSJg1tN20
> >>>>>>
> >>>>>> RebalanceLeaders was not very useful feature before 7.0 (as all 
> >>>>>> replicas were NRT)
> >>>>>> But new replica types made it very helpful to keep big clusters in 
> >>>>>> order...
> >>>>>>
> >>>>>> I wonder, why there is no any jira about this case (or maybe I missed 
> >>>>>> it)?
> >>>>>> Anyone who cares, please, help to create jira and improve this feature 
> >>>>>> in the nearest releaase
> >>>>>>

Re: REBALANCELEADERS is not reliable

Reply via email to