Re: REBALANCELEADERS is not reliable

Erick Erickson Sun, 13 Jan 2019 19:44:47 -0800

Bernd:

I just attached a patch to
https://issues.apache.org/jira/browse/SOLR-13091. It's still rough,
the response from REBALANCELEADERS needs quite a bit of work (lots of
extra stuff in it now, and no overall verification).
I haven't run all the tests, nor precommit.


I wanted to get something up so if you have a test environment that
you can easily test it in you'd have an early chance to play with it.

It's against master, I also haven't tried to backport to 8.0 or 7x
yet. I doubt it'll be a problem, but if it does't apply cleanly let me
know.

Best,
Erick

On Fri, Jan 11, 2019 at 8:33 AM Erick Erickson <erickerick...@gmail.com> wrote:
>
> bq: You have to check if the cores, participating in leadership
> election, are _really_
> in sync. And this must be done before starting any rebalance.
> Sounds ugly... :-(
>
> This _should_ not be necessary. I'll add parenthetically that leader
> election has
> been extensively re-worked in Solr 7.3+ though because "interesting" things
> could happen.
>
> Manipulating the leader election queue is really no different than
> having to deal with, say, someone killing the leader un-gracefully. It  should
> "just work". That said if you're seeing evidence to the contrary that's 
> reality.
>
> What do you mean by "stats" though? It's perfectly ordinary for there to
> be different numbers of _deleted_ documents on various replicas, and
> consequently things like term frequencies and doc frequencies being
> different. What's emphatically _not_ expected is for there to be different
> numbers of "live" docs.
>
> "making sure nodes are in sync" is certainly an option. That should all
> be automatic if you pause indexing and issue a commit, _then_
> do a rebalance.
>
> I certainly agree that the code is broken and needs to be fixed, but I
> also have to ask how many shards are we talking here? The code was
> originally written for the case where 100s of leaders could be on the
> same node, until you get in to a significant number of leaders on
> a single node (10s at least) there haven't been reliable stats showing
> that it's a performance issue. If you have threshold numbers where
> you've seen it make a material difference it'd be great to share them.
>
> And I won't be getting back to this until the weekend, other urgent
> stuff has come up...
>
> Best,
> Erick
>
> On Fri, Jan 11, 2019 at 12:58 AM Bernd Fehling
> <bernd.fehl...@uni-bielefeld.de> wrote:
> >
> > Hi Erik,
> > yes, I would be happy to test any patches.
> >
> > Good news, I got rebalance working.
> > After running the rebalance about 50 times with debugger and watching
> > the behavior of my problem shard and its core_nodes within my test cloud
> > I came to the point of failure. I solved it and now it works.
> >
> > Bad news, rebalance is still not reliable and there are many more
> > problems and point of failure initiated by rebalanceLeaders or better
> > by re-queueing the watchlist.
> >
> > How I located _my_ problem:
> > Test cloud is 5 server (VM), 5 shards, 3 replica per shard, 1 java
> > instance per server. 3 separate zookeepers.
> > My problem, shard2 wasn't willing to rebalance to a specific core_node.
> > core_nodes related (core_node1, core_node2, core_node10).
> > core_node10 was the preferredLeader.
> > It was just changing leader ship between core_node1 and core_node2,
> > back and forth, whenever I called rebalanceLeader.
> > First step, I stopped the server holding core_node2.
> > Result, the leadership was staying at core_node1 whenever I called 
> > rebalanceLeaders.
> > Second step, from debugger I _forced_ during rebalanceLeaders the
> > system to give the leadership to core_node10.
> > Result, there was no leader anymore for that shard. Yes it can happen,
> > you can end up with a shard having no leader but active core_nodes!!!
> > To fix this I was giving preferredLeader to core_node1 and called 
> > rebalanceLeaders.
> > After that, preferredLeader was set back to core_node10 and I was back
> > at the point I started, all calls to rebalanceLeaders kept the leader at 
> > core_node1.
> >
> >  From the debug logs I got the hint about PeerSync of cores and 
> > IndexFingerprint.
> > The stats from my problem core_node10 showed that they differ from leader 
> > core_node1.
> > And the system notices the difference, starts a PeerSync and ends with 
> > success.
> > But actually the PeerSync seem to fail, because the stats of core_node1 and
> > core_node10 still differ afterwards.
> > Solution, I also stopped my server holding my problem core_node10, wiped 
> > all data
> > directories and started that server again. The core_nodes where rebuilt 
> > from leader
> > and now they are really in sync.
> > Calling now rebalanceLeaders ended now with success to preferredLeader.
> >
> > My guess:
> > You have to check if the cores, participating in leadership election, are 
> > _really_
> > in sync. And this must be done before starting any rebalance.
> > Sounds ugly... :-(
> >
> > Next question, why is PeerSync not reporting an error?
> > There is an info about "PeerSync START", "PeerSync Received 0 versions from 
> > ... fingeprint:null"
> > and "PeerSync DONE. sync succeeded" but the cores are not really in sync.
> >
> > Another test I did (with my new knowledge about synced cores):
> > - Removing all preferredLeader properties
> > - stopping, wiping data directory, starting all server one by one to get
> >    all cores of all shards in sync
> > - setting one preferredLeader for each shard but different from the actual 
> > leader
> > - calling rebalanceLeaders succeeded only at 2 shards with the first run,
> >    not for all 5 shards (even with really all cores in sync).
> > - after calling rebalanceLeaders again the other shards succeeded also.
> > Result, rebalanceLeaders is still not reliable.
> >
> > I have to mention that I have about 520.000 docs per core in my test cloud
> > and that there might also be a timing issue between calling 
> > rebalanceLeaders,
> > detecting that cores to become leader are not in sync with actual leader,
> > and resync while waiting for new leader election.
> >
> > So far,
> > Bernd
> >
> >
> > Am 10.01.19 um 17:02 schrieb Erick Erickson:
> > > Bernd:
> > >
> > > Don't feel bad about missing it, I wrote the silly stuff and it took me
> > > some time to remember.....
> > >
> > > Those are  the rules.
> > >
> > > It's always humbling to look back at my own code and say "that
> > > idiot should have put some comments in here..." ;)
> > >
> > > yeah, I agree there are a lot of moving parts here. I have a note to
> > > myself to provide better feedback in the response. You're absolutely
> > > right that we fire all these commands and hope they all work.  Just
> > > returning "success" status doesn't guarantee leadership change.
> > >
> > > I'll be on another task the rest of this week, but I should be able
> > > to dress things up over the weekend. That'll give you a patch to test
> > > if you're willing.
> > >
> > > The actual code changes are pretty minimal, the bulk of the patch
> > > will be the reworked test.
> > >
> > > Best,
> > > Erick
> > >

Re: REBALANCELEADERS is not reliable

Reply via email to