Re: REBALANCELEADERS is not reliable

Erick Erickson Sun, 20 Jan 2019 08:29:29 -0800

Bernd:

I just committed fixes on SOLR-13091 and SOLR-10935 to the repo, if
you wanted to give it a whirl it's ready. By tonight (Sunday) I expect
to change the response format a bit and update the ref guide, although
you'll have to look at the doc changes in the format. There's a new
summary section that gives "Success" or "Failure" that's supposed to
be the only thing you really need to check...


One judgement call I made was that if a replica on a down node is the
preferredLeader, it _can't_ be made leader, but this is still labeled
"Success".

Best,
Erick

On Sun, Jan 13, 2019 at 7:43 PM Erick Erickson <erickerick...@gmail.com> wrote:
>
> Bernd:
>
> I just attached a patch to
> https://issues.apache.org/jira/browse/SOLR-13091. It's still rough,
> the response from REBALANCELEADERS needs quite a bit of work (lots of
> extra stuff in it now, and no overall verification).
> I haven't run all the tests, nor precommit.
>
> I wanted to get something up so if you have a test environment that
> you can easily test it in you'd have an early chance to play with it.
>
> It's against master, I also haven't tried to backport to 8.0 or 7x
> yet. I doubt it'll be a problem, but if it does't apply cleanly let me
> know.
>
> Best,
> Erick
>
> On Fri, Jan 11, 2019 at 8:33 AM Erick Erickson <erickerick...@gmail.com> 
> wrote:
> >
> > bq: You have to check if the cores, participating in leadership
> > election, are _really_
> > in sync. And this must be done before starting any rebalance.
> > Sounds ugly... :-(
> >
> > This _should_ not be necessary. I'll add parenthetically that leader
> > election has
> > been extensively re-worked in Solr 7.3+ though because "interesting" things
> > could happen.
> >
> > Manipulating the leader election queue is really no different than
> > having to deal with, say, someone killing the leader un-gracefully. It  
> > should
> > "just work". That said if you're seeing evidence to the contrary that's 
> > reality.
> >
> > What do you mean by "stats" though? It's perfectly ordinary for there to
> > be different numbers of _deleted_ documents on various replicas, and
> > consequently things like term frequencies and doc frequencies being
> > different. What's emphatically _not_ expected is for there to be different
> > numbers of "live" docs.
> >
> > "making sure nodes are in sync" is certainly an option. That should all
> > be automatic if you pause indexing and issue a commit, _then_
> > do a rebalance.
> >
> > I certainly agree that the code is broken and needs to be fixed, but I
> > also have to ask how many shards are we talking here? The code was
> > originally written for the case where 100s of leaders could be on the
> > same node, until you get in to a significant number of leaders on
> > a single node (10s at least) there haven't been reliable stats showing
> > that it's a performance issue. If you have threshold numbers where
> > you've seen it make a material difference it'd be great to share them.
> >
> > And I won't be getting back to this until the weekend, other urgent
> > stuff has come up...
> >
> > Best,
> > Erick
> >
> > On Fri, Jan 11, 2019 at 12:58 AM Bernd Fehling
> > <bernd.fehl...@uni-bielefeld.de> wrote:
> > >
> > > Hi Erik,
> > > yes, I would be happy to test any patches.
> > >
> > > Good news, I got rebalance working.
> > > After running the rebalance about 50 times with debugger and watching
> > > the behavior of my problem shard and its core_nodes within my test cloud
> > > I came to the point of failure. I solved it and now it works.
> > >
> > > Bad news, rebalance is still not reliable and there are many more
> > > problems and point of failure initiated by rebalanceLeaders or better
> > > by re-queueing the watchlist.
> > >
> > > How I located _my_ problem:
> > > Test cloud is 5 server (VM), 5 shards, 3 replica per shard, 1 java
> > > instance per server. 3 separate zookeepers.
> > > My problem, shard2 wasn't willing to rebalance to a specific core_node.
> > > core_nodes related (core_node1, core_node2, core_node10).
> > > core_node10 was the preferredLeader.
> > > It was just changing leader ship between core_node1 and core_node2,
> > > back and forth, whenever I called rebalanceLeader.
> > > First step, I stopped the server holding core_node2.
> > > Result, the leadership was staying at core_node1 whenever I called 
> > > rebalanceLeaders.
> > > Second step, from debugger I _forced_ during rebalanceLeaders the
> > > system to give the leadership to core_node10.
> > > Result, there was no leader anymore for that shard. Yes it can happen,
> > > you can end up with a shard having no leader but active core_nodes!!!
> > > To fix this I was giving preferredLeader to core_node1 and called 
> > > rebalanceLeaders.
> > > After that, preferredLeader was set back to core_node10 and I was back
> > > at the point I started, all calls to rebalanceLeaders kept the leader at 
> > > core_node1.
> > >
> > >  From the debug logs I got the hint about PeerSync of cores and 
> > > IndexFingerprint.
> > > The stats from my problem core_node10 showed that they differ from leader 
> > > core_node1.
> > > And the system notices the difference, starts a PeerSync and ends with 
> > > success.
> > > But actually the PeerSync seem to fail, because the stats of core_node1 
> > > and
> > > core_node10 still differ afterwards.
> > > Solution, I also stopped my server holding my problem core_node10, wiped 
> > > all data
> > > directories and started that server again. The core_nodes where rebuilt 
> > > from leader
> > > and now they are really in sync.
> > > Calling now rebalanceLeaders ended now with success to preferredLeader.
> > >
> > > My guess:
> > > You have to check if the cores, participating in leadership election, are 
> > > _really_
> > > in sync. And this must be done before starting any rebalance.
> > > Sounds ugly... :-(
> > >
> > > Next question, why is PeerSync not reporting an error?
> > > There is an info about "PeerSync START", "PeerSync Received 0 versions 
> > > from ... fingeprint:null"
> > > and "PeerSync DONE. sync succeeded" but the cores are not really in sync.
> > >
> > > Another test I did (with my new knowledge about synced cores):
> > > - Removing all preferredLeader properties
> > > - stopping, wiping data directory, starting all server one by one to get
> > >    all cores of all shards in sync
> > > - setting one preferredLeader for each shard but different from the 
> > > actual leader
> > > - calling rebalanceLeaders succeeded only at 2 shards with the first run,
> > >    not for all 5 shards (even with really all cores in sync).
> > > - after calling rebalanceLeaders again the other shards succeeded also.
> > > Result, rebalanceLeaders is still not reliable.
> > >
> > > I have to mention that I have about 520.000 docs per core in my test cloud
> > > and that there might also be a timing issue between calling 
> > > rebalanceLeaders,
> > > detecting that cores to become leader are not in sync with actual leader,
> > > and resync while waiting for new leader election.
> > >
> > > So far,
> > > Bernd
> > >
> > >
> > > Am 10.01.19 um 17:02 schrieb Erick Erickson:
> > > > Bernd:
> > > >
> > > > Don't feel bad about missing it, I wrote the silly stuff and it took me
> > > > some time to remember.....
> > > >
> > > > Those are  the rules.
> > > >
> > > > It's always humbling to look back at my own code and say "that
> > > > idiot should have put some comments in here..." ;)
> > > >
> > > > yeah, I agree there are a lot of moving parts here. I have a note to
> > > > myself to provide better feedback in the response. You're absolutely
> > > > right that we fire all these commands and hope they all work.  Just
> > > > returning "success" status doesn't guarantee leadership change.
> > > >
> > > > I'll be on another task the rest of this week, but I should be able
> > > > to dress things up over the weekend. That'll give you a patch to test
> > > > if you're willing.
> > > >
> > > > The actual code changes are pretty minimal, the bulk of the patch
> > > > will be the reworked test.
> > > >
> > > > Best,
> > > > Erick
> > > >

Re: REBALANCELEADERS is not reliable

Reply via email to