Bernd: I just committed fixes on SOLR-13091 and SOLR-10935 to the repo, if you wanted to give it a whirl it's ready. By tonight (Sunday) I expect to change the response format a bit and update the ref guide, although you'll have to look at the doc changes in the format. There's a new summary section that gives "Success" or "Failure" that's supposed to be the only thing you really need to check...
One judgement call I made was that if a replica on a down node is the preferredLeader, it _can't_ be made leader, but this is still labeled "Success". Best, Erick On Sun, Jan 13, 2019 at 7:43 PM Erick Erickson <erickerick...@gmail.com> wrote: > > Bernd: > > I just attached a patch to > https://issues.apache.org/jira/browse/SOLR-13091. It's still rough, > the response from REBALANCELEADERS needs quite a bit of work (lots of > extra stuff in it now, and no overall verification). > I haven't run all the tests, nor precommit. > > I wanted to get something up so if you have a test environment that > you can easily test it in you'd have an early chance to play with it. > > It's against master, I also haven't tried to backport to 8.0 or 7x > yet. I doubt it'll be a problem, but if it does't apply cleanly let me > know. > > Best, > Erick > > On Fri, Jan 11, 2019 at 8:33 AM Erick Erickson <erickerick...@gmail.com> > wrote: > > > > bq: You have to check if the cores, participating in leadership > > election, are _really_ > > in sync. And this must be done before starting any rebalance. > > Sounds ugly... :-( > > > > This _should_ not be necessary. I'll add parenthetically that leader > > election has > > been extensively re-worked in Solr 7.3+ though because "interesting" things > > could happen. > > > > Manipulating the leader election queue is really no different than > > having to deal with, say, someone killing the leader un-gracefully. It > > should > > "just work". That said if you're seeing evidence to the contrary that's > > reality. > > > > What do you mean by "stats" though? It's perfectly ordinary for there to > > be different numbers of _deleted_ documents on various replicas, and > > consequently things like term frequencies and doc frequencies being > > different. What's emphatically _not_ expected is for there to be different > > numbers of "live" docs. > > > > "making sure nodes are in sync" is certainly an option. That should all > > be automatic if you pause indexing and issue a commit, _then_ > > do a rebalance. > > > > I certainly agree that the code is broken and needs to be fixed, but I > > also have to ask how many shards are we talking here? The code was > > originally written for the case where 100s of leaders could be on the > > same node, until you get in to a significant number of leaders on > > a single node (10s at least) there haven't been reliable stats showing > > that it's a performance issue. If you have threshold numbers where > > you've seen it make a material difference it'd be great to share them. > > > > And I won't be getting back to this until the weekend, other urgent > > stuff has come up... > > > > Best, > > Erick > > > > On Fri, Jan 11, 2019 at 12:58 AM Bernd Fehling > > <bernd.fehl...@uni-bielefeld.de> wrote: > > > > > > Hi Erik, > > > yes, I would be happy to test any patches. > > > > > > Good news, I got rebalance working. > > > After running the rebalance about 50 times with debugger and watching > > > the behavior of my problem shard and its core_nodes within my test cloud > > > I came to the point of failure. I solved it and now it works. > > > > > > Bad news, rebalance is still not reliable and there are many more > > > problems and point of failure initiated by rebalanceLeaders or better > > > by re-queueing the watchlist. > > > > > > How I located _my_ problem: > > > Test cloud is 5 server (VM), 5 shards, 3 replica per shard, 1 java > > > instance per server. 3 separate zookeepers. > > > My problem, shard2 wasn't willing to rebalance to a specific core_node. > > > core_nodes related (core_node1, core_node2, core_node10). > > > core_node10 was the preferredLeader. > > > It was just changing leader ship between core_node1 and core_node2, > > > back and forth, whenever I called rebalanceLeader. > > > First step, I stopped the server holding core_node2. > > > Result, the leadership was staying at core_node1 whenever I called > > > rebalanceLeaders. > > > Second step, from debugger I _forced_ during rebalanceLeaders the > > > system to give the leadership to core_node10. > > > Result, there was no leader anymore for that shard. Yes it can happen, > > > you can end up with a shard having no leader but active core_nodes!!! > > > To fix this I was giving preferredLeader to core_node1 and called > > > rebalanceLeaders. > > > After that, preferredLeader was set back to core_node10 and I was back > > > at the point I started, all calls to rebalanceLeaders kept the leader at > > > core_node1. > > > > > > From the debug logs I got the hint about PeerSync of cores and > > > IndexFingerprint. > > > The stats from my problem core_node10 showed that they differ from leader > > > core_node1. > > > And the system notices the difference, starts a PeerSync and ends with > > > success. > > > But actually the PeerSync seem to fail, because the stats of core_node1 > > > and > > > core_node10 still differ afterwards. > > > Solution, I also stopped my server holding my problem core_node10, wiped > > > all data > > > directories and started that server again. The core_nodes where rebuilt > > > from leader > > > and now they are really in sync. > > > Calling now rebalanceLeaders ended now with success to preferredLeader. > > > > > > My guess: > > > You have to check if the cores, participating in leadership election, are > > > _really_ > > > in sync. And this must be done before starting any rebalance. > > > Sounds ugly... :-( > > > > > > Next question, why is PeerSync not reporting an error? > > > There is an info about "PeerSync START", "PeerSync Received 0 versions > > > from ... fingeprint:null" > > > and "PeerSync DONE. sync succeeded" but the cores are not really in sync. > > > > > > Another test I did (with my new knowledge about synced cores): > > > - Removing all preferredLeader properties > > > - stopping, wiping data directory, starting all server one by one to get > > > all cores of all shards in sync > > > - setting one preferredLeader for each shard but different from the > > > actual leader > > > - calling rebalanceLeaders succeeded only at 2 shards with the first run, > > > not for all 5 shards (even with really all cores in sync). > > > - after calling rebalanceLeaders again the other shards succeeded also. > > > Result, rebalanceLeaders is still not reliable. > > > > > > I have to mention that I have about 520.000 docs per core in my test cloud > > > and that there might also be a timing issue between calling > > > rebalanceLeaders, > > > detecting that cores to become leader are not in sync with actual leader, > > > and resync while waiting for new leader election. > > > > > > So far, > > > Bernd > > > > > > > > > Am 10.01.19 um 17:02 schrieb Erick Erickson: > > > > Bernd: > > > > > > > > Don't feel bad about missing it, I wrote the silly stuff and it took me > > > > some time to remember..... > > > > > > > > Those are the rules. > > > > > > > > It's always humbling to look back at my own code and say "that > > > > idiot should have put some comments in here..." ;) > > > > > > > > yeah, I agree there are a lot of moving parts here. I have a note to > > > > myself to provide better feedback in the response. You're absolutely > > > > right that we fire all these commands and hope they all work. Just > > > > returning "success" status doesn't guarantee leadership change. > > > > > > > > I'll be on another task the rest of this week, but I should be able > > > > to dress things up over the weekend. That'll give you a patch to test > > > > if you're willing. > > > > > > > > The actual code changes are pretty minimal, the bulk of the patch > > > > will be the reworked test. > > > > > > > > Best, > > > > Erick > > > >