Hi Erik,
yes, I would be happy to test any patches.

Good news, I got rebalance working.
After running the rebalance about 50 times with debugger and watching
the behavior of my problem shard and its core_nodes within my test cloud
I came to the point of failure. I solved it and now it works.

Bad news, rebalance is still not reliable and there are many more
problems and point of failure initiated by rebalanceLeaders or better
by re-queueing the watchlist.

How I located _my_ problem:
Test cloud is 5 server (VM), 5 shards, 3 replica per shard, 1 java
instance per server. 3 separate zookeepers.
My problem, shard2 wasn't willing to rebalance to a specific core_node.
core_nodes related (core_node1, core_node2, core_node10).
core_node10 was the preferredLeader.
It was just changing leader ship between core_node1 and core_node2,
back and forth, whenever I called rebalanceLeader.
First step, I stopped the server holding core_node2.
Result, the leadership was staying at core_node1 whenever I called 
rebalanceLeaders.
Second step, from debugger I _forced_ during rebalanceLeaders the
system to give the leadership to core_node10.
Result, there was no leader anymore for that shard. Yes it can happen,
you can end up with a shard having no leader but active core_nodes!!!
To fix this I was giving preferredLeader to core_node1 and called 
rebalanceLeaders.
After that, preferredLeader was set back to core_node10 and I was back
at the point I started, all calls to rebalanceLeaders kept the leader at 
core_node1.

From the debug logs I got the hint about PeerSync of cores and IndexFingerprint.
The stats from my problem core_node10 showed that they differ from leader 
core_node1.
And the system notices the difference, starts a PeerSync and ends with success.
But actually the PeerSync seem to fail, because the stats of core_node1 and
core_node10 still differ afterwards.
Solution, I also stopped my server holding my problem core_node10, wiped all 
data
directories and started that server again. The core_nodes where rebuilt from 
leader
and now they are really in sync.
Calling now rebalanceLeaders ended now with success to preferredLeader.

My guess:
You have to check if the cores, participating in leadership election, are 
_really_
in sync. And this must be done before starting any rebalance.
Sounds ugly... :-(

Next question, why is PeerSync not reporting an error?
There is an info about "PeerSync START", "PeerSync Received 0 versions from ... 
fingeprint:null"
and "PeerSync DONE. sync succeeded" but the cores are not really in sync.

Another test I did (with my new knowledge about synced cores):
- Removing all preferredLeader properties
- stopping, wiping data directory, starting all server one by one to get
  all cores of all shards in sync
- setting one preferredLeader for each shard but different from the actual 
leader
- calling rebalanceLeaders succeeded only at 2 shards with the first run,
  not for all 5 shards (even with really all cores in sync).
- after calling rebalanceLeaders again the other shards succeeded also.
Result, rebalanceLeaders is still not reliable.

I have to mention that I have about 520.000 docs per core in my test cloud
and that there might also be a timing issue between calling rebalanceLeaders,
detecting that cores to become leader are not in sync with actual leader,
and resync while waiting for new leader election.

So far,
Bernd


Am 10.01.19 um 17:02 schrieb Erick Erickson:
Bernd:

Don't feel bad about missing it, I wrote the silly stuff and it took me
some time to remember.....

Those are  the rules.

It's always humbling to look back at my own code and say "that
idiot should have put some comments in here..." ;)

yeah, I agree there are a lot of moving parts here. I have a note to
myself to provide better feedback in the response. You're absolutely
right that we fire all these commands and hope they all work.  Just
returning "success" status doesn't guarantee leadership change.

I'll be on another task the rest of this week, but I should be able
to dress things up over the weekend. That'll give you a patch to test
if you're willing.

The actual code changes are pretty minimal, the bulk of the patch
will be the reworked test.

Best,
Erick

Reply via email to