Hi Erik,
patches and the new comments look good.
Unfortunately I'm at 6.6.5 and can't test this with my cloud.
Replica (o.a.s.common.cloud.Replica) at 6.6.5 is to far away from 7.6 and up.
And a backport for 6.6.5 is to much rework, if possible at all.
Thanks for solving this issue.
Regards,
Be
Bernd:
I just committed fixes on SOLR-13091 and SOLR-10935 to the repo, if
you wanted to give it a whirl it's ready. By tonight (Sunday) I expect
to change the response format a bit and update the ref guide, although
you'll have to look at the doc changes in the format. There's a new
summary secti
Bernd:
I just attached a patch to
https://issues.apache.org/jira/browse/SOLR-13091. It's still rough,
the response from REBALANCELEADERS needs quite a bit of work (lots of
extra stuff in it now, and no overall verification).
I haven't run all the tests, nor precommit.
I wanted to get something up
bq: You have to check if the cores, participating in leadership
election, are _really_
in sync. And this must be done before starting any rebalance.
Sounds ugly... :-(
This _should_ not be necessary. I'll add parenthetically that leader
election has
been extensively re-worked in Solr 7.3+ though b
Hi Erik,
yes, I would be happy to test any patches.
Good news, I got rebalance working.
After running the rebalance about 50 times with debugger and watching
the behavior of my problem shard and its core_nodes within my test cloud
I came to the point of failure. I solved it and now it works.
Bad
Bernd:
Don't feel bad about missing it, I wrote the silly stuff and it took me
some time to remember.
Those are the rules.
It's always humbling to look back at my own code and say "that
idiot should have put some comments in here..." ;)
yeah, I agree there are a lot of moving parts here. I
Hi Erik,
that is very valuable info I missed.
Shouldn't that belong into an issue about rework at REBALANCELEADERS?
With your explanation the use of a queue makes sense and now I see some of
the logic behind.
- there is the leader and the firstWatcher
- if firstWatcher goes down or is inactive t
Executive summary:
The central problem is "how can I insert an ephemeral node
in a specific place in a ZK queue". The code could be much,
much simpler if there were a reliable way to do just that. I haven't
looked at more recent ZKs to see if it's possible, I'd love it if
there were a better way.
Yes, your findings are also very strange.
I wonder if we can discover the "inventor" of all this and ask him
how it should work or better how he originally wanted it to work.
Comments in the code (RebalanceLeaders.java) state that it is possible
to have more than one electionNode with the same se
It's weirder than that. In the current test on master, the
assumption is that the node recorded as leader in ZK
is actually the leader, see
TestRebalanceLeaders.checkZkLeadersAgree(). The theory
is that the identified leader node in ZK is actually the leader
after the rebalance command. But you're
Hi Erick,
after some more hours of debugging the rough result is, who ever invented
this leader election did not check if an action returns the estimated
result. There are only checks for exceptions, true/false, new sequence
numbers and so on, but never if a leader election to the preferredleader
I looked at the test last night and it's...disturbing. It succeeds
100% of the time. Manual testing seems to fail very often.
Of course it was late and I was a bit cross-eyed, so maybe
I wasn't looking at the manual tests correctly. Or maybe the
test is buggy.
I beasted the test 100x last night an
As far as I could see with debugger there is still a problem in requeing.
There is a watcher and it is recognized that the watcher is not a
preferredleader.
So it tries to locate a preferredleader with success.
It then calls makeReplicaFirstWatcher and gets a new sequence number for
the preferre
I'm reworking the test case, so hold off on doing that. If you want to
raise a JIRA, though. please do and attach your patch...
On Thu, Dec 20, 2018 at 10:53 AM Erick Erickson wrote:
>
> Nothing that I know of was _intentionally_ changed with this between
> 6x and 7x. That said, nothing that I kn
Nothing that I know of was _intentionally_ changed with this between
6x and 7x. That said, nothing that I know of was done to verify that
TLOG and PULL replicas (added in 7x) were handled correctly. There's a
test "TestRebalanceLeaders" for this functionality that has run since
the feature was put
Hi Vadim,
I just tried it with 6.6.5.
In my test cloud with 5 shards, 5 nodes, 3 cores per node it missed
one shard to become leader. But noticed that one shard already was
leader. No errors or exceptions in logs.
May be I should enable debug logging and try again to see all logging
messages from
; nearest releaase
> --
> Vadim
>
> > -Original Message-
> > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> > Sent: Friday, December 07, 2018 6:13 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: REBALANCELEADERS is not reliable
>
pb.ntk-intourist.ru]
> Sent: Friday, December 07, 2018 6:13 PM
> To: solr-user@lucene.apache.org
> Subject: RE: REBALANCELEADERS is not reliable
>
> I'm waiting for 7.6 or 7.5.1 and plan to apply patch from Endika Posadas to
> it.
> Then test again and hope it'll help
user@lucene.apache.org
> Subject: Re: REBALANCELEADERS is not reliable
>
> Thanks for looking this up.
> It could be a hint where to jump into the code.
> I wonder why they rejected a jira ticket about this problem?
>
> Regards, Bernd
>
> Am 06.12.18 um 16:31 schrieb
-Leader-node-deleted-when-rebalancing-leaders-td4417040.html
May be it will shed some light?
-Original Message-
From: Atita Arora [mailto:atitaar...@gmail.com]
Sent: Thursday, November 29, 2018 11:03 PM
To: solr-user@lucene.apache.org
Subject: Re: REBALANCELEADERS is not reliable
Indeed, I
Thursday, November 29, 2018 11:03 PM
> To: solr-user@lucene.apache.org
> Subject: Re: REBALANCELEADERS is not reliable
>
> Indeed, I tried that on 7.4 & 7.5 too, indeed did not work for me as well,
> even with the preferredLeader property as recommended in the
> documentation.
&g
Indeed, I tried that on 7.4 & 7.5 too, indeed did not work for me as well,
even with the preferredLeader property as recommended in the documentation.
I handled it with a little hack but certainly this dint work as expected.
I can provide more details if there's a ticket.
On Thu, Nov 29, 2018 at 8
++ correction
On Fri, Nov 30, 2018, 01:10 Aman Tandon For me today, I deleted the leader replica of one of the two shard
> collection. Then other replicas of that shard wasn't getting elected for
> leader.
>
> After waiting for long tried the setting addreplicaprop preferred leader
> on one of th
For me today, I deleted the leader replica of one of the two shard
collection. Then other replica of that shard was getting elected for leader.
After waiting for long tried the setting addreplicaprop preferred leader on
one of the replica then tried FORCELEADER but no luck. Then also tried
rebalan
Hi Vadim,
thanks for confirming.
So it seems to be a general problem with Solr 6.x, 7.x and might
be still there in the most recent versions.
But where to start to debug this problem, is it something not
correctly stored in zookeeper or is overseer the problem?
I was also reading something abou
Hi, Bernd
I have tried REBALANCELEADERS with Solr 6.3 and 7.5
I had very similar results and notion that it's not reliable :(
--
Br, Vadim
> -Original Message-
> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
> Sent: Tuesday, November 27, 2018 5:13 PM
> To: solr-user@lucene.
26 matches
Mail list logo