thanks for the info.  we were looking to move to a stable release soon (we
are on an old nightly build from April!).  Has this issue existed since
then?  Do we have an idea when solr 4.1 will be made available?  I am just
trying to get an idea if we should wait or not.


On Thu, Dec 6, 2012 at 9:11 PM, Mark Miller <markrmil...@gmail.com> wrote:

> I should have sent this some time ago:
>
> https://issues.apache.org/jira/browse/SOLR-3940 "Rejoining the leader
> election incorrectly triggers the code path for a fresh cluster start
> rather than fail over."
>
> The above is a somewhat ugly bug.
>
> It means that if you are playing around with recovery and you kill a
> replica in a shard, it will take 3 minutes before a new leader takes over.
>
> This will be fixed in the upcoming 4.1 release (And has been fixed on 4x
> since early October).
>
> This wait is only meant for cluster startup. The idea is that you might
> introduce some random, old, out of date shard and then start up your
> cluster - you don't want that shard to be a leader - so we wait around for
> all known shards to startup so they can all participate in the initial
> leader election and the best one can be chosen. It's meant as a protective
> measure against a fairly unlikely event. But it's kicking in when it
> shouldn't.
>
> You can just accept the 3 minute wait, or you can lower the wait from 3
> minutes (to like 10 seconds or to 0 seconds - just avoid the scenario I
> mention above if you do).
>
> You can set the wait time in solr.xml by adding the attribute
> leaderVoteWait={whatever miliseconds} to the cores node.
>
> Sorry about this - completely my fault.
>
> - Mark

Reply via email to