Hey Jamie - long time, no see.

On Dec 8, 2012, at 5:19 AM, Jamie Johnson <jej2...@gmail.com> wrote:

> thanks for the info.  we were looking to move to a stable release soon (we
> are on an old nightly build from April!).  Has this issue existed since
> then?  

It was introduced shortly before 4.0 was released, so no, I don't think so.

> Do we have an idea when solr 4.1 will be made available?  I am just
> trying to get an idea if we should wait or not.

I hope very, very soon…just have to herd a few cats…

- Mark

> 
> 
> On Thu, Dec 6, 2012 at 9:11 PM, Mark Miller <markrmil...@gmail.com> wrote:
> 
>> I should have sent this some time ago:
>> 
>> https://issues.apache.org/jira/browse/SOLR-3940 "Rejoining the leader
>> election incorrectly triggers the code path for a fresh cluster start
>> rather than fail over."
>> 
>> The above is a somewhat ugly bug.
>> 
>> It means that if you are playing around with recovery and you kill a
>> replica in a shard, it will take 3 minutes before a new leader takes over.
>> 
>> This will be fixed in the upcoming 4.1 release (And has been fixed on 4x
>> since early October).
>> 
>> This wait is only meant for cluster startup. The idea is that you might
>> introduce some random, old, out of date shard and then start up your
>> cluster - you don't want that shard to be a leader - so we wait around for
>> all known shards to startup so they can all participate in the initial
>> leader election and the best one can be chosen. It's meant as a protective
>> measure against a fairly unlikely event. But it's kicking in when it
>> shouldn't.
>> 
>> You can just accept the 3 minute wait, or you can lower the wait from 3
>> minutes (to like 10 seconds or to 0 seconds - just avoid the scenario I
>> mention above if you do).
>> 
>> You can set the wait time in solr.xml by adding the attribute
>> leaderVoteWait={whatever miliseconds} to the cores node.
>> 
>> Sorry about this - completely my fault.
>> 
>> - Mark

Reply via email to