Attention Solr 4.0 SolrCloud users

Mark Miller Thu, 06 Dec 2012 18:12:47 -0800

I should have sent this some time ago:

https://issues.apache.org/jira/browse/SOLR-3940 "Rejoining the leader election 
incorrectly triggers the code path for a fresh cluster start rather than fail 
over."


The above is a somewhat ugly bug.

It means that if you are playing around with recovery and you kill a replica in 
a shard, it will take 3 minutes before a new leader takes over.

This will be fixed in the upcoming 4.1 release (And has been fixed on 4x since 
early October).

This wait is only meant for cluster startup. The idea is that you might 
introduce some random, old, out of date shard and then start up your cluster - 
you don't want that shard to be a leader - so we wait around for all known 
shards to startup so they can all participate in the initial leader election 
and the best one can be chosen. It's meant as a protective measure against a 
fairly unlikely event. But it's kicking in when it shouldn't.

You can just accept the 3 minute wait, or you can lower the wait from 3 minutes 
(to like 10 seconds or to 0 seconds - just avoid the scenario I mention above 
if you do).

You can set the wait time in solr.xml by adding the attribute 
leaderVoteWait={whatever miliseconds} to the cores node.

Sorry about this - completely my fault.

- Mark

Attention Solr 4.0 SolrCloud users

Reply via email to