thanks for the info. we were looking to move to a stable release soon (we are on an old nightly build from April!). Has this issue existed since then? Do we have an idea when solr 4.1 will be made available? I am just trying to get an idea if we should wait or not.
On Thu, Dec 6, 2012 at 9:11 PM, Mark Miller <markrmil...@gmail.com> wrote: > I should have sent this some time ago: > > https://issues.apache.org/jira/browse/SOLR-3940 "Rejoining the leader > election incorrectly triggers the code path for a fresh cluster start > rather than fail over." > > The above is a somewhat ugly bug. > > It means that if you are playing around with recovery and you kill a > replica in a shard, it will take 3 minutes before a new leader takes over. > > This will be fixed in the upcoming 4.1 release (And has been fixed on 4x > since early October). > > This wait is only meant for cluster startup. The idea is that you might > introduce some random, old, out of date shard and then start up your > cluster - you don't want that shard to be a leader - so we wait around for > all known shards to startup so they can all participate in the initial > leader election and the best one can be chosen. It's meant as a protective > measure against a fairly unlikely event. But it's kicking in when it > shouldn't. > > You can just accept the 3 minute wait, or you can lower the wait from 3 > minutes (to like 10 seconds or to 0 seconds - just avoid the scenario I > mention above if you do). > > You can set the wait time in solr.xml by adding the attribute > leaderVoteWait={whatever miliseconds} to the cores node. > > Sorry about this - completely my fault. > > - Mark