Hey Jamie - long time, no see. On Dec 8, 2012, at 5:19 AM, Jamie Johnson <jej2...@gmail.com> wrote:
> thanks for the info. we were looking to move to a stable release soon (we > are on an old nightly build from April!). Has this issue existed since > then? It was introduced shortly before 4.0 was released, so no, I don't think so. > Do we have an idea when solr 4.1 will be made available? I am just > trying to get an idea if we should wait or not. I hope very, very soon…just have to herd a few cats… - Mark > > > On Thu, Dec 6, 2012 at 9:11 PM, Mark Miller <markrmil...@gmail.com> wrote: > >> I should have sent this some time ago: >> >> https://issues.apache.org/jira/browse/SOLR-3940 "Rejoining the leader >> election incorrectly triggers the code path for a fresh cluster start >> rather than fail over." >> >> The above is a somewhat ugly bug. >> >> It means that if you are playing around with recovery and you kill a >> replica in a shard, it will take 3 minutes before a new leader takes over. >> >> This will be fixed in the upcoming 4.1 release (And has been fixed on 4x >> since early October). >> >> This wait is only meant for cluster startup. The idea is that you might >> introduce some random, old, out of date shard and then start up your >> cluster - you don't want that shard to be a leader - so we wait around for >> all known shards to startup so they can all participate in the initial >> leader election and the best one can be chosen. It's meant as a protective >> measure against a fairly unlikely event. But it's kicking in when it >> shouldn't. >> >> You can just accept the 3 minute wait, or you can lower the wait from 3 >> minutes (to like 10 seconds or to 0 seconds - just avoid the scenario I >> mention above if you do). >> >> You can set the wait time in solr.xml by adding the attribute >> leaderVoteWait={whatever miliseconds} to the cores node. >> >> Sorry about this - completely my fault. >> >> - Mark