I figured out that most of the startup time seems to spent on waiting for
replicas to recover. It waits from 6 seconds all the way upto 600 seconds
for replicas to recover before trying again and sometimes it succeeds and
otherwise it marks the core as down. Is there a way to reduce the timeout
while recovery ? Also can anyone explain why the recovery takes so long ?
Cant it mark itself as the leader and not wait for some replica to be
available?

*Logs*:

ERROR - 2014-03-22 19:34:07.852; org.apache.solr.common.SolrException;
Error while trying to recover.
core=testcollection_shard5_replica1:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
I was asked to wait on state recovering for 10.1.1.100:8983_solr but I
still do not see the requested state. I see state: active live:true

at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)

at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)

at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202)

at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346)

at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)


ERROR - 2014-03-22 19:34:07.853; org.apache.solr.cloud.RecoveryStrategy;
Recovery failed - trying again... (6) core= testcollection_shard5_replica1

INFO  - 2014-03-22 19:34:07.853; org.apache.solr.cloud.RecoveryStrategy;
Wait 128.0 seconds before trying to recover again (7)







On Fri, Mar 21, 2014 at 1:05 PM, Chris W <chris1980....@gmail.com> wrote:

> Sorry for the piecemeal approach but had another question. I have a 3 zk
> ensemble. Does making 2 zk as observer roles help speed up bootup of solr
> (due to decrease in time it takes to decide leaders for shards)?
>
>
> On Fri, Mar 21, 2014 at 11:49 AM, Chris W <chris1980....@gmail.com> wrote:
>
>> Thanks Tim. I would definitely try that next time. I have seen a few
>> instances where the overseer_queue not getting processed but that looks
>> like an existing bug which got fixed in 4.6 (overseer doesnt process
>> requests when reload collection fails)
>>
>> One question: Assuming our cluster can tolerate downtime of about 10-15
>> minutes, is it ok to restart all solrnodes at the same time? or will there
>> be race conditions while recovery?
>>
>>
>>
>>
>> On Fri, Mar 21, 2014 at 11:08 AM, Mark Miller <markrmil...@gmail.com>wrote:
>>
>>>
>>> On March 21, 2014 at 1:46:13 PM, Tim Potter (tim.pot...@lucidworks.com)
>>> wrote:
>>>
>>> We've seen instances where you end up restarting the overseer node each
>>> time as you restart the cluster, which causes all kinds of craziness.
>>>
>>>
>>> That would be a great test to add tot he suite.
>>>
>>> --
>>> Mark Miller
>>> about.me/markrmiller
>>>
>>>
>>
>>
>> --
>> Best
>> --
>> C
>>
>
>
>
> --
> Best
> --
> C
>



-- 
Best
-- 
C

Reply via email to