Forgot to attach the log during the recovery failed solr.log.129:1625677:ERROR - 2014-03-06 13:29:31.909; org.apache.solr.common.SolrException; Error while trying to recover:org.apache.solr.common.SolrException: Replication for recovery failed. solr.log.129-1625849- at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156) solr.log.129-1625929- at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409) solr.log.129-1626010- at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223)
solr.log.129-1626085-INFO - 2014-03-06 13:29:31.910; org.apache.solr.update.UpdateLog; Dropping buffered updates FSUpdateLog{state=BUFFERING, tlog=tlog{file=/mnt/search/solr/ testcollection_shard1_replica2/data/tlog/tlog.0000000000000000000 refcount=1}} solr.log.129-1626353-ERROR - 2014-03-06 13:29:31.910; org.apache.solr.cloud.RecoveryStrategy; Recovery failed - trying again... (7) core=testcollection_shard1_replica2 On Fri, Mar 7, 2014 at 11:24 AM, Veera Raghavan <veera.raghavan...@gmail.com > wrote: > Hi there > > I have a 6 node solrcloud cluster with 50 collections. All collections > are sharded across all the 6 nodes. I am seeing a weird behavior where both > the replicas for a shard go to down to go to a "recovering" state and > never come back (No specific corelation to writes or reads). > > I manually am unloading and recreating the cores to band aid the problem > > In the solr logs I see this.. > > org.apache.solr.servlet.SolrDispatchFilter; [admin] webapp=null > path=/admin/cores > params={coreNodeName=<ip>:8983_solr_testcollection_shard1_replica1&state=recovering&nodeName=<ip>:8983_solr&action=PREPRECOVERY&checkLive=true&core=solr_testcollection_shard1_replica2&wt=javabin&onlyIfLeader=true&version=2} > status=0 QTime=99 > > > Have any of you seen this issue before? Is it a known bug that can be > fixed with an upgrade? Should i increase the zookeeper timeout may be? > > > Any pointers are much appreciated > Thanks > Veera > > >