I am also facing the same issue. My solr version is 4.10.2 On Tue, Jan 20, 2015 at 11:33 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> What version of Solr? > > > On Tue, Jan 20, 2015 at 7:07 AM, anand.mahajan <an...@zerebral.co.in> > wrote: > > Hi all, > > > > > > I have a cluster with 36 Shards and 3 replica per shard. I had to > recently > > restart the entire cluster - most of the shards & replica are back up - > but > > a few shards have not had any leaders for a long long time (close to 18 > > hours now) - I tried reloading these cores and even the servlet > containers > > hosting these cores. Its only now that all the shards have leaders > allocated > > - but few of these Leaders are still shown as Recovery Failed status on > the > > Solr Cloud tree view. > > > > > > I see the following in the logs for these shards - > > INFO - 2015-01-20 14:38:19.797; > > org.apache.solr.handler.admin.CoreAdminHandler; In > WaitForState(recovering): > > collection=collection1, shard=shard1, > thisCore=collection1_shard1_replica3, > > leaderDoesNotNeedRecovery=false, isLeader? true, live=true, > checkLive=true, > > currentState=recovering, localState=recovery_failed, > > nodeName=10.68.77.9:8983_solr, coreNodeName=core_node2, > > onlyIfActiveCheckResult=true, nodeProps: > > > core_node2:{"state":"recovering","core":"collection1_shard1_replica1","node_name":"10.68.77.9:8983 > _solr","base_url":"http://10.68.77.9:8983/solr"} > > > > > > And on other server hosting the replica for this shard - > > ERROR - 2015-01-20 14:38:20.768; org.apache.solr.common.SolrException; > > org.apache.solr.common.SolrException: I was asked to wait on state > > recovering for shard3 in collection1 on 10.68.77.9:8983_solr but I > still do > > not see the requested state. I see state: recovering live:true leader > from > > ZK: http://10.68.77.3:8983/solr/collection1_shard3_replica3/ > > at > > > org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:999) > > at > > > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:245) > > at > > > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188) > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > > at > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > > at > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > > at > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > > at > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > > at org.eclipse.jetty.server.Server.handle(Server.java:368) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) > > at > org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) > > at > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > > at > > > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > > at java.lang.Thread.run(Unknown Source) > > > > > > I see that there is no replica catch-up going on between any of these > > servers now. > > Couple of questions - > > 1. What is it that the Solr cloud is waiting on to allocate the leaders > for > > such shards? > > 2. Why are few of these shards show leaders in Recovery Failed state? And > > how do I recover such shards? > > > > Thanks, > > Anand > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/Leaders-in-Recovery-Failed-state-tp4180611.html > > Sent from the Solr - User mailing list archive at Nabble.com. >