Hi, In this screenshot I have a shard with two replicas without leader,
http://picpaste.com/qf2jdkj8.png On machine with shard green I found this exception: INFO - dat5 - 2013-10-18 22:48:04.775; org.apache.solr.handler.admin.CoreAdminHandler; Going to wait for coreNodeName: 192.168.20.106:8983_solr_statistics-13_shard18_replica4, state: recovering, checkLive: true, onlyIfLeader: true ERROR - dat5 - 2013-10-18 22:48:04.775; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: We are not the leader at org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:824) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:192) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:655) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:246) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) -- at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) On the machine with the shard in recovery state I found this exception: INFO - dat6 - 2013-10-18 22:48:44.131; org.apache.solr.cloud.ShardLeaderElectionContext; Running the leader process for shard shard18 INFO - dat6 - 2013-10-18 22:48:44.137; org.apache.solr.cloud.ShardLeaderElectionContext; Checking if I should try and be the leader. INFO - dat6 - 2013-10-18 22:48:44.138; org.apache.solr.cloud.ShardLeaderElectionContext; My last published State was recovering, I won't be the leader. INFO - dat6 - 2013-10-18 22:48:44.139; org.apache.solr.cloud.ShardLeaderElectionContext; There may be a better leader candidate than us - going back into recovery INFO - dat6 - 2013-10-18 22:48:44.142; org.apache.solr.update.DefaultSolrCoreState; Running recovery - first canceling any ongoing recovery WARN - dat6 - 2013-10-18 22:48:44.142; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=192.168.20.106:8983_solr_statistics-13_shard18_replica4core=statistics-13_shard18_replica4 INFO - dat6 - 2013-10-18 22:48:45.131; org.apache.solr.cloud.RecoveryStrategy; Finished recovery process. core=statistics-13_shard18_replica4 INFO - dat6 - 2013-10-18 22:48:45.131; org.apache.solr.cloud.RecoveryStrategy; Starting recovery process. core=statistics-13_shard18_replica4 recoveringAfterStartup=false INFO - dat6 - 2013-10-18 22:48:45.131; org.apache.solr.cloud.ZkController; publishing core=statistics-13_shard18_replica4 state=recovering INFO - dat6 - 2013-10-18 22:48:45.132; org.apache.solr.cloud.ZkController; numShards not found on descriptor - reading it from system property INFO - dat6 - 2013-10-18 22:48:45.141; org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client, config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false ERROR - dat6 - 2013-10-18 22:48:45.143; org.apache.solr.common.SolrException; Error while trying to recover. core=statistics-13_shard18_replica4:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: We are not the leader at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:424) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:198) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:342) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:219) No leader means we can't index data because a 503 http status code is returned. Is this the normal behaviour or a bug? ----- Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-fails-in-some-point-tp4096514.html Sent from the Solr - User mailing list archive at Nabble.com.