I'm experiencing same problem in Solr4.1 during bulk loading. After 50 minutes of indexing the following error starts to occur:
INFO: [core] webapp=/solr path=/update params={} {} 0 4 Feb 02, 2013 11:36:15 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the leader, but locally we don't think so at org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:295) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:230) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:343) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:387) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:112) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:96) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:60) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Feb 02, 2013 11:36:15 PM org.apache.solr.common.SolrException log Feb 02, 2013 11:36:31 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=50699 Then leader tries to sync with replica and after it finishes I can continue loading. None of SolrCloud nodes was restarted during that time. I don't remember such behaviour in Solr4.0. Could it be related with the number of fields indexed during loading ? I have a collection with about 2400 fields. I can't reproduce same issue for other collections with much less fields per record. Regards. On 11 December 2012 19:50, Sudhakar Maddineni <maddineni...@gmail.com>wrote: > Just an update on this issue: > We tried by increasing zookeeper client timeout settings to 30000ms in > solr.xml (i think default is 15000ms), and haven't seen any issues from our > tests. > <cores ......... zkClientTimeout="30000" > > > Thanks, Sudhakar. > > On Fri, Dec 7, 2012 at 4:55 PM, Sudhakar Maddineni > <maddineni...@gmail.com>wrote: > > > We saw this error again today during our load test - basically, whenever > > session is getting expired on the leader node, we are seeing the > > error.After this happens, leader(001) is going into 'recovery' mode and > all > > the index updates are failing with "503- service unavailable" error > > message.After some time(once recovery is successful), roles are swapped > > i.e. 001 acting as the replica and 003 as leader. > > > > Btw, do you know why the connection to zookeeper[solr->zk] getting > > interrupted in the middle? > > is it because of the load(no of updates) we are putting on the cluster? > > > > Here is the exception stack trace: > > > > *Dec* *7*, *2012* *2:28:03* *PM* > *org.apache.solr.cloud.Overseer$ClusterStateUpdater* *amILeader* > > *WARNING:* > *org.apache.zookeeper.KeeperException$SessionExpiredException:* > *KeeperErrorCode* *=* *Session* *expired* *for* */overseer_elect/leader* > > *at* > *org.apache.zookeeper.KeeperException.create*(*KeeperException.java:118*) > > *at* > *org.apache.zookeeper.KeeperException.create*(*KeeperException.java:42*) > > *at* *org.apache.zookeeper.ZooKeeper.getData*(*ZooKeeper.java:927* > > ) > > *at* > *org.apache.solr.common.cloud.SolrZkClient$7.execute*(*SolrZkClient.java:244*) > > *at* > *org.apache.solr.common.cloud.SolrZkClient$7.execute*(*SolrZkClient.java:241*) > > *at* > *org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation*(*ZkCmdExecutor.java:63*) > > *at* > *org.apache.solr.common.cloud.SolrZkClient.getData*(*SolrZkClient.java:241*) > > *at* > *org.apache.solr.cloud.Overseer$ClusterStateUpdater.amILeader*(*Overseer.java:195*) > > *at* > *org.apache.solr.cloud.Overseer$ClusterStateUpdater.run*(*Overseer.java:119*) > > *at* *java.lang.Thread.run*(*Unknown* *Source*) > > > > Thx,Sudhakar. > > > > > > > > On Fri, Dec 7, 2012 at 3:16 PM, Sudhakar Maddineni < > maddineni...@gmail.com > > > wrote: > > > >> Erick: > >> Not seeing any page caching related issues... > >> > >> Mark: > >> 1.Would this "waiting" on 003(replica) cause any inconsistencies in > the > >> zookeeper cluster state? I was also looking at the leader(001) logs at > that > >> time and seeing errors related to "*SEVERE: ClusterState says we are the > >> leader, but locally we don't think so*". > >> 2.Also, all of our servers in cluster were gone down when the index > >> updates were running in parallel along with this issue.Do you see this > >> related to the session expiry on 001? > >> > >> Here are the logs on 001 > >> ========================= > >> > >> Dec 4, 2012 12:12:29 PM > >> org.apache.solr.cloud.Overseer$ClusterStateUpdater amILeader > >> WARNING: > >> org.apache.zookeeper.KeeperException$SessionExpiredException: > >> KeeperErrorCode = Session expired for /overseer_elect/leader > >> at > org.apache.zookeeper.KeeperException.create(KeeperException.java:118) > >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > >> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) > >> Dec 4, 2012 12:12:29 PM > >> org.apache.solr.cloud.Overseer$ClusterStateUpdater amILeader > >> INFO: According to ZK I > >> (id=232887758696546307-<001>:8080_solr-n_0000000005) am no longer a > leader. > >> > >> Dec 4, 2012 12:12:29 PM > org.apache.solr.cloud.OverseerCollectionProcessor > >> run > >> WARNING: Overseer cannot talk to ZK > >> > >> Dec 4, 2012 12:13:00 PM org.apache.solr.common.SolrException log > >> SEVERE: There was a problem finding the leader in > >> zk:org.apache.solr.common.SolrException: Could not get leader props > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:709) > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:673) > >> Dec 4, 2012 12:13:32 PM org.apache.solr.common.SolrException log > >> SEVERE: There was a problem finding the leader in > >> zk:org.apache.solr.common.SolrException: Could not get leader props > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:709) > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:673) > >> Dec 4, 2012 12:15:17 PM org.apache.solr.common.SolrException log > >> SEVERE: There was a problem making a request to the > >> leader:org.apache.solr.common.SolrException: I was asked to wait on > state > >> down for <001>:8080_solr but I still do not see the request state. I see > >> state: active live:true > >> at > >> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:401) > >> Dec 4, 2012 12:15:50 PM org.apache.solr.common.SolrException log > >> SEVERE: There was a problem making a request to the > >> leader:org.apache.solr.common.SolrException: I was asked to wait on > state > >> down for <001>:8080_solr but I still do not see the request state. I see > >> state: active live:true > >> at > >> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:401) > >> .... > >> .... > >> Dec 4, 2012 12:19:10 PM org.apache.solr.common.SolrException log > >> SEVERE: There was a problem finding the leader in > >> zk:org.apache.solr.common.SolrException: Could not get leader props > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:709) > >> .... > >> .... > >> Dec 4, 2012 12:21:24 PM org.apache.solr.common.SolrException log > >> SEVERE: :org.apache.solr.common.SolrException: There was a problem > >> finding the leader in zk > >> at > >> > org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1080) > >> at > >> > org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:273) > >> Dec 4, 2012 12:22:30 PM org.apache.solr.cloud.ZkController getLeader > >> SEVERE: Error getting leader from zk > >> org.apache.solr.common.SolrException: *There is conflicting information > >> about the leader of shard: shard1 our state says:http:// > <001>:8080/solr/core1/ > >> but zookeeper says:http://<003>:8080/solr/core1/* > >> * at > org.apache.solr.cloud.ZkController.getLeader(ZkController.java:647)* > >> * at org.apache.solr.cloud.ZkController.register(ZkController.java:577)* > >> Dec 4, 2012 12:22:30 PM > >> org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess > >> INFO: Running the leader process. > >> .... > >> .... > >> > >> Thanks for your inputs. > >> Sudhakar. > >> > >> > >> > >> > >> > >> > >> > >> On Thu, Dec 6, 2012 at 5:35 PM, Mark Miller <markrmil...@gmail.com > >wrote: > >> > >>> Yes - it means that 001 went down (or more likely had it's connection > to > >>> ZooKeeper interrupted? that's what I mean about a session timeout - if > the > >>> solr->zk link is broken for longer than the session timeout that will > >>> trigger a leader election and when the connection is reestablished, the > >>> node will have to recover). That waiting should stop as soon as 001 > came > >>> back up or reconnected to ZooKeeper. > >>> > >>> In fact, this waiting should not happen in this case - but only on > >>> cluster restart. This is a bug that is fixed in 4.1 (hopefully coming > very > >>> soon!): > >>> > >>> * SOLR-3940: Rejoining the leader election incorrectly triggers the > code > >>> path > >>> for a fresh cluster start rather than fail over. (Mark Miller) > >>> > >>> - Mark > >>> > >>> On Dec 5, 2012, at 9:41 PM, Sudhakar Maddineni <maddineni...@gmail.com > > > >>> wrote: > >>> > >>> > Yep, after restarting, cluster came back to normal state.We will run > >>> couple of more tests and see if we could reproduce this issue. > >>> > > >>> > Btw, I am attaching the server logs before that 'INFO: Waiting until > >>> we see more replicas' message.From the logs, we can see that leader > >>> election process started on 003 which was the replica for 001 > >>> initially.That means leader 001 went down at that time? > >>> > > >>> > logs on 003: > >>> > ======== > >>> > 12:11:16 PM org.apache.solr.cloud.ShardLeaderElectionContext > >>> runLeaderProcess > >>> > INFO: Running the leader process. > >>> > 12:11:16 PM org.apache.solr.cloud.ShardLeaderElectionContext > >>> shouldIBeLeader > >>> > INFO: Checking if I should try and be the leader. > >>> > 12:11:16 PM org.apache.solr.cloud.ShardLeaderElectionContext > >>> shouldIBeLeader > >>> > INFO: My last published State was Active, it's okay to be the > >>> leader. > >>> > 12:11:16 PM org.apache.solr.cloud.ShardLeaderElectionContext > >>> runLeaderProcess > >>> > INFO: I may be the new leader - try and sync > >>> > 12:11:16 PM org.apache.solr.cloud.RecoveryStrategy close > >>> > WARNING: Stopping recovery for > zkNodeName=<003>:8080_solr_core > >>> core=core1. > >>> > 12:11:16 PM org.apache.solr.cloud.SyncStrategy sync > >>> > INFO: Sync replicas to http://<003>:8080/solr/core1/ > >>> > 12:11:16 PM org.apache.solr.update.PeerSync sync > >>> > INFO: PeerSync: core=core1 url=http://<003>:8080/solr START > >>> replicas=[<001>:8080/solr/core1/] nUpdates=100 > >>> > 12:11:16 PM org.apache.solr.common.cloud.ZkStateReader$3 process > >>> > INFO: Updating live nodes -> this message is on 002 > >>> > 12:11:46 PM org.apache.solr.update.PeerSync handleResponse > >>> > WARNING: PeerSync: core=core1 url=http://<003>:8080/solr > >>> exception talking to <001>:8080/solr/core1/, failed > >>> > org.apache.solr.client.solrj.SolrServerException: Timeout > >>> occured while waiting response from server at: <001>:8080/solr/core1 > >>> > at > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:409) > >>> > at > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > >>> > at > >>> > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166) > >>> > at > >>> > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133) > >>> > at > java.util.concurrent.FutureTask$Sync.innerRun(Unknown > >>> Source) > >>> > at java.util.concurrent.FutureTask.run(Unknown Source) > >>> > at > >>> java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > >>> > at > java.util.concurrent.FutureTask$Sync.innerRun(Unknown > >>> Source) > >>> > at java.util.concurrent.FutureTask.run(Unknown Source) > >>> > at > >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) > >>> > at > >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > >>> > at java.lang.Thread.run(Unknown Source) > >>> > Caused by: java.net.SocketTimeoutException: Read timed out > >>> > at java.net.SocketInputStream.socketRead0(Native > Method) > >>> > at java.net.SocketInputStream.read(Unknown Source) > >>> > 12:11:46 PM org.apache.solr.update.PeerSync sync > >>> > INFO: PeerSync: core=core1 url=http://<003>:8080/solr DONE. > >>> sync failed > >>> > 12:11:46 PM org.apache.solr.common.SolrException log > >>> > SEVERE: Sync Failed > >>> > 12:11:46 PM org.apache.solr.cloud.ShardLeaderElectionContext > >>> rejoinLeaderElection > >>> > INFO: There is a better leader candidate than us - going back > >>> into recovery > >>> > 12:11:46 PM org.apache.solr.update.DefaultSolrCoreState doRecovery > >>> > INFO: Running recovery - first canceling any ongoing recovery > >>> > 12:11:46 PM org.apache.solr.cloud.RecoveryStrategy run > >>> > INFO: Starting recovery process. core=core1 > >>> recoveringAfterStartup=false > >>> > 12:11:46 PM org.apache.solr.cloud.RecoveryStrategy doRecovery > >>> > INFO: Attempting to PeerSync from <001>:8080/solr/core1/ > >>> core=core1 - recoveringAfterStartup=false > >>> > 12:11:46 PM org.apache.solr.update.PeerSync sync > >>> > INFO: PeerSync: core=core1 url=http://<003>:8080/solr START > >>> replicas=[<001>:8080/solr/core1/] nUpdates=100 > >>> > 12:11:46 PM org.apache.solr.cloud.ShardLeaderElectionContext > >>> runLeaderProcess > >>> > INFO: Running the leader process. > >>> > 12:11:46 PM org.apache.solr.cloud.ShardLeaderElectionContext > >>> waitForReplicasToComeUp > >>> > INFO: Waiting until we see more replicas up: total=2 found=1 > >>> timeoutin=179999 > >>> > 12:11:47 PM org.apache.solr.cloud.ShardLeaderElectionContext > >>> waitForReplicasToComeUp > >>> > INFO: Waiting until we see more replicas up: total=2 found=1 > >>> timeoutin=179495 > >>> > 12:11:48 PM org.apache.solr.cloud.ShardLeaderElectionContext > >>> waitForReplicasToComeUp > >>> > INFO: Waiting until we see more replicas up: total=2 found=1 > >>> timeoutin=178985 > >>> > .... > >>> > .... > >>> > > >>> > Thanks for your help. > >>> > Sudhakar. > >>> > > >>> > On Wed, Dec 5, 2012 at 6:19 PM, Mark Miller <markrmil...@gmail.com> > >>> wrote: > >>> > The waiting logging had to happen on restart unless it's some kind of > >>> bug. > >>> > > >>> > Beyond that, something is off, but I have no clue why - it seems your > >>> clusterstate.json is not up to date at all. > >>> > > >>> > Have you tried restarting the cluster then? Does that help at all? > >>> > > >>> > Do you see any exceptions around zookeeper session timeouts? > >>> > > >>> > - Mark > >>> > > >>> > On Dec 5, 2012, at 4:57 PM, Sudhakar Maddineni < > maddineni...@gmail.com> > >>> wrote: > >>> > > >>> > > Hey Mark, > >>> > > > >>> > > Yes, I am able to access all of the nodes under each shard from > >>> solrcloud > >>> > > admin UI. > >>> > > > >>> > > > >>> > > - *It kind of looks like the urls solrcloud is using are not > >>> accessible. > >>> > > When you go to the admin page and the cloud tab, can you access > >>> the urls it > >>> > > shows for each shard? That is, if you click on of the links or > >>> copy and > >>> > > paste the address into a web browser, does it work?* > >>> > > > >>> > > Actually, I got these errors when my document upload task/job was > >>> running, > >>> > > not during the cluster restart. Also,job ran fine initially for the > >>> first > >>> > > one hour and started throwing these errors after indexing some > docx. > >>> > > > >>> > > Thx, Sudhakar. > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > On Wed, Dec 5, 2012 at 5:38 PM, Mark Miller <markrmil...@gmail.com > > > >>> wrote: > >>> > > > >>> > >> It kind of looks like the urls solrcloud is using are not > >>> accessible. When > >>> > >> you go to the admin page and the cloud tab, can you access the > urls > >>> it > >>> > >> shows for each shard? That is, if you click on of the links or > copy > >>> and > >>> > >> paste the address into a web browser, does it work? > >>> > >> > >>> > >> You may have to explicitly set the host= in solr.xml if it's not > >>> auto > >>> > >> detecting the right one. Make sure the ports like right too. > >>> > >> > >>> > >>> waitForReplicasToComeUp > >>> > >>> INFO: Waiting until we see more replicas up: total=2 found=1 > >>> > >>> timeoutin=179999 > >>> > >> > >>> > >> That happens when you stop the cluster and try to start it again - > >>> before > >>> > >> a leader is chosen, it will wait for all known replicas fora shard > >>> to come > >>> > >> up so that everyone can sync up and have a chance to be the best > >>> leader. So > >>> > >> at this point it was only finding one of 2 known replicas and > >>> waiting for > >>> > >> the second to come up. After a couple minutes (configurable) it > >>> will just > >>> > >> continue anyway without the missing replica (if it doesn't show > up). > >>> > >> > >>> > >> - Mark > >>> > >> > >>> > >> On Dec 5, 2012, at 4:21 PM, Sudhakar Maddineni < > >>> maddineni...@gmail.com> > >>> > >> wrote: > >>> > >> > >>> > >>> Hi, > >>> > >>> We are uploading solr documents to the index in batches using 30 > >>> threads > >>> > >>> and using ThreadPoolExecutor, LinkedBlockingQueue with max limit > >>> set to > >>> > >>> 10000. > >>> > >>> In the code, we are using HttpSolrServer and add(inputDoc) method > >>> to add > >>> > >>> docx. > >>> > >>> And, we have the following commit settings in solrconfig: > >>> > >>> > >>> > >>> <autoCommit> > >>> > >>> <maxTime>300000</maxTime> > >>> > >>> <maxDocs>10000</maxDocs> > >>> > >>> <openSearcher>false</openSearcher> > >>> > >>> </autoCommit> > >>> > >>> > >>> > >>> <autoSoftCommit> > >>> > >>> <maxTime>1000</maxTime> > >>> > >>> </autoSoftCommit> > >>> > >>> > >>> > >>> Cluster Details: > >>> > >>> ---------------------------- > >>> > >>> solr version - 4.0 > >>> > >>> zookeeper version - 3.4.3 [zookeeper ensemble with 3 nodes] > >>> > >>> numshards=2 , > >>> > >>> 001, 002, 003 are the solr nodes and these three are behind the > >>> > >>> loadbalancer <vip> > >>> > >>> 001, 003 assigned to shard1; 002 assigned to shard2 > >>> > >>> > >>> > >>> > >>> > >>> Logs:Getting the errors in the below sequence after uploading > some > >>> docx: > >>> > >>> > >>> > >> > >>> > ----------------------------------------------------------------------------------------------------------- > >>> > >>> 003 > >>> > >>> Dec 4, 2012 12:11:46 PM > >>> org.apache.solr.cloud.ShardLeaderElectionContext > >>> > >>> waitForReplicasToComeUp > >>> > >>> INFO: Waiting until we see more replicas up: total=2 found=1 > >>> > >>> timeoutin=179999 > >>> > >>> > >>> > >>> 001 > >>> > >>> Dec 4, 2012 12:12:59 PM > >>> > >>> org.apache.solr.update.processor.DistributedUpdateProcessor > >>> > >>> doDefensiveChecks > >>> > >>> SEVERE: ClusterState says we are the leader, but locally we don't > >>> think > >>> > >> so > >>> > >>> > >>> > >>> 003 > >>> > >>> Dec 4, 2012 12:12:59 PM org.apache.solr.common.SolrException log > >>> > >>> SEVERE: forwarding update to <001>:8080/solr/core1/ failed - > >>> retrying ... > >>> > >>> > >>> > >>> 001 > >>> > >>> Dec 4, 2012 12:12:59 PM org.apache.solr.common.SolrException log > >>> > >>> SEVERE: Error uploading: org.apache.solr.common.SolrException: > >>> Server at > >>> > >>> <vip>/solr/core1. returned non ok status:503, message:Service > >>> Unavailable > >>> > >>> at > >>> > >>> > >>> > >> > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) > >>> > >>> at > >>> > >>> > >>> > >> > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > >>> > >>> 001 > >>> > >>> Dec 4, 2012 12:25:45 PM org.apache.solr.common.SolrException log > >>> > >>> SEVERE: Error while trying to recover. > >>> > >>> core=core1:org.apache.solr.common.SolrException: We are not the > >>> leader > >>> > >>> at > >>> > >>> > >>> > >> > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:401) > >>> > >>> > >>> > >>> 001 > >>> > >>> Dec 4, 2012 12:44:38 PM org.apache.solr.common.SolrException log > >>> > >>> SEVERE: Error uploading: > >>> > >> org.apache.solr.client.solrj.SolrServerException: > >>> > >>> IOException occured when talking to server at <vip>/solr/core1 > >>> > >>> at > >>> > >>> > >>> > >> > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413) > >>> > >>> at > >>> > >>> > >>> > >> > >>> > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > >>> > >>> at > >>> > >>> > >>> > >> > >>> > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) > >>> > >>> at > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) > >>> > >>> ... 5 lines omitted ... > >>> > >>> at java.lang.Thread.run(Unknown Source) > >>> > >>> Caused by: java.net.SocketException: Connection reset > >>> > >>> > >>> > >>> > >>> > >>> After sometime, all the three servers are going down. > >>> > >>> > >>> > >>> Appreciate, if someone could let us know what we are missing. > >>> > >>> > >>> > >>> Thx,Sudhakar. > >>> > >> > >>> > >> > >>> > > >>> > > >>> > <logs_error.txt> > >>> > >>> > >> > > >