Hi - at first i didn't recreate the Zookeeper data but i got it to work. I'll check the removal of the LOG line.
thanks -----Original message----- > From:Sami Siren <ssi...@gmail.com> > Sent: Wed 19-Sep-2012 17:45 > To: solr-user@lucene.apache.org > Subject: Re: Nodes cannot recover and become unavailable > > also, did you re create the cluster after upgrading to a newer > version? I believe there were some changes made to the > clusterstate.json recently that are not backwards compatible. > > -- > Sami Siren > > > > On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren <ssi...@gmail.com> wrote: > > Hi, > > > > I am having troubles understanding the reason for that NPE. > > > > First you could try removing the line #102 in HttpClientUtility so > > that logging does not prevent creation of the http client in > > SyncStrategy. > > > > -- > > Sami Siren > > > > On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma > > <markus.jel...@openindex.io> wrote: > >> Hi, > >> > >> Since the 2012-09-17 11:10:41 build shards start to have trouble coming > >> back online. When i restart one node the slices on the other nodes are > >> throwing exceptions and cannot be queried. I'm not sure how to remedy the > >> problem but stopping a node or restarting it a few times seems to help it. > >> The problem is when i restart a node, and it happens, i must not restart > >> another node because that may trigger other slices becoming unavailable. > >> > >> Here are some parts of the log: > >> > >> 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - > >> [RecoveryThread] - : Recovery failed - trying again... core=oi_i > >> 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - > >> [main-EventThread] - : Stopping recovery for > >> zkNodeName=nl10.host:8080_solr_oi_icore=oi_i > >> 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - > >> : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j > >> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - > >> [RecoveryThread] - : Error while trying to recover. > >> core=oi_i:org.apache.solr.common.SolrException: We are not the leader > >> at > >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402) > >> at > >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) > >> at > >> org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199) > >> at > >> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388) > >> at > >> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) > >> > >> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - > >> [RecoveryThread] - : Recovery failed - trying again... core=oi_i > >> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - > >> [RecoveryThread] - : Recovery failed - max retries exceeded. core=oi_i > >> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - > >> [RecoveryThread] - : Recovery failed - I give up. core=oi_i > >> 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - > >> [RecoveryThread] - : Stopping recovery for > >> zkNodeName=nl10.host:8080_solr_oi_icore=oi_i > >> ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request > >> error: java.lang.NullPointerException > >> ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : > >> http://nl10.host:8080/solr/oi_i/: Could not tell a replica to > >> recover:java.lang.NullPointerException > >> at > >> org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) > >> at > >> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102) > >> at > >> org.apache.solr.client.solrj.impl.HttpSolrServer.<init>(HttpSolrServer.java:155) > >> at > >> org.apache.solr.client.solrj.impl.HttpSolrServer.<init>(HttpSolrServer.java:128) > >> at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262) > >> at > >> org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272) > >> at > >> org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203) > >> at > >> org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125) > >> at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87) > >> at > >> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169) > >> at > >> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) > >> at > >> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) > >> at > >> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275) > >> at > >> org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326) > >> at > >> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159) > >> at > >> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158) > >> at > >> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102) > >> at > >> org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:56) > >> at > >> org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:131) > >> at > >> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526) > >> at > >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) > >> > >> ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while > >> calling watcher > >> java.lang.NullPointerException > >> at > >> org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:139) > >> at > >> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526) > >> at > >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) > >> ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while > >> calling watcher > >> java.lang.NullPointerException > >> at > >> org.apache.solr.common.cloud.ZkStateReader$3.process(ZkStateReader.java:238) > >> at > >> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526) > >> at > >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) > >> ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while > >> calling watcher > >> java.lang.NullPointerException > >> at > >> org.apache.solr.common.cloud.ZkStateReader$2.process(ZkStateReader.java:189) > >> at > >> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526) > >> at > >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502) > >> 2012-09-19 14:14:05,304 WARN [solr.core.CoreContainer] - [main] - : Log > >> watching is not yet implemented for log4j > >> 2012-09-19 14:14:08,504 WARN [solr.core.SolrCore] - [main] - : New index > >> directory detected: old=null > >> new=/opt/solr/cores/oi_j/data/index.20120823134824608 > >> 2012-09-19 14:14:10,895 WARN [solr.core.SolrCore] - [main] - : New index > >> directory detected: old=null new=/opt/solr/cores/oi_i/data/index/ > >> 2012-09-19 14:14:41,203 ERROR [solr.cloud.ZkController] - [main] - : Error > >> getting leader from zk > >> org.apache.solr.common.SolrException: Could not get leader props > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:762) > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:722) > >> at > >> org.apache.solr.cloud.ZkController.getLeader(ZkController.java:687) > >> at > >> org.apache.solr.cloud.ZkController.register(ZkController.java:626) > >> at > >> org.apache.solr.cloud.ZkController.register(ZkController.java:576) > >> at > >> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:721) > >> at > >> org.apache.solr.core.CoreContainer.register(CoreContainer.java:705) > >> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:547) > >> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:365) > >> at > >> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:314) > >> at > >> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107) > >> at > >> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) > >> at > >> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) > >> at > >> org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115) > >> at > >> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072) > >> at > >> org.apache.catalina.core.StandardContext.start(StandardContext.java:4726) > >> at > >> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) > >> at > >> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) > >> at > >> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) > >> at > >> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675) > >> at > >> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601) > >> at > >> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) > >> at > >> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) > >> at > >> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) > >> at > >> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) > >> at > >> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) > >> at > >> org.apache.catalina.core.StandardHost.start(StandardHost.java:840) > >> at > >> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) > >> at > >> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) > >> at > >> org.apache.catalina.core.StandardService.start(StandardService.java:525) > >> at > >> org.apache.catalina.core.StandardServer.start(StandardServer.java:754) > >> at org.apache.catalina.startup.Catalina.start(Catalina.java:595) > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> at > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > >> at > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >> at java.lang.reflect.Method.invoke(Method.java:616) > >> at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) > >> at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) > >> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > >> KeeperErrorCode = NoNode for /collections/oi/leaders/shard9 > >> at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > >> at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > >> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) > >> at > >> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:301) > >> at > >> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:298) > >> at > >> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:67) > >> at > >> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:298) > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:740) > >> ... 37 more > >> 2012-09-19 14:14:41,239 ERROR [solr.core.CoreContainer] - [main] - : > >> :org.apache.solr.common.SolrException: Error getting leader from zk > >> at > >> org.apache.solr.cloud.ZkController.getLeader(ZkController.java:711) > >> at > >> org.apache.solr.cloud.ZkController.register(ZkController.java:626) > >> at > >> org.apache.solr.cloud.ZkController.register(ZkController.java:576) > >> at > >> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:721) > >> at > >> org.apache.solr.core.CoreContainer.register(CoreContainer.java:705) > >> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:547) > >> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:365) > >> at > >> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:314) > >> at > >> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107) > >> at > >> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) > >> at > >> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) > >> at > >> org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115) > >> at > >> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072) > >> at > >> org.apache.catalina.core.StandardContext.start(StandardContext.java:4726) > >> at > >> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) > >> at > >> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) > >> at > >> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) > >> at > >> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675) > >> at > >> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601) > >> at > >> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) > >> at > >> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) > >> at > >> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) > >> at > >> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) > >> at > >> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) > >> at > >> org.apache.catalina.core.StandardHost.start(StandardHost.java:840) > >> at > >> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) > >> at > >> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) > >> at > >> org.apache.catalina.core.StandardService.start(StandardService.java:525) > >> at > >> org.apache.catalina.core.StandardServer.start(StandardServer.java:754) > >> at org.apache.catalina.startup.Catalina.start(Catalina.java:595) > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> at > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > >> at > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >> at java.lang.reflect.Method.invoke(Method.java:616) > >> at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) > >> at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) > >> Caused by: org.apache.solr.common.SolrException: Could not get leader props > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:762) > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:722) > >> at > >> org.apache.solr.cloud.ZkController.getLeader(ZkController.java:687) > >> ... 35 more > >> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > >> KeeperErrorCode = NoNode for /collections/oi/leaders/shard9 > >> at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > >> at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > >> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) > >> at > >> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:301) > >> at > >> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:298) > >> at > >> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:67) > >> at > >> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:298) > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:740) > >> ... 37 more > >> 2012-09-19 14:14:41,239 ERROR [solr.core.CoreContainer] - [main] - : > >> null:org.apache.solr.common.cloud.ZooKeeperException: > >> at > >> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:744) > >> at > >> org.apache.solr.core.CoreContainer.register(CoreContainer.java:705) > >> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:547) > >> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:365) > >> at > >> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:314) > >> at > >> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107) > >> at > >> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) > >> at > >> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) > >> at > >> org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115) > >> at > >> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072) > >> at > >> org.apache.catalina.core.StandardContext.start(StandardContext.java:4726) > >> at > >> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799) > >> at > >> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) > >> at > >> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) > >> at > >> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675) > >> at > >> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601) > >> at > >> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502) > >> at > >> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) > >> at > >> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324) > >> at > >> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142) > >> at > >> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) > >> at > >> org.apache.catalina.core.StandardHost.start(StandardHost.java:840) > >> at > >> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) > >> at > >> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) > >> at > >> org.apache.catalina.core.StandardService.start(StandardService.java:525) > >> at > >> org.apache.catalina.core.StandardServer.start(StandardServer.java:754) > >> at org.apache.catalina.startup.Catalina.start(Catalina.java:595) > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> at > >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > >> at > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >> at java.lang.reflect.Method.invoke(Method.java:616) > >> at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) > >> at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) > >> Caused by: org.apache.solr.common.SolrException: Error getting leader from > >> zk > >> at > >> org.apache.solr.cloud.ZkController.getLeader(ZkController.java:711) > >> at > >> org.apache.solr.cloud.ZkController.register(ZkController.java:626) > >> at > >> org.apache.solr.cloud.ZkController.register(ZkController.java:576) > >> at > >> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:721) > >> ... 32 more > >> Caused by: org.apache.solr.common.SolrException: Could not get leader props > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:762) > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:722) > >> at > >> org.apache.solr.cloud.ZkController.getLeader(ZkController.java:687) > >> ... 35 more > >> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > >> KeeperErrorCode = NoNode for /collections/oi/leaders/shard9 > >> at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > >> at > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > >> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) > >> at > >> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:301) > >> at > >> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:298) > >> at > >> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:67) > >> at > >> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:298) > >> at > >> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:740) > >> ... 37 more > >> > >> This does not happen with an older build of september 11th. > >> > >> Thanks, > >> Markus >