Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We have 6 zookeeper instances. We are planning to change to odd number of zookeeper instances.
With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never connects to zookeeper (can't see the admin page) until all zookeeper instances are up and we restart all solr nodes. It was suggested that it could be due this bug https://issues.apache.org/jira/browse/SOLR-4899 and this bug is solved in Solr 4.4 We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing this exception in Solr logs: 751395 [main-SendThread] WARN org.apache.zookeeper.ClientCnxn ? Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) And after a while saw this exception. INFO - 2013-08-05 22:24:07.582; org.apache.solr.common.cloud.ConnectionManager; Watcher org.apache.solr.common.cloud.ConnectionManager@5140709 name:ZooKeeperConnection Watcher:qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com,qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com got event WatchedEvent state:SyncConnected type:None path:null path:null type:None INFO - 2013-08-05 22:24:07.662; org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status change trigger but we are already closed 754311 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? Client->ZooKeeper status change trigger but we are already closed We brought up all zookeeper instances but the cloud never came up until all solr nodes were restarted. Do we need to change any settings? After weekend reboot, all zookeeper instances come up one by one. While zookeeper instances are coming up solr nodes are also getting started. With this issue, we have to put checks to make sure all zookeeper instances are up before we bring up any solr node. Thanks!! -----Original Message----- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, June 11, 2013 10:42 AM To: solr-user@lucene.apache.org Subject: Re: external zookeeper with SolrCloud On Jun 11, 2013, at 10:15 AM, "Joshi, Shital" <shital.jo...@gs.com> wrote: > Thanks Mark. > > Looks like this bug is fixed in Solr 4.4. Do you have any date for official > release of 4.4? Looks like it might come out in a couple of weeks. > Is there any instruction available on how to build Solr 4.4 from SVN > repository? It's java, so it's pretty easy - you might find some help here: http://wiki.apache.org/solr/HowToContribute - Mark > > -----Original Message----- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Monday, June 10, 2013 8:05 PM > To: solr-user@lucene.apache.org > Subject: Re: external zookeeper with SolrCloud > > This might be https://issues.apache.org/jira/browse/SOLR-4899 > > - Mark > > On Jun 10, 2013, at 5:59 PM, "Joshi, Shital" <shital.jo...@gs.com> wrote: > >> Hi, >> >> >> >> We're setting up 5 shard SolrCloud with external zoo keeper. When we bring >> up Solr nodes while the zookeeper instance is not up and running, we see >> this error in Solr logs. >> >> >> >> java.net.ConnectException: Connection refused >> >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> >> at >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) >> >> at >> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) >> >> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) >> >> >> >> INFO - 2013-06-10 15:03:35.422; >> org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 >> [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager ? >> Watcher org.apache.solr.common.cloud.ConnectionManager@530d0eae >> name:ZooKeeperConnection Watcher: ................. got event WatchedEvent >> state:SyncConnected type:None path:null path:null type:None >> >> >> >> INFO - 2013-06-10 15:03:35.423; >> org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status >> change trigger but we are already closed >> >> 592148 [main-EventThread] INFO >> org.apache.solr.common.cloud.ConnectionManager ? Client->ZooKeeper status >> change trigger but we are already closed >> >> >> >> After we bring up zookeeper instance, the node never connects to zookeeper >> and we can't see the solr admin page, until we restart the node. >> >> >> >> Does the zookeeper instance has to be up when we bring up Solr node? That's >> not what the documentation say though. >> >> >> >> Thanks. >