Hi,

We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes. We 
have 6 zookeeper instances. We are planning to change to odd number of 
zookeeper instances. 

With Solr 4.3.0, if all zookeeper instances are not up, the solr4 node never 
connects to zookeeper (can't see the admin page) until all zookeeper instances 
are up and we restart all solr nodes. It was suggested that it could be due 
this bug https://issues.apache.org/jira/browse/SOLR-4899 and this bug is solved 
in Solr 4.4

We upgraded to Solr 4.4 but still see this issue. We brought up 4 out of 6 
zookeeper instances and then brought up all ten Solr4 nodes. We kept seeing 
this exception in Solr logs:

751395 [main-SendThread] WARN  org.apache.zookeeper.ClientCnxn  ? Session 0x0 
for server null, unexpected error, closing socket connection and attempting 
reconnect java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

And after a while saw this exception. 

INFO  - 2013-08-05 22:24:07.582; 
org.apache.solr.common.cloud.ConnectionManager; Watcher 
org.apache.solr.common.cloud.ConnectionManager@5140709 name:ZooKeeperConnection 
Watcher:qa-zk1.services.gs.com,qa-zk2.services.gs.com,qa-zk3.services.gs.com,qa-zk4.services.gs.com,qa-zk5.services.gs.com,qa-zk6.services.gs.com
 got event WatchedEvent state:SyncConnected type:None path:null path:null 
type:None
INFO  - 2013-08-05 22:24:07.662; 
org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status change 
trigger but we are already closed
754311 [main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  
? Client->ZooKeeper status change trigger but we are already closed

We brought up all zookeeper instances but the cloud never came up until all 
solr nodes were restarted. Do we need to change any settings? After weekend 
reboot, all zookeeper instances come up one by one. While zookeeper instances 
are coming up solr nodes are also getting started. With this issue, we have to 
put checks to make sure all zookeeper instances are up before we bring up any 
solr node. 

Thanks!!

-----Original Message-----
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, June 11, 2013 10:42 AM
To: solr-user@lucene.apache.org
Subject: Re: external zookeeper with SolrCloud


On Jun 11, 2013, at 10:15 AM, "Joshi, Shital" <shital.jo...@gs.com> wrote:

> Thanks Mark.
> 
> Looks like this bug is fixed in Solr 4.4. Do you have any date for official 
> release of 4.4?

Looks like it might come out in a couple of weeks.

> Is there any instruction available on how to build Solr 4.4 from SVN 
> repository?

It's java, so it's pretty easy - you might find some help here: 
http://wiki.apache.org/solr/HowToContribute

- Mark

> 
> -----Original Message-----
> From: Mark Miller [mailto:markrmil...@gmail.com] 
> Sent: Monday, June 10, 2013 8:05 PM
> To: solr-user@lucene.apache.org
> Subject: Re: external zookeeper with SolrCloud
> 
> This might be https://issues.apache.org/jira/browse/SOLR-4899
> 
> - Mark
> 
> On Jun 10, 2013, at 5:59 PM, "Joshi, Shital" <shital.jo...@gs.com> wrote:
> 
>> Hi,
>> 
>> 
>> 
>> We're setting up 5 shard SolrCloud with external zoo keeper. When we bring 
>> up Solr nodes while the zookeeper instance is not up and running, we see 
>> this error in Solr logs.
>> 
>> 
>> 
>> java.net.ConnectException: Connection refused
>> 
>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> 
>>       at 
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>> 
>>       at 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
>> 
>>       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
>> 
>> 
>> 
>> INFO  - 2013-06-10 15:03:35.422; 
>> org.apache.solr.common.cloud.ConnectionManager; Watcher 592147 
>> [main-EventThread] INFO  org.apache.solr.common.cloud.ConnectionManager  ? 
>> Watcher org.apache.solr.common.cloud.ConnectionManager@530d0eae 
>> name:ZooKeeperConnection Watcher: ................. got event WatchedEvent 
>> state:SyncConnected type:None path:null path:null type:None
>> 
>> 
>> 
>> INFO  - 2013-06-10 15:03:35.423; 
>> org.apache.solr.common.cloud.ConnectionManager; Client->ZooKeeper status 
>> change trigger but we are already closed
>> 
>> 592148 [main-EventThread] INFO  
>> org.apache.solr.common.cloud.ConnectionManager  ? Client->ZooKeeper status 
>> change trigger but we are already closed
>> 
>> 
>> 
>> After we bring up zookeeper instance, the node never connects to zookeeper 
>> and we can't see the solr admin page, until we restart the node.
>> 
>> 
>> 
>> Does the zookeeper instance has to be up when we bring up Solr node? That's 
>> not what the documentation say though.
>> 
>> 
>> 
>> Thanks.
> 

Reply via email to