I'm setting up a solr cluster in AWS cloud and I need help with the
configuration of ZooKeeper. The cluster has 3 ZK nodes and 3 Solr nodes

There are two behaviors that are of concern:

*1 - ZK ensemble not accepting return of node*
Currently, when a ZK node in the ensemble goes down the ensemble is able to
do what it should do and keeps working. However when I bring the 3rd node
back online the other two nodes reject connection requests from the 3rd
node until I restart the nodes. The sequence is:


   1. Bring 3rd node back on line
   2. Restart follower in existing ensemble
   3. Restart leader in existing ensemble

When this is done the third node happily becomes part fo the ensemble.

*2 - Solr nodes unable to connect*
When setting up the cluster for the first time the ensemble rejects the
solr connection requests until the ZK on the ZK ensemble members is
restarted.

So the sequences is:


   1. Setup ensemble
   2. Bring up solr nodes
   3. Restart followers on ZK ensemble
   4. Restart leader on ZK ensemble


When I do this everything is fine and the cluster is now stable.

However, we have also seen that if we have a problem with one of the Solr
nodes that requires restarting more than one node we have to restart ZK to
reconnect the nodes with thee ensemble again.

We are trying to achieve a self correcting cluster. In other words, we
would like to get to the point where if a nodes goes down, all that is
necessary is to restart it (after the issue is resolved) and it will add
itself back into the cluster. Obviously this is an issue if ZK has to be
restarted.

Is there a configuration that I am missing? Why is ZK so finicky?

Our ZK config is very simple:

clientPort=2181

dataDir=/var/opt/zookeeper/data

tickTime=2000

autopurge.purgeInterval=24

initLimit=100

syncLimit=5

server.1=<AWS internal IP1>:2888:3888

server.2=<AWS internal IP2>:2888:3888

server.3=<AWS internal IP3>:2888:3888


Any help would be greatly appreciated.


Jim K.

-- 
Jim Keeney
President, FitterWeb
E: j...@fitterweb.com
M: 703-568-5887

*FitterWeb Consulting*
*Are you lean and agile enough? *

Reply via email to