Hi Shawn

Thanks for informing me. I guess the worst case scenario is that all 3 ZK 
services are down and that may be unlikely the case. At this juncture, as you 
said the viable workaround is a manual approach to start up the services in 
sequence in ensuring a quorum can take place. So the proper sequence in a 3 ZK 
+ Solr (both ZK and Solr in each server) server setup will be as follows:

Downed situation with one or mode ZK services
1. Restart all ZK Services first on all three machines
2. Restart all Solr Services on all three machines

Please do clarify if the above is correct and I will be happy to take this 
approach and communicate to my customer.

Many thanks.

Regards,
Adrian 

-----Original Message-----
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Wednesday, October 7, 2015 4:09 PM
To: solr-user@lucene.apache.org
Subject: Re: If zookeeper is down, SolrCloud nodes will not start correctly, 
even if zookeeper is started later

On 10/6/2015 10:22 PM, Adrian Liew wrote:
> Hence, the issue is that upon startup of three machines, the startup 
> of ZK and Solr is out of sequence that causes SolrCloud to behave 
> unexpectedly. Noting there is Jira ticket addressed here for Solr 4.9 
> above to include an improvement to the issue above. 
> (https://issues.apache.org/jira/browse/SOLR-5129)

That issue is unresolved, so it has not been fixed in any Solr version.

At this time, if you do not have Zookeeper quorum (a majority of your ZK nodes 
fully operational), you will not be able to successfully start SolrCloud nodes. 
 The issue has low priority because there is a viable workaround -- ensure that 
ZK has quorum before starting or restarting any Solr node.

Thinking out loud:  Until this issue is fixed, I think this means that a 3-node 
setup where all three nodes use the zookeeper embedded in Solr will require a 
strange startup sequence if none of the nodes are running:

* Start node 1. Solr will not start correctly -- no ZK quorum.
* Start node 2. Solr might start correctly, not sure.
* Start node 3. This should start correctly.
* Restart node 1. With ZK nodes 2 and 3 running, this will work.
* Restart node 2 if it did not start properly the first time.

I really have no idea whether the second node startup will work properly.

Thanks,
Shawn

Reply via email to