SolrCloud with Zookeeper ensemble in production environment: SEVERE problems.

Luis Cappa Banda Wed, 13 Mar 2013 10:16:59 -0700

Hello, guys!

I´ve been experiencing some annoying behavior with my current production
scenario. Here is the snapshot:



   - SolrCloud: 2 shards
   - Zookeeper ensemble: 3 nodes in *different machines *(most of the
   tutorials installs 3 Zookeeper nodes in the same machine).
   - This is the zoo.cfg from every

tickTime=2000  // I´ve also tried with 60000

initLimit=10

syncLimit=5

dataDir=/var/lib/zookeeper

clientPort=9000

server.1=zoohost1:2888:3888

server.2=zoohost1:2888:3888

server.3=zoohost1:2888:3888



   - I´ve developed a Java Application with a REST API (let´s call it *
   engine*) that dispatches queries into SolrCloud. It´s a wrapper around
   CloudSolrServer, so it´s mandatory to specify some Zookeeper configuration
   params too. They are loaded dynamically when the application is deployed in
   a Tomcat server, but the current values that I´m using are as follows:

cloudSolrServer.*setZkConnectTimeout(60000)*

cloudSolrServer.*setZkClientTimeout(60000)*
*
*
*
*

*THE PROBLEM*
*
*
Everything goes OK, but after two days more or less (yes, I´ve checked that
this behavior occurrs periodically, more or less) the *engine blocks * and
cannot dispatch any query to SolrCloud.

   - The *engine *log only outputs "updating Zookeeper..." one last time,
   but never updates.
   - I´ve checked SolrCloud via Solr Admin interface and it´s OK:
   everything is green, and I cant execute queries directly into Solr.
   - So then Solr appears to be OK, so the next step is to restart *engine
   but *it again appears "updating Zookeeper...". Unfortunately switch off
   + switch on doesn´t work here, :-(
   - I´ve checked too Zookeeper logs and it appears some connection log
   outs, but the ensemble appears to be OK too.
   - *The end: *If I restart Zookeeper one by one, and I restart SolrCloud,
   plus I restart the engine, the problem is solved. I´m using Amazon AWS as
   hostage, so I discard connection problems between instances.


Does anyone experienced something similar? Can anybody shed some light on
this problem?

Thank you very much.


Regards,


- Luis Cappa

SolrCloud with Zookeeper ensemble in production environment: SEVERE problems.

Reply via email to