Hello, guys! I´ve been experiencing some annoying behavior with my current production scenario. Here is the snapshot:
- SolrCloud: 2 shards - Zookeeper ensemble: 3 nodes in *different machines *(most of the tutorials installs 3 Zookeeper nodes in the same machine). - This is the zoo.cfg from every tickTime=2000 // I´ve also tried with 60000 initLimit=10 syncLimit=5 dataDir=/var/lib/zookeeper clientPort=9000 server.1=zoohost1:2888:3888 server.2=zoohost1:2888:3888 server.3=zoohost1:2888:3888 - I´ve developed a Java Application with a REST API (let´s call it * engine*) that dispatches queries into SolrCloud. It´s a wrapper around CloudSolrServer, so it´s mandatory to specify some Zookeeper configuration params too. They are loaded dynamically when the application is deployed in a Tomcat server, but the current values that I´m using are as follows: cloudSolrServer.*setZkConnectTimeout(60000)* cloudSolrServer.*setZkClientTimeout(60000)* * * * * *THE PROBLEM* * * Everything goes OK, but after two days more or less (yes, I´ve checked that this behavior occurrs periodically, more or less) the *engine blocks * and cannot dispatch any query to SolrCloud. - The *engine *log only outputs "updating Zookeeper..." one last time, but never updates. - I´ve checked SolrCloud via Solr Admin interface and it´s OK: everything is green, and I cant execute queries directly into Solr. - So then Solr appears to be OK, so the next step is to restart *engine but *it again appears "updating Zookeeper...". Unfortunately switch off + switch on doesn´t work here, :-( - I´ve checked too Zookeeper logs and it appears some connection log outs, but the ensemble appears to be OK too. - *The end: *If I restart Zookeeper one by one, and I restart SolrCloud, plus I restart the engine, the problem is solved. I´m using Amazon AWS as hostage, so I discard connection problems between instances. Does anyone experienced something similar? Can anybody shed some light on this problem? Thank you very much. Regards, - Luis Cappa