Could you capture some thread stack traces in the 'engine' and see if there are any blocking methods?
- Mark On Mar 13, 2013, at 1:34 PM, Luis Cappa Banda <luisca...@gmail.com> wrote: > Just one correction: > > When I said: > > - I´ve checked SolrCloud via Solr Admin interface and it´s OK: > everything is green, and I cant execute queries directly into Solr. > > I mean: > > > - I´ve checked SolrCloud via Solr Admin interface and it´s OK: > everything is green, and *I can* execute queries directly into Solr. > > > Thanks! > > > - Luis Cappa > > > 2013/3/13 Luis Cappa Banda <luisca...@gmail.com> > >> Hello, guys! >> >> I´ve been experiencing some annoying behavior with my current production >> scenario. Here is the snapshot: >> >> >> - SolrCloud: 2 shards >> - Zookeeper ensemble: 3 nodes in *different machines *(most of the >> tutorials installs 3 Zookeeper nodes in the same machine). >> - This is the zoo.cfg from every >> >> tickTime=2000 // I´ve also tried with 60000 >> >> initLimit=10 >> >> syncLimit=5 >> >> dataDir=/var/lib/zookeeper >> >> clientPort=9000 >> >> server.1=zoohost1:2888:3888 >> >> server.2=zoohost1:2888:3888 >> >> server.3=zoohost1:2888:3888 >> >> >> >> - I´ve developed a Java Application with a REST API (let´s call it * >> engine*) that dispatches queries into SolrCloud. It´s a wrapper around >> CloudSolrServer, so it´s mandatory to specify some Zookeeper configuration >> params too. They are loaded dynamically when the application is deployed in >> a Tomcat server, but the current values that I´m using are as follows: >> >> cloudSolrServer.*setZkConnectTimeout(60000)* >> >> cloudSolrServer.*setZkClientTimeout(60000)* >> * >> * >> * >> * >> >> *THE PROBLEM* >> * >> * >> Everything goes OK, but after two days more or less (yes, I´ve checked >> that this behavior occurrs periodically, more or less) the *engine blocks >> * and cannot dispatch any query to SolrCloud. >> >> - The *engine *log only outputs "updating Zookeeper..." one last time, >> but never updates. >> - I´ve checked SolrCloud via Solr Admin interface and it´s OK: >> everything is green, and I cant execute queries directly into Solr. >> - So then Solr appears to be OK, so the next step is to restart *engine >> but *it again appears "updating Zookeeper...". Unfortunately switch >> off + switch on doesn´t work here, :-( >> - I´ve checked too Zookeeper logs and it appears some connection log >> outs, but the ensemble appears to be OK too. >> - *The end: *If I restart Zookeeper one by one, and I restart >> SolrCloud, plus I restart the engine, the problem is solved. I´m using >> Amazon AWS as hostage, so I discard connection problems between instances. >> >> >> Does anyone experienced something similar? Can anybody shed some light on >> this problem? >> >> Thank you very much. >> >> >> Regards, >> >> >> - Luis Cappa >>