I have a lot of problem with the stability of my cloud. To improve the stability:
- Move zookeeper to another disk, the I/O from solr.home can kill your ensemble. - Raise the zkTimeoutLimit to 60s - Don't use a very big heap if you don't need, try with values around 4g and increase until OOM doesn't happen. - Use the recommendations to tune the heap from http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning, 99% of my problems with zookeeper was fixed. - Log gc times, I discover pauses of 32s on my boxes, totally killer for zookeeper, the result, tons of session expired. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Thursday, December 19, 2013 at 5:45 PM, Shawn Heisey wrote: > On 12/19/2013 3:44 AM, ilay raja wrote: > > I have deployed solr cloud with external zookeeper ensemble (5 > > instances). I am running solr instances on two servers with single shard > > index. There are 6 replicas. I often see solr going down during high search > > load (or) whenever i run indexing documents. I tried tuning hardcommit > > (kept as 15 mins) and softcommits(12 mins). Also, set zkClientTimeout as 30 > > secs. I observed sometimes OOM, Socket exceptions., EOF exceptions in solr > > logs while the instance is going down. Also, zookeeper recovery for the > > solr instance is going in loop .... My use case is sort of high search (100 > > queries per sec) / heavy indexing (10 K docs per minute). What is the best > > way to keep stable solr cloud isntances with external ensemble. Should we > > try running zookeeper internally, because looks like zookeeper handshaking > > might be an issue as well. Is solr cloud stable for production ? or there > > are open issues still. Please guide me. > > > > > You definitely do not want to run zookeeper embedded in Solr. The > simple reason for this is simply because if you stop Solr, you also stop > zookeeper. Zookeeper works best if it remains up all the time, so an > external ensemble is highly recommended. > > It's probably a good idea to set the max heap on the zookeeper startup > ... one of my zk java instances is using 65MB resident memory, so unless > it's a very large cloud, a low number like 128MB would probably be enough. > > I've heard that heavy I/O on the disk with the zookeeper data can cause > problems for zookeeper. This is the one danger that can come from > putting both Solr and an external zookeeper on the same host, which is > usually a very safe thing to do. Unless you've got very fast I/O, it's > recommended that the zookeeper data is put on separate disk spindles > from anything else. When Solr has performance problems, it's usually > from heavy I/O, and if heavy I/O is causing problems with zookeeper, > then the problem just compounds itself. > > You haven't indicated how big the java heap for Solr is. Severe > stability problems can result from GC pauses, so it's extremely > important to tune your garbage collection unless your Solr max heap is > very very small (less than 1GB). Here's my personal wiki page with > settings that work for me, they seem to work for others too: > > http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning > > Severe GC pause problems can also result from the Solr java heap being > too small. Here's a more involved wiki page on performance issues that > I have seen: > > http://wiki.apache.org/solr/SolrPerformanceProblems > > Thanks, > Shawn > >