On 12/19/2013 3:44 AM, ilay raja wrote: > I have deployed solr cloud with external zookeeper ensemble (5 > instances). I am running solr instances on two servers with single shard > index. There are 6 replicas. I often see solr going down during high search > load (or) whenever i run indexing documents. I tried tuning hardcommit > (kept as 15 mins) and softcommits(12 mins). Also, set zkClientTimeout as 30 > secs. I observed sometimes OOM, Socket exceptions., EOF exceptions in solr > logs while the instance is going down. Also, zookeeper recovery for the > solr instance is going in loop .... My use case is sort of high search (100 > queries per sec) / heavy indexing (10 K docs per minute). What is the best > way to keep stable solr cloud isntances with external ensemble. Should we > try running zookeeper internally, because looks like zookeeper handshaking > might be an issue as well. Is solr cloud stable for production ? or there > are open issues still. Please guide me.
You definitely do not want to run zookeeper embedded in Solr. The simple reason for this is simply because if you stop Solr, you also stop zookeeper. Zookeeper works best if it remains up all the time, so an external ensemble is highly recommended. It's probably a good idea to set the max heap on the zookeeper startup ... one of my zk java instances is using 65MB resident memory, so unless it's a very large cloud, a low number like 128MB would probably be enough. I've heard that heavy I/O on the disk with the zookeeper data can cause problems for zookeeper. This is the one danger that can come from putting both Solr and an external zookeeper on the same host, which is usually a very safe thing to do. Unless you've got very fast I/O, it's recommended that the zookeeper data is put on separate disk spindles from anything else. When Solr has performance problems, it's usually from heavy I/O, and if heavy I/O is causing problems with zookeeper, then the problem just compounds itself. You haven't indicated how big the java heap for Solr is. Severe stability problems can result from GC pauses, so it's extremely important to tune your garbage collection unless your Solr max heap is very very small (less than 1GB). Here's my personal wiki page with settings that work for me, they seem to work for others too: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Severe GC pause problems can also result from the Solr java heap being too small. Here's a more involved wiki page on performance issues that I have seen: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn