Hi,
I have deployed solr cloud with external zookeeper ensemble (5 instances). I am running solr instances on two servers with single shard index. There are 6 replicas. I often see solr going down during high search load (or) whenever i run indexing documents. I tried tuning hardcommit (kept as 15 mins) and softcommits(12 mins). Also, set zkClientTimeout as 30 secs. I observed sometimes OOM, Socket exceptions., EOF exceptions in solr logs while the instance is going down. Also, zookeeper recovery for the solr instance is going in loop .... My use case is sort of high search (100 queries per sec) / heavy indexing (10 K docs per minute). What is the best way to keep stable solr cloud isntances with external ensemble. Should we try running zookeeper internally, because looks like zookeeper handshaking might be an issue as well. Is solr cloud stable for production ? or there are open issues still. Please guide me.