Hi Erick, The older message seems to be deleted so I am sending a new one http://osdir.com/ml/solr-user.lucene.apache.org/2015-01/msg00773.html
In solr.xml file I had zk timeout set to* <int name="zkClientTimeout">${zkClientTimeout:450000}</int>* One thing that made a it a bit better now is the zk tick time and syncLimit settings. I set it to a higher value as below. This may not be advisable though. tickTime=30000 initLimit=30 syncLimit=20 Now we observed that replicas do not go in recovery that often as before. In the whole cluster at a given time I would have a couple of replicas in recovery whereas earlier it were multiple replicas from every shard . On the wiki https://wiki.apache.org/solr/SolrCloud it says the "The maximum is 20 times the tickTime." in the FAQ so I decided to increase the tick time. Is this the correct approach ? One question I have is that if auto commit settings has anything to do with this or not ? Does it induce extra work for the searchers because of which this would happen? I have tried with following settings * <autoSoftCommit>* * <maxDocs>500000</maxDocs>* * <maxTime>900000</maxTime>* * </autoSoftCommit>* * <autoCommit>* * <maxDocs>200000</maxDocs>* * <maxTime>30000</maxTime>* * <openSearcher>false</openSearcher>* * </autoCommit>* I have increased the heap size to 15GB for each JVM instance . I monitored during full indexing how the heap usage looks like and it never goes beyond 8 GB . I don't see any Full GC happening at any point . Our rate is a variable rate . It is not a sustained rate of 6000/second , however there are intervals where it would reach that much and come down and grow again and come down. So if I would take an average it would be 600/second only but that is not real rate at any given time. Version of solr cloud is *4.10*. All indexers are basically java programs running on different host using CloudSolrServer api. As I mentioned it is much better now than before , however not completely as expected .If possible we would like to have none of them go in recovery I captured some logs before and after recovery 14:16:52,774 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: delete "_5r2_1.del" 14:16:52,774 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: 0 msec to checkpoint 14:16:52,774 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: now checkpoint "_4qe(4.10.0):C4312879/1nts ; isCommit = false] 14:16:52,774 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: delete "_5r4_1.del" 14:16:52,774 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [IFD][recoveryExecutor-7-thread-1]: 0 msec to checkpoint 14:16:52,775 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [DW][recoveryExecutor-7-thread-1]: recoveryExecutor-7-thread-1 finishFullFlush success=true 14:16:52,775 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: findMerges: 34 segments 14:16:52,777 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_554(4.10.0):C3995865/780418:delGen=23 size=3669.307 MB [skip: too large] 14:16:52,777 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_4qe(4.10.0):C4312879/1370113:delGen=57 size=3506.254 MB [skip: too large] 14:16:52,777 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5co(4.10.0):C871785/93995:delGen=11 size=853.668 MB 14:16:52,777 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5kb(4.10.0):C424868/49572:delGen=12 size=518.704 MB 14:16:52,777 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5hm(4.10.0):C457977/83353:delGen=12 size=470.422 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_56u(4.10.0):C286775/11906:delGen=15 size=312.952 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5f5(4.10.0):C116528/43621:delGen=2 size=95.529 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5m7(4.10.0):C122852/54010:delGen=12 size=84.949 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5p7(4.10.0):C37457/649:delGen=2 size=54.241 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5of(4.10.0):C38545/5677:delGen=1 size=50.672 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qm(4.10.0):C29770/4:delGen=2 size=34.052 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5o8(4.10.0):C27155/7531:delGen=1 size=31.008 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5r2(4.10.0):C32769/4:delGen=2 size=27.410 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5r3(4.10.0):C26057 size=23.893 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5r4(4.10.0):C23934/3:delGen=2 size=22.004 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5nx(4.10.0):C33236/20669:delGen=2 size=19.861 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5r5(4.10.0):C24222/1:delGen=1 size=19.546 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5q0(4.10.0):C13610/2625:delGen=7 size=12.480 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qv(4.10.0):C3973/1:delGen=1 size=3.402 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qo(4.10.0):C3456/2:delGen=2 size=3.147 MB 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qq(4.10.0):C1997 size=1.781 MB [floored] 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5q2(4.10.0):C779 size=1.554 MB [floored] 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qh(4.10.0):C1765/1:delGen=1 size=1.549 MB [floored] 14:16:52,778 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qc(4.10.0):C1828/1:delGen=1 size=1.401 MB [floored] 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qr(4.10.0):C1468 size=1.390 MB [floored] 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qp(4.10.0):C1729/1:delGen=1 size=1.351 MB [floored] 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qa(4.10.0):C967 size=1.235 MB [floored] 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qi(4.10.0):C1241 size=1.146 MB [floored] 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qy(4.10.0):C1407 size=1.050 MB [floored] 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5q1(4.10.0):C402/1:delGen=1 size=0.954 MB [floored] 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qw(4.10.0):C802/1:delGen=1 size=0.821 MB [floored] 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qx(4.10.0):C638 size=0.818 MB [floored] 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qt(4.10.0):C30 size=0.072 MB [floored] 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: seg=_5qu(4.10.0):C27 size=0.063 MB [floored] 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: allowedSegmentCount=31 vs count=34 (eligible count=32) tooBigCount=2 14:16:52,779 INFO [org.apache.solr.update.LoggingInfoStream] (recoveryExecutor-7-thread-1) [TMP][recoveryExecutor-7-thread-1]: maybe=_5co(4.10.0):C871785/93995:delGen=11 _5kb(4.10.0):C424868/49572:delGen=12 _5hm(4.10.0):C457977/83353:delGen=12 _56u(4.10.0):C286775/11906:delGen=15 _5f5(4.10.0):C116528/43621:delGen=2 _5m7(4.10.0):C122852/54010:delGen=12 _5p7(4.10.0):C37457/649:delGen=2 _5of(4.10.0):C38545/5677:delGen=1 _5qm(4.10.0):C29770/4:delGen=2 _5o8(4.10.0):C27155/7531:delGen=1 score=0.7313777596572674 skew=0.341 nonDelRatio=0.852 tooLarge=false size=2506.203 MB ..... 14:16:58,367 INFO [org.apache.solr.update.LoggingInfoStream] (searcherExecutor-6-thread-1) [IFD][searcherExecutor-6-thread-1]: delete "_5p7_1.del" 14:16:58,367 INFO [org.apache.solr.update.LoggingInfoStream] (searcherExecutor-6-thread-1) [IFD][searcherExecutor-6-thread-1]: delete "_5nx_1.del" 14:16:58,367 INFO [org.apache.solr.update.LoggingInfoStream] (searcherExecutor-6-thread-1) [IFD][searcherExecutor-6-thread-1]: delete "_5co_a.del" 14:16:58,367 INFO [org.apache.solr.update.LoggingInfoStream] (searcherExecutor-6-thread-1) [IFD][searcherExecutor-6-thread-1]: delete "_5q0_6.del" 14:16:58,367 INFO [org.apache.solr.update.LoggingInfoStream] (searcherExecutor-6-thread-1) [ -- ********************************************* Vijay Sekhri *********************************************