Hi Shawn, thanks for the inputs. I have uploaded the gc logs of one of the slaves here: https://ufile.io/ecvag (should work till 18th Oct '18)
I uploaded the logs to gceasy as well and it says that the problem is consecutive full GCs. According to the solution they have mentioned, increasing the heap size is a solution. But I am already running on a pretty big heap, so don't think increasing the heap size is going to be a long term solution. >From what I understood from a bit more looking around, this is Concurrent Mode Failure for CMS. I found an old blog mentioning the use of XX:CMSFullGCsBeforeCompaction=1 to make sure that compaction is done prior to next collection trigger. So if it is a fragmentation problem, this will solve it I hope. I will also try out using docValues as suggested by Ere on a couple of fields on which we make a lot of faceting queries to reduce memory usage on the slaves. Please share any ideas that you may have from the gc logs analysis Thanks Yasoob -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html