We have two slaves replicating off one master every 2 minutes.

Both using the CMS + ParNew Garbage collector. Specifically

-server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing

but periodically they both get into a GC storm and just keel over.

Looking through the GC logs the amount of memory reclaimed in each GC 
run gets less and less until we get a concurrent mode failure and then 
Solr effectively dies.

Is it possible there's a memory leak? I note that later versions of 
Lucene have fixed a few leaks. Our current versions are relatively old

        Solr Implementation Version: 1.4.1 955763M - mark - 2010-06-17 
18:06:42

        Lucene Implementation Version: 2.9.3 951790 - 2010-06-06 01:30:55

so I'm wondering if upgrading to later version of Lucene might help (of 
course it might not but I'm trying to investigate all options at this 
point). If so what's the best way to go about this? Can I just grab the 
Lucene jars and drop them somewhere (or unpack and then repack the solr 
war file?). Or should I use a nightly solr 1.4?

Or am I barking up completely the wrong tree? I'm trawling through heap 
logs and gc logs at the moment trying to to see what other tuning I can 
do but any other hints, tips, tricks or cluebats gratefully received. 
Even if it's just "Yeah, we had that problem and we added more slaves 
and periodically restarted them"

thanks,

Simon

Reply via email to