If possible, can you please share some details of your setup, like the
amount of shards, how big are they size/doc_count wise, what is the user
load / s.

On Fri, Jun 24, 2011 at 1:39 AM, Shawn Heisey <s...@elyograg.org> wrote:

> In the past I have told people on this list and in the IRC channel #solr
> what I use for Java GC settings.  A couple of days ago, I cleaned up my
> testing methodology to more closely mimic real production queries, and
> discovered that my GC settings were woefully inadequate.  Here's what I was
> using on a virtual machine with 9GB of RAM.  I've been using this for
> several months, and chose it because I had read several things praising it.
>  I should have done more research.
>
> -Xms512M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
>
> On my backup servers, I am in the process of getting 3.2.0 ready to replace
> our 1.4.1 index.  I ran into a situation where committing a delta-import of
> only a few thousand records took longer than 3 minutes (Perl LWP default
> timeout) on every shard, where normally in production on 1.4.1 it only takes
> a few seconds.  This was shortly after I had hit the distributed index
> pretty hard with my improved benchmarking.
>
> Using jstat, I found that while under benchmarking load, the system was
> spending 10-15% of it's time doing garbage collection, and that most of the
> garbage collections were from the young generation.  First I tried
> increasing the young generation size with the -XX:NewSize=1024M parameter.
>  This helped on the total GC count, but didn't really help with how much
> time was spent doing them.
>
> A good command to see these statistics on Linux, and an Oracle link
> explaining what it all means:
>
> jstat -gc -t `pgrep java` 5000
> http://download.oracle.com/**javase/6/docs/technotes/tools/**
> share/jstat.html<http://download.oracle.com/javase/6/docs/technotes/tools/share/jstat.html>
>
> I've learned that Solr will keep most of its data in young generation
> (eden), unless that memory pool is too small, then it will move data to the
> tenured generation.  The key for good performance seems to be creating a
> large enough young generation.  You do need to have a good chunk of tenured
> available, unless the solr instance has no index itself and exists only to
> distribute queries to shards living on other solr instances.  In that case,
> it hardly uses the tenured generation.  It turns out that CMSIncrementalMode
> causes more young generation collections and makes them take longer, which
> is exactly what Solr does NOT need.
>
> After messing around with it for quite a while, I came up with the
> following settings, which included an increase in heap size:
>
> -Xms3072M -Xmx3072M -XX:NewSize=1536M -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>
> With these settings, it spends very little time doing garbage collections.
>  One of my shards has been up for nearly 24 hours, has been hit with the
> benchmarking script repeatedly, and it has only done 62 young generation
> collections, and zero full collections, with 6.8 seconds total GC time.  I
> am thinking of increasing the NewSize yet again, because the tenured
> generation (1.5GB in size) is only one third utilized after nearly 24 hours.
>
> My settings will probably not work for everyone, but I hope this post will
> make it easier for others to find the right solution for themselves.
>
> Thanks,
> Shawn
>
>


-- 
Regards,

Dmitry Kan

Reply via email to