Re: Long GC pauses while reading Solr docs using Cursor approach

Shawn Heisey Tue, 11 Apr 2017 20:22:54 -0700

On 4/11/2017 2:56 PM, Chetas Joshi wrote:
> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection
> with number of shards = 80 and replication Factor=2
>
> Sold JVM heap size = 20 GB
> solr.hdfs.blockcache.enabled = true
> solr.hdfs.blockcache.direct.memory.allocation = true
> MaxDirectMemorySize = 25 GB
>
> I am querying a solr collection with index size = 500 MB per core.


I see that you and I have traded messages before on the list.

How much total system memory is there per server?  How many of these
500MB cores are on each server?  How many docs are in a 500MB core?  The
answers to these questions may affect the other advice that I give you.

> The off-heap (25 GB) is huge so that it can load the entire index.

I still know very little about how HDFS handles caching and memory.  You
want to be sure that as much data as possible from your indexes is
sitting in local memory on the server.

> Using cursor approach (number of rows = 100K), I read 2 fields (Total 40
> bytes per solr doc) from the Solr docs that satisfy the query. The docs are 
> sorted by "id" and then by those 2 fields.
>
> I am not able to understand why the heap memory is getting full and Full
> GCs are consecutively running with long GC pauses (> 30 seconds). I am
> using CMS GC.

A 20GB heap is quite large.  Do you actually need it to be that large? 
If you graph JVM heap usage over a long period of time, what are the low
points in the graph?

A result containing 100K docs is going to be pretty large, even with a
limited number of fields.  It is likely to be several megabytes.  It
will need to be entirely built in the heap memory before it is sent to
the client -- both as Lucene data structures (which will probably be
much larger than the actual response due to Java overhead) and as the
actual response format.  Then it will be garbage as soon as the response
is done.  Repeat this enough times, and you're going to go through even
a 20GB heap pretty fast, and need a full GC.  Full GCs on a 20GB heap
are slow.

You could try switching to G1, as long as you realize that you're going
against advice from Lucene experts.... but honestly, I do not expect
this to really help, because you would probably still need full GCs due
to the rate that garbage is being created.  If you do try it, I would
strongly recommend the latest Java 8, either Oracle or OpenJDK.  Here's
my wiki page where I discuss this:

https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector

Reducing the heap size (which may not be possible -- need to know the
answer to the question about memory graphing) and reducing the number of
rows per query are the only quick solutions I can think of.

Thanks,
Shawn

Re: Long GC pauses while reading Solr docs using Cursor approach

Reply via email to