Re: Java GC issue investigation

Walter Underwood Wed, 07 Oct 2020 08:17:08 -0700

First thing is to stop using CMS and use G1GC.

We’ve been using these settings with over a hundred machines
in prod for nearly four years.


SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 7, 2020, at 2:39 AM, Karol Grzyb <grz...@gmail.com> wrote:
> 
> Hi Matthew, Erick!
> 
> Thank you very much for the feedback, I'll try to convince them to
> reduce the heap size.
> 
> current GC settings:
> 
> -XX:+CMSParallelRemarkEnabled
> -XX:+CMSScavengeBeforeRemark
> -XX:+ParallelRefProcEnabled
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC
> -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:ConcGCThreads=4
> -XX:MaxTenuringThreshold=8
> -XX:NewRatio=3
> -XX:ParallelGCThreads=4
> -XX:PretenureSizeThreshold=64m
> -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90
> 
> Kind regards,
> Karol
> 
> 
> wt., 6 paź 2020 o 16:52 Erick Erickson <erickerick...@gmail.com> napisał(a):
>> 
>> 12G is not that huge, it’s surprising that you’re seeing this problem.
>> 
>> However, there are a couple of things to look at:
>> 
>> 1> If you’re saying that you have 16G total physical memory and are 
>> allocating 12G to Solr, that’s an anti-pattern. See:
>> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>> If at all possible, you should allocate between 25% and 50% of your physical 
>> memory to Solr...
>> 
>> 2> what garbage collector are you using? G1GC might be a better choice.
>> 
>>> On Oct 6, 2020, at 10:44 AM, matthew sporleder <msporle...@gmail.com> wrote:
>>> 
>>> Your index is so small that it should easily get cached into OS memory
>>> as it is accessed.  Having a too-big heap is a known problem
>>> situation.
>>> 
>>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-HowmuchheapspacedoIneed?
>>> 
>>> On Tue, Oct 6, 2020 at 9:44 AM Karol Grzyb <grz...@gmail.com> wrote:
>>>> 
>>>> Hi Matthew,
>>>> 
>>>> Thank you for the answer, I cannot reproduce the setup locally I'll
>>>> try to convince them to reduce Xmx, I guess they will rather not agree
>>>> to 1GB but something less than 12G for sure.
>>>> And have some proper dev setup because for now we could only test prod
>>>> or stage which are difficult to adjust.
>>>> 
>>>> Is being stuck in GC common behaviour when the index is small compared
>>>> to available heap during bigger load? I was more worried about the
>>>> ratio of heap to total host memory.
>>>> 
>>>> Regards,
>>>> Karol
>>>> 
>>>> 
>>>> wt., 6 paź 2020 o 14:39 matthew sporleder <msporle...@gmail.com> 
>>>> napisał(a):
>>>>> 
>>>>> You have a 12G heap for a 200MB index?  Can you just try changing Xmx
>>>>> to, like, 1g ?
>>>>> 
>>>>> On Tue, Oct 6, 2020 at 7:43 AM Karol Grzyb <grz...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I'm involved in investigation of issue that involves huge GC overhead
>>>>>> that happens during performance tests on Solr Nodes. Solr version is
>>>>>> 6.1. Last test were done on staging env, and we run into problems for
>>>>>> <100 requests/second.
>>>>>> 
>>>>>> The size of the index itself is ~200MB ~ 50K docs
>>>>>> Index has small updates every 15min.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Queries involve sorting and faceting.
>>>>>> 
>>>>>> I've gathered some heap dumps, I can see from them that most of heap
>>>>>> memory is retained because of object of following classes:
>>>>>> 
>>>>>> -org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector
>>>>>> (>4G, 91% of heap)
>>>>>> -org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector$SearchGroupDocs
>>>>>> -org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue
>>>>>> -org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector
>>>>>> (>3.7G 76% of heap)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Based on information above is there anything generic that can been
>>>>>> looked at as source of potential improvement without diving deeply
>>>>>> into schema and queries (which may be very difficlut to change at this
>>>>>> moment)? I don't see docvalues being enabled - could this help, as if
>>>>>> I get the docs correctly, it's specifically helpful when there are
>>>>>> many sorts/grouping/facets? Or I
>>>>>> 
>>>>>> Additionaly I see, that many threads are blocked on LRUCache.get,
>>>>>> should I recomend switching to FastLRUCache?
>>>>>> 
>>>>>> Also, I wonder if -Xmx12288m for java heap is not too much for 16G
>>>>>> memory? I see some (~5/s) page faults in Dynatrace during the biggest
>>>>>> traffic.
>>>>>> 
>>>>>> Thank you very much for any help,
>>>>>> Kind regards,
>>>>>> Karol
>>

Re: Java GC issue investigation

Reply via email to