Re: Replica is going into recovery in Solr 6.1.0

Walter Underwood Fri, 14 Feb 2020 07:49:51 -0800

I don’t see anything in your description that requires a large heap. This is a 
terrible JVM configuration.


Do this:

* Use the GC configuration I recommended, with an 8 GB heap.
* Run one copy of Solr. That hosts both shard1 and shard1.

That increases the RAM available for OS and file buffers from 38 GB to 158 GB.

Solr is slow when the index must be fetched from disk. It is fast when the 
index can be cached in RAM file buffers. Assuming 2 GB for OS and other demons, 
right now you can fit 17% of the two indexes in file buffers. With a single 
smaller JVM, you can fit 74% of the indexes into RAM. This should make a huge 
speed difference. You’ll also see GC pauses of 200 ms or less.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 13, 2020, at 9:40 PM, vishal patel <vishalpatel200...@outlook.com> 
> wrote:
> 
> Total memory of server is 256 GB and in this server below application running
> Application1             50 GB
> Application2             30 GB
> Application3               8 GB
> Application4               2 GB
> Solr shard1                64 GB
> Solr shard2 replica   64 GB
> 
> Note: Solr shard2 and shard1 replica running on another server. Normally 35 
> to 40 GB memory constant usage in one solr instance so we keep the 64 GB. We 
> are using NRT.
> 
> How big are your indexes on disk? - [Shard1-115 Gb, shard2 replica-96 GB] 
> [shard1 replica-114 GB, shard2-100GB]
> How many docs per replica?            - Approx 30959714 docs
> How many replicas per host?           - One server has one shard and one 
> replica.
> 
> Regards,
> Vishal
> 
> Sent from Outlook<http://aka.ms/weboutlook>
> ________________________________
> From: Erick Erickson <erickerick...@gmail.com>
> Sent: Friday, February 14, 2020 4:00 AM
> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
> Subject: Re: Replica is going into recovery in Solr 6.1.0
> 
> What Walter said. Also, you _must_ leave quite a bit of free RAM for the OS 
> due to Lucene using MMapDirectory space, see:
> 
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> 
> Basically until you can get your GC pauses under control, you’ll have an 
> unstable collection.
> 
> How big are your indexes on disk? How many docs per replica? How many 
> replicas per host?
> 
> Best,
> Erick
> 
>> On Feb 13, 2020, at 5:16 PM, Walter Underwood <wun...@wunderwood.org> wrote:
>> 
>> You have a 64GB heap. That is extremely unusual. You can only do that if the 
>> instance has 80 GB or more of RAM. If you don’t have enough RAM, the JVM 
>> will start using swap space and cause extremely long GC pauses.
>> 
>> How much RAM do you have?
>> 
>> How did you choose these GC settings?
>> 
>> We have been using these settings with Java 8 in prod for three years with 
>> no GC problems.
>> 
>> SOLR_HEAP=8g
>> # Use G1 GC  -- wunder 2017-01-23
>> # Settings from https://wiki.apache.org/solr/ShawnHeisey
>> GC_TUNE=" \
>> -XX:+UseG1GC \
>> -XX:+ParallelRefProcEnabled \
>> -XX:G1HeapRegionSize=8m \
>> -XX:MaxGCPauseMillis=200 \
>> -XX:+UseLargePages \
>> -XX:+AggressiveOpts \
>> “
>> 
>> If you don’t have a very, very good reason for your GC settings, use these 
>> instead.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Feb 12, 2020, at 10:47 PM, vishal patel <vishalpatel200...@outlook.com> 
>>> wrote:
>>> 
>>> My configuration:
>>> 
>>> -XX:+AggressiveOpts -XX:ConcGCThreads=12 -XX:G1HeapRegionSize=33554432 
>>> -XX:G1ReservePercent=20 -XX:InitialHeapSize=68719476736 
>>> -XX:InitiatingHeapOccupancyPercent=10 -XX:+ManagementServer 
>>> -XX:MaxHeapSize=68719476736 -XX:ParallelGCThreads=36 
>>> -XX:+ParallelRefProcEnabled -XX:PrintFLSStatistics=1 -XX:+PrintGC 
>>> -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps 
>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC 
>>> -XX:+PrintTenuringDistribution -XX:ThreadStackSize=256 -XX:+UseG1GC 
>>> -XX:-UseLargePages -XX:-UseLargePagesIndividualAllocation 
>>> -XX:+UseStringDeduplication
>>> 
>>> Sent from Outlook<http://aka.ms/weboutlook>
>>> ________________________________
>>> From: Rajdeep Sahoo <rajdeepsahoo2...@gmail.com>
>>> Sent: Thursday, February 13, 2020 10:03 AM
>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
>>> Subject: Re: Replica is going into recovery in Solr 6.1.0
>>> 
>>> What is your memory configuration
>>> 
>>> On Thu, 13 Feb, 2020, 9:46 AM vishal patel, <vishalpatel200...@outlook.com>
>>> wrote:
>>> 
>>>> Is there anyone looking at this?
>>>> 
>>>> Sent from Outlook<http://aka.ms/weboutlook>
>>>> ________________________________
>>>> From: vishal patel <vishalpatel200...@outlook.com>
>>>> Sent: Wednesday, February 12, 2020 3:45 PM
>>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
>>>> Subject: Replica is going into recovery in Solr 6.1.0
>>>> 
>>>> I am using solr version 6.1.0, Java 8 version and G1gc on production. We
>>>> have 2 shards and each shard has 1 replica. Suddenly one replica is going
>>>> into recovery mode and Requests become slow in our production.
>>>> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on
>>>> that time and also multiple times minor GC pauses.
>>>> 
>>>> My logs :
>>>> 
>>>> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
>>>> 
>>>> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>>>> 
>>>> I do not know why long GC pause time happened. In our platform heavy
>>>> searching and indexing is performed.
>>>> long GC pause times happen due to searching or indexing?
>>>> If GC pause time long then why replica is going into recovery? can we set
>>>> the waiting time of update request?
>>>> what is the minimum GC pause time for going into recovery mode?
>>>> 
>>>> It is useful for my problem? :
>>>> https://issues.apache.org/jira/browse/SOLR-9310
>>>> 
>>>> Regards,
>>>> Vishal Patel
>>>> 
>>>> Sent from Outlook<http://aka.ms/weboutlook>
>>>> 
>> 
>

Re: Replica is going into recovery in Solr 6.1.0

Reply via email to