[jira] [Commented] (SOLR-13862) JDK 13 stability/recovery problems

Uwe Schindler (Jira) Thu, 24 Oct 2019 09:17:56 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959006#comment-16959006
 ]


Uwe Schindler commented on SOLR-13862:
--------------------------------------

It looks like you are using a way too big heap. When using such a size and this 
GC, there are several problems:
- Actually this uses large object pointers, increasing heap usage by 
approximately 30-40% (due to larger pointers with many small objects). It's 
better to go below the 31 GiB limit. As soon as you use 32 GiB it has the same 
effect as 20 GiB effectively. You would need to go to 48 GiB to have the same 
effect like 31.9999 GiB.
- Shennendoah is not tested with Lucene/Solr at all. We have no yet setup 
builds for this. I will work on that on policeman jenkins. In addition as there 
is more parallelization, the CPU usage of Shenandoah in comparison to G1GC or 
CMS is much higher. Lucene needs more a throughput based collector, because it 
creates not many objects to be GCed, but requires a lot of CPU, so Shenandoah 
slows you down because of additional memory barriers.
- Heap size should never be larger that half of system memory, better only 25%. 
If You allocate too much risk of OOM is much higher because more work is spent 
in GC and may fall behind. Also you are stealing memory pages from the 
filesystem cache, which is needed for memory mapping of index files to work 
correctly. In short unless your machine has 256 GiB of physical RAM, a 64 GiB 
heap is a catastrophic idea.

Can you explain why you need a so large heap?

> JDK 13 stability/recovery problems
> ----------------------------------
>
>                 Key: SOLR-13862
>                 URL: https://issues.apache.org/jira/browse/SOLR-13862
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 8.2
>            Reporter: Bernd Wahlen
>            Priority: Major
>
> after updating my cluster (centos 7.7, solr 8.2, jdk12) to JDK13 (3 nodes, 4 
> collections, 1 shard) everything was running good (with lower p95) for some 
> hours. Then 2 nodes (not the leader) going to recovery state, but ~"Recovery 
> failed Error opening new searcher". I tried rolling restart the cluster, but 
> recovery is not working. After i switched to jdk11 recovery works again. In 
> summary jdk11 or jdk12 was running stable, jdk13 not.
> This is my solr.in.sh:
> GC_TUNE="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC"
>  SOLR_TIMEZONE="CET"
>  
> GC_LOG_OPTS="-Xlog:gc*:file=/var/log/solr/solr_gc.log:time:filecount=9,filesize=20M:safepoint"
> I also tried ADDREPLICA during my attempt to reapair the cluster, which 
> causes Out of Memory on JDK 13 and worked after going back to JDK 11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13862) JDK 13 stability/recovery problems

Reply via email to