[ https://issues.apache.org/jira/browse/SOLR-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959540#comment-16959540 ]
Uwe Schindler commented on SOLR-13862: -------------------------------------- Hi, the reason why G1GC pauses are slow is the large heap size. In general, Solr never ever needs much heap space, unless your have *many* indexes and you don't use docvalues for all fields that your sort or aggregate/facet on. You can easily run an huge index with 100 Gigabytes on a node with 8 GiB of heap. The problems only start when you have many concurrent requests. I'd try the following: - Go with G1GC. - Reduce heap as much as possible. - Check your schema and enable docvalues for all fields that are used for aggregations or sorting. It's easy to find out if a field does not use docvalues that should do: If you go to cache statistics and look into FieldCache: In an ideal configuration, the Fieldcache should be empty. Every field showing up there should have docvalues enabled and then it disappears. Only backside: You need to reindex to get the docvalues persisted. - To get best performance on your index have as much as possible of free buffer space available, because most of Lucene's index is memory mapped outside of heap. If there is not enough pysical RAM available, the searcher will swap in/out pages all the time, while heap space is mostly unused. If the index on disk fits into the remaining buffer space next to heap, it's ideal. This is the reason why heap should be as small as possible. I will keep this issue open to enable Sheanandoah for OpenJDK versions that support it on Policeman Jenkins. > JDK 13 stability/recovery problems > ---------------------------------- > > Key: SOLR-13862 > URL: https://issues.apache.org/jira/browse/SOLR-13862 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Affects Versions: 8.2 > Reporter: Bernd Wahlen > Priority: Major > > after updating my cluster (centos 7.7, solr 8.2, jdk12) to JDK13 (3 nodes, 4 > collections, 1 shard) everything was running good (with lower p95) for some > hours. Then 2 nodes (not the leader) going to recovery state, but ~"Recovery > failed Error opening new searcher". I tried rolling restart the cluster, but > recovery is not working. After i switched to jdk11 recovery works again. In > summary jdk11 or jdk12 was running stable, jdk13 not. > This is my solr.in.sh: > GC_TUNE="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC" > SOLR_TIMEZONE="CET" > > GC_LOG_OPTS="-Xlog:gc*:file=/var/log/solr/solr_gc.log:time:filecount=9,filesize=20M:safepoint" > I also tried ADDREPLICA during my attempt to reapair the cluster, which > causes Out of Memory on JDK 13 and worked after going back to JDK 11. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org