[jira] [Commented] (SOLR-13862) JDK 13 stability/recovery problems

Uwe Schindler (Jira) Fri, 25 Oct 2019 01:20:07 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959540#comment-16959540
 ]


Uwe Schindler commented on SOLR-13862:
--------------------------------------

Hi,
the reason why G1GC pauses are slow is the large heap size.
In general, Solr never ever needs much heap space, unless your have *many* 
indexes and you don't use docvalues for all fields that your sort or 
aggregate/facet on. You can easily run an huge index with 100 Gigabytes on a 
node with 8 GiB of heap.

The problems only start when you have many concurrent requests.

I'd try the following:
- Go with G1GC.
- Reduce heap as much as possible.
- Check your schema and enable docvalues for all fields that are used for 
aggregations or sorting. It's easy to find out if a field does not use 
docvalues that should do: If you go to cache statistics and look into 
FieldCache: In an ideal configuration, the Fieldcache should be empty. Every 
field showing up there should have docvalues enabled and then it disappears. 
Only backside: You need to reindex to get the docvalues persisted.
- To get best performance on your index have as much as possible of free buffer 
space available, because most of Lucene's index is memory mapped outside of 
heap. If there is not enough pysical RAM available, the searcher will swap 
in/out pages all the time, while heap space is mostly unused. If the index on 
disk fits into the remaining buffer space next to heap, it's ideal. This is the 
reason why heap should be as small as possible.

I will keep this issue open to enable Sheanandoah for OpenJDK versions that 
support it on Policeman Jenkins.

> JDK 13 stability/recovery problems
> ----------------------------------
>
>                 Key: SOLR-13862
>                 URL: https://issues.apache.org/jira/browse/SOLR-13862
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 8.2
>            Reporter: Bernd Wahlen
>            Priority: Major
>
> after updating my cluster (centos 7.7, solr 8.2, jdk12) to JDK13 (3 nodes, 4 
> collections, 1 shard) everything was running good (with lower p95) for some 
> hours. Then 2 nodes (not the leader) going to recovery state, but ~"Recovery 
> failed Error opening new searcher". I tried rolling restart the cluster, but 
> recovery is not working. After i switched to jdk11 recovery works again. In 
> summary jdk11 or jdk12 was running stable, jdk13 not.
> This is my solr.in.sh:
> GC_TUNE="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC"
>  SOLR_TIMEZONE="CET"
>  
> GC_LOG_OPTS="-Xlog:gc*:file=/var/log/solr/solr_gc.log:time:filecount=9,filesize=20M:safepoint"
> I also tried ADDREPLICA during my attempt to reapair the cluster, which 
> causes Out of Memory on JDK 13 and worked after going back to JDK 11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13862) JDK 13 stability/recovery problems

Reply via email to