On 12/2/2017 8:43 AM, Dominique Bejean wrote:
I would like to have some advices on best practices related to Heap Size,
MMap, direct memory, GC algorithm and OS Swap.

For the most part, there is no generic advice we can give you for these things. What you need is going to be highly dependent on exactly what you are doing with Solr and how much index data you have. There are no formulas for calculating these values based on information about your setup.

Experienced Solr users can make *guesses* if you provide some information, but those guesses might turn out the be completely wrong.

https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

About JVM heap size setting

JVM heap size setting is related to use case so there is no other advice
than reduce it at the minimum possible size in order to avoid GC issue.
Reduce Heap size at is minimum will be achieved mainly by :

The max heap size should be as large as you need, and no larger. Figuring out what you need may require trial and error on an installation that has all the index data and is receiving production queries.

On this wiki page, I wrote a small section about one way you MIGHT be able to figure out what heap size you need:

https://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F

    Optimize schema by remove unused fields and not index / store fields if
    it is not necessary
    -

    Enable docValues on fields used for facetting, sorting and grouping
    -

    Not oversize Solr cache
    -

    Be careful with rows and fl query parameters

These are good ideas. But sometimes you find that you need a lot of fields, and you need a lot of them to be stored. The schema and config should be designed around what you need Solr to do. Designing them for the lowest possible memory usage might result in a config that doesn't do what you want.

About MMap setting

According to the great article “
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html”
from Uwe Schindler, the only tasks that have to be done at OS settings
level is check that “ulimit -v” and “ulimit -m” both report “unlimited” and
increase vm.max_map_count setting from is default 65536.

The default directory implementation that recent Solr versions use is NRTCachingDirectoryFactory. This wraps another implementation with a small memory cache. The implementation that is wrapped by default DOES use MMAP.

The amount of memory used for caching MMAP access cannot be configured in the application. The OS handles that caching completely automatically, without any configuration at all. All modern operating systems are designed so that the disk cache can use *all* available memory in the system. This is because the cache will instantly give up memory if a program requests it. The cache never keeps memory that programs want.

I suppose the best value is related to available off heap memory. I
generally set it to 262144. Is it a good value or is there a better way to
determine this value ?

Solr doesn't use any off heap memory as far as I'm aware. There was a fork of Solr for a short time named heliosearch, which DID use off-heap memory. Java itself will use some off-heap memory for its own operation. I do not know whether that is configurable, and if so, how it's done.

About Direct Memory

According to a response in Solr Maillig list from Uwe Schindler (again), I
understand that the MmapDirectory is not Direct Memory.

The only place where I read that MaxDirectMemorySize JVM setting have to be
set for Solr is in Cloudera blog post and in Solr mailing list when using
Solr with HDFS.

Is it necessary to change the default MaxDirectMemorySize JVM setting ? If
yes, how to determine the appropriate value ?

I have never heard of this "direct memory." Solr probably doesn't use it. I really have no idea what happens when the index is in HDFS. You'd have to ask somebody who knows Hadoop.

About OS Swap setting

Linux generally starts swapping when less than 30% of the memory is free.
In order to avoid OS goes against Solr for off heap memory management,  I
use to change OS swappiness value to 0. Can you confirm it is a good thing ?

If the OS starts swapping, performance of everything on the machine is going to drop significantly. Setting swappiness to 0 is probably a good idea. Most Linux distributions default to 60 here, which means the OS is going to aggressively start swapping anything it thinks isn't being used, even before memory pressure becomes extreme.

About CMS GC vs G1 GC

Default Solr setting use CMS GC.

According to the post from Shawn Heisey in the old Solr wiki (
https://wiki.apache.org/solr/ShawnHeisey), can we consider that G1 GC can
definitely be used with Solr for heap size over nearly 4Gb ?

I've never had any problems with G1, and my experiments suggest that it does a better job of reducing GC pauses than CMS does, if it is tuned correctly. Just enabling G1 isn't much better than Java's defaults, and Solr's CMS settings are definitely better than untuned G1.

Thanks,
Shawn

Reply via email to