On 5/27/2019 9:49 AM, Joe Doupnik wrote:
    A few more numbers to contemplate. An experiment here, adding 80 PDF and PPTX files into an empty index.

Solr v8.0 regular settings, 1.7GB quiesent memory consumption, 1.9GB while indexing, 2.92 minutes to do the job. Solr v8.0, using GC_TUNE from v8.1 solr.in.sh, 1.1GB quiesent, 1.3GB while indexing,  2.97 minutes. Solr v8.1, regular settings, 4.3GB quiesent, 4.4GB while indexing, 1.67 minutes Solr v8.1, using GC_TUNE from v8.1 solr.in.sh, 1.0GB quiesent, 1.3GB while indexing, 1.53 minutes

    It is clear that the GC_TUNE settings from v8.1 are beneficial to v8.0, saving about 600MB of memory. That's not small change.

GC tuning will not change the amount of memory the program needs. It *can't* change it. All it can do is affect how the garbage collector works. Different collectors can result in differences in how much memory an outside observer will see allocated, because one may be more aggressive about early collection than the other, but the amount of heap actually required by the program will not change.

The commented out GC_TUNE settings in the 8.1 "bin/solr.in.sh" file are the old CMS settings that earlier versions of Solr used.

When you tell a Java program that it is allowed to use 4GB of memory, it's going to use that memory. Eventually. Maybe not in three minutes, but eventually. Even the settings that you are seeing use less memory WILL eventually use all of it that they have been allowed. That is the nature of Java.

    Also clear is that Solr v8.1 is slightly faster than v8.0 when both use those TUNE values. A hidden benefit.     Without GC_TUNE settings Solr v8.1 shows its appetite for much memory, several GB's more than v8.0.

The CMS collector will be removed from Java at some point in the future. We can't use it any more.

When you note that for a given sequential process, certain settings accomplishing that process faster, that's a measure of throughput -- how much data is pushed through in a given timeframe. We really don't care about that metric for Solr. We care about latency. Let's say that setting 1 produces a typical processing time per request of 90 milliseconds, and setting 2 produces a typical processing time per request of 100 milliseconds. You might think setting 1 is better. But what if 1 percent of the requests with setting 1 take ten seconds, and EVERY request with setting 2 takes 120 milliseconds or less? As a project, we are going to prefer setting 2. That's not a theoretical situation -- it's how things really work out with different garbage collectors, and it's why Solr has the default settings that it does.

Shawn

Reply via email to