My comments are inserted in-line this time. Thanks for the
amplifications Shawn.
On 27/05/2019 17:39, Shawn Heisey wrote:
On 5/27/2019 9:49 AM, Joe Doupnik wrote:
A few more numbers to contemplate. An experiment here, adding 80
PDF and PPTX files into an empty index.
Solr v8.0 regular settings, 1.7GB quiesent memory consumption, 1.9GB
while indexing, 2.92 minutes to do the job.
Solr v8.0, using GC_TUNE from v8.1 solr.in.sh, 1.1GB quiesent, 1.3GB
while indexing, 2.97 minutes.
Solr v8.1, regular settings, 4.3GB quiesent, 4.4GB while indexing,
1.67 minutes
Solr v8.1, using GC_TUNE from v8.1 solr.in.sh, 1.0GB quiesent, 1.3GB
while indexing, 1.53 minutes
It is clear that the GC_TUNE settings from v8.1 are beneficial
to v8.0, saving about 600MB of memory. That's not small change.
Well, the numbers observed here tell a slightly different story:
TUNEing can help Solr v8.0. Confirmatory values from other folks would
be good to have. The memory concerned is what is taken from the system
as real memory, and the rest of the system is directly affected by that.
Java can subdivide its part as it wishes.
Yes, the TUNE values were from Solr v8.1. To me that says those
values are late arriving for v8.0 and prior, but we have them now and
can use them to save system resources. Also, it means that Solr v8.1's
GC1 needs more baking time; the new GC is not quite ready for normal
production work (to put it mildly).
GC tuning will not change the amount of memory the program needs. It
*can't* change it. All it can do is affect how the garbage collector
works. Different collectors can result in differences in how much
memory an outside observer will see allocated, because one may be more
aggressive about early collection than the other, but the amount of
heap actually required by the program will not change.
The commented out GC_TUNE settings in the 8.1 "bin/solr.in.sh" file
are the old CMS settings that earlier versions of Solr used.
When you tell a Java program that it is allowed to use 4GB of memory,
it's going to use that memory. Eventually. Maybe not in three
minutes, but eventually. Even the settings that you are seeing use
less memory WILL eventually use all of it that they have been
allowed. That is the nature of Java.
Data here says there is a quiesent consumption value, a higher one
during intensive indexing, and a smaller one during routine query
handling. The point is the consumption peaks go away, memory is returned
to the system. That's what garbage collection is all about.
Also clear is that Solr v8.1 is slightly faster than v8.0 when
both use those TUNE values. A hidden benefit.
Without GC_TUNE settings Solr v8.1 shows its appetite for much
memory, several GB's more than v8.0.
The CMS collector will be removed from Java at some point in the
future. We can't use it any more.
Meanwhile we in the field can improve our current systems with the
TUNE settings. Solr v8.1 isn't ready yet for that workload, in my opinion.
The latency discussion below is in need of hard experimental
evidence. That does not mean your analysis is incorrect, but rather we
simply don't know and ought not make decisions based on such
assumptions. I look forward to seeing decent test results.
Thanks,
Joe D.
When you note that for a given sequential process, certain settings
accomplishing that process faster, that's a measure of throughput --
how much data is pushed through in a given timeframe. We really don't
care about that metric for Solr. We care about latency. Let's say
that setting 1 produces a typical processing time per request of 90
milliseconds, and setting 2 produces a typical processing time per
request of 100 milliseconds. You might think setting 1 is better.
But what if 1 percent of the requests with setting 1 take ten seconds,
and EVERY request with setting 2 takes 120 milliseconds or less? As a
project, we are going to prefer setting 2. That's not a theoretical
situation -- it's how things really work out with different garbage
collectors, and it's why Solr has the default settings that it does.
Shawn