An interesting supplement to this discussion. The experiment this
time was use Solr v8.1, omit the GC_TUNE items, but instead adjust
SOLR_HEAP. I had set the heap to 4GB, based on good intentions, and as
we have seen Solr v8.1 gobbles it up and does not return a farthing.
Thus I tried indexing a large (2600 docs) collection of .pdfs, .ppt, etc
files, but with the heap size gradually reduced from 4GB to 1GB. That
worked smoothly, and while indexing Solr is consuming about 1.5/1.6GB
and working hard. So, if a little is good then less must be better, yes?
512MB is too little and Solr barely starts and then shuts down. 1GB
seems to be a safe value for the heap, and no GC_TUNE settings. This is
true on my machines for both Oracle jdk 1.8 and openjdk 10.
In passing, recommendations on the net suggest watching the action
via jconsole (in the Oracle jdk bundle and in the openjdk material).
Well, it has pretty pictures and many numbers which are far far away
from the basic values we see with top and ps aux | grep solr. Not
useful, even less believable if one asks my simple consumption question.
So then, this leaves us with the usual question of just how much
heap space does a Java app require. The answer seems to be no one really
knows, only experiments will reveal practical values.
Thus we choose a heap value tested to be safe and observe the
persisting use of that value until Solr is restarted and then consumes a
smaller amount sufficient for answering queries rather than indexing
files. If the openjdk folks get their reduction work (below) into our
hands then idle memory may shrink further.
In closing, Solr v8.1 has one very nice advantage over its
predecessors: indexing speed, about double that of v8.0.
Thanks,
Joe D.
On 27/05/2019 18:38, Joe Doupnik wrote:
An interesting note on the memory returning issue for the G1
collector.
https://openjdk.java.net/jeps/346
Entitled "JEP 346: Promptly Return Unused Committed Memory from G1"
with a summary saying "Enhance the G1 garbage collector to
automatically return Java heap memory to the operating system when idle."
It goes on to say the following, and more:
"Motivation
Currently the G1 garbage collector may not return committed Java heap
memory to the operating system in a timely manner. G1 only returns
memory from the Java heap at either a full GC or during a concurrent
cycle. Since G1 tries hard to completely avoid full GCs, and only
triggers a concurrent cycle based on Java heap occupancy and
allocation activity, it will not return Java heap memory in many cases
unless forced to do so externally.
This behavior is particularly disadvantageous in container
environments where resources are paid by use. Even during phases where
the VM only uses a fraction of its assigned memory resources due to
inactivity, G1 will retain all of the Java heap. This results in
customers paying for all resources all the time, and cloud providers
not being able to fully utilize their hardware.
If the VM were able to detect phases of Java heap under-utilization
("idle" phases), and automatically reduce its heap usage during that
time, both would benefit.
Shenandoah and OpenJ9's GenCon collector already provide similar
functionality.
Tests with a prototype in Bruno et al., section 5.5, shows that based
on the real-world utilization of a Tomcat server that serves HTTP
requests during the day, and is mostly idle during the night, this
solution can reduce the amount of memory committed by the Java VM by
85%."
Please read the full web page to have a rounded view of that
discussion.
Thanks,
Joe D.
On 27/05/2019 18:17, Joe Doupnik wrote:
My comments are inserted in-line this time. Thanks for the
amplifications Shawn.
On 27/05/2019 17:39, Shawn Heisey wrote:
On 5/27/2019 9:49 AM, Joe Doupnik wrote:
A few more numbers to contemplate. An experiment here, adding
80 PDF and PPTX files into an empty index.
Solr v8.0 regular settings, 1.7GB quiesent memory consumption,
1.9GB while indexing, 2.92 minutes to do the job.
Solr v8.0, using GC_TUNE from v8.1 solr.in.sh, 1.1GB quiesent,
1.3GB while indexing, 2.97 minutes.
Solr v8.1, regular settings, 4.3GB quiesent, 4.4GB while indexing,
1.67 minutes
Solr v8.1, using GC_TUNE from v8.1 solr.in.sh, 1.0GB quiesent,
1.3GB while indexing, 1.53 minutes
It is clear that the GC_TUNE settings from v8.1 are beneficial
to v8.0, saving about 600MB of memory. That's not small change.
Well, the numbers observed here tell a slightly different story:
TUNEing can help Solr v8.0. Confirmatory values from other folks
would be good to have. The memory concerned is what is taken from the
system as real memory, and the rest of the system is directly
affected by that. Java can subdivide its part as it wishes.
Yes, the TUNE values were from Solr v8.1. To me that says those
values are late arriving for v8.0 and prior, but we have them now and
can use them to save system resources. Also, it means that Solr
v8.1's GC1 needs more baking time; the new GC is not quite ready for
normal production work (to put it mildly).
GC tuning will not change the amount of memory the program needs.
It *can't* change it. All it can do is affect how the garbage
collector works. Different collectors can result in differences in
how much memory an outside observer will see allocated, because one
may be more aggressive about early collection than the other, but
the amount of heap actually required by the program will not change.
The commented out GC_TUNE settings in the 8.1 "bin/solr.in.sh" file
are the old CMS settings that earlier versions of Solr used.
When you tell a Java program that it is allowed to use 4GB of
memory, it's going to use that memory. Eventually. Maybe not in
three minutes, but eventually. Even the settings that you are
seeing use less memory WILL eventually use all of it that they have
been allowed. That is the nature of Java.
Data here says there is a quiesent consumption value, a higher
one during intensive indexing, and a smaller one during routine query
handling. The point is the consumption peaks go away, memory is
returned to the system. That's what garbage collection is all about.
Also clear is that Solr v8.1 is slightly faster than v8.0 when
both use those TUNE values. A hidden benefit.
Without GC_TUNE settings Solr v8.1 shows its appetite for much
memory, several GB's more than v8.0.
The CMS collector will be removed from Java at some point in the
future. We can't use it any more.
Meanwhile we in the field can improve our current systems with
the TUNE settings. Solr v8.1 isn't ready yet for that workload, in my
opinion.
The latency discussion below is in need of hard experimental
evidence. That does not mean your analysis is incorrect, but rather
we simply don't know and ought not make decisions based on such
assumptions. I look forward to seeing decent test results.
Thanks,
Joe D.
When you note that for a given sequential process, certain settings
accomplishing that process faster, that's a measure of throughput --
how much data is pushed through in a given timeframe. We really
don't care about that metric for Solr. We care about latency. Let's
say that setting 1 produces a typical processing time per request of
90 milliseconds, and setting 2 produces a typical processing time
per request of 100 milliseconds. You might think setting 1 is
better. But what if 1 percent of the requests with setting 1 take
ten seconds, and EVERY request with setting 2 takes 120 milliseconds
or less? As a project, we are going to prefer setting 2. That's
not a theoretical situation -- it's how things really work out with
different garbage collectors, and it's why Solr has the default
settings that it does.
Shawn