Thank you for the detailed response Shawn! I've read it several times.
Yes, that particular machine has 12 cores that are hyper-threaded. Does
Solr do something special when not running in HDFS to allocate memory
that would result in VIRT showing memory required for index data size?
In my experience the VIRT shows (for java anyway) what the JVM wanted to
allocate. If I specify -Xms75G, VIRT will show 75G, but RES may show
much less if the program doesn't do anything.
For example, I wrote a program that sleeps and then exits. If I run it
with java --Xms75G -jar blah.jar, top reports a VIRT of ~80G (notice PID
29566)
top - 17:09:05 up 50 days, 4:24, 2 users, load average: 9.82, 11.31,
12.41
Tasks: 410 total, 1 running, 409 sleeping, 0 stopped, 0 zombie
Cpu(s): 39.8%us, 0.7%sy, 20.1%ni, 39.3%id, 0.0%wa, 0.0%hi, 0.2%si,
0.0%st
Mem: 82505408k total, 76160560k used, 6344848k free, 356212k buffers
Swap: 33554428k total, 115756k used, 33438672k free, 14011992k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14260 solr 20 0 41.6g 33g 19m S 629.4 42.3 57412:59 java
29566 joeo 20 0 80.2g 275m 12m S 0.3 0.3 0:00.93 java
Note that the OS didn't actually give PID 29566 80G of memory, it
actually gave it 275m. Right? Thanks again!
-Joe
On 8/18/2017 4:15 PM, Shawn Heisey wrote:
On 8/18/2017 1:05 PM, Joe Obernberger wrote:
Thank you Shawn. Please see:
http://www.lovehorsepower.com/Vesta
for screen shots of top
(http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg) and
several screen shots over various times of jvisualvm.
There is also the GC log and the regular solr.log for one server
(named Vesta). Please note that we are using HDFS for storage. I
love top, but also use htop and atop as they show additional
information. In general we are RAM limited and therefore do not have
much cache for OS/disk as we would like, but this issue is CPU
related. After restarting the one node, the CPU usage stayed low for
a while, but then eventually comes up to ~800% where it will stay.
Your GC log does not show any evidence of extreme GC activity. The
longest pause in the whole thing is 1.4 seconds, and the average pause
is only seven milliseconds. Looking at percentile statistics, GC
performance is amazing, especially given the rather large heap size.
Problems with insufficient disk caching memory do frequently manifest as
high CPU usage, because that situation will require waiting on I/O.
When the CPU spends a lot of time in iowait, total CPU usage tends to be
very high. The iowait CPU percentage on the top output when that
screenshot was taken was 8.5. This sounds like a small number, but in
fact it is quite high. Very healthy Solr installs will have an
extremely low iowait percentage -- possibly zero -- because they will
rarely read off the disk. I can see that on the atop screenshot, iowait
percentage is 172.
The load average on the system is well above 11. The atop output shows
24 CPU cores (which might actually be 12 if the CPUs have
hypherthreading). Even with all those CPUs, that load average is high
enough to be concerned.
I can see that the system has about 70GB of memory directly allocated to
various Java processes, leaving about 30GB for disk caching purposes.
Walter has noted that those same java processes have allocated over
200GB of virtual memory. If we subtract the 70GB of allocated heap,
this would tend to indicate that those processes, one of which is Solr,
are accessing about 130GB of data.
I have no idea how the memory situation works with HDFS, or how this
screenshot should look on a healthy system. Having 30GB of memory to
cache the 130GB of data opened by these Java processes might be enough,
or it might not. If this were a system NOT running HDFS, then I would
say that there isn't enough memory. Putting HDFS into this mix makes it
difficult for me to say anything useful, simply because I do not know
much about it. You should consult with an HDFS expert and ask them how
to make sure that actual disk accesses are rare -- you want as much of
the index data sitting in RAM on the Solr server as you can possibly get.
Addressing a message later in the thread: The concern with high virtual
memory is actually NOT swapping. It's effective use of disk caching
memory. Let's examine a hypothetical situation with a machine running
nothing but Solr, using a standard filesystem for data storage.
The "top" output in this hypothetical situation indicates that total
system memory is 128GB and there is no swap usage. The Solr process has
a RES memory size of 25GB, a SHR size of a few megabytes, and a VIRT
size of 1000GB. This tells me that their heap is approximately 25 GB,
and that Solr is accessing 975GB of index data. At that point, I know
that they have about 103GB of memory to cache nearly a terabyte of index
data. This is a situation where there is nowhere near enough memory for
good performance.
Thanks,
Shawn
---
This email has been checked for viruses by AVG.
http://www.avg.com