On 3/26/2014 10:26 PM, Darrell Burgan wrote:
> Okay well it didn't take long for the swapping to start happening on one of 
> our nodes.  Here is a screen shot of the Solr console:
> 
> https://s3-us-west-2.amazonaws.com/panswers-darrell/solr.png
> 
> And here is a shot of top, with processes sorted by VIRT:
> 
> https://s3-us-west-2.amazonaws.com/panswers-darrell/top.png
> 
> As shown, we have used up more than 25% of the swap space, over 1GB, even 
> though there is 16GB of OS RAM available, and the Solr JVM has been allocated 
> only 10GB. Further, we're only consuming 1.5/4GB of the 10GB of JVM heap.
> 
> Top shows that the Solr process 21582 is using 2.4GB resident but has a 
> virtual size of 82.4GB. Presumably that virtual size is due to the memory 
> mapped file. The other Java process 27619 is Zookeeper.
> 
> So my question remains - why did we use any swap space at all? Doesn't seem 
> like we're experiencing memory pressure at the moment ... I'm confused.  :-)

The virtual memory value is indeed that large because of the mmapped file.

There is definitely something wrong here.  I don't know whether it's
Java, RHEL, or something strange with the S3 virtual machine, possibly a
bad interaction with the older kernel.  With your -Xmx value, Java
should never use more than about 10.5 GB of physical memory, and the top
output indicates that it's only using 2.4GB of memory.  13GB is used by
the OS disk cache.

You might notice that I'm not mentioning Solr in the list of possible
problems.  This is because an unmodified Solr install only utilizes the
Java heap, so it's Java that is in charge of allocating memory from the
operating system.

Here is a script that will tell you what's using swap and how much.
This will let you be absolutely sure about whether or not Java is the
problem child:

http://stackoverflow.com/a/7180078/2665648

There are instructions in the comments of the script for sorting the output.

The only major thing I saw in your JVM config (aside from perhaps
reducing the max heap) that I would change is the garbage collector
tuning.  I'm the original author mentioned in this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems

----------------

Here's a screenshot from my dev solr server, where you can see that
there is zero swap usage:

https://www.dropbox.com/s/mftgi3q2hn7w9qp/solr-centos6-top.png

This is a baremetal server with 16GB of RAM, running CentOS 6.5 and a
pre-release snapshot of Solr 4.7.1.  With an Intel Xeon X3430, I'm
pretty sure the processor architecture is NUMA, but the motherboard only
has one CPU slot, so it's only got one NUMA node.  As you can see by my
virtual memory value, I have a lot more index data on this machine than
you have on yours.  My heap is 7GB.  The other three java processes that
you can see running are in-house software related to Solr.

Performance is fairly slow with that much index and so little disk
cache, but it's a dev server.  The production environment has plenty of
RAM to cache the entire index.

Thanks,
Shawn

Reply via email to