We have a webapp running with a very high HEAP size (24GB) and we have
no problems with it AFTER we enabled the new GC that is meant to replace
sometime in the future the CMS GC, but you have to have Java 6 update
"Some number I couldn't find but latest should cover" to be able to use:
1. Remove all GC options you have and...
2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
As a test of course, more information you can read on the following (and
interesting) article, we also have Solr running with these options, no
more pauses or HEAP size hitting the sky.
Don't get bored reading the 1st (and small) introduction page of the
article, page 2 and 3 will make lot of sense:
http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
HTH,
Guido.
On 26/11/13 21:59, Patrick O'Lone wrote:
We do perform a lot of sorting - on multiple fields in fact. We have
different kinds of Solr configurations - our news searches do little
with regards to faceting, but heavily sort. We provide classified ad
searches and that heavily uses faceting. I might try reducing the JVM
memory some and amount of perm generation as suggested earlier. It feels
like a GC issue and loading the cache just happens to be the victim of a
stop-the-world event at the worse possible time.
My gut instinct is that your heap size is way too high. Try decreasing it to
like 5-10G. I know you say it uses more than that, but that just seems bizarre
unless you're doing something like faceting and/or sorting on every field.
-Michael
-----Original Message-----
From: Patrick O'Lone [mailto:pol...@townnews.com]
Sent: Tuesday, November 26, 2013 11:59 AM
To: solr-user@lucene.apache.org
Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
I've been tracking a problem in our Solr environment for awhile with periodic
stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I
might get some insight from some others on this list.
The load on the server is normally anywhere between 1-3. It's an 8-core machine
with 40GB of RAM. I have about 25GB of index data that is replicated to this
server every 5 minutes. It's taking about 200 connections per second and
roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The
stall causes the load to go to as high as 90. It is all CPU bound in user space
- all cores go to 99% utilization (spinlock?). When doing a thread dump, the
following line is blocked in all running Tomcat threads:
org.apache.lucene.search.FieldCacheImpl$Cache.get (
FieldCacheImpl.java:230 )
Looking the source code in 3.6.1, that is a function call to
syncronized() which blocks all threads and causes the backlog. I've tried to
correlate these events to the replication events - but even with replication
disabled - this still happens. We run multiple data centers using Solr and I
was comparing garbage collection processes between and noted that the old
generation is collected very differently on this data center versus others. The
old generation is collected as a massive collect event (several gigabytes
worth) - the other data center is more saw toothed and collects only in
500MB-1GB at a time. Here's my parameters to java (the same in all
environments):
/usr/java/jre/bin/java \
-verbose:gc \
-XX:+PrintGCDetails \
-server \
-Dcom.sun.management.jmxremote \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:+CMSIncrementalMode \
-XX:+CMSParallelRemarkEnabled \
-XX:+CMSIncrementalPacing \
-XX:NewRatio=3 \
-Xms30720M \
-Xmx30720M \
-Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath
/usr/local/share/apache-tomcat/bin/bootstrap.jar \
-Dcatalina.base=/usr/local/share/apache-tomcat \
-Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \
org.apache.catalina.startup.Bootstrap start
I've tried a few GC option changes from this (been running this way for a
couple of years now) - primarily removing CMS Incremental mode as we have 8
cores and remarks on the internet suggest that it is only for smaller SMP
setups. Removing CMS did not fix anything.
I've considered that the heap is way too large (30GB from 40GB) and may not
leave enough memory for mmap operations (MMap appears to be used in the field
cache). Based on active memory utilization in Java, seems like I might be able
to reduce down to 22GB safely - but I'm not sure if that will help with the CPU
issues.
I think field cache is used for sorting and faceting. I've started to
investigate facet.method, but from what I can tell, this doesn't seem to
influence sorting at all - only facet queries. I've tried setting
useFilterForSortQuery, and seems to require less field cache but doesn't
address the stalling issues.
Is there something I am overlooking? Perhaps the system is becoming
oversubscribed in terms of resources? Thanks for any help that is offered.
--
Patrick O'Lone
Director of Software Development
TownNews.com
E-mail ... pol...@townnews.com
Phone .... 309-743-0809
Fax ...... 309-743-0830