Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

Patrick O'Lone Mon, 09 Dec 2013 08:34:42 -0800

Unfortunately, in a test environment, this happens in version 4.4.0 of
Solr as well.


> I was trying to locate the release notes for 3.6.x it is too old, if I
> were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you
> since it is a minor release, locate the release notes and see if
> something that is affecting you got fixed, also, I would be thinking on
> moving on to 4.x which is quite stable and fast.
> 
> Like anything with Java and concurrency, it will just get better (and
> faster) with bigger numbers and concurrency frameworks becoming more and
> more reliable, standard and stable.
> 
> Regards,
> 
> Guido.
> 
> On 09/12/13 15:07, Patrick O'Lone wrote:
>> I have a new question about this issue - I create a filter queries of
>> the form:
>>
>> fq=start_time:[* TO NOW/5MINUTE]
>>
>> This is used to restrict the set of documents to only items that have a
>> start time within the next 5 minutes. Most of my indexes have millions
>> of documents with few documents that start sometime in the future.
>> Nearly all of my queries include this, would this cause every other
>> search thread to block until the filter query is re-cached every 5
>> minutes and if so, is there a better way to do it? Thanks for any
>> continued help with this issue!
>>
>>> We have a webapp running with a very high HEAP size (24GB) and we have
>>> no problems with it AFTER we enabled the new GC that is meant to replace
>>> sometime in the future the CMS GC, but you have to have Java 6 update
>>> "Some number I couldn't find but latest should cover" to be able to use:
>>>
>>> 1. Remove all GC options you have and...
>>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
>>>
>>> As a test of course, more information you can read on the following (and
>>> interesting) article, we also have Solr running with these options, no
>>> more pauses or HEAP size hitting the sky.
>>>
>>> Don't get bored reading the 1st (and small) introduction page of the
>>> article, page 2 and 3 will make lot of sense:
>>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
>>>
>>>
>>>
>>> HTH,
>>>
>>> Guido.
>>>
>>> On 26/11/13 21:59, Patrick O'Lone wrote:
>>>> We do perform a lot of sorting - on multiple fields in fact. We have
>>>> different kinds of Solr configurations - our news searches do little
>>>> with regards to faceting, but heavily sort. We provide classified ad
>>>> searches and that heavily uses faceting. I might try reducing the JVM
>>>> memory some and amount of perm generation as suggested earlier. It
>>>> feels
>>>> like a GC issue and loading the cache just happens to be the victim
>>>> of a
>>>> stop-the-world event at the worse possible time.
>>>>
>>>>> My gut instinct is that your heap size is way too high. Try
>>>>> decreasing it to like 5-10G. I know you say it uses more than that,
>>>>> but that just seems bizarre unless you're doing something like
>>>>> faceting and/or sorting on every field.
>>>>>
>>>>> -Michael
>>>>>
>>>>> -----Original Message-----
>>>>> From: Patrick O'Lone [mailto:pol...@townnews.com]
>>>>> Sent: Tuesday, November 26, 2013 11:59 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
>>>>>
>>>>> I've been tracking a problem in our Solr environment for awhile with
>>>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
>>>>> try and thought I might get some insight from some others on this
>>>>> list.
>>>>>
>>>>> The load on the server is normally anywhere between 1-3. It's an
>>>>> 8-core machine with 40GB of RAM. I have about 25GB of index data that
>>>>> is replicated to this server every 5 minutes. It's taking about 200
>>>>> connections per second and roughly every 5-10 minutes it will stall
>>>>> for about 30 seconds to a minute. The stall causes the load to go to
>>>>> as high as 90. It is all CPU bound in user space - all cores go to
>>>>> 99% utilization (spinlock?). When doing a thread dump, the following
>>>>> line is blocked in all running Tomcat threads:
>>>>>
>>>>> org.apache.lucene.search.FieldCacheImpl$Cache.get (
>>>>> FieldCacheImpl.java:230 )
>>>>>
>>>>> Looking the source code in 3.6.1, that is a function call to
>>>>> syncronized() which blocks all threads and causes the backlog. I've
>>>>> tried to correlate these events to the replication events - but even
>>>>> with replication disabled - this still happens. We run multiple data
>>>>> centers using Solr and I was comparing garbage collection processes
>>>>> between and noted that the old generation is collected very
>>>>> differently on this data center versus others. The old generation is
>>>>> collected as a massive collect event (several gigabytes worth) - the
>>>>> other data center is more saw toothed and collects only in 500MB-1GB
>>>>> at a time. Here's my parameters to java (the same in all
>>>>> environments):
>>>>>
>>>>> /usr/java/jre/bin/java \
>>>>> -verbose:gc \
>>>>> -XX:+PrintGCDetails \
>>>>> -server \
>>>>> -Dcom.sun.management.jmxremote \
>>>>> -XX:+UseConcMarkSweepGC \
>>>>> -XX:+UseParNewGC \
>>>>> -XX:+CMSIncrementalMode \
>>>>> -XX:+CMSParallelRemarkEnabled \
>>>>> -XX:+CMSIncrementalPacing \
>>>>> -XX:NewRatio=3 \
>>>>> -Xms30720M \
>>>>> -Xmx30720M \
>>>>> -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \
>>>>> -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \
>>>>> -Dcatalina.base=/usr/local/share/apache-tomcat \
>>>>> -Dcatalina.home=/usr/local/share/apache-tomcat \
>>>>> -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start
>>>>>
>>>>> I've tried a few GC option changes from this (been running this way
>>>>> for a couple of years now) - primarily removing CMS Incremental mode
>>>>> as we have 8 cores and remarks on the internet suggest that it is
>>>>> only for smaller SMP setups. Removing CMS did not fix anything.
>>>>>
>>>>> I've considered that the heap is way too large (30GB from 40GB) and
>>>>> may not leave enough memory for mmap operations (MMap appears to be
>>>>> used in the field cache). Based on active memory utilization in Java,
>>>>> seems like I might be able to reduce down to 22GB safely - but I'm
>>>>> not sure if that will help with the CPU issues.
>>>>>
>>>>> I think field cache is used for sorting and faceting. I've started to
>>>>> investigate facet.method, but from what I can tell, this doesn't seem
>>>>> to influence sorting at all - only facet queries. I've tried setting
>>>>> useFilterForSortQuery, and seems to require less field cache but
>>>>> doesn't address the stalling issues.
>>>>>
>>>>> Is there something I am overlooking? Perhaps the system is becoming
>>>>> oversubscribed in terms of resources? Thanks for any help that is
>>>>> offered.
>>>>>
>>>>> -- 
>>>>> Patrick O'Lone
>>>>> Director of Software Development
>>>>> TownNews.com
>>>>>
>>>>> E-mail ... pol...@townnews.com
>>>>> Phone .... 309-743-0809
>>>>> Fax ...... 309-743-0830
>>>>>
>>>>>
>>>
>>
> 
> 


-- 
Patrick O'Lone
Director of Software Development
TownNews.com

E-mail ... pol...@townnews.com
Phone .... 309-743-0809
Fax ...... 309-743-0830

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

Reply via email to