Unfortunately, in a test environment, this happens in version 4.4.0 of Solr as well.
> I was trying to locate the release notes for 3.6.x it is too old, if I > were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you > since it is a minor release, locate the release notes and see if > something that is affecting you got fixed, also, I would be thinking on > moving on to 4.x which is quite stable and fast. > > Like anything with Java and concurrency, it will just get better (and > faster) with bigger numbers and concurrency frameworks becoming more and > more reliable, standard and stable. > > Regards, > > Guido. > > On 09/12/13 15:07, Patrick O'Lone wrote: >> I have a new question about this issue - I create a filter queries of >> the form: >> >> fq=start_time:[* TO NOW/5MINUTE] >> >> This is used to restrict the set of documents to only items that have a >> start time within the next 5 minutes. Most of my indexes have millions >> of documents with few documents that start sometime in the future. >> Nearly all of my queries include this, would this cause every other >> search thread to block until the filter query is re-cached every 5 >> minutes and if so, is there a better way to do it? Thanks for any >> continued help with this issue! >> >>> We have a webapp running with a very high HEAP size (24GB) and we have >>> no problems with it AFTER we enabled the new GC that is meant to replace >>> sometime in the future the CMS GC, but you have to have Java 6 update >>> "Some number I couldn't find but latest should cover" to be able to use: >>> >>> 1. Remove all GC options you have and... >>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/ >>> >>> As a test of course, more information you can read on the following (and >>> interesting) article, we also have Solr running with these options, no >>> more pauses or HEAP size hitting the sky. >>> >>> Don't get bored reading the 1st (and small) introduction page of the >>> article, page 2 and 3 will make lot of sense: >>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 >>> >>> >>> >>> HTH, >>> >>> Guido. >>> >>> On 26/11/13 21:59, Patrick O'Lone wrote: >>>> We do perform a lot of sorting - on multiple fields in fact. We have >>>> different kinds of Solr configurations - our news searches do little >>>> with regards to faceting, but heavily sort. We provide classified ad >>>> searches and that heavily uses faceting. I might try reducing the JVM >>>> memory some and amount of perm generation as suggested earlier. It >>>> feels >>>> like a GC issue and loading the cache just happens to be the victim >>>> of a >>>> stop-the-world event at the worse possible time. >>>> >>>>> My gut instinct is that your heap size is way too high. Try >>>>> decreasing it to like 5-10G. I know you say it uses more than that, >>>>> but that just seems bizarre unless you're doing something like >>>>> faceting and/or sorting on every field. >>>>> >>>>> -Michael >>>>> >>>>> -----Original Message----- >>>>> From: Patrick O'Lone [mailto:pol...@townnews.com] >>>>> Sent: Tuesday, November 26, 2013 11:59 AM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache >>>>> >>>>> I've been tracking a problem in our Solr environment for awhile with >>>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to >>>>> try and thought I might get some insight from some others on this >>>>> list. >>>>> >>>>> The load on the server is normally anywhere between 1-3. It's an >>>>> 8-core machine with 40GB of RAM. I have about 25GB of index data that >>>>> is replicated to this server every 5 minutes. It's taking about 200 >>>>> connections per second and roughly every 5-10 minutes it will stall >>>>> for about 30 seconds to a minute. The stall causes the load to go to >>>>> as high as 90. It is all CPU bound in user space - all cores go to >>>>> 99% utilization (spinlock?). When doing a thread dump, the following >>>>> line is blocked in all running Tomcat threads: >>>>> >>>>> org.apache.lucene.search.FieldCacheImpl$Cache.get ( >>>>> FieldCacheImpl.java:230 ) >>>>> >>>>> Looking the source code in 3.6.1, that is a function call to >>>>> syncronized() which blocks all threads and causes the backlog. I've >>>>> tried to correlate these events to the replication events - but even >>>>> with replication disabled - this still happens. We run multiple data >>>>> centers using Solr and I was comparing garbage collection processes >>>>> between and noted that the old generation is collected very >>>>> differently on this data center versus others. The old generation is >>>>> collected as a massive collect event (several gigabytes worth) - the >>>>> other data center is more saw toothed and collects only in 500MB-1GB >>>>> at a time. Here's my parameters to java (the same in all >>>>> environments): >>>>> >>>>> /usr/java/jre/bin/java \ >>>>> -verbose:gc \ >>>>> -XX:+PrintGCDetails \ >>>>> -server \ >>>>> -Dcom.sun.management.jmxremote \ >>>>> -XX:+UseConcMarkSweepGC \ >>>>> -XX:+UseParNewGC \ >>>>> -XX:+CMSIncrementalMode \ >>>>> -XX:+CMSParallelRemarkEnabled \ >>>>> -XX:+CMSIncrementalPacing \ >>>>> -XX:NewRatio=3 \ >>>>> -Xms30720M \ >>>>> -Xmx30720M \ >>>>> -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ >>>>> -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ >>>>> -Dcatalina.base=/usr/local/share/apache-tomcat \ >>>>> -Dcatalina.home=/usr/local/share/apache-tomcat \ >>>>> -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start >>>>> >>>>> I've tried a few GC option changes from this (been running this way >>>>> for a couple of years now) - primarily removing CMS Incremental mode >>>>> as we have 8 cores and remarks on the internet suggest that it is >>>>> only for smaller SMP setups. Removing CMS did not fix anything. >>>>> >>>>> I've considered that the heap is way too large (30GB from 40GB) and >>>>> may not leave enough memory for mmap operations (MMap appears to be >>>>> used in the field cache). Based on active memory utilization in Java, >>>>> seems like I might be able to reduce down to 22GB safely - but I'm >>>>> not sure if that will help with the CPU issues. >>>>> >>>>> I think field cache is used for sorting and faceting. I've started to >>>>> investigate facet.method, but from what I can tell, this doesn't seem >>>>> to influence sorting at all - only facet queries. I've tried setting >>>>> useFilterForSortQuery, and seems to require less field cache but >>>>> doesn't address the stalling issues. >>>>> >>>>> Is there something I am overlooking? Perhaps the system is becoming >>>>> oversubscribed in terms of resources? Thanks for any help that is >>>>> offered. >>>>> >>>>> -- >>>>> Patrick O'Lone >>>>> Director of Software Development >>>>> TownNews.com >>>>> >>>>> E-mail ... pol...@townnews.com >>>>> Phone .... 309-743-0809 >>>>> Fax ...... 309-743-0830 >>>>> >>>>> >>> >> > > -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone .... 309-743-0809 Fax ...... 309-743-0830