Yeah, I tried G1, but it did not help - I don't think it is a garbage collection issue. I've made various changes to iCMS as well and the issue ALWAYS happens - no matter what I do. If I'm taking heavy traffic (200 requests per second) - as soon as I hit a 5 minute mark - the world stops - garbage collection would be less predictable. Nearly all of my requests have this 5 minute windowing behavior on time though, which is why I have it as a strong suspect now. If it blocks on that - even for a couple of seconds, my traffic backlog will be 600-800 requests.
> Did you add the Garbage collection JVM options I suggested you? > > -XX:+UseG1GC -XX:MaxGCPauseMillis=50 > > Guido. > > On 09/12/13 16:33, Patrick O'Lone wrote: >> Unfortunately, in a test environment, this happens in version 4.4.0 of >> Solr as well. >> >>> I was trying to locate the release notes for 3.6.x it is too old, if I >>> were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you >>> since it is a minor release, locate the release notes and see if >>> something that is affecting you got fixed, also, I would be thinking on >>> moving on to 4.x which is quite stable and fast. >>> >>> Like anything with Java and concurrency, it will just get better (and >>> faster) with bigger numbers and concurrency frameworks becoming more and >>> more reliable, standard and stable. >>> >>> Regards, >>> >>> Guido. >>> >>> On 09/12/13 15:07, Patrick O'Lone wrote: >>>> I have a new question about this issue - I create a filter queries of >>>> the form: >>>> >>>> fq=start_time:[* TO NOW/5MINUTE] >>>> >>>> This is used to restrict the set of documents to only items that have a >>>> start time within the next 5 minutes. Most of my indexes have millions >>>> of documents with few documents that start sometime in the future. >>>> Nearly all of my queries include this, would this cause every other >>>> search thread to block until the filter query is re-cached every 5 >>>> minutes and if so, is there a better way to do it? Thanks for any >>>> continued help with this issue! >>>> >>>>> We have a webapp running with a very high HEAP size (24GB) and we have >>>>> no problems with it AFTER we enabled the new GC that is meant to >>>>> replace >>>>> sometime in the future the CMS GC, but you have to have Java 6 update >>>>> "Some number I couldn't find but latest should cover" to be able to >>>>> use: >>>>> >>>>> 1. Remove all GC options you have and... >>>>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/ >>>>> >>>>> As a test of course, more information you can read on the following >>>>> (and >>>>> interesting) article, we also have Solr running with these options, no >>>>> more pauses or HEAP size hitting the sky. >>>>> >>>>> Don't get bored reading the 1st (and small) introduction page of the >>>>> article, page 2 and 3 will make lot of sense: >>>>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 >>>>> >>>>> >>>>> >>>>> >>>>> HTH, >>>>> >>>>> Guido. >>>>> >>>>> On 26/11/13 21:59, Patrick O'Lone wrote: >>>>>> We do perform a lot of sorting - on multiple fields in fact. We have >>>>>> different kinds of Solr configurations - our news searches do little >>>>>> with regards to faceting, but heavily sort. We provide classified ad >>>>>> searches and that heavily uses faceting. I might try reducing the JVM >>>>>> memory some and amount of perm generation as suggested earlier. It >>>>>> feels >>>>>> like a GC issue and loading the cache just happens to be the victim >>>>>> of a >>>>>> stop-the-world event at the worse possible time. >>>>>> >>>>>>> My gut instinct is that your heap size is way too high. Try >>>>>>> decreasing it to like 5-10G. I know you say it uses more than that, >>>>>>> but that just seems bizarre unless you're doing something like >>>>>>> faceting and/or sorting on every field. >>>>>>> >>>>>>> -Michael >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Patrick O'Lone [mailto:pol...@townnews.com] >>>>>>> Sent: Tuesday, November 26, 2013 11:59 AM >>>>>>> To: solr-user@lucene.apache.org >>>>>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field >>>>>>> cache >>>>>>> >>>>>>> I've been tracking a problem in our Solr environment for awhile with >>>>>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to >>>>>>> try and thought I might get some insight from some others on this >>>>>>> list. >>>>>>> >>>>>>> The load on the server is normally anywhere between 1-3. It's an >>>>>>> 8-core machine with 40GB of RAM. I have about 25GB of index data >>>>>>> that >>>>>>> is replicated to this server every 5 minutes. It's taking about 200 >>>>>>> connections per second and roughly every 5-10 minutes it will stall >>>>>>> for about 30 seconds to a minute. The stall causes the load to go to >>>>>>> as high as 90. It is all CPU bound in user space - all cores go to >>>>>>> 99% utilization (spinlock?). When doing a thread dump, the following >>>>>>> line is blocked in all running Tomcat threads: >>>>>>> >>>>>>> org.apache.lucene.search.FieldCacheImpl$Cache.get ( >>>>>>> FieldCacheImpl.java:230 ) >>>>>>> >>>>>>> Looking the source code in 3.6.1, that is a function call to >>>>>>> syncronized() which blocks all threads and causes the backlog. I've >>>>>>> tried to correlate these events to the replication events - but even >>>>>>> with replication disabled - this still happens. We run multiple data >>>>>>> centers using Solr and I was comparing garbage collection processes >>>>>>> between and noted that the old generation is collected very >>>>>>> differently on this data center versus others. The old generation is >>>>>>> collected as a massive collect event (several gigabytes worth) - the >>>>>>> other data center is more saw toothed and collects only in 500MB-1GB >>>>>>> at a time. Here's my parameters to java (the same in all >>>>>>> environments): >>>>>>> >>>>>>> /usr/java/jre/bin/java \ >>>>>>> -verbose:gc \ >>>>>>> -XX:+PrintGCDetails \ >>>>>>> -server \ >>>>>>> -Dcom.sun.management.jmxremote \ >>>>>>> -XX:+UseConcMarkSweepGC \ >>>>>>> -XX:+UseParNewGC \ >>>>>>> -XX:+CMSIncrementalMode \ >>>>>>> -XX:+CMSParallelRemarkEnabled \ >>>>>>> -XX:+CMSIncrementalPacing \ >>>>>>> -XX:NewRatio=3 \ >>>>>>> -Xms30720M \ >>>>>>> -Xmx30720M \ >>>>>>> -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ >>>>>>> -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ >>>>>>> -Dcatalina.base=/usr/local/share/apache-tomcat \ >>>>>>> -Dcatalina.home=/usr/local/share/apache-tomcat \ >>>>>>> -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start >>>>>>> >>>>>>> I've tried a few GC option changes from this (been running this way >>>>>>> for a couple of years now) - primarily removing CMS Incremental mode >>>>>>> as we have 8 cores and remarks on the internet suggest that it is >>>>>>> only for smaller SMP setups. Removing CMS did not fix anything. >>>>>>> >>>>>>> I've considered that the heap is way too large (30GB from 40GB) and >>>>>>> may not leave enough memory for mmap operations (MMap appears to be >>>>>>> used in the field cache). Based on active memory utilization in >>>>>>> Java, >>>>>>> seems like I might be able to reduce down to 22GB safely - but I'm >>>>>>> not sure if that will help with the CPU issues. >>>>>>> >>>>>>> I think field cache is used for sorting and faceting. I've >>>>>>> started to >>>>>>> investigate facet.method, but from what I can tell, this doesn't >>>>>>> seem >>>>>>> to influence sorting at all - only facet queries. I've tried setting >>>>>>> useFilterForSortQuery, and seems to require less field cache but >>>>>>> doesn't address the stalling issues. >>>>>>> >>>>>>> Is there something I am overlooking? Perhaps the system is becoming >>>>>>> oversubscribed in terms of resources? Thanks for any help that is >>>>>>> offered. >>>>>>> >>>>>>> -- >>>>>>> Patrick O'Lone >>>>>>> Director of Software Development >>>>>>> TownNews.com >>>>>>> >>>>>>> E-mail ... pol...@townnews.com >>>>>>> Phone .... 309-743-0809 >>>>>>> Fax ...... 309-743-0830 >>>>>>> >>>>>>> >>> >> > > -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone .... 309-743-0809 Fax ...... 309-743-0830