I initially thought this was the case as well. These are slave nodes that receive updates every 5-10 minutes. However, this issue happens even if replication is turned off and no update handler is provided at all.
I have confirmed against my data that simply querying the fq for a start_time in a range takes 11-13 seconds to actually populate the cache. If I make the fq not cache at all, my QTime raises by about 100ms, but does not have the stalling effect. A purely negative query also seems to have this effect, that is: fq=-start_time:[NOW/MINUTE TO *] But, I'm not sure if that is because it actually caches the negative query or if it discards it entirely. > Patrick, > > Are you getting these stalls following a commit? If so then the issue is > most likely fieldCache warming pauses. To stop your users from seeing > this pause you'll need to add static warming queries to your > solrconfig.xml to warm the fieldCache before it's registered . > > > On Mon, Dec 9, 2013 at 12:33 PM, Patrick O'Lone <pol...@townnews.com > <mailto:pol...@townnews.com>> wrote: > > Well, I want to include everything will start in the next 5 minute > interval and everything that came before. The query is more like: > > fq=start_time:[* TO NOW+5MINUTE/5MINUTE] > > so that it rounds to the nearest 5 minute interval on the right-hand > side. But, as soon as 1 second after that 5 minute window, everything > pauses wanting for filter cache (at least that's my working theory based > on observation). Is it possible to do something like: > > fq=start_time:[* TO NOW+1DAY/DAY]&q=start_time:[* TO NOW/MINUTE] > > where it would use the filter cache to narrow down by day resolution and > then filter as part of the standard query, or something like that? > > My thought is that this would still gain a benefit from a query cache, > but somewhat slower since it must remove results for things appearing > later in the day. > > > If you want a start time within the next 5 minutes, I think your > filter > > is not the good one. > > * will be replaced by the first date in your field > > > > Try : > > fq=start_time:[NOW TO NOW+5MINUTE] > > > > Franck Brisbart > > > > > > Le lundi 09 d�cembre 2013 � 09:07 -0600, Patrick O'Lone a �crit : > >> I have a new question about this issue - I create a filter queries of > >> the form: > >> > >> fq=start_time:[* TO NOW/5MINUTE] > >> > >> This is used to restrict the set of documents to only items that > have a > >> start time within the next 5 minutes. Most of my indexes have > millions > >> of documents with few documents that start sometime in the future. > >> Nearly all of my queries include this, would this cause every other > >> search thread to block until the filter query is re-cached every 5 > >> minutes and if so, is there a better way to do it? Thanks for any > >> continued help with this issue! > >> > >>> We have a webapp running with a very high HEAP size (24GB) and > we have > >>> no problems with it AFTER we enabled the new GC that is meant to > replace > >>> sometime in the future the CMS GC, but you have to have Java 6 > update > >>> "Some number I couldn't find but latest should cover" to be able > to use: > >>> > >>> 1. Remove all GC options you have and... > >>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/ > >>> > >>> As a test of course, more information you can read on the > following (and > >>> interesting) article, we also have Solr running with these > options, no > >>> more pauses or HEAP size hitting the sky. > >>> > >>> Don't get bored reading the 1st (and small) introduction page of the > >>> article, page 2 and 3 will make lot of sense: > >>> > > http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 > >>> > >>> > >>> HTH, > >>> > >>> Guido. > >>> > >>> On 26/11/13 21:59, Patrick O'Lone wrote: > >>>> We do perform a lot of sorting - on multiple fields in fact. We > have > >>>> different kinds of Solr configurations - our news searches do > little > >>>> with regards to faceting, but heavily sort. We provide > classified ad > >>>> searches and that heavily uses faceting. I might try reducing > the JVM > >>>> memory some and amount of perm generation as suggested earlier. > It feels > >>>> like a GC issue and loading the cache just happens to be the > victim of a > >>>> stop-the-world event at the worse possible time. > >>>> > >>>>> My gut instinct is that your heap size is way too high. Try > >>>>> decreasing it to like 5-10G. I know you say it uses more than > that, > >>>>> but that just seems bizarre unless you're doing something like > >>>>> faceting and/or sorting on every field. > >>>>> > >>>>> -Michael > >>>>> > >>>>> -----Original Message----- > >>>>> From: Patrick O'Lone [mailto:pol...@townnews.com > <mailto:pol...@townnews.com>] > >>>>> Sent: Tuesday, November 26, 2013 11:59 AM > >>>>> To: solr-user@lucene.apache.org > <mailto:solr-user@lucene.apache.org> > >>>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on > field cache > >>>>> > >>>>> I've been tracking a problem in our Solr environment for > awhile with > >>>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on > ideas to > >>>>> try and thought I might get some insight from some others on > this list. > >>>>> > >>>>> The load on the server is normally anywhere between 1-3. It's an > >>>>> 8-core machine with 40GB of RAM. I have about 25GB of index > data that > >>>>> is replicated to this server every 5 minutes. It's taking > about 200 > >>>>> connections per second and roughly every 5-10 minutes it will > stall > >>>>> for about 30 seconds to a minute. The stall causes the load to > go to > >>>>> as high as 90. It is all CPU bound in user space - all cores go to > >>>>> 99% utilization (spinlock?). When doing a thread dump, the > following > >>>>> line is blocked in all running Tomcat threads: > >>>>> > >>>>> org.apache.lucene.search.FieldCacheImpl$Cache.get ( > >>>>> FieldCacheImpl.java:230 ) > >>>>> > >>>>> Looking the source code in 3.6.1, that is a function call to > >>>>> syncronized() which blocks all threads and causes the backlog. > I've > >>>>> tried to correlate these events to the replication events - > but even > >>>>> with replication disabled - this still happens. We run > multiple data > >>>>> centers using Solr and I was comparing garbage collection > processes > >>>>> between and noted that the old generation is collected very > >>>>> differently on this data center versus others. The old > generation is > >>>>> collected as a massive collect event (several gigabytes worth) > - the > >>>>> other data center is more saw toothed and collects only in > 500MB-1GB > >>>>> at a time. Here's my parameters to java (the same in all > environments): > >>>>> > >>>>> /usr/java/jre/bin/java \ > >>>>> -verbose:gc \ > >>>>> -XX:+PrintGCDetails \ > >>>>> -server \ > >>>>> -Dcom.sun.management.jmxremote \ > >>>>> -XX:+UseConcMarkSweepGC \ > >>>>> -XX:+UseParNewGC \ > >>>>> -XX:+CMSIncrementalMode \ > >>>>> -XX:+CMSParallelRemarkEnabled \ > >>>>> -XX:+CMSIncrementalPacing \ > >>>>> -XX:NewRatio=3 \ > >>>>> -Xms30720M \ > >>>>> -Xmx30720M \ > >>>>> -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ > >>>>> -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ > >>>>> -Dcatalina.base=/usr/local/share/apache-tomcat \ > >>>>> -Dcatalina.home=/usr/local/share/apache-tomcat \ > >>>>> -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap > start > >>>>> > >>>>> I've tried a few GC option changes from this (been running > this way > >>>>> for a couple of years now) - primarily removing CMS > Incremental mode > >>>>> as we have 8 cores and remarks on the internet suggest that it is > >>>>> only for smaller SMP setups. Removing CMS did not fix anything. > >>>>> > >>>>> I've considered that the heap is way too large (30GB from > 40GB) and > >>>>> may not leave enough memory for mmap operations (MMap appears > to be > >>>>> used in the field cache). Based on active memory utilization > in Java, > >>>>> seems like I might be able to reduce down to 22GB safely - but I'm > >>>>> not sure if that will help with the CPU issues. > >>>>> > >>>>> I think field cache is used for sorting and faceting. I've > started to > >>>>> investigate facet.method, but from what I can tell, this > doesn't seem > >>>>> to influence sorting at all - only facet queries. I've tried > setting > >>>>> useFilterForSortQuery, and seems to require less field cache but > >>>>> doesn't address the stalling issues. > >>>>> > >>>>> Is there something I am overlooking? Perhaps the system is > becoming > >>>>> oversubscribed in terms of resources? Thanks for any help that is > >>>>> offered. > >>>>> > >>>>> -- > >>>>> Patrick O'Lone > >>>>> Director of Software Development > >>>>> TownNews.com > >>>>> > >>>>> E-mail ... pol...@townnews.com <mailto:pol...@townnews.com> > >>>>> Phone .... 309-743-0809 <tel:309-743-0809> > >>>>> Fax ...... 309-743-0830 <tel:309-743-0830> > >>>>> > >>>>> > >>>> > >>> > >>> > >> > >> > > > > > > > > > > > -- > Patrick O'Lone > Director of Software Development > TownNews.com > > E-mail ... pol...@townnews.com <mailto:pol...@townnews.com> > Phone .... 309-743-0809 <tel:309-743-0809> > Fax ...... 309-743-0830 <tel:309-743-0830> > > > > > -- > Joel Bernstein > Search Engineer at Heliosearch -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone .... 309-743-0809 Fax ...... 309-743-0830