Solr 3.6.1 stalling with high CPU and blocking on field cache
I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \ -verbose:gc \ -XX:+PrintGCDetails \ -server \ -Dcom.sun.management.jmxremote \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+CMSIncrementalMode \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSIncrementalPacing \ -XX:NewRatio=3 \ -Xms30720M \ -Xmx30720M \ -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ -Dcatalina.base=/usr/local/share/apache-tomcat \ -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start I've tried a few GC option changes from this (been running this way for a couple of years now) - primarily removing CMS Incremental mode as we have 8 cores and remarks on the internet suggest that it is only for smaller SMP setups. Removing CMS did not fix anything. I've considered that the heap is way too large (30GB from 40GB) and may not leave enough memory for mmap operations (MMap appears to be used in the field cache). Based on active memory utilization in Java, seems like I might be able to reduce down to 22GB safely - but I'm not sure if that will help with the CPU issues. I think field cache is used for sorting and faceting. I've started to investigate facet.method, but from what I can tell, this doesn't seem to influence sorting at all - only facet queries. I've tried setting useFilterForSortQuery, and seems to require less field cache but doesn't address the stalling issues. Is there something I am overlooking? Perhaps the system is becoming oversubscribed in terms of resources? Thanks for any help that is offered. -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
We do perform a lot of sorting - on multiple fields in fact. We have different kinds of Solr configurations - our news searches do little with regards to faceting, but heavily sort. We provide classified ad searches and that heavily uses faceting. I might try reducing the JVM memory some and amount of perm generation as suggested earlier. It feels like a GC issue and loading the cache just happens to be the victim of a stop-the-world event at the worse possible time. > My gut instinct is that your heap size is way too high. Try decreasing it to > like 5-10G. I know you say it uses more than that, but that just seems > bizarre unless you're doing something like faceting and/or sorting on every > field. > > -Michael > > -Original Message- > From: Patrick O'Lone [mailto:pol...@townnews.com] > Sent: Tuesday, November 26, 2013 11:59 AM > To: solr-user@lucene.apache.org > Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache > > I've been tracking a problem in our Solr environment for awhile with periodic > stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I > might get some insight from some others on this list. > > The load on the server is normally anywhere between 1-3. It's an 8-core > machine with 40GB of RAM. I have about 25GB of index data that is replicated > to this server every 5 minutes. It's taking about 200 connections per second > and roughly every 5-10 minutes it will stall for about 30 seconds to a > minute. The stall causes the load to go to as high as 90. It is all CPU bound > in user space - all cores go to 99% utilization (spinlock?). When doing a > thread dump, the following line is blocked in all running Tomcat threads: > > org.apache.lucene.search.FieldCacheImpl$Cache.get ( > FieldCacheImpl.java:230 ) > > Looking the source code in 3.6.1, that is a function call to > syncronized() which blocks all threads and causes the backlog. I've tried to > correlate these events to the replication events - but even with replication > disabled - this still happens. We run multiple data centers using Solr and I > was comparing garbage collection processes between and noted that the old > generation is collected very differently on this data center versus others. > The old generation is collected as a massive collect event (several gigabytes > worth) - the other data center is more saw toothed and collects only in > 500MB-1GB at a time. Here's my parameters to java (the same in all > environments): > > /usr/java/jre/bin/java \ > -verbose:gc \ > -XX:+PrintGCDetails \ > -server \ > -Dcom.sun.management.jmxremote \ > -XX:+UseConcMarkSweepGC \ > -XX:+UseParNewGC \ > -XX:+CMSIncrementalMode \ > -XX:+CMSParallelRemarkEnabled \ > -XX:+CMSIncrementalPacing \ > -XX:NewRatio=3 \ > -Xms30720M \ > -Xmx30720M \ > -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath > /usr/local/share/apache-tomcat/bin/bootstrap.jar \ > -Dcatalina.base=/usr/local/share/apache-tomcat \ > -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ > org.apache.catalina.startup.Bootstrap start > > I've tried a few GC option changes from this (been running this way for a > couple of years now) - primarily removing CMS Incremental mode as we have 8 > cores and remarks on the internet suggest that it is only for smaller SMP > setups. Removing CMS did not fix anything. > > I've considered that the heap is way too large (30GB from 40GB) and may not > leave enough memory for mmap operations (MMap appears to be used in the field > cache). Based on active memory utilization in Java, seems like I might be > able to reduce down to 22GB safely - but I'm not sure if that will help with > the CPU issues. > > I think field cache is used for sorting and faceting. I've started to > investigate facet.method, but from what I can tell, this doesn't seem to > influence sorting at all - only facet queries. I've tried setting > useFilterForSortQuery, and seems to require less field cache but doesn't > address the stalling issues. > > Is there something I am overlooking? Perhaps the system is becoming > oversubscribed in terms of resources? Thanks for any help that is offered. > > -- > Patrick O'Lone > Director of Software Development > TownNews.com > > E-mail ... pol...@townnews.com > Phone 309-743-0809 > Fax .. 309-743-0830 > > -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830
facet.method=fcs vs facet.method=fc on solr slaves
Is there any advantage on a Solr slave to receive queries using facet.method=fcs instead of the default of facet.method=fc? Most of the segment files are unchanged between replication events - but I wasn't sure if replication would cause the unchanged segment field caches to be lost anyway. -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830
Re: facet.method=fcs vs facet.method=fc on solr slaves
So does it make the most sense then to force, by default, facet.method=fcs on slave nodes that receive updates every 5 minutes but with large segments that don't change every update? Right now, everything I have configured uses facet.method=fc since we don't declare it at all. Randomly, after replication, I have several threads that will hang on reading data from field cache and I'm trying to think of things I can do to mitigate that. Thanks for the info. > Hello Patrick, > > Replication flushes UnInvertedField cache that impacts fc, but doesn't > harm Lucene's FieldCache which is for fcs. You can check how much time > in millis is spend on UnInvertedField cache regeneration in INFO logs like > "UnInverted multi-valued field ,time=### ..." > > > On Thu, Dec 5, 2013 at 12:15 AM, Patrick O'Lone <mailto:pol...@townnews.com>> wrote: > > Is there any advantage on a Solr slave to receive queries using > facet.method=fcs instead of the default of facet.method=fc? Most of the > segment files are unchanged between replication events - but I wasn't > sure if replication would cause the unchanged segment field caches to be > lost anyway. > -- > Patrick O'Lone > Director of Software Development > TownNews.com > > E-mail ... pol...@townnews.com <mailto:pol...@townnews.com> > Phone 309-743-0809 > Fax .. 309-743-0830 > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mailto:mkhlud...@griddynamics.com> -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
I have a new question about this issue - I create a filter queries of the form: fq=start_time:[* TO NOW/5MINUTE] This is used to restrict the set of documents to only items that have a start time within the next 5 minutes. Most of my indexes have millions of documents with few documents that start sometime in the future. Nearly all of my queries include this, would this cause every other search thread to block until the filter query is re-cached every 5 minutes and if so, is there a better way to do it? Thanks for any continued help with this issue! > We have a webapp running with a very high HEAP size (24GB) and we have > no problems with it AFTER we enabled the new GC that is meant to replace > sometime in the future the CMS GC, but you have to have Java 6 update > "Some number I couldn't find but latest should cover" to be able to use: > > 1. Remove all GC options you have and... > 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/ > > As a test of course, more information you can read on the following (and > interesting) article, we also have Solr running with these options, no > more pauses or HEAP size hitting the sky. > > Don't get bored reading the 1st (and small) introduction page of the > article, page 2 and 3 will make lot of sense: > http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 > > > HTH, > > Guido. > > On 26/11/13 21:59, Patrick O'Lone wrote: >> We do perform a lot of sorting - on multiple fields in fact. We have >> different kinds of Solr configurations - our news searches do little >> with regards to faceting, but heavily sort. We provide classified ad >> searches and that heavily uses faceting. I might try reducing the JVM >> memory some and amount of perm generation as suggested earlier. It feels >> like a GC issue and loading the cache just happens to be the victim of a >> stop-the-world event at the worse possible time. >> >>> My gut instinct is that your heap size is way too high. Try >>> decreasing it to like 5-10G. I know you say it uses more than that, >>> but that just seems bizarre unless you're doing something like >>> faceting and/or sorting on every field. >>> >>> -Michael >>> >>> -Original Message- >>> From: Patrick O'Lone [mailto:pol...@townnews.com] >>> Sent: Tuesday, November 26, 2013 11:59 AM >>> To: solr-user@lucene.apache.org >>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache >>> >>> I've been tracking a problem in our Solr environment for awhile with >>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to >>> try and thought I might get some insight from some others on this list. >>> >>> The load on the server is normally anywhere between 1-3. It's an >>> 8-core machine with 40GB of RAM. I have about 25GB of index data that >>> is replicated to this server every 5 minutes. It's taking about 200 >>> connections per second and roughly every 5-10 minutes it will stall >>> for about 30 seconds to a minute. The stall causes the load to go to >>> as high as 90. It is all CPU bound in user space - all cores go to >>> 99% utilization (spinlock?). When doing a thread dump, the following >>> line is blocked in all running Tomcat threads: >>> >>> org.apache.lucene.search.FieldCacheImpl$Cache.get ( >>> FieldCacheImpl.java:230 ) >>> >>> Looking the source code in 3.6.1, that is a function call to >>> syncronized() which blocks all threads and causes the backlog. I've >>> tried to correlate these events to the replication events - but even >>> with replication disabled - this still happens. We run multiple data >>> centers using Solr and I was comparing garbage collection processes >>> between and noted that the old generation is collected very >>> differently on this data center versus others. The old generation is >>> collected as a massive collect event (several gigabytes worth) - the >>> other data center is more saw toothed and collects only in 500MB-1GB >>> at a time. Here's my parameters to java (the same in all environments): >>> >>> /usr/java/jre/bin/java \ >>> -verbose:gc \ >>> -XX:+PrintGCDetails \ >>> -server \ >>> -Dcom.sun.management.jmxremote \ >>> -XX:+UseConcMarkSweepGC \ >>> -XX:+UseParNewGC \ >>> -XX:+CMSIncrementalMode \ >>> -XX:+CMSParallelRemarkEnabled \ >>> -XX:+CMSIncrementalPacing \ >>> -XX:NewRatio=3 \ >>> -Xm
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
Unfortunately, in a test environment, this happens in version 4.4.0 of Solr as well. > I was trying to locate the release notes for 3.6.x it is too old, if I > were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you > since it is a minor release, locate the release notes and see if > something that is affecting you got fixed, also, I would be thinking on > moving on to 4.x which is quite stable and fast. > > Like anything with Java and concurrency, it will just get better (and > faster) with bigger numbers and concurrency frameworks becoming more and > more reliable, standard and stable. > > Regards, > > Guido. > > On 09/12/13 15:07, Patrick O'Lone wrote: >> I have a new question about this issue - I create a filter queries of >> the form: >> >> fq=start_time:[* TO NOW/5MINUTE] >> >> This is used to restrict the set of documents to only items that have a >> start time within the next 5 minutes. Most of my indexes have millions >> of documents with few documents that start sometime in the future. >> Nearly all of my queries include this, would this cause every other >> search thread to block until the filter query is re-cached every 5 >> minutes and if so, is there a better way to do it? Thanks for any >> continued help with this issue! >> >>> We have a webapp running with a very high HEAP size (24GB) and we have >>> no problems with it AFTER we enabled the new GC that is meant to replace >>> sometime in the future the CMS GC, but you have to have Java 6 update >>> "Some number I couldn't find but latest should cover" to be able to use: >>> >>> 1. Remove all GC options you have and... >>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/ >>> >>> As a test of course, more information you can read on the following (and >>> interesting) article, we also have Solr running with these options, no >>> more pauses or HEAP size hitting the sky. >>> >>> Don't get bored reading the 1st (and small) introduction page of the >>> article, page 2 and 3 will make lot of sense: >>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 >>> >>> >>> >>> HTH, >>> >>> Guido. >>> >>> On 26/11/13 21:59, Patrick O'Lone wrote: >>>> We do perform a lot of sorting - on multiple fields in fact. We have >>>> different kinds of Solr configurations - our news searches do little >>>> with regards to faceting, but heavily sort. We provide classified ad >>>> searches and that heavily uses faceting. I might try reducing the JVM >>>> memory some and amount of perm generation as suggested earlier. It >>>> feels >>>> like a GC issue and loading the cache just happens to be the victim >>>> of a >>>> stop-the-world event at the worse possible time. >>>> >>>>> My gut instinct is that your heap size is way too high. Try >>>>> decreasing it to like 5-10G. I know you say it uses more than that, >>>>> but that just seems bizarre unless you're doing something like >>>>> faceting and/or sorting on every field. >>>>> >>>>> -Michael >>>>> >>>>> -Original Message- >>>>> From: Patrick O'Lone [mailto:pol...@townnews.com] >>>>> Sent: Tuesday, November 26, 2013 11:59 AM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache >>>>> >>>>> I've been tracking a problem in our Solr environment for awhile with >>>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to >>>>> try and thought I might get some insight from some others on this >>>>> list. >>>>> >>>>> The load on the server is normally anywhere between 1-3. It's an >>>>> 8-core machine with 40GB of RAM. I have about 25GB of index data that >>>>> is replicated to this server every 5 minutes. It's taking about 200 >>>>> connections per second and roughly every 5-10 minutes it will stall >>>>> for about 30 seconds to a minute. The stall causes the load to go to >>>>> as high as 90. It is all CPU bound in user space - all cores go to >>>>> 99% utilization (spinlock?). When doing a thread dump, the following >>>>> line is blocked in all running Tomcat threads: >>&
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
Yeah, I tried G1, but it did not help - I don't think it is a garbage collection issue. I've made various changes to iCMS as well and the issue ALWAYS happens - no matter what I do. If I'm taking heavy traffic (200 requests per second) - as soon as I hit a 5 minute mark - the world stops - garbage collection would be less predictable. Nearly all of my requests have this 5 minute windowing behavior on time though, which is why I have it as a strong suspect now. If it blocks on that - even for a couple of seconds, my traffic backlog will be 600-800 requests. > Did you add the Garbage collection JVM options I suggested you? > > -XX:+UseG1GC -XX:MaxGCPauseMillis=50 > > Guido. > > On 09/12/13 16:33, Patrick O'Lone wrote: >> Unfortunately, in a test environment, this happens in version 4.4.0 of >> Solr as well. >> >>> I was trying to locate the release notes for 3.6.x it is too old, if I >>> were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you >>> since it is a minor release, locate the release notes and see if >>> something that is affecting you got fixed, also, I would be thinking on >>> moving on to 4.x which is quite stable and fast. >>> >>> Like anything with Java and concurrency, it will just get better (and >>> faster) with bigger numbers and concurrency frameworks becoming more and >>> more reliable, standard and stable. >>> >>> Regards, >>> >>> Guido. >>> >>> On 09/12/13 15:07, Patrick O'Lone wrote: >>>> I have a new question about this issue - I create a filter queries of >>>> the form: >>>> >>>> fq=start_time:[* TO NOW/5MINUTE] >>>> >>>> This is used to restrict the set of documents to only items that have a >>>> start time within the next 5 minutes. Most of my indexes have millions >>>> of documents with few documents that start sometime in the future. >>>> Nearly all of my queries include this, would this cause every other >>>> search thread to block until the filter query is re-cached every 5 >>>> minutes and if so, is there a better way to do it? Thanks for any >>>> continued help with this issue! >>>> >>>>> We have a webapp running with a very high HEAP size (24GB) and we have >>>>> no problems with it AFTER we enabled the new GC that is meant to >>>>> replace >>>>> sometime in the future the CMS GC, but you have to have Java 6 update >>>>> "Some number I couldn't find but latest should cover" to be able to >>>>> use: >>>>> >>>>> 1. Remove all GC options you have and... >>>>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/ >>>>> >>>>> As a test of course, more information you can read on the following >>>>> (and >>>>> interesting) article, we also have Solr running with these options, no >>>>> more pauses or HEAP size hitting the sky. >>>>> >>>>> Don't get bored reading the 1st (and small) introduction page of the >>>>> article, page 2 and 3 will make lot of sense: >>>>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 >>>>> >>>>> >>>>> >>>>> >>>>> HTH, >>>>> >>>>> Guido. >>>>> >>>>> On 26/11/13 21:59, Patrick O'Lone wrote: >>>>>> We do perform a lot of sorting - on multiple fields in fact. We have >>>>>> different kinds of Solr configurations - our news searches do little >>>>>> with regards to faceting, but heavily sort. We provide classified ad >>>>>> searches and that heavily uses faceting. I might try reducing the JVM >>>>>> memory some and amount of perm generation as suggested earlier. It >>>>>> feels >>>>>> like a GC issue and loading the cache just happens to be the victim >>>>>> of a >>>>>> stop-the-world event at the worse possible time. >>>>>> >>>>>>> My gut instinct is that your heap size is way too high. Try >>>>>>> decreasing it to like 5-10G. I know you say it uses more than that, >>>>>>> but that just seems bizarre unless you're doing something like >>>>>>> faceting and/or sorting on every field. >>>>>>> >>>>>>> -Michael &g
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
Well, I want to include everything will start in the next 5 minute interval and everything that came before. The query is more like: fq=start_time:[* TO NOW+5MINUTE/5MINUTE] so that it rounds to the nearest 5 minute interval on the right-hand side. But, as soon as 1 second after that 5 minute window, everything pauses wanting for filter cache (at least that's my working theory based on observation). Is it possible to do something like: fq=start_time:[* TO NOW+1DAY/DAY]&q=start_time:[* TO NOW/MINUTE] where it would use the filter cache to narrow down by day resolution and then filter as part of the standard query, or something like that? My thought is that this would still gain a benefit from a query cache, but somewhat slower since it must remove results for things appearing later in the day. > If you want a start time within the next 5 minutes, I think your filter > is not the good one. > * will be replaced by the first date in your field > > Try : > fq=start_time:[NOW TO NOW+5MINUTE] > > Franck Brisbart > > > Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit : >> I have a new question about this issue - I create a filter queries of >> the form: >> >> fq=start_time:[* TO NOW/5MINUTE] >> >> This is used to restrict the set of documents to only items that have a >> start time within the next 5 minutes. Most of my indexes have millions >> of documents with few documents that start sometime in the future. >> Nearly all of my queries include this, would this cause every other >> search thread to block until the filter query is re-cached every 5 >> minutes and if so, is there a better way to do it? Thanks for any >> continued help with this issue! >> >>> We have a webapp running with a very high HEAP size (24GB) and we have >>> no problems with it AFTER we enabled the new GC that is meant to replace >>> sometime in the future the CMS GC, but you have to have Java 6 update >>> "Some number I couldn't find but latest should cover" to be able to use: >>> >>> 1. Remove all GC options you have and... >>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/ >>> >>> As a test of course, more information you can read on the following (and >>> interesting) article, we also have Solr running with these options, no >>> more pauses or HEAP size hitting the sky. >>> >>> Don't get bored reading the 1st (and small) introduction page of the >>> article, page 2 and 3 will make lot of sense: >>> http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 >>> >>> >>> HTH, >>> >>> Guido. >>> >>> On 26/11/13 21:59, Patrick O'Lone wrote: >>>> We do perform a lot of sorting - on multiple fields in fact. We have >>>> different kinds of Solr configurations - our news searches do little >>>> with regards to faceting, but heavily sort. We provide classified ad >>>> searches and that heavily uses faceting. I might try reducing the JVM >>>> memory some and amount of perm generation as suggested earlier. It feels >>>> like a GC issue and loading the cache just happens to be the victim of a >>>> stop-the-world event at the worse possible time. >>>> >>>>> My gut instinct is that your heap size is way too high. Try >>>>> decreasing it to like 5-10G. I know you say it uses more than that, >>>>> but that just seems bizarre unless you're doing something like >>>>> faceting and/or sorting on every field. >>>>> >>>>> -Michael >>>>> >>>>> -Original Message- >>>>> From: Patrick O'Lone [mailto:pol...@townnews.com] >>>>> Sent: Tuesday, November 26, 2013 11:59 AM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache >>>>> >>>>> I've been tracking a problem in our Solr environment for awhile with >>>>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to >>>>> try and thought I might get some insight from some others on this list. >>>>> >>>>> The load on the server is normally anywhere between 1-3. It's an >>>>> 8-core machine with 40GB of RAM. I have about 25GB of index data that >>>>> is replicated to this server every 5 minutes. It's taking about 200 >>>>> connections per second and roughly every 5-10 minutes it will stall >>&
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
I initially thought this was the case as well. These are slave nodes that receive updates every 5-10 minutes. However, this issue happens even if replication is turned off and no update handler is provided at all. I have confirmed against my data that simply querying the fq for a start_time in a range takes 11-13 seconds to actually populate the cache. If I make the fq not cache at all, my QTime raises by about 100ms, but does not have the stalling effect. A purely negative query also seems to have this effect, that is: fq=-start_time:[NOW/MINUTE TO *] But, I'm not sure if that is because it actually caches the negative query or if it discards it entirely. > Patrick, > > Are you getting these stalls following a commit? If so then the issue is > most likely fieldCache warming pauses. To stop your users from seeing > this pause you'll need to add static warming queries to your > solrconfig.xml to warm the fieldCache before it's registered . > > > On Mon, Dec 9, 2013 at 12:33 PM, Patrick O'Lone <mailto:pol...@townnews.com>> wrote: > > Well, I want to include everything will start in the next 5 minute > interval and everything that came before. The query is more like: > > fq=start_time:[* TO NOW+5MINUTE/5MINUTE] > > so that it rounds to the nearest 5 minute interval on the right-hand > side. But, as soon as 1 second after that 5 minute window, everything > pauses wanting for filter cache (at least that's my working theory based > on observation). Is it possible to do something like: > > fq=start_time:[* TO NOW+1DAY/DAY]&q=start_time:[* TO NOW/MINUTE] > > where it would use the filter cache to narrow down by day resolution and > then filter as part of the standard query, or something like that? > > My thought is that this would still gain a benefit from a query cache, > but somewhat slower since it must remove results for things appearing > later in the day. > > > If you want a start time within the next 5 minutes, I think your > filter > > is not the good one. > > * will be replaced by the first date in your field > > > > Try : > > fq=start_time:[NOW TO NOW+5MINUTE] > > > > Franck Brisbart > > > > > > Le lundi 09 d�cembre 2013 � 09:07 -0600, Patrick O'Lone a �crit : > >> I have a new question about this issue - I create a filter queries of > >> the form: > >> > >> fq=start_time:[* TO NOW/5MINUTE] > >> > >> This is used to restrict the set of documents to only items that > have a > >> start time within the next 5 minutes. Most of my indexes have > millions > >> of documents with few documents that start sometime in the future. > >> Nearly all of my queries include this, would this cause every other > >> search thread to block until the filter query is re-cached every 5 > >> minutes and if so, is there a better way to do it? Thanks for any > >> continued help with this issue! > >> > >>> We have a webapp running with a very high HEAP size (24GB) and > we have > >>> no problems with it AFTER we enabled the new GC that is meant to > replace > >>> sometime in the future the CMS GC, but you have to have Java 6 > update > >>> "Some number I couldn't find but latest should cover" to be able > to use: > >>> > >>> 1. Remove all GC options you have and... > >>> 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/ > >>> > >>> As a test of course, more information you can read on the > following (and > >>> interesting) article, we also have Solr running with these > options, no > >>> more pauses or HEAP size hitting the sky. > >>> > >>> Don't get bored reading the 1st (and small) introduction page of the > >>> article, page 2 and 3 will make lot of sense: > >>> > > http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 > >>> > >>> > >>> HTH, > >>> > >>> Guido. > >>> > >>> On 26/11/13 21:59, Patrick O'Lone wrote: > >>>> We do perform a lot of sorting - on multiple fields in fact. We > have > >>>> different kinds of Solr configurations - our news searches do > little > >>>>
LFU cache and autowarming
If I was to use the LFU cache instead of FastLRU on the filter cache, if I enable auto-warming on that cache type - does it warm the most frequently used fq on the filter cache? Thanks for any info! -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830
Re: LFU cache and autowarming
Well, I haven't tested it - if it's not ready yet I will probably avoid for now. > On 12/19/2013 1:46 PM, Patrick O'Lone wrote: >> If I was to use the LFU cache instead of FastLRU on the filter cache, if >> I enable auto-warming on that cache type - does it warm the most >> frequently used fq on the filter cache? Thanks for any info! > > I wrote that cache. It's a really really crappy implementation, I would > only expect it to work well if it's the cache is very very small. > > I do have a replacement implementation that's just about ready, but I've > not been able to find 'round tuits to work on getting it polished and > committed. > > https://issues.apache.org/jira/browse/SOLR-2906 > https://issues.apache.org/jira/browse/SOLR-3393 > > Thanks, > Shawn > > -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830