RE: Delayed/waiting requests
Hi Erick, Thank you for your detailed answer, I better understand autowarming. We have an autowarming time of ~10s for filterCache (queryResultCache is not used at all, ratio = 0.02). We increased the size of the filterCache from 6k to 12k (and autowarming size set to same values) to have a better ratio which is _only_ around 0.85/0.90. The thing I don't understand is I should see "Opening new searcher" in the logs everytime a new searcher is opened and thus an autowarming happens, right? But I don't see "Opening new searcher" very often, and I don't see it being correlated with the response time peaks. Also, I didn't mention it earlier but, we have other SolrCloud clusters with similar settings and load (~10s filterCache autowarming, 10k entries) and we don't observe the same behavior. Regards, De : Erick Erickson Envoyé : lundi 14 janvier 2019 17:44:38 À : solr-user Objet : Re: Delayed/waiting requests Gael: bq. Nevertheless, our filterCache is set to autowarm 12k entries which is also the maxSize That is far, far, far too many. Let's assume you actually have 12K entries in the filterCache. Every time you open a new searcher, 12K queries are executed _before_ the searcher accepts any new requests. While being able to re-use a filterCache entry is useful, one of the primary purposes is to pre-load index data from disk into memory which can be the event that takes the most time. The queryResultCache has a similar function. I often find that this cache doesn't have a very high hit ratio, but again executing a _few_ of these queries warms the index from disk. I think of both caches as a map, where the key is the "thing", (fq clause in the case of filterCache, the whole query in the case of the queryResultCache). Autowarming replays the most recently executed N of these entries, essentially just as though they were submitted by a user. Hypothesis: You're massively over-warming, and when that kicks in you're seeing increased CPU and GC pressure leading to the anomalies you're seeing. Further, you have such excessive autowarming going on that it's hard to see the associated messages in the log. Here's what I'd recommend: Set your autowarm counts to something on the order of 16. If the culprit is just excessive autowarming, I'd expect your spikes to be much less severe. It _might_ be that your users see some increased (very temporary) variance in response time. You can tell that the autowarming configurations are "more art than science", I can't give you any other recommendations than "start small and increase until you're happy" unfortunately. I usually do this with some kind of load tester in a dev lab of course ;). Finally, if you use the metrics data (see: https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html) you can see the autowarm times. Don't get too lost in the page to start, just hit the "http://localhost:8983/solr/admin/metrics"; endpoint and look for "warmupTime", then refine on how to get _only_ the warmup stats ;). Best, Erick On Mon, Jan 14, 2019 at 5:08 AM Gael Jourdan-Weil wrote: > > I had a look to GC logs this morning but I'm not sure how to interpret them. > > > Over a period of 54mn, there is: > > - Number of pauses: 2739 > > - Accumulated pauses: 93s => that is 2.86% of the time > > - Average pause duration: 0.03s > > - Average pause interval: 1.18s > > - Accumulated full GC: 0 > > I'm not sure if this is a lot or not. What do you think ? > > > Looking more closely to GC logs with GC Viewer, I can notice that the high > response time peaks happens at the same time where GC pauses takes 2x more > time (around 0.06s) than average. > > > Also we are indeed indexing at the same time but we have autowarming set. > > I don't see any Searcher opened at the time we experience slowness. > > Nevertheless, our filterCache is set to autowarm 12k entries which is also > the maxSize. > > Could this have any downside? > > > Thanks, > > Gaël > > > > De : Erick Erickson > Envoyé : vendredi 11 janvier 2019 17:21 > À : solr-user > Objet : Re: Delayed/waiting requests > > Jimi's comment is one of the very common culprits. > > Autowarming is another. Are you indexing at the same > time? If so it could well be you aren't autowarming and > the spikes are caused by using a new IndexSearcher > that has to read much of the index off disk when commits > happen. The "smoking gun" here would be if the spikes > correlate to your commits (soft or hard-with-opensearcher-true). > > Best, > Erick > > On Fri, Jan 11, 2019 at 1:23 AM Gael Jourdan-Weil > wrote: > > > > Interesting indeed, we did not see anything with VisualVM but having a look > > at the GC logs could gives us more info, especially on the pauses. > > > > I will collect data over the week-end and look at it. > > > > > > Thanks > > > > > > De : Hullegård, Jimi > > Envoyé : vendredi 11 janvier 2019 03:46:02 > > À :
join query and new searcher on joined collection
Solr 6.3 I have a query like this: q=*:*{!join score=none from=id fromIndex=hss_4 to=rpk_hdquotes v=$qq}*:* -- Vadim
RE: join query and new searcher on joined collection
Sory, I've sent unfinished message So, query on collection1 q=*:*{!join score=none from=id fromIndex=collection2 to=field1}*:* The question is what happened with autowarming and new searchers on collection1 when new searcher starts on collection2? IMHO when request with join comes it's impossible to use caches on collection1 and ... Does new searcher starts on collection1 as well? > -Original Message- > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru] > Sent: Tuesday, January 15, 2019 1:00 PM > To: solr-user@lucene.apache.org > Subject: join query and new searcher on joined collection > > Solr 6.3 > > > > I have a query like this: > > q=*:*{!join score=none from=id fromIndex=hss_4 to=rpk_hdquotes v=$qq}*:* > > > > -- > > Vadim > >
Re: "no servers hosting shard" when querying during shard creation
On 13/01/2019 19:43, Erick Erickson wrote: > Yeah, that seems wrong, I'd say open a JIRA. I've created a bug in Jira: SOLR-13136. Should I assign this to anyone? Unsure what the procedure is there. Incidentally, while doing so I noticed that 7.6 is still "unreleased" according to Jira. Thanks, - Bram
Re: DateRangeField requires month?
I did some testing by tweaking DateRangeFieldTest and witness that 2000-11T13 is parsed as 2000-11-13 see https://github.com/apache/lucene-solr/blob/f083473b891e596def2877b5429fcfa6db175464/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/tree/DateRangePrefixTree.java#L462 Don't know what to do with it... At least I'm going to update the doc. On Mon, Jan 14, 2019 at 4:42 PM Jeremy Smith wrote: > Hi Mikhail, thanks for the response. I'm probably missing something, but > what makes 2000-11T13 contiguous and 2000T13 not contiguous? They seem > pretty similar to me, but only the former is supported. > > > Thanks, > > Jeremy > > > From: Mikhail Khludnev > Sent: Sunday, January 13, 2019 12:59:31 AM > To: solr-user > Subject: Re: DateRangeField requires month? > > Hello, Jeremy. > > See below. > > On Mon, Jan 7, 2019 at 5:09 PM Jeremy Smith wrote: > > > Hello, > > > > I am trying to use the DateRangeField and ran into an interesting > > issue. According to the documentation ( > > https://lucene.apache.org/solr/guide/7_6/working-with-dates.html), these > > are both valid for the DateRangeField: 2000-11 and 2000-11T13. I can > > confirm this is working in 7.6. I would also expect to be able to use > > 2000T13, which would mean any time in the year 2000 between 1300 and > 1400. > > > Nope. This is not a range, but multiple ranges. DateRangeField supports > contiguous ranges only. > > > > However, I get an error when trying to insert this value: > > > > > > "error":{"metadata": > > > > > > > ["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.NumberFormatException"], > > > > "msg":"ERROR: Error adding field 'dtRange'='2000T13' msg=Couldn't > > parse date because: Improperly formatted date: 2000T13","code":400 > > > > } > > > > > > I am using 7.6 with a super simple schema containing only _version_ and a > > DateRangeField and there's nothing special in my solrconfig.xml. Is this > > behavior expected? Should I open a jira issue? > > > > > > Thanks, > > > > Jeremy > > > > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev
Re: DateRangeField requires month?
Follow up https://issues.apache.org/jira/browse/SOLR-13139 On Tue, Jan 15, 2019 at 2:46 PM Mikhail Khludnev wrote: > I did some testing by tweaking DateRangeFieldTest and witness that > 2000-11T13 is parsed as 2000-11-13 see > > https://github.com/apache/lucene-solr/blob/f083473b891e596def2877b5429fcfa6db175464/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/tree/DateRangePrefixTree.java#L462 > Don't know what to do with it... At least I'm going to update the doc. > > On Mon, Jan 14, 2019 at 4:42 PM Jeremy Smith wrote: > >> Hi Mikhail, thanks for the response. I'm probably missing something, but >> what makes 2000-11T13 contiguous and 2000T13 not contiguous? They seem >> pretty similar to me, but only the former is supported. >> >> >> Thanks, >> >> Jeremy >> >> >> From: Mikhail Khludnev >> Sent: Sunday, January 13, 2019 12:59:31 AM >> To: solr-user >> Subject: Re: DateRangeField requires month? >> >> Hello, Jeremy. >> >> See below. >> >> On Mon, Jan 7, 2019 at 5:09 PM Jeremy Smith wrote: >> >> > Hello, >> > >> > I am trying to use the DateRangeField and ran into an interesting >> > issue. According to the documentation ( >> > https://lucene.apache.org/solr/guide/7_6/working-with-dates.html), >> these >> > are both valid for the DateRangeField: 2000-11 and 2000-11T13. I can >> > confirm this is working in 7.6. I would also expect to be able to use >> > 2000T13, which would mean any time in the year 2000 between 1300 and >> 1400. >> >> >> Nope. This is not a range, but multiple ranges. DateRangeField supports >> contiguous ranges only. >> >> >> > However, I get an error when trying to insert this value: >> > >> > >> > "error":{"metadata": >> > >> > >> > >> ["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.NumberFormatException"], >> > >> > "msg":"ERROR: Error adding field 'dtRange'='2000T13' msg=Couldn't >> > parse date because: Improperly formatted date: 2000T13","code":400 >> > >> > } >> > >> > >> > I am using 7.6 with a super simple schema containing only _version_ and >> a >> > DateRangeField and there's nothing special in my solrconfig.xml. Is >> this >> > behavior expected? Should I open a jira issue? >> > >> > >> > Thanks, >> > >> > Jeremy >> > >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> > > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev
Re: join query and new searcher on joined collection
collection1 has no idea about new searcher in collection2. On Tue, Jan 15, 2019 at 1:18 PM Vadim Ivanov < vadim.iva...@spb.ntk-intourist.ru> wrote: > Sory, I've sent unfinished message > So, query on collection1 > q=*:*{!join score=none from=id fromIndex=collection2 to=field1}*:* > > The question is what happened with autowarming and new searchers on > collection1 when new searcher starts on collection2? > IMHO when request with join comes it's impossible to use caches on > collection1 and ... > Does new searcher starts on collection1 as well? > > > > -Original Message- > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru] > > Sent: Tuesday, January 15, 2019 1:00 PM > > To: solr-user@lucene.apache.org > > Subject: join query and new searcher on joined collection > > > > Solr 6.3 > > > > > > > > I have a query like this: > > > > q=*:*{!join score=none from=id fromIndex=hss_4 to=rpk_hdquotes v=$qq}*:* > > > > > > > > -- > > > > Vadim > > > > > > > -- Sincerely yours Mikhail Khludnev
Re: DateRangeField requires month?
Thanks Mikhail, I think the change you proposed to the documentation will be helpful to avoid this confusion. From: Mikhail Khludnev Sent: Tuesday, January 15, 2019 8:47:17 AM To: solr-user Subject: Re: DateRangeField requires month? Follow up https://issues.apache.org/jira/browse/SOLR-13139 On Tue, Jan 15, 2019 at 2:46 PM Mikhail Khludnev wrote: > I did some testing by tweaking DateRangeFieldTest and witness that > 2000-11T13 is parsed as 2000-11-13 see > > https://github.com/apache/lucene-solr/blob/f083473b891e596def2877b5429fcfa6db175464/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/tree/DateRangePrefixTree.java#L462 > Don't know what to do with it... At least I'm going to update the doc. > > On Mon, Jan 14, 2019 at 4:42 PM Jeremy Smith wrote: > >> Hi Mikhail, thanks for the response. I'm probably missing something, but >> what makes 2000-11T13 contiguous and 2000T13 not contiguous? They seem >> pretty similar to me, but only the former is supported. >> >> >> Thanks, >> >> Jeremy >> >> >> From: Mikhail Khludnev >> Sent: Sunday, January 13, 2019 12:59:31 AM >> To: solr-user >> Subject: Re: DateRangeField requires month? >> >> Hello, Jeremy. >> >> See below. >> >> On Mon, Jan 7, 2019 at 5:09 PM Jeremy Smith wrote: >> >> > Hello, >> > >> > I am trying to use the DateRangeField and ran into an interesting >> > issue. According to the documentation ( >> > https://lucene.apache.org/solr/guide/7_6/working-with-dates.html), >> these >> > are both valid for the DateRangeField: 2000-11 and 2000-11T13. I can >> > confirm this is working in 7.6. I would also expect to be able to use >> > 2000T13, which would mean any time in the year 2000 between 1300 and >> 1400. >> >> >> Nope. This is not a range, but multiple ranges. DateRangeField supports >> contiguous ranges only. >> >> >> > However, I get an error when trying to insert this value: >> > >> > >> > "error":{"metadata": >> > >> > >> > >> ["error-class","org.apache.solr.common.SolrException","root-error-class","java.lang.NumberFormatException"], >> > >> > "msg":"ERROR: Error adding field 'dtRange'='2000T13' msg=Couldn't >> > parse date because: Improperly formatted date: 2000T13","code":400 >> > >> > } >> > >> > >> > I am using 7.6 with a super simple schema containing only _version_ and >> a >> > DateRangeField and there's nothing special in my solrconfig.xml. Is >> this >> > behavior expected? Should I open a jira issue? >> > >> > >> > Thanks, >> > >> > Jeremy >> > >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> > > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev
RE: join query and new searcher on joined collection
Thanx, Mikhail for reply > collection1 has no idea about new searcher in collection2. I suspected it. :) So, when "join" query arrives searcher on collection1 has no chance to use filter cache, stored before. I suppose it invalidates filter cache, am I right? &fq={!join score=none from=id fromIndex=collection2 to=field1}*:* > On Tue, Jan 15, 2019 at 1:18 PM Vadim Ivanov < > vadim.iva...@spb.ntk-intourist.ru> wrote: > > > Sory, I've sent unfinished message > > So, query on collection1 > > q=*:*{!join score=none from=id fromIndex=collection2 to=field1}*:* > > > > The question is what happened with autowarming and new searchers on > > collection1 when new searcher starts on collection2? > > IMHO when request with join comes it's impossible to use caches on > > collection1 and ... > > Does new searcher starts on collection1 as well? > > > > > > > -Original Message- > > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru] > > > Sent: Tuesday, January 15, 2019 1:00 PM > > > To: solr-user@lucene.apache.org > > > Subject: join query and new searcher on joined collection > > > > > > Solr 6.3 > > > > > > > > > > > > I have a query like this: > > > > > > q=*:*{!join score=none from=id fromIndex=hss_4 to=rpk_hdquotes > v=$qq}*:* > > > > > > > > > > > > -- > > > > > > Vadim > > > > > > > > > > > > > > -- > Sincerely yours > Mikhail Khludnev
Re: join query and new searcher on joined collection
It doesn't invalidate anything. It just doesn't matches to the join query from older collection2 see https://github.com/apache/lucene-solr/blob/b7f99fe55a6fb6e7b38828676750b3512d6899a1/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L570 So, after commit collection2 following join at collection1 just won't hit filter cache, and will be cached as new entry and lately the old entry will be evicted. On Tue, Jan 15, 2019 at 5:30 PM Vadim Ivanov < vadim.iva...@spb.ntk-intourist.ru> wrote: > Thanx, Mikhail for reply > > collection1 has no idea about new searcher in collection2. > I suspected it. :) > > So, when "join" query arrives searcher on collection1 has no chance to use > filter cache, stored before. > I suppose it invalidates filter cache, am I right? > > &fq={!join score=none from=id fromIndex=collection2 to=field1}*:* > > > On Tue, Jan 15, 2019 at 1:18 PM Vadim Ivanov < > > vadim.iva...@spb.ntk-intourist.ru> wrote: > > > > > Sory, I've sent unfinished message > > > So, query on collection1 > > > q=*:*{!join score=none from=id fromIndex=collection2 to=field1}*:* > > > > > > The question is what happened with autowarming and new searchers on > > > collection1 when new searcher starts on collection2? > > > IMHO when request with join comes it's impossible to use caches on > > > collection1 and ... > > > Does new searcher starts on collection1 as well? > > > > > > > > > > -Original Message- > > > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru] > > > > Sent: Tuesday, January 15, 2019 1:00 PM > > > > To: solr-user@lucene.apache.org > > > > Subject: join query and new searcher on joined collection > > > > > > > > Solr 6.3 > > > > > > > > > > > > > > > > I have a query like this: > > > > > > > > q=*:*{!join score=none from=id fromIndex=hss_4 to=rpk_hdquotes > > v=$qq}*:* > > > > > > > > > > > > > > > > -- > > > > > > > > Vadim > > > > > > > > > > > > > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > -- Sincerely yours Mikhail Khludnev
Re: Delayed/waiting requests
Well, it was a nice theory anyway. "Other collections with the same settings" doesn't really mean much unless those other collections are very similar, especially in terms of numbers of docs. You should only see a new searcher opening when you do a hard-commit-with-opensearcher-true or soft commit. So what happens when you just try lowering the autowarm count? I'm assuming you're free to test in some non-prod system. Focusing on the hit ratio is something of a red herring. Remember that each entry in your filterCache is roughly maxDoc/8 + a little overhead, the increase in GC pressure has to be balanced against getting the hits from the cache. Now, all that said if there's no correlation, then you need to put a profiler on the system when you see this kind of thing and find out where the hotspots are, otherwise it's guesswork and I'm out of ideas. Best, Erick On Tue, Jan 15, 2019 at 12:06 AM Gael Jourdan-Weil wrote: > > Hi Erick, > > > Thank you for your detailed answer, I better understand autowarming. > > > We have an autowarming time of ~10s for filterCache (queryResultCache is not > used at all, ratio = 0.02). > > We increased the size of the filterCache from 6k to 12k (and autowarming size > set to same values) to have a better ratio which is _only_ around 0.85/0.90. > > > The thing I don't understand is I should see "Opening new searcher" in the > logs everytime a new searcher is opened and thus an autowarming happens, > right? > > But I don't see "Opening new searcher" very often, and I don't see it being > correlated with the response time peaks. > > > Also, I didn't mention it earlier but, we have other SolrCloud clusters with > similar settings and load (~10s filterCache autowarming, 10k entries) and we > don't observe the same behavior. > > > Regards, > > > De : Erick Erickson > Envoyé : lundi 14 janvier 2019 17:44:38 > À : solr-user > Objet : Re: Delayed/waiting requests > > Gael: > > bq. Nevertheless, our filterCache is set to autowarm 12k entries which > is also the maxSize > > That is far, far, far too many. Let's assume you actually have 12K > entries in the filterCache. > Every time you open a new searcher, 12K queries are executed _before_ > the searcher > accepts any new requests. While being able to re-use a filterCache > entry is useful, one of > the primary purposes is to pre-load index data from disk into memory > which can be > the event that takes the most time. > > The queryResultCache has a similar function. I often find that this > cache doesn't have a > very high hit ratio, but again executing a _few_ of these queries > warms the index from > disk. > > I think of both caches as a map, where the key is the "thing", (fq > clause in the case > of filterCache, the whole query in the case of the queryResultCache). > Autowarming > replays the most recently executed N of these entries, essentially > just as though > they were submitted by a user. > > Hypothesis: You're massively over-warming, and when that kicks in you're > seeing > increased CPU and GC pressure leading to the anomalies you're seeing. Further, > you have such excessive autowarming going on that it's hard to see the > associated messages in the log. > > Here's what I'd recommend: Set your autowarm counts to something on the order > of 16. If the culprit is just excessive autowarming, I'd expect your spikes to > be much less severe. It _might_ be that your users see some increased (very > temporary) variance in response time. You can tell that the autowarming > configurations are "more art than science", I can't give you any other > recommendations than "start small and increase until you're happy" > unfortunately. > > I usually do this with some kind of load tester in a dev lab of course ;). > > Finally, if you use the metrics data (see: > https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html) > you can see the autowarm times. Don't get too lost in the page to > start, just hit the "http://localhost:8983/solr/admin/metrics"; endpoint > and look for "warmupTime", then refine on how to get _only_ > the warmup stats ;). > > Best, > Erick > > On Mon, Jan 14, 2019 at 5:08 AM Gael Jourdan-Weil > wrote: > > > > I had a look to GC logs this morning but I'm not sure how to interpret them. > > > > > > Over a period of 54mn, there is: > > > > - Number of pauses: 2739 > > > > - Accumulated pauses: 93s => that is 2.86% of the time > > > > - Average pause duration: 0.03s > > > > - Average pause interval: 1.18s > > > > - Accumulated full GC: 0 > > > > I'm not sure if this is a lot or not. What do you think ? > > > > > > Looking more closely to GC logs with GC Viewer, I can notice that the high > > response time peaks happens at the same time where GC pauses takes 2x more > > time (around 0.06s) than average. > > > > > > Also we are indeed indexing at the same time but we have autowarming set. > > > > I don't see any Searcher opened at the time we experience slowness. > >
Re: Re: Delayed/waiting requests
Hi Gael – Could you share this information? Size of the index Server memory available Server CPU count JVM memory settings You mentioned a cloud configuration of 3 replicas. Does that mean you have 1 shard with a replication factor of 3? Do the pauses occur on all 3 servers? Is the traffic evenly balanced across those servers? Jeremy Branham jb...@allstate.com On 1/15/19, 9:50 AM, "Erick Erickson" wrote: Well, it was a nice theory anyway. "Other collections with the same settings" doesn't really mean much unless those other collections are very similar, especially in terms of numbers of docs. You should only see a new searcher opening when you do a hard-commit-with-opensearcher-true or soft commit. So what happens when you just try lowering the autowarm count? I'm assuming you're free to test in some non-prod system. Focusing on the hit ratio is something of a red herring. Remember that each entry in your filterCache is roughly maxDoc/8 + a little overhead, the increase in GC pressure has to be balanced against getting the hits from the cache. Now, all that said if there's no correlation, then you need to put a profiler on the system when you see this kind of thing and find out where the hotspots are, otherwise it's guesswork and I'm out of ideas. Best, Erick On Tue, Jan 15, 2019 at 12:06 AM Gael Jourdan-Weil wrote: > > Hi Erick, > > > Thank you for your detailed answer, I better understand autowarming. > > > We have an autowarming time of ~10s for filterCache (queryResultCache is not used at all, ratio = 0.02). > > We increased the size of the filterCache from 6k to 12k (and autowarming size set to same values) to have a better ratio which is _only_ around 0.85/0.90. > > > The thing I don't understand is I should see "Opening new searcher" in the logs everytime a new searcher is opened and thus an autowarming happens, right? > > But I don't see "Opening new searcher" very often, and I don't see it being correlated with the response time peaks. > > > Also, I didn't mention it earlier but, we have other SolrCloud clusters with similar settings and load (~10s filterCache autowarming, 10k entries) and we don't observe the same behavior. > > > Regards, > > > De : Erick Erickson > Envoyé : lundi 14 janvier 2019 17:44:38 > À : solr-user > Objet : Re: Delayed/waiting requests > > Gael: > > bq. Nevertheless, our filterCache is set to autowarm 12k entries which > is also the maxSize > > That is far, far, far too many. Let's assume you actually have 12K > entries in the filterCache. > Every time you open a new searcher, 12K queries are executed _before_ > the searcher > accepts any new requests. While being able to re-use a filterCache > entry is useful, one of > the primary purposes is to pre-load index data from disk into memory > which can be > the event that takes the most time. > > The queryResultCache has a similar function. I often find that this > cache doesn't have a > very high hit ratio, but again executing a _few_ of these queries > warms the index from > disk. > > I think of both caches as a map, where the key is the "thing", (fq > clause in the case > of filterCache, the whole query in the case of the queryResultCache). > Autowarming > replays the most recently executed N of these entries, essentially > just as though > they were submitted by a user. > > Hypothesis: You're massively over-warming, and when that kicks in you're seeing > increased CPU and GC pressure leading to the anomalies you're seeing. Further, > you have such excessive autowarming going on that it's hard to see the > associated messages in the log. > > Here's what I'd recommend: Set your autowarm counts to something on the order > of 16. If the culprit is just excessive autowarming, I'd expect your spikes to > be much less severe. It _might_ be that your users see some increased (very > temporary) variance in response time. You can tell that the autowarming > configurations are "more art than science", I can't give you any other > recommendations than "start small and increase until you're happy" > unfortunately. > > I usually do this with some kind of load tester in a dev lab of course ;). > > Finally, if you use the metrics data (see: > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F1_metrics-2Dreporting.html&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=9BWTVr32mplsfAWQ3hnWuVx5V1cL_RgLNDDpg8S2mtk&e=) > you can see the autowarm times. Don't get
RE: Re: Delayed/waiting requests
@Erick: We will try to lower the autowarm and run some tests to compare. If I get your point, having a big cache might cause more troubles than help if the cache hit ratio is not high enough because the cache is constantly evicting/inserting entries? @Jeremy: Index size: ~20G and ~14M documents Server memory available: 256G from which ~30G used and ~100G system cache Server CPU count: 32, ~10% usage JVM memory settings: -Xms12G -Xmx12G We have 3 servers and 3 clusters of 3 Solr instances. That is each server hosts 1 Solr instance for each cluster. And, indeed, each cluster only has 1 shard with replication factor 3. Among all these Solr instances, the pauses are observed on only one single cluster but on every server at different times (sometimes on all servers at the same time but I would say it's very rare). We do observe the traffic is evenly balanced across the 3 servers, around 30-40 queries per second sent to each server. Regards, Gaël De : Branham, Jeremy (Experis) Envoyé : mardi 15 janvier 2019 17:59:56 À : solr-user@lucene.apache.org Objet : Re: Re: Delayed/waiting requests Hi Gael – Could you share this information? Size of the index Server memory available Server CPU count JVM memory settings You mentioned a cloud configuration of 3 replicas. Does that mean you have 1 shard with a replication factor of 3? Do the pauses occur on all 3 servers? Is the traffic evenly balanced across those servers? Jeremy Branham jb...@allstate.com On 1/15/19, 9:50 AM, "Erick Erickson" wrote: Well, it was a nice theory anyway. "Other collections with the same settings" doesn't really mean much unless those other collections are very similar, especially in terms of numbers of docs. You should only see a new searcher opening when you do a hard-commit-with-opensearcher-true or soft commit. So what happens when you just try lowering the autowarm count? I'm assuming you're free to test in some non-prod system. Focusing on the hit ratio is something of a red herring. Remember that each entry in your filterCache is roughly maxDoc/8 + a little overhead, the increase in GC pressure has to be balanced against getting the hits from the cache. Now, all that said if there's no correlation, then you need to put a profiler on the system when you see this kind of thing and find out where the hotspots are, otherwise it's guesswork and I'm out of ideas. Best, Erick On Tue, Jan 15, 2019 at 12:06 AM Gael Jourdan-Weil wrote: > > Hi Erick, > > > Thank you for your detailed answer, I better understand autowarming. > > > We have an autowarming time of ~10s for filterCache (queryResultCache is not used at all, ratio = 0.02). > > We increased the size of the filterCache from 6k to 12k (and autowarming size set to same values) to have a better ratio which is _only_ around 0.85/0.90. > > > The thing I don't understand is I should see "Opening new searcher" in the logs everytime a new searcher is opened and thus an autowarming happens, right? > > But I don't see "Opening new searcher" very often, and I don't see it being correlated with the response time peaks. > > > Also, I didn't mention it earlier but, we have other SolrCloud clusters with similar settings and load (~10s filterCache autowarming, 10k entries) and we don't observe the same behavior. > > > Regards, > > > De : Erick Erickson > Envoyé : lundi 14 janvier 2019 17:44:38 > À : solr-user > Objet : Re: Delayed/waiting requests > > Gael: > > bq. Nevertheless, our filterCache is set to autowarm 12k entries which > is also the maxSize > > That is far, far, far too many. Let's assume you actually have 12K > entries in the filterCache. > Every time you open a new searcher, 12K queries are executed _before_ > the searcher > accepts any new requests. While being able to re-use a filterCache > entry is useful, one of > the primary purposes is to pre-load index data from disk into memory > which can be > the event that takes the most time. > > The queryResultCache has a similar function. I often find that this > cache doesn't have a > very high hit ratio, but again executing a _few_ of these queries > warms the index from > disk. > > I think of both caches as a map, where the key is the "thing", (fq > clause in the case > of filterCache, the whole query in the case of the queryResultCache). > Autowarming > replays the most recently executed N of these entries, essentially > just as though > they were submitted by a user. > > Hypothesis: You're massively over-warming, and when that kicks in you're seeing > increased CPU and GC pressure leading to the anomalies
Can Solr 4.10 work with JDK11
I probably already know the answer for this but was still wondering.
Re: Re: Delayed/waiting requests
bq. If I get your point, having a big cache might cause more troubles than help if the cache hit ratio is not high enough because the cache is constantly evicting/inserting entries? Pretty much. Although there are nuances. Right now, you have a 12K autowarm count. That means your cache will eventually always contain 12K entries whether or not you ever use the last 11K! I'm simplifying a bit, but it grows like this. Let's say I start Solr. Initially it has no cache entries. Now I start both querying and indexing. For simplicity, say I have 100 _new_ fq clauses come in between each commit. The first commit will autowarm 100. The next will autowarm 200, then 300.. etc. Eventually this will grow to 12K. So your performance will start to vary depending on how long Solr has been running. Worse. it's not clear that you _ever_ re-use those clauses. One example: fq=date_field:[* TO NOW] NOW is really a Unix timestamp. So issuing the same fq 1 millisecond from the first one will not re-use the entry. In the worst case almost all of your autwarming is useless. It neither loads relevant index data into RAM nor is reusable. Even if you use "date math" to round to, say, a minute, if you run Solr long enough you'll still fill up with useless fq clauses. Best, Erick On Tue, Jan 15, 2019 at 9:33 AM Gael Jourdan-Weil wrote: > > @Erick: > > > We will try to lower the autowarm and run some tests to compare. > > If I get your point, having a big cache might cause more troubles than help > if the cache hit ratio is not high enough because the cache is constantly > evicting/inserting entries? > > > > @Jeremy: > > > Index size: ~20G and ~14M documents > > Server memory available: 256G from which ~30G used and ~100G system cache > > Server CPU count: 32, ~10% usage > > JVM memory settings: -Xms12G -Xmx12G > > > We have 3 servers and 3 clusters of 3 Solr instances. > > That is each server hosts 1 Solr instance for each cluster. > > And, indeed, each cluster only has 1 shard with replication factor 3. > > > Among all these Solr instances, the pauses are observed on only one single > cluster but on every server at different times (sometimes on all servers at > the same time but I would say it's very rare). > > We do observe the traffic is evenly balanced across the 3 servers, around > 30-40 queries per second sent to each server. > > > > Regards, > > Gaël > > > > De : Branham, Jeremy (Experis) > Envoyé : mardi 15 janvier 2019 17:59:56 > À : solr-user@lucene.apache.org > Objet : Re: Re: Delayed/waiting requests > > Hi Gael – > > Could you share this information? > Size of the index > Server memory available > Server CPU count > JVM memory settings > > You mentioned a cloud configuration of 3 replicas. > Does that mean you have 1 shard with a replication factor of 3? > Do the pauses occur on all 3 servers? > Is the traffic evenly balanced across those servers? > > > Jeremy Branham > jb...@allstate.com > > > On 1/15/19, 9:50 AM, "Erick Erickson" wrote: > > Well, it was a nice theory anyway. > > "Other collections with the same settings" > doesn't really mean much unless those other collections are very similar, > especially in terms of numbers of docs. > > You should only see a new searcher opening when you do a > hard-commit-with-opensearcher-true or soft commit. > > So what happens when you just try lowering the autowarm > count? I'm assuming you're free to test in some non-prod > system. > > Focusing on the hit ratio is something of a red herring. Remember > that each entry in your filterCache is roughly maxDoc/8 + a little > overhead, the increase in GC pressure has to be balanced > against getting the hits from the cache. > > Now, all that said if there's no correlation, then you need to put > a profiler on the system when you see this kind of thing and > find out where the hotspots are, otherwise it's guesswork and > I'm out of ideas. > > Best, > Erick > > On Tue, Jan 15, 2019 at 12:06 AM Gael Jourdan-Weil > wrote: > > > > Hi Erick, > > > > > > Thank you for your detailed answer, I better understand autowarming. > > > > > > We have an autowarming time of ~10s for filterCache (queryResultCache > is not used at all, ratio = 0.02). > > > > We increased the size of the filterCache from 6k to 12k (and > autowarming size set to same values) to have a better ratio which is _only_ > around 0.85/0.90. > > > > > > The thing I don't understand is I should see "Opening new searcher" in > the logs everytime a new searcher is opened and thus an autowarming happens, > right? > > > > But I don't see "Opening new searcher" very often, and I don't see it > being correlated with the response time peaks. > > > > > > Also, I didn't mention it earlier but, we have other SolrCloud clusters > with similar settings and load (~10s filterCache
RE: join query and new searcher on joined collection
I see, thank you very much! > -Original Message- > From: Mikhail Khludnev [mailto:m...@apache.org] > Sent: Tuesday, January 15, 2019 6:45 PM > To: solr-user > Subject: Re: join query and new searcher on joined collection > > It doesn't invalidate anything. It just doesn't matches to the join query > from older collection2 see > https://github.com/apache/lucene- > solr/blob/b7f99fe55a6fb6e7b38828676750b3512d6899a1/solr/core/src/java/o > rg/apache/solr/search/JoinQParserPlugin.java#L570 > So, after commit collection2 following join at collection1 just won't hit > filter cache, and will be cached as new entry and lately the old entry will > be evicted. > > On Tue, Jan 15, 2019 at 5:30 PM Vadim Ivanov < > vadim.iva...@spb.ntk-intourist.ru> wrote: > > > Thanx, Mikhail for reply > > > collection1 has no idea about new searcher in collection2. > > I suspected it. :) > > > > So, when "join" query arrives searcher on collection1 has no chance to use > > filter cache, stored before. > > I suppose it invalidates filter cache, am I right? > > > > &fq={!join score=none from=id fromIndex=collection2 to=field1}*:* > > > > > On Tue, Jan 15, 2019 at 1:18 PM Vadim Ivanov < > > > vadim.iva...@spb.ntk-intourist.ru> wrote: > > > > > > > Sory, I've sent unfinished message > > > > So, query on collection1 > > > > q=*:*{!join score=none from=id fromIndex=collection2 to=field1}*:* > > > > > > > > The question is what happened with autowarming and new searchers on > > > > collection1 when new searcher starts on collection2? > > > > IMHO when request with join comes it's impossible to use caches on > > > > collection1 and ... > > > > Does new searcher starts on collection1 as well? > > > > > > > > > > > > > -Original Message- > > > > > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru] > > > > > Sent: Tuesday, January 15, 2019 1:00 PM > > > > > To: solr-user@lucene.apache.org > > > > > Subject: join query and new searcher on joined collection > > > > > > > > > > Solr 6.3 > > > > > > > > > > > > > > > > > > > > I have a query like this: > > > > > > > > > > q=*:*{!join score=none from=id fromIndex=hss_4 to=rpk_hdquotes > > > v=$qq}*:* > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Vadim > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > > > > -- > Sincerely yours > Mikhail Khludnev
Re: Can Solr 4.10 work with JDK11
Or let me rephrase the question. What is the minimum Solr version that is JDK11 compatible. On Tue, Jan 15, 2019 at 10:27 AM Pushkar Raste wrote: > I probably already know the answer for this but was still wondering. >
Re: Delayed/waiting requests
On 1/15/2019 10:33 AM, Gael Jourdan-Weil wrote: Index size: ~20G and ~14M documents Server memory available: 256G from which ~30G used and ~100G system cache Server CPU count: 32, ~10% usage JVM memory settings: -Xms12G -Xmx12G Can you create a process listing screenshot as described at this URL? You'll need to use a file sharing website to provide us with a URL to access the file. When done properly, the screenshot provides a lot of useful information. https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue It would be best if the screenshot is gathered when you're experiencing the problem. Thanks, Shawn
Re: Logging fails when starting Solr in Windows using solr.cmd
I faced the same issue as jakob with solr-7.6.0, eclipse-2018-12 (4.10.0), Java 1.8.0_191: *Solution:* In eclipse Run Configuration run-solr remove "file:" from Argument -Dlog4j.configurationFile="file:${workspace_loc:solr-7.6.0}/solr/server/resources/log4j2.xml" -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html