Re: Solr Searcher 100% Latency Spike

Erick Erickson Wed, 29 Jan 2020 16:25:14 -0800

Autowarming is significantly misunderstood. One of it's purposes in “the bad 
old days” was to rebuild very expensive on-heap structures for 
searching/sorting/grouping/and function queries.


These are exactly what docValues are designed to make much, much faster.

If you are still using spinning disks, the other benefit of warming queries is 
to read the index off disk and into MMapDirectory space. SSDs make this much 
faster too.


I often see two common mistakes:
1> no autowarming
2> excessive autowarming

I usually recommend people start with, say autowarm counts in the 10-20 as a 
start.

One implication of what you’ve said so far is that the additional 9 seconds 
your old autowarming took didn’t get you any benefit either, so putting it back 
isn’t indicated. I’m not quite clear why you say your memory footprint is 
lower, it’s unrelated to autowarming unless you also decreased your size 
parameter. If you’re saying that your reduced cache size hasn’t changed your 
95th percentile, I’d keep reducing it until it _did_ have a measurable effect.

The hit ratio is only loosely related to autowarming. So focusing on 
autowarming as a way to improve the hit ratio is probably the wrong focus.

So the first thing I’d do is make very, very sure that all the fields I used 
for grouping/sorting/faceting/function operations are docValues. Second, a 
static warming query that insured this rather relying on autowarming of the 
queryResultCache to happen to exercise those functions would be another step. 
NOTE: you don’t have to do all those operations on every field, just sorting on 
each field would suffice. NOTE: as of Solr 7.6, you can add “uninvertible=true” 
to your field types to insure that you have docValues set, see: SOLR-12962

And then I’d ask how much effort is smoothing out that kind of spike worth? You 
certainly see it with monitoring tools, but do users notice at all? If not, I 
wouldn’t spend all that much effort pursuing it…

Best,
Erick


> On Jan 29, 2020, at 4:48 PM, Karl Stoney 
> <karl.sto...@autotrader.co.uk.INVALID> wrote:
> 
> So interestingly tweaking my filter cache i've got the warming time down to 
> 1s (from 10!) and also reduced my memory footprint due to the smaller cache 
> size.
> 
> However, I still get these latency spikes (these changes have made no 
> difference to them).
> 
> So the theory about them being due to the warming being too intensive is 
> wrong.
> 
> I know the images didn't load btw so when I say spike I mean p95th response 
> time going from 50ms to 100-120ms momentarily.
> ________________________________
> From: Walter Underwood <wun...@wunderwood.org>
> Sent: 29 January 2020 21:30
> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
> Subject: Re: Solr Searcher 100% Latency Spike
> 
> Looking at the log, that takes one or two seconds after a complete batch 
> reload (master/slave). So that is loading a cold index, all new files. This 
> is not a big index, about a half million book titles.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C88a60f1aa3e14255da7b08d7a5026ee3%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159302173939949&amp;sdata=hhLg7bOfsLMN8OgLR625oj8xX%2Fm%2BZ%2BVOf1C813e4xk8%3D&amp;reserved=0
>   (my blog)
> 
>> On Jan 29, 2020, at 1:21 PM, Karl Stoney 
>> <karl.sto...@autotrader.co.uk.INVALID> wrote:
>> 
>> Out of curiosity, could you define "fast"?
>> I'm wondering what sort of figures people target their searcher warm time at
>> ________________________________
>> From: Walter Underwood <wun...@wunderwood.org>
>> Sent: 29 January 2020 21:13
>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
>> Subject: Re: Solr Searcher 100% Latency Spike
>> 
>> I use a static set of warming queries, about 20 of them. That is fast and 
>> gets a decent amount of the index into file buffers. Your top queries won’t 
>> change much unless you have a news site or a seasonal business.
>> 
>> Like this:
>> 
>>   <listener event="newSearcher" class="solr.QuerySenderListener">
>>     <arr name="queries">
>>       <lst>
>>         <!-- Top non-numeric query words from August 2011 rush -->
>>         <str name="q">introduction</str>
>>         <str name="q">intermediate</str>
>>         <str name="q">fundamentals</str>
>>         <str name="q">understanding</str>
>>         <str name="q">introductory</str>
>>         <str name="q">precalculus</str>
>>         <str name="q">foundations</str>
>>         <str name="q">microeconomics</str>
>>         <str name="q">microbiology</str>
>>         <str name="q">macroeconomics</str>
>>         <str name="q">discovering</str>
>>         <str name="q">international</str>
>>         <str name="q">mathematics</str>
>>         <str name="q">organizational</str>
>>         <str name="q">criminology</str>
>>         <str name="q">developmental</str>
>>         <str name="q">engineering</str>
>>       </lst>
>>     </arr>
>>   </listener>
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C88a60f1aa3e14255da7b08d7a5026ee3%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159302173939949&amp;sdata=hhLg7bOfsLMN8OgLR625oj8xX%2Fm%2BZ%2BVOf1C813e4xk8%3D&amp;reserved=0
>>   (my blog)
>> 
>>> On Jan 29, 2020, at 1:01 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>>> 
>>> On 1/29/2020 12:44 PM, Karl Stoney wrote:
>>>> Looking for a bit of support here.  When we soft commit (every 10 
>>>> minutes), we get a latency spike that means response times for solr are 
>>>> loosely double, as you can see in this screenshot:
>>> 
>>> Attachments almost never make it to the list.  We cannot see any of your 
>>> screenshots.
>>> 
>>>> They do correlate to filterCache warmup, which seem to take between 10s 
>>>> and 30s:
>>>> We don't have any other caches enabled, due to the high level of 
>>>> cardinality of the queries.
>>>> The spikes are specifically on /select
>>>> We have the following autowarm configuration for the filterCache:
>>>>       <filterCache class="solr.FastLRUCache"
>>>>                    size="8192"
>>>>                    initialSize="8192"
>>>>                    cleanupThread="true"
>>>>                    autowarmCount="900"/>
>>> 
>>> Autowarm, especially on filterCache, can be an extremely lengthy process.  
>>> What Solr must do in order to warm the cache here is execute up to 900 
>>> queries, sequentially, on the new index.  That can take a lot of time and 
>>> use a lot of resources like CPU and I/O.
>>> 
>>> In order to reduce the impact of cache warming, I had to reduce my own 
>>> autowarmCount on the filterCache to 4.
>>> 
>>> Thanks,
>>> Shawn
>> 
>> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
>> Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England 
>> No. 9439967). This email and any files transmitted with it are confidential 
>> and may be legally privileged, and intended solely for the use of the 
>> individual or entity to whom they are addressed. If you have received this 
>> email in error please notify the sender. This email message has been swept 
>> for the presence of computer viruses.
> 
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
> Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
> 9439967). This email and any files transmitted with it are confidential and 
> may be legally privileged, and intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this email in 
> error please notify the sender. This email message has been swept for the 
> presence of computer viruses.

Re: Solr Searcher 100% Latency Spike

Reply via email to