Re: Solr Searcher 100% Latency Spike

Erick Erickson Thu, 30 Jan 2020 04:48:46 -0800

I’ve seen those kinds of compounding problems. So, rather than guess as to what 
kinds of warming queries you need (and I think this is where you’ll get what 
you need) here’s how I’d approach it.


Look at your Solr logs for any query with QTime > 75. From what you’re saying, 
you should see a cluster of those at the 10 minute interval. That’ll give you a 
couple of clues as to what kinds of warming you need to do. Look particularly 
for anything that sorts, groups, facets or uses function queries. If you find 
any of the latter, cross-check in your schema for whether docValues are enabled 
for those fields. While just warming will fix the immediate problem, there are 
a number of other gains if fields used for those operations are docValues=true. 
I’d still autowarm them to get them into memory, but building the expensive 
on-heap structures will be avoided. WARNING: if you change the value, you must 
re-index everything into an empty index, NOT just re-index all the docs into an 
existing index.

The above requires that you be logging at INFO level. If you’re not, set 
<slowQueryThresholdMillis> to 75 or so. That will log all queries that take 
longer than 75 ms to log at WARN level. Then do the same analysis.

Good luck!
Erick

> On Jan 30, 2020, at 4:26 AM, Karl Stoney 
> <karl.sto...@autotrader.co.uk.INVALID> wrote:
> 
> Hey Erick,
> Firstly - thank you so much for your detailed response - it is really 
> appreciated!
> Unfortunately some of the context of my original message was lost in because 
> the screenshots weren't there.
> The additional latency spike does absolutely result in a poor user experience 
> for us, some of our legacy applications hit solr quite a few times in order 
> to render the client experience so the compound effect can take a search 
> result render from 500ms to 3-4 seconds for a chunk of our users every 10 
> minutes.
> 
> I know I'll never get this down to 0, I'm just striving to make what changes 
> are feasible without going down too much of a rabbit hole.  Please note I'm 
> relatively new to Solr and have inherited a legacy stack __
> 
> The memory footprint is lower because I also reduced the size, not just the 
> warming value.  The warmup time is now sub 1second which I'm good with.
> 
> I am working through the static warming queries today with one of the teams, 
> so hopefully that will also have an impact.
> 
> I will look at the docValues as well.
> 
> Thanks again
> Karl
> 
> 
> On 30/01/2020, 00:24, "Erick Erickson" <erickerick...@gmail.com> wrote:
> 
>    Autowarming is significantly misunderstood. One of it's purposes in “the 
> bad old days” was to rebuild very expensive on-heap structures for 
> searching/sorting/grouping/and function queries.
> 
>    These are exactly what docValues are designed to make much, much faster.
> 
>    If you are still using spinning disks, the other benefit of warming 
> queries is to read the index off disk and into MMapDirectory space. SSDs make 
> this much faster too.
> 
> 
>    I often see two common mistakes:
>    1> no autowarming
>    2> excessive autowarming
> 
>    I usually recommend people start with, say autowarm counts in the 10-20 as 
> a start.
> 
>    One implication of what you’ve said so far is that the additional 9 
> seconds your old autowarming took didn’t get you any benefit either, so 
> putting it back isn’t indicated. I’m not quite clear why you say your memory 
> footprint is lower, it’s unrelated to autowarming unless you also decreased 
> your size parameter. If you’re saying that your reduced cache size hasn’t 
> changed your 95th percentile, I’d keep reducing it until it _did_ have a 
> measurable effect.
> 
>    The hit ratio is only loosely related to autowarming. So focusing on 
> autowarming as a way to improve the hit ratio is probably the wrong focus.
> 
>    So the first thing I’d do is make very, very sure that all the fields I 
> used for grouping/sorting/faceting/function operations are docValues. Second, 
> a static warming query that insured this rather relying on autowarming of the 
> queryResultCache to happen to exercise those functions would be another step. 
> NOTE: you don’t have to do all those operations on every field, just sorting 
> on each field would suffice. NOTE: as of Solr 7.6, you can add 
> “uninvertible=true” to your field types to insure that you have docValues 
> set, see: SOLR-12962
> 
>    And then I’d ask how much effort is smoothing out that kind of spike 
> worth? You certainly see it with monitoring tools, but do users notice at 
> all? If not, I wouldn’t spend all that much effort pursuing it…
> 
>    Best,
>    Erick
> 
> 
>> On Jan 29, 2020, at 4:48 PM, Karl Stoney 
>> <karl.sto...@autotrader.co.uk.INVALID> wrote:
>> 
>> So interestingly tweaking my filter cache i've got the warming time down to 
>> 1s (from 10!) and also reduced my memory footprint due to the smaller cache 
>> size.
>> 
>> However, I still get these latency spikes (these changes have made no 
>> difference to them).
>> 
>> So the theory about them being due to the warming being too intensive is 
>> wrong.
>> 
>> I know the images didn't load btw so when I say spike I mean p95th response 
>> time going from 50ms to 100-120ms momentarily.
>> ________________________________
>> From: Walter Underwood <wun...@wunderwood.org>
>> Sent: 29 January 2020 21:30
>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
>> Subject: Re: Solr Searcher 100% Latency Spike
>> 
>> Looking at the log, that takes one or two seconds after a complete batch 
>> reload (master/slave). So that is loading a cold index, all new files. This 
>> is not a big index, about a half million book titles.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7Cc67416e932d74851402d08d7a51ad3c3%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159406947278454&amp;sdata=E1YkJlFTDtQPSkC9%2BNHft%2FjqkuTFXaz0BKO5RxahV3w%3D&amp;reserved=0
>>   (my blog)
>> 
>>> On Jan 29, 2020, at 1:21 PM, Karl Stoney 
>>> <karl.sto...@autotrader.co.uk.INVALID> wrote:
>>> 
>>> Out of curiosity, could you define "fast"?
>>> I'm wondering what sort of figures people target their searcher warm time at
>>> ________________________________
>>> From: Walter Underwood <wun...@wunderwood.org>
>>> Sent: 29 January 2020 21:13
>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
>>> Subject: Re: Solr Searcher 100% Latency Spike
>>> 
>>> I use a static set of warming queries, about 20 of them. That is fast and 
>>> gets a decent amount of the index into file buffers. Your top queries won’t 
>>> change much unless you have a news site or a seasonal business.
>>> 
>>> Like this:
>>> 
>>>  <listener event="newSearcher" class="solr.QuerySenderListener">
>>>    <arr name="queries">
>>>      <lst>
>>>        <!-- Top non-numeric query words from August 2011 rush -->
>>>        <str name="q">introduction</str>
>>>        <str name="q">intermediate</str>
>>>        <str name="q">fundamentals</str>
>>>        <str name="q">understanding</str>
>>>        <str name="q">introductory</str>
>>>        <str name="q">precalculus</str>
>>>        <str name="q">foundations</str>
>>>        <str name="q">microeconomics</str>
>>>        <str name="q">microbiology</str>
>>>        <str name="q">macroeconomics</str>
>>>        <str name="q">discovering</str>
>>>        <str name="q">international</str>
>>>        <str name="q">mathematics</str>
>>>        <str name="q">organizational</str>
>>>        <str name="q">criminology</str>
>>>        <str name="q">developmental</str>
>>>        <str name="q">engineering</str>
>>>      </lst>
>>>    </arr>
>>>  </listener>
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7Cc67416e932d74851402d08d7a51ad3c3%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159406947278454&amp;sdata=E1YkJlFTDtQPSkC9%2BNHft%2FjqkuTFXaz0BKO5RxahV3w%3D&amp;reserved=0
>>>   (my blog)
>>> 
>>>> On Jan 29, 2020, at 1:01 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>>>> 
>>>> On 1/29/2020 12:44 PM, Karl Stoney wrote:
>>>>> Looking for a bit of support here.  When we soft commit (every 10 
>>>>> minutes), we get a latency spike that means response times for solr are 
>>>>> loosely double, as you can see in this screenshot:
>>>> 
>>>> Attachments almost never make it to the list.  We cannot see any of your 
>>>> screenshots.
>>>> 
>>>>> They do correlate to filterCache warmup, which seem to take between 10s 
>>>>> and 30s:
>>>>> We don't have any other caches enabled, due to the high level of 
>>>>> cardinality of the queries.
>>>>> The spikes are specifically on /select
>>>>> We have the following autowarm configuration for the filterCache:
>>>>>      <filterCache class="solr.FastLRUCache"
>>>>>                   size="8192"
>>>>>                   initialSize="8192"
>>>>>                   cleanupThread="true"
>>>>>                   autowarmCount="900"/>
>>>> 
>>>> Autowarm, especially on filterCache, can be an extremely lengthy process.  
>>>> What Solr must do in order to warm the cache here is execute up to 900 
>>>> queries, sequentially, on the new index.  That can take a lot of time and 
>>>> use a lot of resources like CPU and I/O.
>>>> 
>>>> In order to reduce the impact of cache warming, I had to reduce my own 
>>>> autowarmCount on the filterCache to 4.
>>>> 
>>>> Thanks,
>>>> Shawn
>>> 
>>> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 
>>> 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England 
>>> No. 9439967). This email and any files transmitted with it are confidential 
>>> and may be legally privileged, and intended solely for the use of the 
>>> individual or entity to whom they are addressed. If you have received this 
>>> email in error please notify the sender. This email message has been swept 
>>> for the presence of computer viruses.
>> 
>> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
>> Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England 
>> No. 9439967). This email and any files transmitted with it are confidential 
>> and may be legally privileged, and intended solely for the use of the 
>> individual or entity to whom they are addressed. If you have received this 
>> email in error please notify the sender. This email message has been swept 
>> for the presence of computer viruses.
> 
> 
> 
> This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
> Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
> 9439967). This email and any files transmitted with it are confidential and 
> may be legally privileged, and intended solely for the use of the individual 
> or entity to whom they are addressed. If you have received this email in 
> error please notify the sender. This email message has been swept for the 
> presence of computer viruses.

Re: Solr Searcher 100% Latency Spike

Reply via email to