Re: SOLR uses too much CPU and GC is also weird on Windows server

Emir Arnautović Tue, 27 Oct 2020 07:54:03 -0700

Hi Jaan,
You can also check in admin console in caches the sizes of field* caches. That 
will tell you if some field needs docValues=true.


Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Oct 2020, at 14:36, Jaan Arjasepp <[email protected]> wrote:
> 
> Hi Erick,
> 
> Thanks for this information, I will look into it.
> Main changes were regarding parsing the results JSON got from solr, not the 
> queries or updates.
> 
> Jaan
> 
> P.S. configuration change about requestParser was not it.
> 
> 
> -----Original Message-----
> From: Erick Erickson <[email protected] 
> <mailto:[email protected]>> 
> Sent: 27 October 2020 15:03
> To: [email protected] <mailto:[email protected]>
> Subject: Re: SOLR uses too much CPU and GC is also weird on Windows server
> 
> Jean:
> 
> The basic search uses an “inverted index”, which is basically a list of terms 
> and the documents they appear in, e.g.
> my - 1, 4, 9, 12
> dog - 4, 8, 10
> 
> So the word “my” appears in docs 1, 4, 9 and 12, and “dog” appears in 4, 8, 
> 10. Makes it easy to search for my AND dog for instance, obviously both 
> appear in doc 4.
> 
> But that’s a lousy structure for faceting, where you have a list of documents 
> and are trying to find the terms it has to count them up. For that, you want 
> to “uninvert” the above structure,
> 1 - my
> 4 - my dog
> 8 - dog
> 9 - my
> 10 - dog
> 12 - my
> 
> From there, it’s easy to say “count the distinct terms for docs 1 and 4 and 
> put them in a bucket”, giving facet counts like 
> 
> my (2)
> dog (1)
> 
> If docValues=true, then the second structure is built at index time and 
> occupies memory at run time out in MMapDirectory space, i.e. _not_ on the 
> heap. 
> 
> If docValues=false, the second structure is built _on_ the heap when it’s 
> needed, adding to GC, memory pressure, CPU utilization etc.
> 
> So one theory is that when you upgraded your system (and you did completely 
> rebuild your corpus, right?) you inadvertently changed the docValues property 
> for one or more fields that you facet, group, sort, or use function queries 
> on and Solr is doing all the extra work of uninverting the field that it 
> didn’t have to before.
> 
> To answer that, you need to go through your schema and insure that 
> docValues=true is set for any field you facet, group, sort, or use function 
> queries on. If you do change this value, you need to blow away your index so 
> there are no segments and index all your documents again.
> 
> But that theory has problems:
> 1> why should Solr run for a while and then go crazy? It’d have to be 
> 1> that the query that
>    triggers uninversion is uncommon.
> 2> docValues defaults to true for simple types in recent schemas. 
> 2> Perhaps you pulled
>  over an old definition from your former schema?
> 
> 
> One other thing: you mention a bit of custom code you needed to change. I 
> always try to investigate that first. Is it possible to
> 1> reproduce the problem no a non-prod system
> 2> see what happens if you take the custom code out?
> 
> Best,
> Erick
> 
> 
>> On Oct 27, 2020, at 4:42 AM, Emir Arnautović <[email protected]> 
>> wrote:
>> 
>> Hi Jaan,
>> It can be several things:
>> caches
>> fieldCache/fieldValueCache - it can be that you you are missing doc values 
>> on some fields that are used for faceting/sorting/functions and that 
>> uninverted field structures are eating your memory. 
>> filterCache - you’ve changed setting for filter caches and set it to 
>> some large value heavy queries return a lot of documents facet on high 
>> cardinality fields deep pagination
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
>> Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 27 Oct 2020, at 08:48, Jaan Arjasepp <[email protected]> wrote:
>>> 
>>> Hello,
>>> 
>>> We have been using SOLR for quite some time. We used 6.0 and now we did a 
>>> little upgrade to our system and servers and we started to use 8.6.1.
>>> We use it on a Windows Server 2019.
>>> Java version is 11
>>> Basically using it in a default setting, except giving SOLR 2G of heap. It 
>>> used 512, but it ran out of memory and stopped responding. Not sure if it 
>>> was the issue. When older version, it managed fine with 512MB.
>>> SOLR is not in a cloud mode, but in solo mode as we use it internally and 
>>> it does not have too many request nor indexing actually.
>>> Document sizes are not big, I guess. We only use one core.
>>> Document stats are here:
>>> Num Docs: 3627341
>>> Max Doc: 4981019
>>> Heap Memory Usage: 434400
>>> Deleted Docs: 1353678
>>> Version: 15999036
>>> Segment Count: 30
>>> 
>>> The size of index is 2.66GB
>>> 
>>> While making upgrade we had to modify one field and a bit of code that uses 
>>> it. Thats basically it. It works.
>>> If needed more information about background of the system, I am happy to 
>>> help.
>>> 
>>> 
>>> But now to the issue I am having.
>>> If SOLR is started, at first 40-60 minutes it works just fine. CPU is not 
>>> high, heap usage seem normal. All is good, but then suddenly, the heap 
>>> usage goes crazy, going up and down, up and down and CPU rises to 50-60% of 
>>> the usage. Also I noticed over the weekend, when there are no writing 
>>> usage, the CPU remains low and decent. I can try it this weekend again to 
>>> see if and how this works out.
>>> Also it seems to me, that after 4-5 days of working like this, it stops 
>>> responding, but needs to be confirmed with more heap also.
>>> 
>>> Heap memory usage via JMX and jconsole -> 
>>> https://drive.google.com/file/d/1Zo3B_xFsrrt-WRaxW-0A0QMXDNscXYih/vie
>>> w?usp=sharing As you can see, it starts of normal, but then goes 
>>> crazy and it has been like this over night.
>>> 
>>> This is overall monitoring graphs, as you can see CPU is working hard 
>>> or hardly working. -> 
>>> https://drive.google.com/file/d/1_Gtz-Bi7LUrj8UZvKfmNMr-8gF_lM2Ra/vie
>>> w?usp=sharing VM summary can be found here -> 
>>> https://drive.google.com/file/d/1FvdCz0N5pFG1fmX_5OQ2855MVkaL048w/vie
>>> w?usp=sharing And finally to have better and quick overview of the 
>>> SOLR executing parameters that I have -> 
>>> https://drive.google.com/file/d/10VCtYDxflJcvb1aOoxt0u3Nb5JzTjrAI/vie
>>> w?usp=sharing
>>> 
>>> If you can point me what I have to do to make it work, then I appreciate it 
>>> a lot.
>>> 
>>> Thank you in advance.
>>> 
>>> Best regards,
>>> Jaan

Re: SOLR uses too much CPU and GC is also weird on Windows server

Reply via email to