RE: Very High CPU when indexing

Matias Laino Fri, 02 Dec 2022 05:58:19 -0800

Hello Jan, thanks for your reply!

I'm not very experienced with Cache settings on solr, this is the first time 
I'm setting it up myself.


These are the settings I was able to find on our solrconfig.xml

<filterCache class="solr.FastLRUCache"
                 size="512"
                 initialSize="512"
                 autowarmCount="0"/>

<queryResultCache class="solr.LRUCache"
                     size="512"
                     initialSize="512"
                     autowarmCount="0"/>

<documentCache class="solr.LRUCache"
                   size="512"
                   initialSize="512"
                   autowarmCount="0"/>

<cache name="perSegFilter"
      class="solr.search.LRUCache"
      size="10"
      initialSize="0"
      autowarmCount="10"
      regenerator="solr.NoOpRegenerator" />

In the meantime, I'll investigate about cachin, thanks again!

MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT
[email protected] | +54 11-6357-2143


-----Original Message-----
From: Jan Høydahl <[email protected]> 
Sent: Thursday, December 1, 2022 10:11 PM
To: [email protected]
Subject: Re: Very High CPU when indexing

What are your cache settings? Are you using autoWarmCount or explicit cache 
warming? It could be a source of long commit times.

Jan

> 1. des. 2022 kl. 22:35 skrev Matias Laino <[email protected]>:
> 
> 
> I've tried with multiple different autosoft commit and auto commit 
> configurations, and it always takes 2:30 - 3 minutes to get the records 
> available on search, CPU is being pretty good since I upgraded, and memory 
> should be plenty unless I'm mistaken, I'm lost at this point.
> 
> Any help will be really appreciated
> 
> MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT 
> [email protected] | +54 11-6357-2143
> 
> 
> -----Original Message-----
> From: Matias Laino <[email protected]>
> Sent: Thursday, December 1, 2022 1:11 PM
> To: [email protected]
> Subject: RE: Very High CPU when indexing
> 
> Hi Shawn, thanks again for the reply.
> 
> I've tried increasing the memory to 32 gb and 16gb of ram heap with 8 cores, 
> and even though I still see peaks of 300% CPU on the solr process it can 
> handle it (solr doesn't go down).
> But, I've tried several different configurations for the auto commit and soft 
> commit and results always take a few minutes to show up on search, which is 
> really unacceptable for us, I'm not sure how to proceed now.
> 
> I've looked at the cores and for example of the collection I'm testing 
> against right now, I see these values:
> 
> Core 1: 
> Num Docs:4806841
> Max Doc:4845793
> Heap Memory Usage:387392
> Core 2:
> Num Docs:4810159
> Max Doc:4849229
> Heap Memory Usage:450008
> 
> Other collections look fairly similar, except for this one:
> 
> Preview Core1:
> Num Docs:5774937
> Max Doc:5832482
> Heap Memory Usage:407424
> 
> Preview Core2:
> Num Docs:5774937
> Max Doc:5833942
> Heap Memory Usage:463632
> 
> Preview Core 3:
> Num Docs:5778245
> Max Doc:5790174
> Heap Memory Usage:480672
> 
> For some reason, the "Preview Collection" has 3 shards instead of 2 like it 
> was before... maybe that could be related? The collection overview say shards 
> 2 and replication factor 2.
> 
> As additional info, Zookeeper is running on it's own server and solr is the 
> only thing running on that server, aside some system processes.
> 
> Thanks again! 
> 
> MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT 
> [email protected] | +54 11-6357-2143
> 
> 
> -----Original Message-----
> From: Shawn Heisey <[email protected]>
> Sent: Thursday, December 1, 2022 1:07 AM
> To: [email protected]
> Subject: Re: Very High CPU when indexing
> 
> On 11/30/22 08:57, Matias Laino wrote:
>> Q: What is the total document count?
>> A: Based on the dashboard, it's Total #docs: 68.6mn each node (I'm 
>> replicating the same data on both)
> 
> Each core has a count.  And here you can see what I was talking about with 
> max doc compared to num docs.
> 
> https://www.dropbox.com/s/jdgddn4ve5mluhr/core_doc_counts.png?dl=0
> 
>> Q: but it would be great to have an on-disk size and document count 
>> (max docs, not num docs) for each collection
>> A: I'm not sure where to get that from metrics, based on the cloud dashboard 
>> it say the following by shard:
>> preview_s1r2:  1.9Gb
>> preview_s2r11:  1.9Gb
>> preview_s2r6:  1.9Gb
>> staging-d_s1r1:  1.8Gb
>> staging-d_s2r4:  1.8Gb
>> staging-a_s1r1:  1.7Gb
>> staging-a_s2r4:  1.7Gb
>> staging-c_s2r5:  1.6Gb
>> staging-c_s1r2:  1.6Gb
>> pre-prod_s1r1:  1.6Gb
>> pre-prod_s2r4:  1.6Gb
>> staging-b_s1r2:  1.5Gb
>> staging-b_s2r5:  1.5Gb
>> That is replicated on the other node.
> 
> So you've got 22GB of data, and assuming Solr is the only thing running on 
> the machine, only about 8GB of memory to cache it (total RAM of 16GB minus 
> 8GB for the Solr heap).  I would hope for at least of 12GB of cache for that, 
> and more is always better. 8GB may not be enough.  If you have other software 
> running on the machine, it will be even less. Does ZK live on the same 
> instance?  If so, how much heap are you giving to that?
> 
> Performance of a system is often perfectly fine up until some threshold, and 
> once you throw just little bit more data in the mix so it goes over that 
> threshold, performance drops drastically. That is how a small increase can 
> bring a system to its knees.
> 
> If you can upgrade the instance to one with more memory, that might also 
> help, but I do think that the biggest problem is the autoSoftCommit setting.  
> If you really can't make it at least two minutes, which is the value I would 
> use, then set it as high as you can.  10 to 30 seconds, maybe.
> 
> Thanks,
> Shawn
>

RE: Very High CPU when indexing

Reply via email to