Hello Jan, thanks for your reply!
I'm not very experienced with Cache settings on solr, this is the first time
I'm setting it up myself.
These are the settings I was able to find on our solrconfig.xml
<filterCache class="solr.FastLRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<queryResultCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<documentCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
<cache name="perSegFilter"
class="solr.search.LRUCache"
size="10"
initialSize="0"
autowarmCount="10"
regenerator="solr.NoOpRegenerator" />
In the meantime, I'll investigate about cachin, thanks again!
MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT
[email protected] | +54 11-6357-2143
-----Original Message-----
From: Jan Høydahl <[email protected]>
Sent: Thursday, December 1, 2022 10:11 PM
To: [email protected]
Subject: Re: Very High CPU when indexing
What are your cache settings? Are you using autoWarmCount or explicit cache
warming? It could be a source of long commit times.
Jan
> 1. des. 2022 kl. 22:35 skrev Matias Laino <[email protected]>:
>
>
> I've tried with multiple different autosoft commit and auto commit
> configurations, and it always takes 2:30 - 3 minutes to get the records
> available on search, CPU is being pretty good since I upgraded, and memory
> should be plenty unless I'm mistaken, I'm lost at this point.
>
> Any help will be really appreciated
>
> MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT
> [email protected] | +54 11-6357-2143
>
>
> -----Original Message-----
> From: Matias Laino <[email protected]>
> Sent: Thursday, December 1, 2022 1:11 PM
> To: [email protected]
> Subject: RE: Very High CPU when indexing
>
> Hi Shawn, thanks again for the reply.
>
> I've tried increasing the memory to 32 gb and 16gb of ram heap with 8 cores,
> and even though I still see peaks of 300% CPU on the solr process it can
> handle it (solr doesn't go down).
> But, I've tried several different configurations for the auto commit and soft
> commit and results always take a few minutes to show up on search, which is
> really unacceptable for us, I'm not sure how to proceed now.
>
> I've looked at the cores and for example of the collection I'm testing
> against right now, I see these values:
>
> Core 1:
> Num Docs:4806841
> Max Doc:4845793
> Heap Memory Usage:387392
> Core 2:
> Num Docs:4810159
> Max Doc:4849229
> Heap Memory Usage:450008
>
> Other collections look fairly similar, except for this one:
>
> Preview Core1:
> Num Docs:5774937
> Max Doc:5832482
> Heap Memory Usage:407424
>
> Preview Core2:
> Num Docs:5774937
> Max Doc:5833942
> Heap Memory Usage:463632
>
> Preview Core 3:
> Num Docs:5778245
> Max Doc:5790174
> Heap Memory Usage:480672
>
> For some reason, the "Preview Collection" has 3 shards instead of 2 like it
> was before... maybe that could be related? The collection overview say shards
> 2 and replication factor 2.
>
> As additional info, Zookeeper is running on it's own server and solr is the
> only thing running on that server, aside some system processes.
>
> Thanks again!
>
> MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT
> [email protected] | +54 11-6357-2143
>
>
> -----Original Message-----
> From: Shawn Heisey <[email protected]>
> Sent: Thursday, December 1, 2022 1:07 AM
> To: [email protected]
> Subject: Re: Very High CPU when indexing
>
> On 11/30/22 08:57, Matias Laino wrote:
>> Q: What is the total document count?
>> A: Based on the dashboard, it's Total #docs: 68.6mn each node (I'm
>> replicating the same data on both)
>
> Each core has a count. And here you can see what I was talking about with
> max doc compared to num docs.
>
> https://www.dropbox.com/s/jdgddn4ve5mluhr/core_doc_counts.png?dl=0
>
>> Q: but it would be great to have an on-disk size and document count
>> (max docs, not num docs) for each collection
>> A: I'm not sure where to get that from metrics, based on the cloud dashboard
>> it say the following by shard:
>> preview_s1r2: 1.9Gb
>> preview_s2r11: 1.9Gb
>> preview_s2r6: 1.9Gb
>> staging-d_s1r1: 1.8Gb
>> staging-d_s2r4: 1.8Gb
>> staging-a_s1r1: 1.7Gb
>> staging-a_s2r4: 1.7Gb
>> staging-c_s2r5: 1.6Gb
>> staging-c_s1r2: 1.6Gb
>> pre-prod_s1r1: 1.6Gb
>> pre-prod_s2r4: 1.6Gb
>> staging-b_s1r2: 1.5Gb
>> staging-b_s2r5: 1.5Gb
>> That is replicated on the other node.
>
> So you've got 22GB of data, and assuming Solr is the only thing running on
> the machine, only about 8GB of memory to cache it (total RAM of 16GB minus
> 8GB for the Solr heap). I would hope for at least of 12GB of cache for that,
> and more is always better. 8GB may not be enough. If you have other software
> running on the machine, it will be even less. Does ZK live on the same
> instance? If so, how much heap are you giving to that?
>
> Performance of a system is often perfectly fine up until some threshold, and
> once you throw just little bit more data in the mix so it goes over that
> threshold, performance drops drastically. That is how a small increase can
> bring a system to its knees.
>
> If you can upgrade the instance to one with more memory, that might also
> help, but I do think that the biggest problem is the autoSoftCommit setting.
> If you really can't make it at least two minutes, which is the value I would
> use, then set it as high as you can. 10 to 30 seconds, maybe.
>
> Thanks,
> Shawn
>