Re: Periodically 100% cpu and high load/IO

2020-06-11 Thread Marvin Bredal Lillehaug
This may have been a bunch of simultaious queries, not index merging. Managed to get more thread dumps, and they all have severat threads like > "qtp385739920-1050156" #1050156 prio=5 os_prio=0 cpu=38864.53ms > elapsed=7414.80s tid=0x7f62080eb000 nid=0x5f8a runnable > [0x7f611748] >

Re: Periodically 100% cpu and high load/IO

2020-06-07 Thread Marvin Bredal Lillehaug
We have upgrading 8.5.2 on the way to production, so we'll see. We are running with default merge config, and based on the description on https://lucene.apache.org/solr/guide/8_5/taking-solr-to-production.html#dynamic-defaults-for-concurrentmergescheduler I don't understand why all cpus are maxed.

Re: Periodically 100% cpu and high load/IO

2020-06-07 Thread Phill Campbell
Can you switch to 8.5.2 and see if it still happens. In my testing of 8.5.1 I had one of my machines get really hot and bring the entire system to a crawl. What seemed to cause my issue was memory usage. I could give the JVM running Solr less heap and the problem wouldn’t manifest. I haven’t seen

Re: Periodically 100% cpu and high load/IO

2020-06-03 Thread Marvin Bredal Lillehaug
Yes, there are light/moderate indexing most of the time. The setup has NRT replicas. And the shards are around 45GB each. Index merging has been the hypothesis for some time, but we haven't dared to activate info stream logging. On Wed, Jun 3, 2020 at 2:34 PM Erick Erickson wrote: > One possibil

Re: Periodically 100% cpu and high load/IO

2020-06-03 Thread Erick Erickson
One possibility is merging index segments. When this happens, are you actively indexing? And are these NRT replicas or TLOG/PULL? If the latter, are your TLOG leaders on the affected machines? Best, Erick > On Jun 3, 2020, at 3:57 AM, Marvin Bredal Lillehaug > wrote: > > Hi, > We have a clus

Periodically 100% cpu and high load/IO

2020-06-03 Thread Marvin Bredal Lillehaug
Hi, We have a cluster with five Solr(8.5.1, Java 11) nodes, and sometimes one or two nodes has Solr running with 100% cpu on all cores, «load» over 400, and high IO. It usually lasts five to ten minutes, and the node is hardly responding. Does anyone have any experience with this type of behaviour?