Re: CPU hangs at LeapFrogScorer.advanceToNextDoc() under high load

Erick Erickson Sun, 10 Jul 2016 12:20:06 -0700

Not being able to reproduce this locally makes it tough. What I usually
do at that point is start looking at the environment.


> Are the JVMs identical?
> Are the memory settings comparable?
> Have you looked at GC activity? Sometimes what's really happening
   is that the method in question is triggering excessive time in
   GC. Shot in the dark....
> Did you pull down the identical index from prod locally? Or on a shard?
> Usually the first thing I'd do is take out my customizations, but on a
   prod system that's unlikely.
> Op system comparable?
> GC settings comparable?
> when you say jmeter I'm assuming you're using real user queries on
   data indexed as you do in prod personally I'd just copy the
   index from one of the nodes that exhibits this problem.

For the harsher tests (i.e. removing customizations) I've sometimes had
good results by mirroring the prod system (or a portion thereof) on any
kind of identical hardware I can lay my hands on and splitting the incoming
live traffic to my test system... where I can "just try stuff" without
impacting
the prod traffic. Of course one _should_ be able to do that with
jmeter...

Good luck, these are the most frustrating types of problems.

Erick


On Sun, Jul 10, 2016 at 3:25 AM, Stefan Moises <moi...@shoptimax.de> wrote:

> Hi,
>
> we are experiencing problems on our live system, we use a single Solr
> server with 7 live cores and as soon as there is some traffic on the
> website (Solr is used for filtering a Ecommerce Site with filters on
> category lists and of course for searching), all available CPUs (no matter
> how many we assign to the Solr node) go up to 100% and never go down again.
>
> I've stared on many thread dumps etc. over the last days and every time,
> the most time consuming thread (which seems to "hang up" forever) is
> Lucene's LeapFrogScorer.advanceToNextDoc() method. Here is a profiler
> snapshop when the CPU is at 100%:
>
> We are still on Solr 4.8. since we have some plugins extending the
> JoinQParser so that we can join child docs to parent docs to handle product
> variants in the shop. Therefore we also have our own DirectUpdateHandler
> plugin for indexing the documents so that always stacks of a parent doc and
> its variants/childs are added in a block.
>
> May that changed indexing cause the LeapFrogScorer to get a problem with
> calculating scores? Or does anybody have an idea what else might be causing
> this?
>
> Unfortunately it only happens on the live system, I can't reproduce it on
> my local test system, altough I am emulating some example requests with a
> JMeter setup...
>
> Thanks for any hints!!
>
> Best regards,
>
> Stefan
>
> --
> --
> ************************************
> Stefan Moises
> Manager Research & Development
> shoptimax GmbH
> Ulmenstraße 52 H
> 90443 Nürnberg
> Tel.: 0911/25566-0
> Fax: 0911/25566-29moises@shoptimax.dehttp://www.shoptimax.de
>
> Geschäftsführung: Friedrich Schreieck
> Ust.-IdNr.: DE 814340642
> Amtsgericht Nürnberg HRB 21703
>
> ************************************
>
>

Re: CPU hangs at LeapFrogScorer.advanceToNextDoc() under high load

Reply via email to