Re: 6.6 cloud starting to eat CPU after 8+ hours

Mikhail Khludnev Wed, 19 Jul 2017 05:18:55 -0700

>
> The real distinction between busy and calm nodes is that busy nodes all
> have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as
> second to fillBuffer(), what are they doing?



Can you expose the stack deeper?
Can they start to sync shards due to some reason?

On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hello,
>
> Another peculiarity here, our six node (2 shards / 3 replica's) cluster is
> going crazy after a good part of the day has passed. It starts eating CPU
> for no good reason and its latency goes up. Grafana graphs show the problem
> really well
>
> After restarting 2/6 nodes, there is also quite a distinction in the
> VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on
> self time (CPU)). The busy nodes are deeply red in o.a.h.impl.io.
> AbstractSessionInputBuffer.fillBuffer (as usual), the restarted nodes are
> not.
>
> The real distinction between busy and calm nodes is that busy nodes all
> have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as
> second to fillBuffer(), what are they doing?! Why? The calm nodes don't
> show this at all. Busy nodes all have o.a.l.codec stuff on top, restarted
> nodes don't.
>
> So, actually, i don't have a clue! Any, any ideas?
>
> Thanks,
> Markus
>
> Each replica is underpowered but performing really well after restart (and
> JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index size
> 18 GB.
>



-- 
Sincerely yours
Mikhail Khludnev

Re: 6.6 cloud starting to eat CPU after 8+ hours

Reply via email to