> > The real distinction between busy and calm nodes is that busy nodes all > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as > second to fillBuffer(), what are they doing?
Can you expose the stack deeper? Can they start to sync shards due to some reason? On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma <markus.jel...@openindex.io> wrote: > Hello, > > Another peculiarity here, our six node (2 shards / 3 replica's) cluster is > going crazy after a good part of the day has passed. It starts eating CPU > for no good reason and its latency goes up. Grafana graphs show the problem > really well > > After restarting 2/6 nodes, there is also quite a distinction in the > VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on > self time (CPU)). The busy nodes are deeply red in o.a.h.impl.io. > AbstractSessionInputBuffer.fillBuffer (as usual), the restarted nodes are > not. > > The real distinction between busy and calm nodes is that busy nodes all > have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as > second to fillBuffer(), what are they doing?! Why? The calm nodes don't > show this at all. Busy nodes all have o.a.l.codec stuff on top, restarted > nodes don't. > > So, actually, i don't have a clue! Any, any ideas? > > Thanks, > Markus > > Each replica is underpowered but performing really well after restart (and > JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index size > 18 GB. > -- Sincerely yours Mikhail Khludnev