Hello, Another peculiarity here, our six node (2 shards / 3 replica's) cluster is going crazy after a good part of the day has passed. It starts eating CPU for no good reason and its latency goes up. Grafana graphs show the problem really well
After restarting 2/6 nodes, there is also quite a distinction in the VisualVM monitor views, and the VisualVM CPU sampler reports (sorted on self time (CPU)). The busy nodes are deeply red in o.a.h.impl.io.AbstractSessionInputBuffer.fillBuffer (as usual), the restarted nodes are not. The real distinction between busy and calm nodes is that busy nodes all have o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as second to fillBuffer(), what are they doing?! Why? The calm nodes don't show this at all. Busy nodes all have o.a.l.codec stuff on top, restarted nodes don't. So, actually, i don't have a clue! Any, any ideas? Thanks, Markus Each replica is underpowered but performing really well after restart (and JVM warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index size 18 GB.