Hello,

Another peculiarity here, our six node (2 shards / 3 replica's) cluster is 
going crazy after a good part of the day has passed. It starts eating CPU for 
no good reason and its latency goes up. Grafana graphs show the problem really 
well

After restarting 2/6 nodes, there is also quite a distinction in the VisualVM 
monitor views, and the VisualVM CPU sampler reports (sorted on self time 
(CPU)). The busy nodes are deeply red in 
o.a.h.impl.io.AbstractSessionInputBuffer.fillBuffer (as usual), the restarted 
nodes are not.

The real distinction between busy and calm nodes is that busy nodes all have 
o.a.l.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms() as second to 
fillBuffer(), what are they doing?! Why? The calm nodes don't show this at all. 
Busy nodes all have o.a.l.codec stuff on top, restarted nodes don't.

So, actually, i don't have a clue! Any, any ideas? 

Thanks,
Markus

Each replica is underpowered but performing really well after restart (and JVM 
warmup), 4 CPU's, 900M heap, 8 GB RAM, maxDoc 2.8 million, index size 18 GB.

Reply via email to