Production Issue: SOLR node goes to non responsive , restart not helping at peak hours

Doss Thu, 05 Sep 2019 04:09:06 -0700

Hi,

We are using 3 node SOLR (7.0.1) cloud setup 1 node zookeeper ensemble.
Each system has 16CPUs, 90GB RAM (14GB HEAP), 130 cores (3 replicas NRT)
with index size ranging from 700MB to 20GB.


autoCommit - 10 minutes once
softCommit - 30 Sec Once

At peak time if a shard goes to recovery mode many other shards also going
to recovery mode in few minutes, which creates huge load (200+ load
average) and SOLR becomes non responsive. To fix this we are restarting the
node, again leader tries to correct the index by initiating replication,
which causes load again, and the node goes to non responsive state.

As soon as a node starts the replication process initiated for all 130
cores, is there any we control it, like one after the other?

Thanks,
Doss.

Production Issue: SOLR node goes to non responsive , restart not helping at peak hours

Reply via email to