single node causing cluster-wide outage

Avishai Ish-Shalom Wed, 12 Mar 2014 14:07:33 -0700

Hi all!

After upgrading to Solr 4.6.1 we encountered a situation where a cluster
outage was traced to a single node misbehaving, after restarting the node
the cluster immediately returned to normal operation.
The bad node had ~420 threads locked on FastLRUCache and most
httpshardexecutor threads were waiting on apache commons http futures.


Has anyone encountered such a situation? what can we do to prevent
misbehaving nodes from bringing down the entire cluster?

Cheers,
Avishai

single node causing cluster-wide outage

Reply via email to