Hi,

We've had a strange mishap with a solr cloud cluster (version 4.5.1) where
we observed high search latency. The problem appears to develop over
several hours until such point where the entire cluster stopped responding
properly.

After investigation we found that the number of threads (both solr and
jetty) gradually rose over several hours until it hit a the maximum allowed
at which point the cluster stopped responding properly. After restarting
several nodes the number of threads dropped and the cluster started
responding again.
We've examined nodes that were not restarted and found a high number of
CLOSE_WAIT sockets held by the solr process; these sockets were using a
random local port and 8983 remote port - meaning they were outgoing
connections. a thread dump did not show a large number of solr threads and
we were unable to determine which thread(s) is holding these sockets.

has anyone else encountered such a situation?

Regards,
Avishai

Reply via email to