Hi, We've had a strange mishap with a solr cloud cluster (version 4.5.1) where we observed high search latency. The problem appears to develop over several hours until such point where the entire cluster stopped responding properly.
After investigation we found that the number of threads (both solr and jetty) gradually rose over several hours until it hit a the maximum allowed at which point the cluster stopped responding properly. After restarting several nodes the number of threads dropped and the cluster started responding again. We've examined nodes that were not restarted and found a high number of CLOSE_WAIT sockets held by the solr process; these sockets were using a random local port and 8983 remote port - meaning they were outgoing connections. a thread dump did not show a large number of solr threads and we were unable to determine which thread(s) is holding these sockets. has anyone else encountered such a situation? Regards, Avishai