It sounds like the distributed update deadlock issue. It’s fixed in 4.6.1 and 4.7.
- Mark http://about.me/markrmiller On Mar 6, 2014, at 3:10 PM, Avishai Ish-Shalom <avis...@fewbytes.com> wrote: > Hi, > > We've had a strange mishap with a solr cloud cluster (version 4.5.1) where > we observed high search latency. The problem appears to develop over > several hours until such point where the entire cluster stopped responding > properly. > > After investigation we found that the number of threads (both solr and > jetty) gradually rose over several hours until it hit a the maximum allowed > at which point the cluster stopped responding properly. After restarting > several nodes the number of threads dropped and the cluster started > responding again. > We've examined nodes that were not restarted and found a high number of > CLOSE_WAIT sockets held by the solr process; these sockets were using a > random local port and 8983 remote port - meaning they were outgoing > connections. a thread dump did not show a large number of solr threads and > we were unable to determine which thread(s) is holding these sockets. > > has anyone else encountered such a situation? > > Regards, > Avishai