On 9/1/2015 12:53 AM, Arnon Yogev wrote: > We have a Solr cloud (4.7) consisting of 5 servers. > At some point we noticed that one of the servers had a very high CPU and > was not responding. A few minutes later, the other 4 servers were > responding very slowly. A restart was required. > Looking at the Solr logs, we mainly saw symptoms, i.e. errors that happened > a few minutes after the high CPU started (connection timeouts etc). > > When looking at the javacore of the problematic server, we found that one > thread was waiting on a log4j method, and 538 threads (!) were waiting on > the same lock. > The thread's stack trace is:
<snip> > Our logging is done to a local file. > After searching the web, we found similar problems: > https://bz.apache.org/bugzilla/show_bug.cgi?id=50213 > https://bz.apache.org/bugzilla/show_bug.cgi?id=51047 > https://dzone.com/articles/log4j-thread-deadlock-case > > However, seems like the fixes were made for log4j 2.X. And Solr uses log4j > 1.2.X (even the new Solr 5.3.0, from what I've seen). > > Is this a known problem? > Is it possible to upgrade Solr log4j version to 2.X? We have an issue to upgrde log4j. I know because I'm the one that opened it. I haven't had any time to work on it, and until I can actually research it, I am fairly clueless about how to proceed. https://issues.apache.org/jira/browse/SOLR-7887 What container are you running in? The stacktrace was not complete enough for me to figure that out myself. What is that container's maxThreads setting? The thread name including "http-bio-8443" makes me thing it's probably Tomcat, not the jetty included in the example found in the download, which makes the maxThreads parameter particularly relevant. I do not see any mention of locks in the information that you included, either held or waiting. If a lot of threads are waiting on a single lock, then you should be able to find which thread is holding that lock ... and I don't think it will be the thread that you mentioned. Thanks, Shawn