We have a Solr cloud (4.7) consisting of 5 servers. At some point we noticed that one of the servers had a very high CPU and was not responding. A few minutes later, the other 4 servers were responding very slowly. A restart was required. Looking at the Solr logs, we mainly saw symptoms, i.e. errors that happened a few minutes after the high CPU started (connection timeouts etc).
When looking at the javacore of the problematic server, we found that one thread was waiting on a log4j method, and 538 threads (!) were waiting on the same lock. The thread's stack trace is: 3XMTHREADINFO "http-bio-8443-exec-37460" J9VMThread:0x00007FED88044600, j9thread_t:0x00007FE73E4D04A0, java/lang/Thread:0x00007FF267995468, state:CW, prio=5 3XMJAVALTHREAD (java/lang/Thread getId:0xA1AC9, isDaemon:true) 3XMTHREADINFO1 (native thread ID:0x17F8, native priority:0x5, native policy:UNKNOWN) 3XMTHREADINFO2 (native stack address range from:0x00007FEA9487B000, to:0x00007FEA948BC000, size:0x41000) 3XMCPUTIME CPU usage total: 55.216798962 secs 3XMHEAPALLOC Heap bytes allocated since last GC cycle=3176200 (0x307708) 3XMTHREADINFO3 Java callstack: 4XESTACKTRACE at org/apache/log4j/Category.callAppenders(Category.java:204) 4XESTACKTRACE at org/apache/log4j/Category.forcedLog(Category.java:391(Compiled Code)) 4XESTACKTRACE at org/apache/log4j/Category.log(Category.java:856(Compiled Code)) 4XESTACKTRACE at org/slf4j/impl/Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:498) 4XESTACKTRACE at org/apache/solr/common/SolrException.log(SolrException.java:109) 4XESTACKTRACE at org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:153(Compiled Code)) 4XESTACKTRACE at org/apache/solr/core/SolrCore.execute(SolrCore.java:1916(Compiled Code)) 4XESTACKTRACE at org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:780(Compiled Code)) 4XESTACKTRACE at org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427(Compiled Code)) 4XESTACKTRACE at org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217(Compiled ... Our logging is done to a local file. After searching the web, we found similar problems: https://bz.apache.org/bugzilla/show_bug.cgi?id=50213 https://bz.apache.org/bugzilla/show_bug.cgi?id=51047 https://dzone.com/articles/log4j-thread-deadlock-case However, seems like the fixes were made for log4j 2.X. And Solr uses log4j 1.2.X (even the new Solr 5.3.0, from what I've seen). Is this a known problem? Is it possible to upgrade Solr log4j version to 2.X? Thanks, Arnon