We have a Solr cloud (4.7) consisting of 5 servers.
At some point we noticed that one of the servers had a very high CPU and
was not responding. A few minutes later, the other 4 servers were
responding very slowly. A restart was required.
Looking at the Solr logs, we mainly saw symptoms, i.e. errors that happened
a few minutes after the high CPU started (connection timeouts etc).

When looking at the javacore of the problematic server, we found that one
thread was waiting on a log4j method, and 538 threads (!) were waiting on
the same lock.
The thread's stack trace is:

3XMTHREADINFO      "http-bio-8443-exec-37460"
J9VMThread:0x00007FED88044600, j9thread_t:0x00007FE73E4D04A0,
java/lang/Thread:0x00007FF267995468, state:CW, prio=5

3XMJAVALTHREAD            (java/lang/Thread getId:0xA1AC9, isDaemon:true)

3XMTHREADINFO1            (native thread ID:0x17F8, native priority:0x5,
native policy:UNKNOWN)

3XMTHREADINFO2            (native stack address range
from:0x00007FEA9487B000, to:0x00007FEA948BC000, size:0x41000)

3XMCPUTIME               CPU usage total: 55.216798962 secs

3XMHEAPALLOC             Heap bytes allocated since last GC cycle=3176200
(0x307708)

3XMTHREADINFO3           Java callstack:

4XESTACKTRACE                at
org/apache/log4j/Category.callAppenders(Category.java:204)

4XESTACKTRACE                at
org/apache/log4j/Category.forcedLog(Category.java:391(Compiled Code))

4XESTACKTRACE                at
org/apache/log4j/Category.log(Category.java:856(Compiled Code))

4XESTACKTRACE                at
org/slf4j/impl/Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:498)

4XESTACKTRACE                at
org/apache/solr/common/SolrException.log(SolrException.java:109)

4XESTACKTRACE                at
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:153(Compiled
Code))

4XESTACKTRACE                at
org/apache/solr/core/SolrCore.execute(SolrCore.java:1916(Compiled Code))

4XESTACKTRACE                at
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:780(Compiled
Code))

4XESTACKTRACE                at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427(Compiled
Code))
4XESTACKTRACE                at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217(Compiled
...

Our logging is done to a local file.
After searching the web, we found similar problems:
https://bz.apache.org/bugzilla/show_bug.cgi?id=50213
https://bz.apache.org/bugzilla/show_bug.cgi?id=51047
https://dzone.com/articles/log4j-thread-deadlock-case

However, seems like the fixes were made for log4j 2.X. And Solr uses log4j
1.2.X (even the new Solr 5.3.0, from what I've seen).

Is this a known problem?
Is it possible to upgrade Solr log4j version to 2.X?

Thanks,
Arnon

Reply via email to