All,

It seems that ContainerBackgroundProcessor can die and end up silently
destabilizing things.

I got this in a development environment today:

Exception in thread
"ContainerBackgroundProcessor[StandardEngine[Catalina]]"
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at
java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
        at java.lang.StringCoding.encode(StringCoding.java:344)
        at java.lang.String.getBytes(String.java:916)
        at java.io.UnixFileSystem.getLastModifiedTime(Native Method)
        at java.io.File.lastModified(File.java:937)
        at
org.apache.naming.resources.FileDirContext$FileResourceAttributes.get
LastModified(FileDirContext.java:1008)
        at
org.apache.naming.resources.FileDirContext$FileResourceAttributes.get
Creation(FileDirContext.java:980)
        at
org.apache.naming.resources.FileDirContext$FileResourceAttributes.<in
it>(FileDirContext.java:925)
        at
org.apache.naming.resources.FileDirContext.doGetAttributes(FileDirCon
text.java:403)
        at
org.apache.naming.resources.BaseDirContext.getAttributes(BaseDirConte
xt.java:1137)
        at
org.apache.naming.resources.BaseDirContext.getAttributes(BaseDirConte
xt.java:1090)
        at
org.apache.naming.resources.ProxyDirContext.getAttributes(ProxyDirCon
text.java:882)

Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread
"ContainerBackgroundProcessor[StandardEngine[Catalina]]"

Okay, this kind of thing sometimes happens in our development instance.
But what I noticed is that the ContainerBackgroundProcessor thread has
not died.

That means that even if the OOME was transient and the JVM could recover
from the failure, the background processor thread is dead and things
like sessions will pile up until memory is truly exhausted.

The problem is in the code for the runnable method. Somewhat simplified,
it's just a loop to do stuff:

while(!done) {
  try {
    sleep();
    processChildren();
  } catch (Throwable t) {
    ExceptionUtils.handleException(t);
    log("error", t);
  }
}

Although the stack trace doesn't show it, the above error clearly
occurred in processChildren().

ExceptionUtils.handleException checks for two things that I think we
might want to change:

1. If the exception is StackOverflowError, it silently ignores the error
and continues. I think we should at least log something, probably at
WARN level.

2. If the exception is VirtualMachineError, it gets re-thrown with no
log. This skips the "log" call in the above code and so the only log
will come from the VM's "unhandled exception" logger which may not go
where you expect it to go. The exception propagates, and the thread's
run() method finishes (escapes due to uncaught exception). After that,
regardless of the recoverability of the situation (OOME), the background
processor will not run and therefore no auto-reload applications will
auto-reload, no sessions will ever die, etc.

If we think that StackOverflowError is recoverable, why not OutOfMemory?
What about other VirtualMachineErrors?

-chris

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to