All, It seems that ContainerBackgroundProcessor can die and end up silently destabilizing things.
I got this in a development environment today: Exception in thread "ContainerBackgroundProcessor[StandardEngine[Catalina]]" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300) at java.lang.StringCoding.encode(StringCoding.java:344) at java.lang.String.getBytes(String.java:916) at java.io.UnixFileSystem.getLastModifiedTime(Native Method) at java.io.File.lastModified(File.java:937) at org.apache.naming.resources.FileDirContext$FileResourceAttributes.get LastModified(FileDirContext.java:1008) at org.apache.naming.resources.FileDirContext$FileResourceAttributes.get Creation(FileDirContext.java:980) at org.apache.naming.resources.FileDirContext$FileResourceAttributes.<in it>(FileDirContext.java:925) at org.apache.naming.resources.FileDirContext.doGetAttributes(FileDirCon text.java:403) at org.apache.naming.resources.BaseDirContext.getAttributes(BaseDirConte xt.java:1137) at org.apache.naming.resources.BaseDirContext.getAttributes(BaseDirConte xt.java:1090) at org.apache.naming.resources.ProxyDirContext.getAttributes(ProxyDirCon text.java:882) Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "ContainerBackgroundProcessor[StandardEngine[Catalina]]" Okay, this kind of thing sometimes happens in our development instance. But what I noticed is that the ContainerBackgroundProcessor thread has not died. That means that even if the OOME was transient and the JVM could recover from the failure, the background processor thread is dead and things like sessions will pile up until memory is truly exhausted. The problem is in the code for the runnable method. Somewhat simplified, it's just a loop to do stuff: while(!done) { try { sleep(); processChildren(); } catch (Throwable t) { ExceptionUtils.handleException(t); log("error", t); } } Although the stack trace doesn't show it, the above error clearly occurred in processChildren(). ExceptionUtils.handleException checks for two things that I think we might want to change: 1. If the exception is StackOverflowError, it silently ignores the error and continues. I think we should at least log something, probably at WARN level. 2. If the exception is VirtualMachineError, it gets re-thrown with no log. This skips the "log" call in the above code and so the only log will come from the VM's "unhandled exception" logger which may not go where you expect it to go. The exception propagates, and the thread's run() method finishes (escapes due to uncaught exception). After that, regardless of the recoverability of the situation (OOME), the background processor will not run and therefore no auto-reload applications will auto-reload, no sessions will ever die, etc. If we think that StackOverflowError is recoverable, why not OutOfMemory? What about other VirtualMachineErrors? -chris
signature.asc
Description: OpenPGP digital signature