[Bug 62626] Tomcat 9.0.10 APR/Native crashes

bugzilla Thu, 16 Aug 2018 08:06:39 -0700

https://bz.apache.org/bugzilla/show_bug.cgi?id=62626


--- Comment #6 from Christopher Schultz <ch...@christopherschultz.net> ---
(In reply to jan.pfeifer from comment #5)
> With NIO2 + OpenSSL no crash so far. No problems detected. Except it comes
> with new set of client abort IO exceptions:
> 
> "The specified network name is no longer available" and "An existing
> connection was forcibly closed by the remote host".

This is actually what I was hoping to see happen.

tcnative is ... the bare minimum code to get things working; it doesn't have a
huge amount of robustness especially when it comes to all the weird
possibilities when a network connection is involved.

The fact that NIO is telling you that things are in a bad state means that it's
actually true: the bug in tcnative that causes this crash is merely bad
error-handling and not just tcnative not being able to push bytes around. That
bad error-handling can probably be fixed with enough analysis and maybe some
trial-and-error testing on your end (if you are willing to be a guinea pig).

But the fact remains: something "bad" is really happening with your clients
and/or your network and that is the true source of the problem.

> Are there any generic way to detect it? I was forced to enclose
> stream.write() to its own try/catch and swallow any IOException it produces.

That's kind of par for the course, isn't it? Any IO operation can fail for any
reason. If you don't catch it and handle it, Tomcat will (eventually) log it.
What else did you expect to happen?

> To our problem: I cant see any other way how to find culprit on my side.
> Some kind of stress test would probably trigger it, but with the same result
> we have now. I guess it is still related to image serving part as it is only
> "complex" part of webapp. Crawlers and "image thieves" spams it a lot. There
> can be let say 50 request per second, sometimes for same image, sometimes
> all ends with some kind of "client abort exception".

A "client abort" exception happens when the client makes a request and then
hangs-up the phone before you complete the response. It's fairly common and I
wouldn't really expect tcnative to crash under that circumstance, but it's
certainly possible.

In general, ClientAbortExceptions can safely be completely ignored, so don't
worry too much about the fact that they are happening. On a busy site with
large responses (e.g. images) I'd expect lots of them.

Let's keep this BZ issue open and continue to talk about the Java + native
stack traces you have and try to get this resolved. ERROR_ACCESS_VIOLATION is
the same as a segfault (basically null-pointer exception and/or
use-after-free). In this case, it looks like dereferencing a pointer which
probably isn't a pointer (its value is way too low to be valid).

Can you try one more thing for me? Downgrade to Java 1.8.0_whatever and revert
your configuration to use APR+tcnative again and let us know if the crashes
(actual segfaults that kill the JVM) continue.

But let's take your ClientAbortException stuff onto the user list to see if we
can't find maybe some more efficient ways to get your images generated. That
synchronized block looks suspicious to me, and creating new threads all the
time is likely to lead to instability.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

[Bug 62626] Tomcat 9.0.10 APR/Native crashes

Reply via email to