Mark, On 6/22/16 9:06 AM, Mark Thomas wrote: > On 22/06/2016 14:01, Rainer Jung wrote: >> Hi Mark, >> >> Am 22.06.2016 um 13:20 schrieb Mark Thomas: >>> A while ago I observed unexpected APR_EGENERAL errors being returned >>> when performing SSL reads. I was unable to identify the root cause but I >>> did discover that if those errors were treated as EAGAIN, processing >>> continued normally. As a result, I committed [1]. >>> >>> A report on the users list [2] has highlighted that, in some >>> circumstances at least, [1] is the wrong thing to do. I have therefore >>> investigated the circumstances that led to [1] further. The relevant >>> code is [3]. >>> >>> With some local debug logging I have discovered that in the case [1] was >>> trying to address the result of the SSL_read call at [3] is as follows: >>> s == -1 >>> i == 5 (SSL_ERROR_SYSCALL) >>> rv = 730035 >>> >>> Subtracting the 720000 offset from rv gives the OS error as 10035 which >>> [4] gives as WSAEWOULDBLOCK which looks exactly like EAGAIN to me. >>> >>> Based on the above, my conclusion is that [2] was caused by some other >>> windows error which was incorrectly handled as EAGAIN. >>> >>> Therefore, I'd like to propose something along the following lines for >>> tc-native: >>> Index: src/sslnetwork.c >>> =================================================================== >>> --- src/sslnetwork.c (revision 1749592) >>> +++ src/sslnetwork.c (working copy) >>> @@ -427,7 +427,11 @@ >>> con->shutdown_type = SSL_SHUTDOWN_TYPE_STANDARD; >>> return APR_EOF; >>> } >>> -#if !defined(_WIN32) >>> +#if defined(_WIN32) >>> + else if (rv == 730035 && timeout == 0) { >>> + return APR_EAGAIN; >>> + } >>> +#else >>> else if (APR_STATUS_IS_EINTR(rv)) { >>> /* Interrupted by signal >>> */ >> >> ... and reverting [1] ? > > Yes. > >>> I'd appreciate some review of this change as I know C is not my strong >>> point. The hard-coded value for the test of rv looks wrong to me. Is >>> there a better way to do this? Any other review comments? >>> >>> Obviously some changes will be required on the Tomcat side as well. I'm >>> still looking at those as I think I have discovered another issue that >>> was masked by [1]. >> >> File apr_errno.h contains a macro >> >> APR_STATUS_IS_EAGAIN(s) >> >> which in the WIN32 case is defined as: >> >> ((s) == APR_EAGAIN \ >> || (s) == APR_OS_START_SYSERR + ERROR_NO_DATA \ >> || (s) == APR_OS_START_SYSERR + ERROR_NO_PROC_SLOTS \ >> || (s) == APR_OS_START_SYSERR + ERROR_NESTING_NOT_ALLOWED \ >> || (s) == APR_OS_START_SYSERR + ERROR_MAX_THRDS_REACHED \ >> || (s) == APR_OS_START_SYSERR + ERROR_LOCK_VIOLATION \ >> || (s) == APR_OS_START_SYSERR + WSAEWOULDBLOCK) >> >> so one could use "APR_STATUS_IS_EAGAIN(rv)" instead of "rv == 730035" as >> a condition. This would be broader than your suggestion. Whether it >> still separates the case the user observed from the one you want to >> handle with could maybe be tested by the user in [2]? > > That is much better. > >> Finally if using the macro, one could also likely just drop the "#if >> defined(_WIN32)" before the EAGAIN test, because the macro is also >> defined for other platforms. For Unix/Linux either as "((s) == >> APR_EAGAIN)" or "((s) == APR_EAGAIN || (s) == EWOULDBLOCK)". > > Makes sense. > > Thanks for the review. I'll adjust the patch accordingly and commit.
I was gong to make a very similar suggestion; glad Rainer spoke up. That magic number looked like a landmine ready to explode, or, worse yet, be one of those lines of code nobody ever wants to touch in the future because they are scared to break something... but nobody knows what it's there for :) Don't sell your C skills too short, Mark ;) -chris
signature.asc
Description: OpenPGP digital signature