[ https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997572#comment-16997572 ]
Dawid Weiss commented on SOLR-13778: ------------------------------------ I looked at classes involved in this complex stack trace. It's quite a nightmare. :) There are various checks and exception-handling routines trying to figure out what went wrong and replying exceptions or wrapping them. For example: https://github.com/AdoptOpenJDK/openjdk-jdk11u/blob/381c817fa41d549420b1f3a173d9147aa7a679cd/src/java.base/share/classes/sun/security/ssl/TransportContext.java#L344-L360 {code} // send fatal alert // // If we haven't even started handshaking yet, or we are the recipient // of a fatal alert, no need to generate a fatal close alert. if (!recvFatalAlert && !isOutboundClosed() && !isBroken && (isNegotiated || handshakeContext != null)) { try { outputRecord.encodeAlert(Alert.Level.FATAL.level, alert.id); } catch (IOException ioe) { if (SSLLogger.isOn && SSLLogger.isOn("ssl")) { SSLLogger.warning( "Fatal: failed to send fatal alert " + alert, ioe); } closeReason.addSuppressed(ioe); } } {code} So it looks like the SSL code is trying to send an alert message over a socket that's been closed and fails miserably to do both. And it also looks like it's really sensitive to timing and operating system since some of the "socket close" handlers are done in object finalizers so they're naturally asynchronous to the main code. I'll try to reproduce this on a smaller piece of code - then it'll be easier to tell why this behaved different previously. My guess is that it's probably some other refactoring in the JDK that triggered this... I'm am half-convinced SSLException should be retriable... so many things can go wrong if the SSL layer is closed that I think it should be allowed to just try to re-establish SSL connection from scratch. But I'll try to provide an example of this happening on a smaller piece of code. Maybe we'll have a better understanding of what interaction can lead to this. > Windows JDK SSL Test Failure trend: SSLException: Software caused connection > abort: recv failed > ----------------------------------------------------------------------------------------------- > > Key: SOLR-13778 > URL: https://issues.apache.org/jira/browse/SOLR-13778 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Chris M. Hostetter > Priority: Major > Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip > > > Now that Uwe's jenkins build has been correctly reporting it's build results > for my [automated > reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick > up, I've noticed a pattern of failures that indicate a definite problem with > using SSL on Windows (even with java 11.0.4 > ) > The symptommatic stack traces all contain... > {noformat} > ... > [junit4] > Caused by: javax.net.ssl.SSLException: Software caused > connection abort: recv failed > [junit4] > at > java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127) > ... > [junit4] > Caused by: java.net.SocketException: Software caused > connection abort: recv failed > [junit4] > at > java.base/java.net.SocketInputStream.socketRead0(Native Method) > ... > {noformat} > I suspect this may be related to > [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete > evidence to back this up. > I'll post some details of my analysis in comments... -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org