[ https://issues.apache.org/jira/browse/GEODE-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031450#comment-16031450 ]
Vahram Aharonyan commented on GEODE-2898: ----------------------------------------- HI Bruce, Recently we also face problems with this piece of code while using GEODE 1.0.0. Issues mainly popped up if packet drops happen in the environment. When there is a burst of packet drop in the environment, cluster nodes are not able to successfully communicate, accept thread is blocked and there are a lot of sockets pending in states SYN_RECV, SYN_SENT, ESTABLISHED, etc. Furthermore, GEODE cluster is not able to recover from this state even if network has come back to normal state. Our analysis shows that it is not related to having SSL handshake in accept thread itself. The root cause of this is the fact that there is no timeout set for the socket before performing SSL handshake. Experiments show that if s.setSoTimeout is invoked in org.apache.geode.internal.cache.tier.sockets.AcceptorImpl#accept just before configureServerSSLSocket, the issue is gone. We also tried to move configureServerSSLSocket to handshake thread pool before s.setSoTimeout(this.acceptTimeout) and in this case handshake threads were blocked after some time. Based on above-mentioned stuff we do agree that SSL handshake should be moved to handshake thread after setting timeout. However, this fix is not complete - the same situation occurs in org.apache.geode.distributed.internal.tcpserver.TcpServer#processRequest, where GEODE performs SSL handshake without setting socket timeout. In this case TcpServer threads could be blocked forever. Hence we need to have another socket timeout setting in processRequest method. Is this something that can be solved in scope of this issue? If yes should we reopen it? Or you prefer we create another issue and submit separate patch for this? Thanks, Vahram. > A non-responsive SSL client can block a server's "acceptor" thread > ------------------------------------------------------------------ > > Key: GEODE-2898 > URL: https://issues.apache.org/jira/browse/GEODE-2898 > Project: Geode > Issue Type: Bug > Components: client/server > Reporter: Bruce Schuchardt > Fix For: 1.2.0 > > > During the handoff to the handshake thread pool the accept thread can be > blocked in the SSL handshake. The SSL handshake should be moved to the > handshake thread pool. The goal is allow the server to reject clients that > haven't finished the handshake in the allotted time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)