[ 
https://issues.apache.org/jira/browse/GEODE-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031450#comment-16031450
 ] 

Vahram Aharonyan commented on GEODE-2898:
-----------------------------------------

HI Bruce,

Recently we also face problems with this piece of code while using GEODE 1.0.0. 
Issues mainly popped up if packet drops happen in the environment. When there 
is a burst of packet drop in the environment, cluster nodes are not able to 
successfully communicate, accept thread is blocked and there are a lot of 
sockets pending in states SYN_RECV, SYN_SENT, ESTABLISHED, etc. Furthermore, 
GEODE cluster is not able to recover from this state even if network has come 
back to normal state.

Our analysis shows that it is not related to having SSL handshake in accept 
thread itself. The root cause of this is the fact that there is no timeout set 
for the socket before performing SSL handshake. Experiments show that if 
s.setSoTimeout is invoked in 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl#accept just before 
configureServerSSLSocket, the issue is gone. We also tried to move 
configureServerSSLSocket to handshake thread pool before 
s.setSoTimeout(this.acceptTimeout) and in this case handshake threads were 
blocked after some time.

Based on above-mentioned stuff we do agree that SSL handshake should be moved 
to handshake thread after setting timeout. However, this fix is not complete - 
the same situation occurs in 
org.apache.geode.distributed.internal.tcpserver.TcpServer#processRequest, where 
GEODE performs SSL handshake without setting socket timeout. In this case 
TcpServer threads could be blocked forever. Hence we need to have another 
socket timeout setting in processRequest method.

Is this something that can be solved in scope of this issue? If yes should we 
reopen it? Or you prefer we create another issue and submit separate patch for 
this?

Thanks,
Vahram.

> A non-responsive SSL client can block a server's "acceptor" thread
> ------------------------------------------------------------------
>
>                 Key: GEODE-2898
>                 URL: https://issues.apache.org/jira/browse/GEODE-2898
>             Project: Geode
>          Issue Type: Bug
>          Components: client/server
>            Reporter: Bruce Schuchardt
>             Fix For: 1.2.0
>
>
> During the handoff to the handshake thread pool the accept thread can be 
> blocked in the SSL handshake. The SSL handshake should be moved to the 
> handshake thread pool. The goal is allow the server to reject clients that 
> haven't finished the handshake in the allotted time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to