[ https://issues.apache.org/jira/browse/GEODE-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315729#comment-17315729 ]
Kirk Lund commented on GEODE-9024: ---------------------------------- [~leonfin] Hi Leon, I recommend asking detailed questions about this on the geode dev-list: d...@geode.apache.org. > Geode Cache Server stops accepting client connections > ----------------------------------------------------- > > Key: GEODE-9024 > URL: https://issues.apache.org/jira/browse/GEODE-9024 > Project: Geode > Issue Type: Bug > Components: core > Affects Versions: 1.13.1 > Reporter: Leon Finker > Priority: Critical > > We are encountering the following deadlock (pretty often) on 1.13.1: > 1. Client (bridge) acceptor thread is locked up in this stack > {noformat} > "Handshaker 0.0.0.0/0.0.0.0:40011 Thread 2" #219 daemon prio=5 > os_prio=0 tid=0x00007f755c007000 nid=0x44a2 runnable > [0x00007f75847c7000] > java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:170) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.net.SocketInputStream.read(SocketInputStream.java:223) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.getCommunicationModeForNonSelector(AcceptorImpl.java:1559) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.handleNewClientConnection(AcceptorImpl.java:1430) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$handOffNewClientConnection$4(AcceptorImpl.java:1341) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$407/2146094985.run(Unknown > Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > 2. The 4 Handshaker threads for that pool are stuck in this stack > {noformat} > "Handshaker 0.0.0.0/0.0.0.0:40011 Thread 2" #219 daemon prio=5 > os_prio=0 tid=0x00007f755c007000 nid=0x44a2 runnable > [0x00007f75847c7000] > java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > at java.net.SocketInputStream.read(SocketInputStream.java:170) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at java.net.SocketInputStream.read(SocketInputStream.java:223) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.getCommunicationModeForNonSelector(AcceptorImpl.java:1559) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.handleNewClientConnection(AcceptorImpl.java:1430) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$handOffNewClientConnection$4(AcceptorImpl.java:1341) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$407/2146094985.run(Unknown > Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Is there any reason there is no socket read timeout set here: > private CommunicationMode getCommunicationModeForNonSelector(Socket > socket) throws IOException { > socket.setSoTimeout(0); > socketCreator.forCluster().handshakeIfSocketIsSSL(socket, acceptTimeout); > byte communicationModeByte = (byte) socket.getInputStream().read(); > This blocks any new client connections to the server. Why not set read > timeout? For some reason it's explicitly set to 0 (infinite)...This seems to > have changed here: > https://github.com/apache/geode/commit/e423cd8fa24329baf11fd6871a5ea6dc0f362b6c > Before that change, the socket.setSoTimeout(0); was after the socket read. > The cache server can be brought to a complete stop by just opening 4 telnet > sessions to the cache server port. This is kind of denial of service... > This is when using default CacheServer.MaxThreads=0. Maybe the work around is > to use CacheServer.MaxThreads=N because then the code goes into a selector > based logic with timeout it seems? > Thank you -- This message was sent by Atlassian Jira (v8.3.4#803005)