[ https://issues.apache.org/jira/browse/GEODE-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen Nichols updated GEODE-8817: -------------------------------- Labels: blocks-1.14.0 (was: ) > server hangs in cache close with ssl enabled due to active client connection; > client side (CacheClientUpdater.close()) is hung in > SSLSocketImpl$AppInputStream.deplete() > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: GEODE-8817 > URL: https://issues.apache.org/jira/browse/GEODE-8817 > Project: Geode > Issue Type: Bug > Components: client/server, security > Affects Versions: 1.14.0 > Reporter: Bill Burcham > Assignee: Bill Burcham > Priority: Major > Labels: blocks-1.14.0 > > A proprietary TLS/SSL-enabled application encountered a network partition. A > server hangs in cache close due to active client connection; client side > ({{CacheClientUpdater.close()}}) is hung in > {{SSLSocketImpl$AppInputStream.deplete()}} > The configuration is: > {noformat} > ========================================== > losingSide | survivingSide > ========================================== > 11110 | 10627 > 11115 | 10632 > ------------------------------------------ > 11139 | 10655 > | 10662 > ------------------------------------------ > {noformat} > The stuck threads were stuck in sun's SSL code. Geode's client/Server > framework uses old I/O and that was also part of where they were stuck. If > the clients had closed their connections to the server then it would not have > been stuck here. But the server shutdown shouldn't hang because of client > that refuses to disconnect. > The Geode client-side of the connection is hung here: > {code:java} > \[warn 2020/11/06 14:56:56.577 PST <ThreadsMonitor> tid=0x18] Thread <50> > (0x32) that was executed at <06 Nov 2020 14:55:43 PST> has been stuck for > <72.81 seconds> and number of thread monitor iteration <1> > Thread Name <poolTimer-brclient-7> state <BLOCKED> > Waiting on <sun.security.ssl.SSLSocketImpl$AppInputStream@3a9b72d3> > Owned By <Cache Client Updater Thread on > 10.32.108.224(bridgep2_host2_10627:10627)<v3>:41003 port 27636> with ID <43> > Executor Group <ScheduledThreadPoolExecutorWithKeepAlive> > Monitored metric <ResourceManagerStats.numThreadsStuck> > Thread stack: > sun.security.ssl.SSLSocketImpl$AppInputStream.deplete(SSLSocketImpl.java:1016) > sun.security.ssl.SSLSocketImpl$AppInputStream.access$100(SSLSocketImpl.java:816) > sun.security.ssl.SSLSocketImpl.bruteForceCloseInput(SSLSocketImpl.java:702) > sun.security.ssl.SSLSocketImpl.duplexCloseOutput(SSLSocketImpl.java:553) > sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:485) > org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.close(CacheClientUpdater.java:546) > org.apache.geode.cache.client.internal.QueueConnectionImpl.internalDestroy(QueueConnectionImpl.java:112) > org.apache.geode.cache.client.internal.QueueManagerImpl.endpointCrashed(QueueManagerImpl.java:379) > org.apache.geode.cache.client.internal.QueueManagerImpl.connectionCrashed(QueueManagerImpl.java:357) > org.apache.geode.cache.client.internal.QueueConnectionImpl.destroy(QueueConnectionImpl.java:88) > org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:645) > org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:504) > org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:334) > org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:303) > org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:839) > org.apache.geode.cache.client.internal.PingOp.execute(PingOp.java:38) > org.apache.geode.cache.client.internal.LiveServerPinger$PingTask.run2(LiveServerPinger.java:90) > org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1329) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > org.apache.geode.internal.ScheduledThreadPoolExecutorWithKeepAlive$DelegatingScheduledFuture.run(ScheduledThreadPoolExecutorWithKeepAlive.java:279) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > Lock owner thread stack > java.net.SocketInputStream.socketRead0(Native Method) > java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > java.net.SocketInputStream.read(SocketInputStream.java:171) > java.net.SocketInputStream.read(SocketInputStream.java:141) > sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:475) > sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:469) > sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:69) > sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1228) > sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:75) > sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:915) > org.apache.geode.internal.cache.tier.sockets.Message.fetchHeader(Message.java:830) > org.apache.geode.internal.cache.tier.sockets.Message.readHeaderAndBody(Message.java:677) > org.apache.geode.internal.cache.tier.sockets.Message.receiveWithHeaderReadTimeout(Message.java:1140) > org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.processMessages(CacheClientUpdater.java:1591) > org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.run(CacheClientUpdater.java:483) > {code} > From the Geode log we can see the expected exceptions for messages to losing > side (Operation not permitted) after the network is dropped: > {code:java} > [warn 2020/11/06 14:50:20.865 PST <Geode Failure Detection thread 3> > tid=0x9b] Unable to send message to > 10.32.110.94(bridgep1_host1_11115:11115)<v4>:41001 > java.io.IOException: Operation not permitted (sendto failed) > at java.net.PlainDatagramSocketImpl.send(Native Method) > at java.net.DatagramSocket.send(DatagramSocket.java:693) > at org.jgroups.protocols.UDP._send(UDP.java:224) > at org.jgroups.protocols.UDP.sendUnicast(UDP.java:215) > at org.jgroups.protocols.TP.sendToSingleMember(TP.java:1985) > at org.jgroups.protocols.TP.doSend(TP.java:1962) > at > org.apache.geode.distributed.internal.membership.gms.messenger.Transport.doSend(Transport.java:85) > at org.jgroups.protocols.TP.send(TP.java:1948) > at > org.apache.geode.distributed.internal.membership.gms.messenger.Transport._send(Transport.java:57) > at org.jgroups.protocols.TP.down(TP.java:1515) > at org.jgroups.stack.Protocol.down(Protocol.java:439) > at > org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.down(StatRecorder.java:87) > at org.jgroups.protocols.UNICAST3.down(UNICAST3.java:646) > at org.jgroups.protocols.FlowControl.down(FlowControl.java:347) > at org.jgroups.protocols.FRAG2.down(FRAG2.java:136) > at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1039) > at org.jgroups.JChannel.down(JChannel.java:790) > at org.jgroups.JChannel.send(JChannel.java:426) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.send(JGroupsMessenger.java:819) > at > org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.send(JGroupsMessenger.java:670) > at > org.apache.geode.distributed.internal.membership.gms.fd.GMSHealthMonitor.doCheckMember(GMSHealthMonitor.java:504) > at > org.apache.geode.distributed.internal.membership.gms.fd.GMSHealthMonitor.lambda$checkMember$1(GMSHealthMonitor.java:455) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)