[ 
https://issues.apache.org/jira/browse/GEODE-8817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen Nichols updated GEODE-8817:
--------------------------------
    Labels: blocks-1.14.0​  (was: )

> server hangs in cache close with ssl enabled due to active client connection; 
> client side (CacheClientUpdater.close()) is hung in 
> SSLSocketImpl$AppInputStream.deplete()
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-8817
>                 URL: https://issues.apache.org/jira/browse/GEODE-8817
>             Project: Geode
>          Issue Type: Bug
>          Components: client/server, security
>    Affects Versions: 1.14.0
>            Reporter: Bill Burcham
>            Assignee: Bill Burcham
>            Priority: Major
>              Labels: blocks-1.14.0​
>
> A proprietary TLS/SSL-enabled application encountered a network partition. A 
> server hangs in cache close due to active client connection; client side 
> ({{CacheClientUpdater.close()}}) is hung in 
> {{SSLSocketImpl$AppInputStream.deplete()}}
> The configuration is:
> {noformat}
> ==========================================
> losingSide              |    survivingSide
> ==========================================
> 11110                   |    10627
> 11115                   |    10632
> ------------------------------------------
> 11139                   |    10655
>                         |    10662
> ------------------------------------------
> {noformat}
> The stuck threads were stuck in sun's SSL code. Geode's client/Server 
> framework uses old I/O and that was also part of where they were stuck. If 
> the clients had closed their connections to the server then it would not have 
> been stuck here. But the server shutdown shouldn't hang because of client 
> that refuses to disconnect.
> The Geode client-side of the connection is hung here:
> {code:java}
> \[warn 2020/11/06 14:56:56.577 PST <ThreadsMonitor> tid=0x18] Thread <50> 
> (0x32) that was executed at <06 Nov 2020 14:55:43 PST> has been stuck for 
> <72.81 seconds> and number of thread monitor iteration <1>
> Thread Name <poolTimer-brclient-7> state <BLOCKED>
> Waiting on <sun.security.ssl.SSLSocketImpl$AppInputStream@3a9b72d3>
> Owned By <Cache Client Updater Thread  on 
> 10.32.108.224(bridgep2_host2_10627:10627)<v3>:41003 port 27636> with ID <43>
> Executor Group <ScheduledThreadPoolExecutorWithKeepAlive>
> Monitored metric <ResourceManagerStats.numThreadsStuck>
> Thread stack:
> sun.security.ssl.SSLSocketImpl$AppInputStream.deplete(SSLSocketImpl.java:1016)
> sun.security.ssl.SSLSocketImpl$AppInputStream.access$100(SSLSocketImpl.java:816)
> sun.security.ssl.SSLSocketImpl.bruteForceCloseInput(SSLSocketImpl.java:702)
> sun.security.ssl.SSLSocketImpl.duplexCloseOutput(SSLSocketImpl.java:553)
> sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:485)
> org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.close(CacheClientUpdater.java:546)
> org.apache.geode.cache.client.internal.QueueConnectionImpl.internalDestroy(QueueConnectionImpl.java:112)
> org.apache.geode.cache.client.internal.QueueManagerImpl.endpointCrashed(QueueManagerImpl.java:379)
> org.apache.geode.cache.client.internal.QueueManagerImpl.connectionCrashed(QueueManagerImpl.java:357)
> org.apache.geode.cache.client.internal.QueueConnectionImpl.destroy(QueueConnectionImpl.java:88)
> org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:645)
> org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:504)
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeOnServer(OpExecutorImpl.java:334)
> org.apache.geode.cache.client.internal.OpExecutorImpl.executeOn(OpExecutorImpl.java:303)
> org.apache.geode.cache.client.internal.PoolImpl.executeOn(PoolImpl.java:839)
> org.apache.geode.cache.client.internal.PingOp.execute(PingOp.java:38)
> org.apache.geode.cache.client.internal.LiveServerPinger$PingTask.run2(LiveServerPinger.java:90)
> org.apache.geode.cache.client.internal.PoolImpl$PoolTask.run(PoolImpl.java:1329)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> org.apache.geode.internal.ScheduledThreadPoolExecutorWithKeepAlive$DelegatingScheduledFuture.run(ScheduledThreadPoolExecutorWithKeepAlive.java:279)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> Lock owner thread stack
> java.net.SocketInputStream.socketRead0(Native Method)
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> java.net.SocketInputStream.read(SocketInputStream.java:171)
> java.net.SocketInputStream.read(SocketInputStream.java:141)
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:475)
> sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:469)
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:69)
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1228)
> sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:75)
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:915)
> org.apache.geode.internal.cache.tier.sockets.Message.fetchHeader(Message.java:830)
> org.apache.geode.internal.cache.tier.sockets.Message.readHeaderAndBody(Message.java:677)
> org.apache.geode.internal.cache.tier.sockets.Message.receiveWithHeaderReadTimeout(Message.java:1140)
> org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.processMessages(CacheClientUpdater.java:1591)
> org.apache.geode.internal.cache.tier.sockets.CacheClientUpdater.run(CacheClientUpdater.java:483)
> {code}
> From the Geode log we can see the expected exceptions for messages to losing 
> side (Operation not permitted) after the network is dropped:
> {code:java}
> [warn 2020/11/06 14:50:20.865 PST <Geode Failure Detection thread 3> 
> tid=0x9b] Unable to send message to 
> 10.32.110.94(bridgep1_host1_11115:11115)<v4>:41001
> java.io.IOException: Operation not permitted (sendto failed)
>         at java.net.PlainDatagramSocketImpl.send(Native Method)
>         at java.net.DatagramSocket.send(DatagramSocket.java:693)
>         at org.jgroups.protocols.UDP._send(UDP.java:224)
>         at org.jgroups.protocols.UDP.sendUnicast(UDP.java:215)
>         at org.jgroups.protocols.TP.sendToSingleMember(TP.java:1985)
>         at org.jgroups.protocols.TP.doSend(TP.java:1962)
>         at 
> org.apache.geode.distributed.internal.membership.gms.messenger.Transport.doSend(Transport.java:85)
>         at org.jgroups.protocols.TP.send(TP.java:1948)
>         at 
> org.apache.geode.distributed.internal.membership.gms.messenger.Transport._send(Transport.java:57)
>         at org.jgroups.protocols.TP.down(TP.java:1515)
>         at org.jgroups.stack.Protocol.down(Protocol.java:439)
>         at 
> org.apache.geode.distributed.internal.membership.gms.messenger.StatRecorder.down(StatRecorder.java:87)
>         at org.jgroups.protocols.UNICAST3.down(UNICAST3.java:646)
>         at org.jgroups.protocols.FlowControl.down(FlowControl.java:347)
>         at org.jgroups.protocols.FRAG2.down(FRAG2.java:136)
>         at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1039)
>         at org.jgroups.JChannel.down(JChannel.java:790)
>         at org.jgroups.JChannel.send(JChannel.java:426)
>         at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.send(JGroupsMessenger.java:819)
>         at 
> org.apache.geode.distributed.internal.membership.gms.messenger.JGroupsMessenger.send(JGroupsMessenger.java:670)
>         at 
> org.apache.geode.distributed.internal.membership.gms.fd.GMSHealthMonitor.doCheckMember(GMSHealthMonitor.java:504)
>         at 
> org.apache.geode.distributed.internal.membership.gms.fd.GMSHealthMonitor.lambda$checkMember$1(GMSHealthMonitor.java:455)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to