[ https://issues.apache.org/jira/browse/GEODE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221691#comment-17221691 ]
ASF GitHub Bot commented on GEODE-8652: --------------------------------------- Bill commented on a change in pull request #5666: URL: https://github.com/apache/geode/pull/5666#discussion_r512967980 ########## File path: geode-core/src/main/java/org/apache/geode/internal/net/NioSslEngine.java ########## @@ -315,50 +317,48 @@ public ByteBuffer ensureWrappedCapacity(int amount, ByteBuffer wrappedBuffer, } @Override - public ByteBuffer readAtLeast(SocketChannel channel, int bytes, + public ByteBufferSharing readAtLeast(SocketChannel channel, int bytes, ByteBuffer wrappedBuffer) throws IOException { - if (peerAppData.capacity() > bytes) { - // we already have a buffer that's big enough - if (peerAppData.capacity() - peerAppData.position() < bytes) { - peerAppData.compact(); - peerAppData.flip(); - } - } + try (final ByteBufferSharing inputSharing = shareInputBuffer()) { - while (peerAppData.remaining() < bytes) { - wrappedBuffer.limit(wrappedBuffer.capacity()); - int amountRead = channel.read(wrappedBuffer); - if (amountRead < 0) { - throw new EOFException(); + ByteBuffer peerAppData = inputSharing.getBuffer(); + + if (peerAppData.capacity() > bytes) { + // we already have a buffer that's big enough + if (peerAppData.capacity() - peerAppData.position() < bytes) { + peerAppData.compact(); + peerAppData.flip(); + } } - if (amountRead > 0) { - wrappedBuffer.flip(); - // prep the decoded buffer for writing - peerAppData.compact(); - peerAppData = unwrap(wrappedBuffer); - // done writing to the decoded buffer - prep it for reading again - peerAppData.flip(); + + while (peerAppData.remaining() < bytes) { + wrappedBuffer.limit(wrappedBuffer.capacity()); + int amountRead = channel.read(wrappedBuffer); + if (amountRead < 0) { + throw new EOFException(); + } + if (amountRead > 0) { + wrappedBuffer.flip(); + // prep the decoded buffer for writing + peerAppData.compact(); + try (final ByteBufferSharing inputSharing2 = unwrap(wrappedBuffer)) { + // done writing to the decoded buffer - prep it for reading again + final ByteBuffer peerAppDataNew = inputSharing2.getBuffer(); + peerAppDataNew.flip(); + peerAppData = peerAppDataNew; // loop needs new reference! + } + } } + return shareInputBuffer(); } - return peerAppData; } @Override - public ByteBuffer getUnwrappedBuffer(ByteBuffer wrappedBuffer) { - return peerAppData; - } - - /** - * ensures that the unwrapped buffer associated with the given wrapped buffer has - * sufficient capacity for the given amount of bytes. This may compact the - * buffer or it may return a new buffer. - */ - public ByteBuffer ensureUnwrappedCapacity(int amount) { - // for TTLS the app-data buffers do not need to be tracked direct-buffers since we - // do not use them for I/O operations - peerAppData = - bufferPool.expandReadBufferIfNeeded(TRACKED_RECEIVER, peerAppData, amount); - return peerAppData; + public ByteBufferSharing getUnwrappedBuffer(ByteBuffer wrappedBuffer) { + /* + * TODO: it can't be right that we ignore the wrappedBuffer parameter here! Review comment: Resolved and removed. We eliminated the parameter entirely. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > member hung in Connection.notifyHandshakeWaiter() during disconnect waiting > for a lock held by another thread in Connection.readAck() > -------------------------------------------------------------------------------------------------------------------------------------- > > Key: GEODE-8652 > URL: https://issues.apache.org/jira/browse/GEODE-8652 > Project: Geode > Issue Type: Bug > Components: membership, messaging > Affects Versions: 1.14.0 > Reporter: Bill Burcham > Assignee: Bill Burcham > Priority: Major > Labels: pull-request-available > > An application encountered the following hang in a TLS-enabled cluster. > Let's call the cluster members ds3 -> ds1. > ds3 sends a {{PutAllPRMessage}} to ds1 and is stuck in > {{SocketChannel.read()}} waiting for the acknowledgement. > {{ClusterDistributionManager.uncleanShutdown()}} is invoked on ds3 to shut > the member down. That thread blocks trying to acquire a lock on the > {{NioSslEngine}} held by the first thread (the one doing waiting for the ack > to the put-all.) > Somehow the shutdown thread must be allowed to proceed. > Here's the hung thread in ds3 (a.k.a. vm_3_thr_34_dataStore3_host1_8592) > trying to shut down the member but it's stuck waiting for the monitor on the > {{NioSslEngine}}: > {noformat} > "vm_3_thr_34_dataStore3_host1_8592" #795 daemon prio=5 os_prio=0 > tid=0x00007fdb9c011000 nid=0x2fcc waiting for monitor entry > [0x00007fdb6f4b7000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.geode.internal.tcp.Connection.notifyHandshakeWaiter(Connection.java:804) > - waiting to lock <0x00000000f2635b28> (a > org.apache.geode.internal.net.NioSslEngine) > at org.apache.geode.internal.tcp.Connection.close(Connection.java:1350) > at > org.apache.geode.internal.tcp.Connection.closePartialConnect(Connection.java:1278) > at > org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:612) > at > org.apache.geode.internal.tcp.ConnectionTable.closeCon(ConnectionTable.java:604) > at > org.apache.geode.internal.tcp.ConnectionTable.close(ConnectionTable.java:661) > - locked <0x00000000f2678cf8> (a java.util.ArrayList) > - locked <0x00000000f1187348> (a java.util.concurrent.ConcurrentHashMap) > at org.apache.geode.internal.tcp.TCPConduit.stop(TCPConduit.java:487) > at > org.apache.geode.distributed.internal.direct.DirectChannel.disconnect(DirectChannel.java:644) > - locked <0x00000000f11867a8> (a > org.apache.geode.distributed.internal.direct.DirectChannel) > at > org.apache.geode.distributed.internal.DistributionImpl.disconnectDirectChannel(DistributionImpl.java:631) > at > org.apache.geode.distributed.internal.DistributionImpl.access$200(DistributionImpl.java:82) > at > org.apache.geode.distributed.internal.DistributionImpl$LifecycleListenerImpl.disconnect(DistributionImpl.java:904) > at > org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.stop(GMSMembership.java:1908) > at > org.apache.geode.distributed.internal.membership.gms.Services.stop(Services.java:302) > at > org.apache.geode.distributed.internal.membership.gms.GMSMembership.shutdown(GMSMembership.java:1262) > at > org.apache.geode.distributed.internal.membership.gms.GMSMembership.disconnect(GMSMembership.java:1847) > at > org.apache.geode.distributed.internal.DistributionImpl.disconnect(DistributionImpl.java:501) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.uncleanShutdown(ClusterDistributionManager.java:1291) > {noformat} > That thread is waiting on a lock held by this thread (in ds3) which is > waiting on an acknowledgement to a PutAllPRMessage sent to ds1. > {noformat} > "vm_3_thr_37_dataStore3_host1_8592" #857 daemon prio=5 os_prio=0 > tid=0x00007fdb9c030800 nid=0x30d1 runnable [0x00007fdb732f0000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:192) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378) > - locked <0x00000000f2643380> (a java.lang.Object) > at > org.apache.geode.internal.net.NioSslEngine.readAtLeast(NioSslEngine.java:330) > at > org.apache.geode.internal.tcp.MsgReader.readAtLeast(MsgReader.java:129) > at > org.apache.geode.internal.tcp.MsgReader.readHeader(MsgReader.java:58) > ==> - locked <0x00000000f2635b28> (a > org.apache.geode.internal.net.NioSslEngine) > at > org.apache.geode.internal.tcp.Connection.readAck(Connection.java:2652) > at > org.apache.geode.distributed.internal.direct.DirectChannel.readAcks(DirectChannel.java:392) > at > org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:342) > at > org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:182) > at > org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:511) > at > org.apache.geode.distributed.internal.DistributionImpl.directChannelSend(DistributionImpl.java:346) > at > org.apache.geode.distributed.internal.DistributionImpl.send(DistributionImpl.java:291) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2053) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:1981) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2018) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1083) > at > org.apache.geode.internal.cache.partitioned.PutAllPRMessage.send(PutAllPRMessage.java:201) > at > org.apache.geode.internal.cache.PartitionedRegion.tryToSendOnePutAllMessage(PartitionedRegion.java:2839) > at > org.apache.geode.internal.cache.PartitionedRegion.sendMsgByBucket(PartitionedRegion.java:2621) > at > org.apache.geode.internal.cache.PartitionedRegion.postPutAllSend(PartitionedRegion.java:2392) > at > org.apache.geode.internal.cache.LocalRegionDataView.postPutAll(LocalRegionDataView.java:361) > at > org.apache.geode.internal.cache.LocalRegion.basicPutAll(LocalRegion.java:9154) > at > org.apache.geode.internal.cache.LocalRegion.putAll(LocalRegion.java:8903) > {noformat} > What we see is that the {{MsgReader}} in the second thread is not letting the > first thread close the socket. Until the socket is closed, the second thread > will be stuck in {{SocketChannel.read()}}. > *But why is the second thread stuck in {{SocketChannelImpl.read}}? That may > be due to GEODE-8651!* -- This message was sent by Atlassian Jira (v8.3.4#803005)