[ https://issues.apache.org/jira/browse/GEODE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734686#comment-16734686 ]
Kirk Lund commented on GEODE-6232: ---------------------------------- The cause of this test intermittently hanging is a deadlock between the 2 threads below. {{RMI TCP Connection(1)-192.168.0.26@5968}} is synchronized on GemFireCacheImpl while waiting to acquire the readLock in ManagementListener. {{Pooled High Priority Message Processor 2@5963}} is holding the writeLock in ManagementListener while waiting to synchronize on GemFireCacheImpl. This happens in VM-0 because it has a DistributionMessageObserver test hook that is trying to invoke disconnectFromDS() during beforeProcessMessage for a bucket GII (RequestImageMessage). {noformat} "RMI TCP Connection(1)-192.168.0.26@5968" daemon prio=5 tid=0x12 nid=NA waiting java.lang.Thread.State: WAITING blocks Pooled High Priority Message Processor 2@5963 at sun.misc.Unsafe.park(Unsafe.java:-1) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) at org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:110) at org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2201) at org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:606) at org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:144) - locked <0x17bb> (a org.apache.geode.internal.cache.GemFireCacheImpl) at org.apache.geode.internal.cache.GemFireCacheImpl.getOrCreateDefaultDiskStore(GemFireCacheImpl.java:2566) at org.apache.geode.internal.cache.LocalRegion.findDiskStore(LocalRegion.java:7600) at org.apache.geode.internal.cache.PartitionedRegion.findDiskStore(PartitionedRegion.java:9002) at org.apache.geode.internal.cache.LocalRegion.<init>(LocalRegion.java:647) at org.apache.geode.internal.cache.PartitionedRegion.<init>(PartitionedRegion.java:730) at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3015) - locked <0x180a> (a java.util.HashMap) at org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2956) at org.apache.geode.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2944) at org.apache.geode.cache.RegionFactory.create(RegionFactory.java:755) at org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest.createPartitionedRegion(PersistentPartitionedRegionRegressionTest.java:559) at org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest.lambda$doesNotWaitForPreviousInstanceOfOnlineServer$bb17a952$7(PersistentPartitionedRegionRegressionTest.java:393) at org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest$$Lambda$104.640621901.run(Unknown Source:-1) at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123) at org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source:-1) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(AccessController.java:-1) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$6.2031477857.run(Unknown Source:-1) at java.security.AccessController.doPrivileged(AccessController.java:-1) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) "Pooled High Priority Message Processor 2@5963" daemon prio=10 tid=0x6f8e nid=NA waiting for monitor entry java.lang.Thread.State: BLOCKED waiting for RMI TCP Connection(1)-192.168.0.26@5968 to release lock on <0x180a> (a java.util.HashMap) at org.apache.geode.internal.cache.GemFireCacheImpl.removeRoot(GemFireCacheImpl.java:3576) at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6333) at org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1755) at org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6255) at org.apache.geode.internal.cache.LocalRegion.localDestroyRegion(LocalRegion.java:2242) at org.apache.geode.internal.cache.AbstractRegion.localDestroyRegion(AbstractRegion.java:430) at org.apache.geode.management.internal.ManagementResourceRepo.destroyLocalMonitoringRegion(ManagementResourceRepo.java:73) at org.apache.geode.management.internal.LocalManager.cleanUpResources(LocalManager.java:260) at org.apache.geode.management.internal.LocalManager.stopManager(LocalManager.java:388) at org.apache.geode.management.internal.SystemManagementService.close(SystemManagementService.java:239) - locked <0x1812> (a java.util.HashMap) at org.apache.geode.management.internal.beans.ManagementAdapter.handleCacheRemoval(ManagementAdapter.java:737) at org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:119) at org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2201) at org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:606) at org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2127) - locked <0xbd0> (a java.lang.Class) at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1353) at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1016) at org.apache.geode.test.dunit.Disconnect.disconnectFromDS(Disconnect.java:43) at org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest$2.beforeProcessMessage(PersistentPartitionedRegionRegressionTest.java:359) at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:365) at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:432) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:956) at org.apache.geode.distributed.internal.ClusterDistributionManager.doHighPriorityThread(ClusterDistributionManager.java:834) at org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$36.853691000.invoke(Unknown Source:-1) at org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) at org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$34.67404871.run(Unknown Source:-1) at java.lang.Thread.run(Thread.java:745) {noformat} > CI failure: > PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer > hangs > -------------------------------------------------------------------------------------------------------- > > Key: GEODE-6232 > URL: https://issues.apache.org/jira/browse/GEODE-6232 > Project: Geode > Issue Type: Bug > Reporter: Lynn Gallinat > Assignee: Kirk Lund > Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > I did not find any call stacks. > The distributedTest-progress.txt file shows this test starts but does not > complete: > 2018-12-20 19:52:27.157 +0000 Starting test > org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest > doesNotWaitForPreviousInstanceOfOnlineServer > =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-build.302/test-results/distributedTest/1545341717/ > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Test report artifacts from this job are available at: > http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-build.302/test-artifacts/1545341717/distributedtestfiles-OpenJDK8-1.9.0-build.302.tgz > This occurred in DistributedTestOpenJDK8 tab #262, but a similar failure is > showing on tab #242 -- This message was sent by Atlassian JIRA (v7.6.3#76005)