[ 
https://issues.apache.org/jira/browse/GEODE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734686#comment-16734686
 ] 

Kirk Lund commented on GEODE-6232:
----------------------------------

The cause of this test intermittently hanging is a deadlock between the 2 
threads below.

{{RMI TCP Connection(1)-192.168.0.26@5968}} is synchronized on GemFireCacheImpl 
while waiting to acquire the readLock in ManagementListener.

{{Pooled High Priority Message Processor 2@5963}} is holding the writeLock in 
ManagementListener while waiting to synchronize on GemFireCacheImpl.

This happens in VM-0 because it has a DistributionMessageObserver test hook 
that is trying to invoke disconnectFromDS() during beforeProcessMessage for a 
bucket GII (RequestImageMessage).
{noformat}
"RMI TCP Connection(1)-192.168.0.26@5968" daemon prio=5 tid=0x12 nid=NA waiting
  java.lang.Thread.State: WAITING
         blocks Pooled High Priority Message Processor 2@5963
          at sun.misc.Unsafe.park(Unsafe.java:-1)
          at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
          at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
          at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
          at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
          at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
          at 
org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:110)
          at 
org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2201)
          at 
org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:606)
          at 
org.apache.geode.internal.cache.DiskStoreFactoryImpl.create(DiskStoreFactoryImpl.java:144)
          - locked <0x17bb> (a org.apache.geode.internal.cache.GemFireCacheImpl)
          at 
org.apache.geode.internal.cache.GemFireCacheImpl.getOrCreateDefaultDiskStore(GemFireCacheImpl.java:2566)
          at 
org.apache.geode.internal.cache.LocalRegion.findDiskStore(LocalRegion.java:7600)
          at 
org.apache.geode.internal.cache.PartitionedRegion.findDiskStore(PartitionedRegion.java:9002)
          at 
org.apache.geode.internal.cache.LocalRegion.<init>(LocalRegion.java:647)
          at 
org.apache.geode.internal.cache.PartitionedRegion.<init>(PartitionedRegion.java:730)
          at 
org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3015)
          - locked <0x180a> (a java.util.HashMap)
          at 
org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2956)
          at 
org.apache.geode.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2944)
          at org.apache.geode.cache.RegionFactory.create(RegionFactory.java:755)
          at 
org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest.createPartitionedRegion(PersistentPartitionedRegionRegressionTest.java:559)
          at 
org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest.lambda$doesNotWaitForPreviousInstanceOfOnlineServer$bb17a952$7(PersistentPartitionedRegionRegressionTest.java:393)
          at 
org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest$$Lambda$104.640621901.run(Unknown
 Source:-1)
          at 
sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
          at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at 
org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
          at 
org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69)
          at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source:-1)
          at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346)
          at sun.rmi.transport.Transport$1.run(Transport.java:200)
          at sun.rmi.transport.Transport$1.run(Transport.java:197)
          at 
java.security.AccessController.doPrivileged(AccessController.java:-1)
          at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
          at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
          at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
          at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
          at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$6.2031477857.run(Unknown
 Source:-1)
          at 
java.security.AccessController.doPrivileged(AccessController.java:-1)
          at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
          at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:745)

"Pooled High Priority Message Processor 2@5963" daemon prio=10 tid=0x6f8e 
nid=NA waiting for monitor entry
  java.lang.Thread.State: BLOCKED
         waiting for RMI TCP Connection(1)-192.168.0.26@5968 to release lock on 
<0x180a> (a java.util.HashMap)
          at 
org.apache.geode.internal.cache.GemFireCacheImpl.removeRoot(GemFireCacheImpl.java:3576)
          at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6333)
          at 
org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1755)
          at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6255)
          at 
org.apache.geode.internal.cache.LocalRegion.localDestroyRegion(LocalRegion.java:2242)
          at 
org.apache.geode.internal.cache.AbstractRegion.localDestroyRegion(AbstractRegion.java:430)
          at 
org.apache.geode.management.internal.ManagementResourceRepo.destroyLocalMonitoringRegion(ManagementResourceRepo.java:73)
          at 
org.apache.geode.management.internal.LocalManager.cleanUpResources(LocalManager.java:260)
          at 
org.apache.geode.management.internal.LocalManager.stopManager(LocalManager.java:388)
          at 
org.apache.geode.management.internal.SystemManagementService.close(SystemManagementService.java:239)
          - locked <0x1812> (a java.util.HashMap)
          at 
org.apache.geode.management.internal.beans.ManagementAdapter.handleCacheRemoval(ManagementAdapter.java:737)
          at 
org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:119)
          at 
org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2201)
          at 
org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:606)
          at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2127)
          - locked <0xbd0> (a java.lang.Class)
          at 
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1353)
          at 
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1016)
          at 
org.apache.geode.test.dunit.Disconnect.disconnectFromDS(Disconnect.java:43)
          at 
org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest$2.beforeProcessMessage(PersistentPartitionedRegionRegressionTest.java:359)
          at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:365)
          at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:432)
          at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:956)
          at 
org.apache.geode.distributed.internal.ClusterDistributionManager.doHighPriorityThread(ClusterDistributionManager.java:834)
          at 
org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$36.853691000.invoke(Unknown
 Source:-1)
          at 
org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121)
          at 
org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$34.67404871.run(Unknown
 Source:-1)
          at java.lang.Thread.run(Thread.java:745)
{noformat}

> CI failure: 
> PersistentPartitionedRegionRegressionTest.doesNotWaitForPreviousInstanceOfOnlineServer
>  hangs
> --------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-6232
>                 URL: https://issues.apache.org/jira/browse/GEODE-6232
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Lynn Gallinat
>            Assignee: Kirk Lund
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I did not find any call stacks.
> The distributedTest-progress.txt file shows this test starts but does not 
> complete:
> 2018-12-20 19:52:27.157 +0000 Starting test 
> org.apache.geode.internal.cache.partitioned.PersistentPartitionedRegionRegressionTest
>  doesNotWaitForPreviousInstanceOfOnlineServer
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-build.302/test-results/distributedTest/1545341717/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.9.0-build.302/test-artifacts/1545341717/distributedtestfiles-OpenJDK8-1.9.0-build.302.tgz
> This occurred in DistributedTestOpenJDK8 tab #262, but a similar failure is 
> showing on tab #242



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to