Still trying to identify the cause of the unit test hangs.....stay tuned. For other people who might not know this, the reason the entire CI job fails rather than the hanging test is because we don't have any global test-level timeouts, so a hanging test will run up until the Concourse timeout is reached. I wrote a story to explore adding test level timeouts so we can get actual feedback in the CI about which test is hanging without waiting for the whole job to timeout, downloading artifacts, looking at the stack dumps, etc etc. I think we could have some reasonable global upper limit (30 minutes?) for any test. We could always tune that as needed. https://issues.apache.org/jira/projects/GEODE/issues/GEODE-7057?filter=allopenissues
Ryan On Wed, Aug 7, 2019 at 11:21 AM Bruce Schuchardt <bschucha...@pivotal.io> wrote: > Yeah, that test passed on my branch in unit tests and stress tests but > for some reason is hanging after the merge to develop. I've pushed an > @Ignore for the test that you should pick up. > > On 8/7/19 10:38 AM, Kirk Lund wrote: > > Yep, that's the same test I'm seeing in the callstacks of my dunit tgz > from > > concourse... > > > > Started @ 2019-08-07 07:25:18.494 +0000 > > 2019-08-07 07:51:25.252 +0000 > > org.apache.geode.cache30.DistributedMulticastRegionDUnitTest > > testMulticastAfterReconnect > > Ended @ 2019-08-07 08:28:16.591 +0000 > > > > Thanks for digging into it! > > > > On Wed, Aug 7, 2019 at 10:16 AM Ryan McMahon <rmcma...@pivotal.io> > wrote: > > > >> I think the reflection and PowerMock warnings here are probably a red > >> herring. We pulled down the artifacts and found that the DUnit job is > >> hanging due to stuck threads in a newer DUnit test. I am not sure why > it > >> isn't failing the test but rather failing the entire job. I believe > Bruce > >> Schuchart is going to dig into this a bit. Analysis from Lynn > >> Hughes-Godfrey: > >> > >> So, I downloaded > >> > >> > >> > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/961 > >> > >> to > >> > >> > >> > /Users/lhughesgodfrey/Downloads/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0010/geode-core/build/distributedTest > >> > >> In that callstacks directory, you'll see dunit-hangs.txt ... and this > >> test has been running for over an hour ... > >> > >> ``` > >> > >> Started @ 2019-08-07 07:54:10.272 +0000 > >> > >> 2019-08-07 08:22:25.440 +0000 > >> org.apache.geode.cache30.DistributedMulticastRegionDUnitTest > >> testMulticastAfterReconnect > >> > >> Ended @ 2019-08-07 08:58:50.243 +0000 > >> > >> ``` > >> > >> When I look at the callstacks (at the top level), I see this: > >> > >> ``` > >> > >> "Pooled Waiting Message Processor 1" #213 daemon prio=5 os_prio=0 > >> tid=0x00007fc738024000 nid=0x488 waiting for monitor entry > >> [0x00007fc7ec5f7000] > >> > >> java.lang.Thread.State: BLOCKED (on object monitor) > >> > >> at > >> > org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager.startEventProcessing(GMSMembershipManager.java:1179) > >> > >> - waiting to lock <0x00000000e03a75e0> (a java.lang.Object) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.readyForMessages(ClusterDistributionManager.java:1152) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.lambda$startThreads$10(ClusterDistributionManager.java:1129) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$87/1408912936.run(Unknown > >> Source) > >> > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > >> > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:961) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.doWaitingThread(ClusterDistributionManager.java:851) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager$$Lambda$53/274751999.invoke(Unknown > >> Source) > >> > >> at > >> > org.apache.geode.internal.logging.LoggingThreadFactory.lambda$newThread$0(LoggingThreadFactory.java:121) > >> > >> at > >> > org.apache.geode.internal.logging.LoggingThreadFactory$$Lambda$49/1824093464.run(Unknown > >> Source) > >> > >> at java.lang.Thread.run(Thread.java:748) > >> > >> Locked ownable synchronizers: > >> > >> - <0x00000000e03a7660> (a > >> java.util.concurrent.ThreadPoolExecutor$Worker) > >> > >> ``` > >> > >> This thread holds the lock: > >> > >> ``` > >> > >> "ReconnectThread" #193 prio=5 os_prio=0 tid=0x00007fc7cc219000 > >> nid=0x46a waiting on condition [0x00007fc78bdfb000] > >> > >> java.lang.Thread.State: TIMED_WAITING (sleeping) > >> > >> at java.lang.Thread.sleep(Native Method) > >> > >> at > >> > org.apache.geode.internal.tcp.Connection.createSender(Connection.java:959) > >> > >> at > >> > org.apache.geode.internal.tcp.ConnectionTable.handleNewPendingConnection(ConnectionTable.java:297) > >> > >> at > >> > org.apache.geode.internal.tcp.ConnectionTable.getSharedConnection(ConnectionTable.java:408) > >> > >> at > >> > org.apache.geode.internal.tcp.ConnectionTable.get(ConnectionTable.java:593) > >> > >> at > >> > org.apache.geode.internal.tcp.TCPConduit.getConnection(TCPConduit.java:825) > >> > >> at > >> > org.apache.geode.distributed.internal.direct.DirectChannel.getConnections(DirectChannel.java:537) > >> > >> at > >> > org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:326) > >> > >> at > >> > org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:241) > >> > >> at > >> > org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:596) > >> > >> at > >> > org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager.directChannelSend(GMSMembershipManager.java:1558) > >> > >> at > >> > org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager.send(GMSMembershipManager.java:1739) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.sendViaMembershipManager(ClusterDistributionManager.java:2849) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.sendOutgoing(ClusterDistributionManager.java:2776) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.sendMessage(ClusterDistributionManager.java:2813) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.putOutgoing(ClusterDistributionManager.java:1523) > >> > >> at > >> > org.apache.geode.distributed.internal.ReplyMessage.send(ReplyMessage.java:174) > >> > >> at > >> > org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.sendReply(DistributedCacheOperation.java:1269) > >> > >> at > >> > org.apache.geode.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1132) > >> > >> at > >> > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:370) > >> > >> at > >> > org.apache.geode.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:425) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.scheduleIncomingMessage(ClusterDistributionManager.java:2862) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.handleIncomingDMsg(ClusterDistributionManager.java:2549) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.access$1700(ClusterDistributionManager.java:113) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.messageReceived(ClusterDistributionManager.java:3388) > >> > >> at > >> > org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1043) > >> > >> at > >> > org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager.processStartupEvent(GMSMembershipManager.java:1154) > >> > >> at > >> > org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager.startEventProcessing(GMSMembershipManager.java:1205) > >> > >> - locked <0x00000000e03a75e0> (a java.lang.Object) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.<init>(ClusterDistributionManager.java:911) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager.create(ClusterDistributionManager.java:540) > >> > >> at > >> > org.apache.geode.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:756) > >> > >> at > >> > org.apache.geode.distributed.internal.InternalDistributedSystem.access$200(InternalDistributedSystem.java:135) > >> > >> at > >> > org.apache.geode.distributed.internal.InternalDistributedSystem$Builder.build(InternalDistributedSystem.java:2994) > >> > >> at > >> > org.apache.geode.distributed.internal.InternalDistributedSystem.connectInternal(InternalDistributedSystem.java:251) > >> > >> - locked <0x00000000e0397568> (a java.lang.Object) > >> > >> at > >> > org.apache.geode.distributed.DistributedSystem.connect(DistributedSystem.java:158) > >> > >> at > >> > org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2575) > >> > >> at > >> > org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2392) > >> > >> - locked <0x00000000e045f178> (a java.lang.Object) > >> > >> - locked <0x00000000e047a880> (a java.lang.Class for > >> org.apache.geode.internal.cache.GemFireCacheImpl) > >> > >> - locked <0x00000000e0512eb8> (a java.lang.Class for > >> org.apache.geode.internal.cache.InternalCacheBuilder) > >> > >> at > >> > org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1207) > >> > >> at > >> > org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:3383) > >> > >> at > >> > org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager.uncleanShutdown(GMSMembershipManager.java:1441) > >> > >> at > >> > org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager$ManagerImpl.lambda$forceDisconnect$0(GMSMembershipManager.java:2613) > >> > >> at > >> > org.apache.geode.distributed.internal.membership.adapter.GMSMembershipManager$ManagerImpl$$Lambda$198/1187156898.run(Unknown > >> Source) > >> > >> at java.lang.Thread.run(Thread.java:748) > >> > >> Locked ownable synchronizers: > >> > >> - None > >> > >> ``` > >> > >> Another blocked thread: > >> > >> ``` > >> > >> "RMI TCP Connection(1)-172.17.0.13" #32 daemon prio=5 os_prio=0 > >> tid=0x00007fc7d4001800 nid=0x27e waiting for monitor entry > >> [0x00007fc849054000] > >> > >> java.lang.Thread.State: BLOCKED (on object monitor) > >> > >> at java.lang.Object.wait(Native Method) > >> > >> - waiting on <0x00000000e045f178> (a java.lang.Object) > >> > >> at > >> > org.apache.geode.distributed.internal.InternalDistributedSystem.waitUntilReconnected(InternalDistributedSystem.java:2895) > >> > >> - locked <0x00000000e045f178> (a java.lang.Object) > >> > >> at > >> > org.apache.geode.internal.cache.GemFireCacheImpl.waitUntilReconnected(GemFireCacheImpl.java:2408) > >> > >> at > >> > org.apache.geode.cache30.DistributedMulticastRegionDUnitTest.lambda$testMulticastAfterReconnect$bb17a952$1(DistributedMulticastRegionDUnitTest.java:168) > >> > >> at > >> > org.apache.geode.cache30.DistributedMulticastRegionDUnitTest$$Lambda$200/626559185.run(Unknown > >> Source) > >> > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> > >> at > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > >> > >> at > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >> > >> at java.lang.reflect.Method.invoke(Method.java:498) > >> > >> at > >> > org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123) > >> > >> at > >> > org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:69) > >> > >> at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) > >> > >> at > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > >> > >> at java.lang.reflect.Method.invoke(Method.java:498) > >> > >> at > >> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) > >> > >> at sun.rmi.transport.Transport$1.run(Transport.java:200) > >> > >> at sun.rmi.transport.Transport$1.run(Transport.java:197) > >> > >> at java.security.AccessController.doPrivileged(Native Method) > >> > >> at sun.rmi.transport.Transport.serviceCall(Transport.java:196) > >> > >> at > >> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) > >> > >> at > >> > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834) > >> > >> at > >> > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) > >> > >> at > >> > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$8/1801094145.run(Unknown > >> Source) > >> > >> at java.security.AccessController.doPrivileged(Native Method) > >> > >> at > >> > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) > >> > >> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > >> > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > >> > >> at java.lang.Thread.run(Thread.java:748) > >> > >> Locked ownable synchronizers: > >> > >> - <0x00000000e01b9da0> (a > >> java.util.concurrent.ThreadPoolExecutor$Worker) > >> > >> ``` > >> > >> > >> Thanks, > >> > >> Ryan > >> > >> > >> On Wed, Aug 7, 2019 at 10:05 AM Kirk Lund <kl...@apache.org> wrote: > >> > >>> We previously decided to stop using PowerMock in any Geode tests and > >> remove > >>> it. Ryan and I created > https://issues.apache.org/jira/browse/GEODE-6143 > >>> but > >>> it's languishing because it takes a lot of time to strip out PowerMock > >> from > >>> a test. > >>> > >>> I don't get any PowerMock warnings locally when I run unit tests. > >>> > >>> On Wed, Aug 7, 2019 at 9:50 AM Murtuza Boxwala <mboxw...@pivotal.io> > >>> wrote: > >>> > >>>> Yea. I think it might be a red herring, because I am seeing those > >> errors > >>>> in every run, passing ones two…just double checked on > >>>> > >> > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UnitTestOpenJDK11/builds/948 > >>>> < > >>>> > >> > https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UnitTestOpenJDK11/builds/948 > >>>> If we don’t have any other ideas, though, it might be worth removing > >>> them. > >>>> It is odd that you don’t see them locally. Let me see if I can get > them > >>> to > >>>> show up…it might have something to do with the logging level. > >>>> > >>>>> On Aug 7, 2019, at 12:45 PM, Kirk Lund <kl...@apache.org> wrote: > >>>>> > >>>>> I don't know if the PowerMock warnings are related but that's the > >> only > >>>>> thing interesting in the output before "timeout exceeded". > >>>>> > >>>>> On Wed, Aug 7, 2019 at 9:43 AM Murtuza Boxwala <mboxw...@pivotal.io> > >>>> wrote: > >>>>>> How down know these warnings are related to the builds hanging. > >> When > >>> I > >>>>>> was on CIO duty a couple days ago, I remember seeing this warning > >> in a > >>>>>> failing build, but then I looked back at passing builds on saw this > >> as > >>>> well. > >>>>>>> On Aug 7, 2019, at 12:40 PM, Kirk Lund <kl...@apache.org> wrote: > >>>>>>> > >>>>>>> The build is broken in CI right now (for main CI and PRs). The > >>> UnitTest > >>>>>>> jobs are timing out so I assume there's a hang of some sort. > >>>>>>> > >>>>>>> The WARNINGs appear to be related to PowerMock and begins with "An > >>>>>> illegal > >>>>>>> reflective access operation" in geode-assembly:test. > >>>>>>> > >>>>>>> I'm running unit tests locally and haven't hit this yet, but maybe > >> we > >>>>>>> should consider disabling all PowerMock tests and increase priority > >>> on > >>>>>>> re-enabling them without PowerMock? > >>>>>>> > >>>>>>>> Task :geode-assembly:test > >>>>>>> WARNING: An illegal reflective access operation has occurred > >>>>>>> WARNING: Illegal reflective access by > >>>>>>> org.mockito.internal.util.reflection.AccessibilityChanger > >>>>>>> > >> > (file:/home/geode/.gradle/caches/modules-2/files-2.1/org.mockito/mockito-core/2.23.0/497ddb32fd5d01f9dbe99a2ec790aeb931dff1b1/mockito-core-2.23.0.jar) > >>>>>>> to field java.io.File.path > >>>>>>> WARNING: Please consider reporting this to the maintainers of > >>>>>>> org.mockito.internal.util.reflection.AccessibilityChanger > >>>>>>> WARNING: Use --illegal-access=warn to enable warnings of further > >>>> illegal > >>>>>>> reflective access operations > >>>>>>> WARNING: All illegal access operations will be denied in a future > >>>> release > >>>>>>> WARNING: An illegal reflective access operation has occurred > >>>>>>> WARNING: Illegal reflective access by > >>>>>>> org.mockito.internal.util.reflection.AccessibilityChanger > >>>>>>> > >> > (file:/home/geode/.gradle/caches/modules-2/files-2.1/org.mockito/mockito-core/2.23.0/497ddb32fd5d01f9dbe99a2ec790aeb931dff1b1/mockito-core-2.23.0.jar) > >>>>>>> to field java.io.File.path > >>>>>>> WARNING: Please consider reporting this to the maintainers of > >>>>>>> org.mockito.internal.util.reflection.AccessibilityChanger > >>>>>>> WARNING: Use --illegal-access=warn to enable warnings of further > >>>> illegal > >>>>>>> reflective access operations > >>>>>>> WARNING: All illegal access operations will be denied in a future > >>>> release > >>>>>>>> Task :geode-core:test > >>>>>>> WARNING: An illegal reflective access operation has occurred > >>>>>>> WARNING: Illegal reflective access by > >>>>>>> org.apache.geode.distributed.internal.deadlock.UnsafeThreadLocal > >>>>>>> (file:/home/geode/geode/geode-core/build/classes/java/main/) to > >>> method > >>>>>>> java.lang.ThreadLocal.getMap(java.lang.Thread) > >>>>>>> WARNING: Please consider reporting this to the maintainers of > >>>>>>> org.apache.geode.distributed.internal.deadlock.UnsafeThreadLocal > >>>>>>> WARNING: Use --illegal-access=warn to enable warnings of further > >>>> illegal > >>>>>>> reflective access operations > >>>>>>> WARNING: All illegal access operations will be denied in a future > >>>> release > >>>>>>> WARNING: An illegal reflective access operation has occurred > >>>>>>> WARNING: Illegal reflective access by > >>>>>>> org.mockito.internal.util.reflection.AccessibilityChanger > >>>>>>> > >> > (file:/home/geode/.gradle/caches/modules-2/files-2.1/org.mockito/mockito-core/2.23.0/497ddb32fd5d01f9dbe99a2ec790aeb931dff1b1/mockito-core-2.23.0.jar) > >>>>>>> to field java.util.Date.fastTime > >>>>>>> WARNING: Please consider reporting this to the maintainers of > >>>>>>> org.mockito.internal.util.reflection.AccessibilityChanger > >>>>>>> WARNING: Use --illegal-access=warn to enable warnings of further > >>>> illegal > >>>>>>> reflective access operations > >>>>>>> WARNING: All illegal access operations will be denied in a future > >>>> release > >>>>>>> WARNING: An illegal reflective access operation has occurred > >>>>>>> WARNING: Illegal reflective access by > >>>>>>> org.powermock.reflect.internal.WhiteboxImpl > >>>>>>> > >> > (file:/home/geode/.gradle/caches/modules-2/files-2.1/org.powermock/powermock-reflect/2.0.0-beta.5/4ea415348f15620783a1f26343d6732adfa86bc8/powermock-reflect-2.0.0-beta.5.jar) > >>>>>>> to method java.lang.Object.finalize() > >>>>>>> WARNING: Please consider reporting this to the maintainers of > >>>>>>> org.powermock.reflect.internal.WhiteboxImpl > >>>>>>> WARNING: Use --illegal-access=warn to enable warnings of further > >>>> illegal > >>>>>>> reflective access operations > >>>>>>> WARNING: All illegal access operations will be denied in a future > >>>> release > >>>>>>> WARNING: An illegal reflective access operation has occurred > >>>>>>> WARNING: Illegal reflective access by > >>>>>>> org.junit.contrib.java.lang.system.EnvironmentVariables > >>>>>>> > >> > (file:/home/geode/.gradle/caches/modules-2/files-2.1/com.github.stefanbirkner/system-rules/1.19.0/d541c9a1cff0dda32e2436c74562e2e4aa6c88cd/system-rules-1.19.0.jar) > >>>>>>> to field java.util.Collections$UnmodifiableMap.m > >>>>>>> WARNING: Please consider reporting this to the maintainers of > >>>>>>> org.junit.contrib.java.lang.system.EnvironmentVariables > >>>>>>> WARNING: Use --illegal-access=warn to enable warnings of further > >>>> illegal > >>>>>>> reflective access operations > >>>>>>> WARNING: All illegal access operations will be denied in a future > >>>> release > >>>>>>> timeout exceeded > >>>>>> > >>>> >