[
https://issues.apache.org/jira/browse/GEODE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176536#comment-17176536
]
Bill Burcham commented on GEODE-8267:
-------------------------------------
the test `serverRestartHangsWaitingForStartupMessageResponse` was added (by
[~echobravo] and [~upthewaterspout]) as part of this ticket a year ago (in
July, 2019): GEODE-6904 _Reconnecting locator has many hung threads, causing
members to startup without cluster configuration_
> serverRestartsAfterOneLocatorDies hangs
> ---------------------------------------
>
> Key: GEODE-8267
> URL: https://issues.apache.org/jira/browse/GEODE-8267
> Project: Geode
> Issue Type: Bug
> Components: configuration, locator, membership
> Reporter: Bill Burcham
> Priority: Major
>
> hang:
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/275#A]
>
> The test hung in serverRestartsAfterOneLocatorDies after another failure in
> the same test class.
> Here's the hung thread:
> {noformat}
> "Test worker" #27 prio=5 os_prio=0 cpu=5016.73ms elapsed=5638.52s
> tid=0x00007f01c8ad4800 nid=0x18 runnable [0x00007f019872c000]"Test worker"
> #27 prio=5 os_prio=0 cpu=5016.73ms elapsed=5638.52s tid=0x00007f01c8ad4800
> nid=0x18 runnable [0x00007f019872c000] java.lang.Thread.State: RUNNABLE at
> java.net.SocketInputStream.socketRead0([email protected]/Native Method) at
> java.net.SocketInputStream.socketRead([email protected]/SocketInputStream.java:115)
> at
> java.net.SocketInputStream.read([email protected]/SocketInputStream.java:168)
> at
> java.net.SocketInputStream.read([email protected]/SocketInputStream.java:140)
> at
> java.io.BufferedInputStream.fill([email protected]/BufferedInputStream.java:252)
> at
> java.io.BufferedInputStream.read([email protected]/BufferedInputStream.java:271)
> - locked <0x00000000d08fe7a0> (a java.io.BufferedInputStream) at
> java.io.DataInputStream.readByte([email protected]/DataInputStream.java:270)
> at
> sun.rmi.transport.StreamRemoteCall.executeCall([email protected]/StreamRemoteCall.java:240)
> at sun.rmi.server.UnicastRef.invoke([email protected]/UnicastRef.java:164) at
> java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod([email protected]/RemoteObjectInvocationHandler.java:217)
> at
> java.rmi.server.RemoteObjectInvocationHandler.invoke([email protected]/RemoteObjectInvocationHandler.java:162)
> at com.sun.proxy.$Proxy53.executeMethodOnObject(Unknown Source) at
> org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:607) at
> org.apache.geode.test.dunit.VM.invoke(VM.java:450) at
> org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:268)
> at
> org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:261)
> at
> org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:256)
> at
> org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest.serverRestartsAfterOneLocatorDies(ClusterConfigLocatorRestartDUnitTest.java:114)
> at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0([email protected]/Native
> Method) {noformat}
> Here's the previous test failure, which may have affected the test that hung:
> {code:java}
> org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest
> > serverRestartHangsWaitingForStartupMessageResponse FAILED
> org.junit.runners.model.TestTimedOutException: test timed out after
> 300000 milliseconds
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
> at java.net.SocketInputStream.read(SocketInputStream.java:168)
> at java.net.SocketInputStream.read(SocketInputStream.java:140)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:271)
> at java.io.DataInputStream.readByte(DataInputStream.java:270)
> at
> sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:240)
> at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:164)
> at
> java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:217)
> at
> java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:162)
> at com.sun.proxy.$Proxy53.executeMethodOnObject(Unknown Source)
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:607)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:437)
> at
> org.apache.geode.test.junit.rules.VMProvider.invoke(VMProvider.java:94)
> at
> org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest.serverRestartHangsWaitingForStartupMessageResponse(ClusterConfigLocatorRestartDUnitTest.java:176)
> {code}
> Seems like 300s should be long enough so I fear there may be a real problem
> here.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)