[
https://issues.apache.org/jira/browse/GEODE-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166582#comment-17166582
]
Bruce J Schuchardt commented on GEODE-8267:
-------------------------------------------
See also GEODE-8389, which has a suspicious auto-reconnect error from the same
run.
Stack traces from the artifacts also show these dangling auto-reconnect
threads, which would be from a previous test and may be blocking the test that
hung.
{noformat}
"ReconnectThread" #97 prio=5 os_prio=0 cpu=6655.62ms elapsed=5595.75s
tid=0x00007f6f2c4e5800 nid=0x2ea in Object.wait()
[0x00007f6e6fbfc000]"ReconnectThread" #97 prio=5 os_prio=0 cpu=6655.62ms
elapsed=5595.75s tid=0x00007f6f2c4e5800 nid=0x2ea in Object.wait()
[0x00007f6e6fbfc000] java.lang.Thread.State: TIMED_WAITING (on object
monitor) at java.lang.Object.wait([email protected]/Native Method) - waiting on
<no object reference available> at
org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2569)
at
org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2424)
- waiting to re-lock in wait() <0x00000000e063ad70> (a java.lang.Object) -
locked <0x00000000e10b6060> (a java.lang.Class for
org.apache.geode.internal.cache.GemFireCacheImpl) - locked <0x00000000e0cd2348>
(a java.lang.Class for org.apache.geode.internal.cache.InternalCacheBuilder) at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1275)
at
org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2315)
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1287)
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$forceDisconnect$0(GMSMembership.java:2030)
at
org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl$$Lambda$453/0x0000000840bbe440.run(Unknown
Source) at java.lang.Thread.run([email protected]/Thread.java:834)
Locked ownable synchronizers: - None
"RMI TCP Connection(5)-172.17.0.11" #310 daemon prio=5 os_prio=0 cpu=269.79ms
elapsed=5330.58s tid=0x00007f6f30001800 nid=0x5b8 waiting for monitor entry
[0x00007f6f359da000] java.lang.Thread.State: BLOCKED (on object monitor) at
org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:156)
- waiting to lock <0x00000000e0cd2348> (a java.lang.Class for
org.apache.geode.internal.cache.InternalCacheBuilder) at
org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142) at
org.apache.geode.test.junit.rules.ServerStarterRule.startServer(ServerStarterRule.java:199)
at
org.apache.geode.test.junit.rules.ServerStarterRule.before(ServerStarterRule.java:91)
at
org.apache.geode.test.dunit.rules.ClusterStartupRule.lambda$startServerVM$729766c4$1(ClusterStartupRule.java:277)
at
org.apache.geode.test.dunit.rules.ClusterStartupRule$$Lambda$139/0x0000000840a2b440.call(Unknown
Source) at
org.apache.geode.test.dunit.internal.IdentifiableCallable.call(IdentifiableCallable.java:41)
at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0([email protected]/Native
Method) at
jdk.internal.reflect.NativeMethodAccessorImpl.invoke([email protected]/NativeMethodAccessorImpl.java:62)
at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke([email protected]/Method.java:566) at
org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
at
org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:78)
at jdk.internal.reflect.GeneratedMethodAccessor250.invoke(Unknown Source) at
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke([email protected]/Method.java:566) at
sun.rmi.server.UnicastServerRef.dispatch([email protected]/UnicastServerRef.java:359)
at sun.rmi.transport.Transport$1.run([email protected]/Transport.java:200) at
sun.rmi.transport.Transport$1.run([email protected]/Transport.java:197) at
java.security.AccessController.doPrivileged([email protected]/Native Method) at
sun.rmi.transport.Transport.serviceCall([email protected]/Transport.java:196) at
sun.rmi.transport.tcp.TCPTransport.handleMessages([email protected]/TCPTransport.java:562)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0([email protected]/TCPTransport.java:796)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0([email protected]/TCPTransport.java:677)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$134/0x0000000840a25840.run([email protected]/Unknown
Source) at java.security.AccessController.doPrivileged([email protected]/Native
Method) at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run([email protected]/TCPTransport.java:676)
at
java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
at java.lang.Thread.run([email protected]/Thread.java:834) {noformat}
> serverRestartsAfterOneLocatorDies hangs
> ---------------------------------------
>
> Key: GEODE-8267
> URL: https://issues.apache.org/jira/browse/GEODE-8267
> Project: Geode
> Issue Type: Bug
> Components: configuration, locator, membership
> Reporter: Bill Burcham
> Priority: Major
>
> hang:
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK11/builds/275#A]
>
> The test hung in serverRestartsAfterOneLocatorDies after another failure in
> the same test class.
> Here's the hung thread:
> {noformat}
> "Test worker" #27 prio=5 os_prio=0 cpu=5016.73ms elapsed=5638.52s
> tid=0x00007f01c8ad4800 nid=0x18 runnable [0x00007f019872c000]"Test worker"
> #27 prio=5 os_prio=0 cpu=5016.73ms elapsed=5638.52s tid=0x00007f01c8ad4800
> nid=0x18 runnable [0x00007f019872c000] java.lang.Thread.State: RUNNABLE at
> java.net.SocketInputStream.socketRead0([email protected]/Native Method) at
> java.net.SocketInputStream.socketRead([email protected]/SocketInputStream.java:115)
> at
> java.net.SocketInputStream.read([email protected]/SocketInputStream.java:168)
> at
> java.net.SocketInputStream.read([email protected]/SocketInputStream.java:140)
> at
> java.io.BufferedInputStream.fill([email protected]/BufferedInputStream.java:252)
> at
> java.io.BufferedInputStream.read([email protected]/BufferedInputStream.java:271)
> - locked <0x00000000d08fe7a0> (a java.io.BufferedInputStream) at
> java.io.DataInputStream.readByte([email protected]/DataInputStream.java:270)
> at
> sun.rmi.transport.StreamRemoteCall.executeCall([email protected]/StreamRemoteCall.java:240)
> at sun.rmi.server.UnicastRef.invoke([email protected]/UnicastRef.java:164) at
> java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod([email protected]/RemoteObjectInvocationHandler.java:217)
> at
> java.rmi.server.RemoteObjectInvocationHandler.invoke([email protected]/RemoteObjectInvocationHandler.java:162)
> at com.sun.proxy.$Proxy53.executeMethodOnObject(Unknown Source) at
> org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:607) at
> org.apache.geode.test.dunit.VM.invoke(VM.java:450) at
> org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:268)
> at
> org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:261)
> at
> org.apache.geode.test.dunit.rules.ClusterStartupRule.startServerVM(ClusterStartupRule.java:256)
> at
> org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest.serverRestartsAfterOneLocatorDies(ClusterConfigLocatorRestartDUnitTest.java:114)
> at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0([email protected]/Native
> Method) {noformat}
> Here's the previous test failure, which may have affected the test that hung:
> {code:java}
> org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest
> > serverRestartHangsWaitingForStartupMessageResponse FAILED
> org.junit.runners.model.TestTimedOutException: test timed out after
> 300000 milliseconds
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
> at java.net.SocketInputStream.read(SocketInputStream.java:168)
> at java.net.SocketInputStream.read(SocketInputStream.java:140)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:271)
> at java.io.DataInputStream.readByte(DataInputStream.java:270)
> at
> sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:240)
> at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:164)
> at
> java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:217)
> at
> java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:162)
> at com.sun.proxy.$Proxy53.executeMethodOnObject(Unknown Source)
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:607)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:437)
> at
> org.apache.geode.test.junit.rules.VMProvider.invoke(VMProvider.java:94)
> at
> org.apache.geode.management.internal.configuration.ClusterConfigLocatorRestartDUnitTest.serverRestartHangsWaitingForStartupMessageResponse(ClusterConfigLocatorRestartDUnitTest.java:176)
> {code}
> Seems like 300s should be long enough so I fear there may be a real problem
> here.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)