[
https://issues.apache.org/jira/browse/GEODE-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bruce J Schuchardt updated GEODE-9011:
--------------------------------------
Description: submitted to the wrong JIRA - sorry (was: A test was reported
hung when it tried to shut down. One server reported this:
{noformat}
[warn 2021/03/06 09:45:18.783 PST bridgegemfire_1_1_host1_6920
<vm_0_thr_0_bridge_1_1_host1_6920> tid=0x90] 15 seconds have elapsed while
waiting for replies: <DistributedCacheOperation$CacheOperationReplyProcessor 66
waiting for 2 replies from
[rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_3_host1_582:582)<ec><v95>:41006,
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_4_host1_31258:31258)<ec><v88>:41005]>
on
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
whose current membership list is:
[[rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_2_host1_7658:7658)<ec><v102>:41004,
rs-FullRegression58615648a0i3large-hydra-client-18(locatorgemfire_1_2_host1_13486:13486:locator)<ec><v5>:41003,
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_3_host1_582:582)<ec><v95>:41006,
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_4_host1_31258:31258)<ec><v88>:41005,
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007,
rs-FullRegression58615648a0i3large-hydra-client-18(locatorgemfire_1_1_host1_13950:13950:locator)<ec><v15>:41000]]
{noformat}
and was stuck waiting for a reply in thread dumps
{noformat}
"vm_0_thr_0_bridge_1_1_host1_6920" #144 daemon prio=5 os_prio=0
tid=0x00007fec70058800 nid=0x1d28 waiting on condition [0x00007fec62063000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f4f654f8> (a
java.util.concurrent.CountDownLatch$Sync)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
at
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:723)
at
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:794)
at
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:771)
at
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:857)
at
org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:779)
at
org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:676)
at
org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
at
org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
at
org.apache.geode.internal.cache.DistributedRegion.distributeDestroyRegion(DistributedRegion.java:1865)
at
org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1844)
at
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6180)
at
org.apache.geode.internal.cache.HARegion.destroyRegion(HARegion.java:331)
at
org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:476)
at
org.apache.geode.internal.cache.ha.HARegionQueue.destroy(HARegionQueue.java:3438)
at
org.apache.geode.internal.cache.ha.HARegionQueue$BlockingHARegionQueue.destroy(HARegionQueue.java:2272)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.destroyRQ(CacheClientProxy.java:1031)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.terminateDispatching(CacheClientProxy.java:939)
at
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.shutdown(CacheClientNotifier.java:1306)
- locked <0x00000000f8022800> (a
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier)
at
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.close(AcceptorImpl.java:1630)
- locked <0x00000000f5f7b888> (a java.lang.Object)
at
org.apache.geode.internal.cache.CacheServerImpl.stop(CacheServerImpl.java:491)
- locked <0x00000000f7ef2980> (a
org.apache.geode.internal.cache.CacheServerImpl)
at
org.apache.geode.internal.cache.GemFireCacheImpl.stopServers(GemFireCacheImpl.java:2672)
at
org.apache.geode.internal.cache.GemFireCacheImpl.doClose(GemFireCacheImpl.java:2263)
- locked <0x00000000f5a21a08> (a java.lang.Class for
org.apache.geode.internal.cache.GemFireCacheImpl)
at
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2151)
at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1559)
- locked <0x00000000f5a21a08> (a java.lang.Class for
org.apache.geode.internal.cache.GemFireCacheImpl)
at
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1257)
at hydra.RemoteTestModule$2.run(RemoteTestModule.java:388)
{noformat}
Server 31258 reported this deserialization problem in its log:
{noformat}
[fatal 2021/03/06 09:45:02.772 PST bridgegemfire_1_4_host1_31258 <P2P message
reader for
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
unshared ordered sender uid=43 dom #1 local port=57557 remote port=43412>
tid=0x152] Error deserializing message
java.lang.IllegalStateException: unexpected byte: HASH_TABLE while reading dsfid
at
org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2397)
at
org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2403)
at
org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:2979)
at
org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2797)
at
org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1651)
at org.apache.geode.internal.tcp.Connection.run(Connection.java:1482)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}
The other server, 582, reported the same thing:
{noformat}
[fatal 2021/03/06 09:45:02.796 PST bridgegemfire_1_3_host1_582 <P2P message
reader for
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
unshared ordered sender uid=42 dom #1 local port=58695 remote port=52758>
tid=0xcd] Error deserializing message
java.lang.IllegalStateException: unexpected byte: HASH_TABLE while reading dsfid
at
org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2397)
at
org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2403)
at
org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:2979)
at
org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2797)
at
org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1651)
at org.apache.geode.internal.tcp.Connection.run(Connection.java:1482)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}
The test run is here:
http://hydradb.gemfire.pivotal.io/hdb/testresult/9653486
bugReportTemplate.txt:
{noformat}
Host name: rs-FullRegression58615648a0i3large-hydra-client-18
OS name: Linux
Architecture: amd64
OS version: 3.10.0-1160.15.2.el7.x86_64
Java version: 1.8.0_282
Java vm name: OpenJDK 64-Bit Server VM
Java vendor: BellSoft
Java home: /usr/local/regr/jdk/jdk-liberica-jdk8u282/jre
#####################################################
Product
Product-Name: Apache Geode
Product-Version: 1.15.0-build.36
Build
Build-Id: geode 36
Build-Java-Vendor: BellSoft
Build-Java-Version: 1.8.0_282
Build-Platform: Linux 5.4.0-1037-gcp amd64
Open
Source-Date: 2021-03-05 21:41:55 +0000
Source-Repository: develop
Source-Revision: dc5541665b132dcf2f87a667ee34ee34ca223923
Closed
GemFire-Source-Date=2021-03-04 16:36:54 -0800
GemFire-Source-Repository=develop
GemFire-Source-Revision=b346c4572ef28cc9157a58d5021e948977d96966
GemFire-Source-Status-Clean=true
Running on: /10.32.110.103, 2 cpu(s), amd64 Linux
3.10.0-1160.15.2.el7.x86_64
#####################################################
Test was run from rollingupgrade/rollingUpgradeWan.bt
Test:
rollingupgrade/newWan/hctKill.conf
bridgeHostsPerSite=4
bridgeThreadsPerVM=2
bridgeVMsPerHost=1
clientMem=128m
edgeHostsPerSite=3
edgeThreadsPerVM=5
edgeVMsPerHost=1
locatorHostsPerSite=2
locatorThreadsPerVM=1
locatorVMsPerHost=1
maxOps=50000
oldVersion=924
resultWaitSec=600
serverMem=256m
wanSites=2
Run with local.conf:
hydra.JDKVersionPrms-javaHomes = default,
/usr/local/regr/jdk/jdk-liberica-jdk8u242/jre;
hydra.JDKVersionPrms-javaVendors = default, BellSoft;
hydra.JDKVersionPrms-javaVersions = default, 1.8.0_242;
hydra.JDKVersionPrms-javaVmNames = default, OpenJDK;
//randomSeed extracted from test:
hydra.Prms-randomSeed=1615051005312;
*** Test failed with this error:
THREAD vm_8_thr_16_edge_1_1_host1_9768 Subthread Dynamic Client VM Stopper
HANG Timeout stopping clients for DynamicCloseTasks
hydra.HydraTimeoutException: Failed to stop client vms within 300 seconds,
starting with vm_0
at hydra.ClientMgr.waitForClientsToDie(ClientMgr.java:395)
at hydra.BaseTaskScheduler.stopClients(BaseTaskScheduler.java:128)
at hydra.ClientMgr.doDynamicCloseTasks(ClientMgr.java:980)
at hydra.ClientMgr.exitClientVm(ClientMgr.java:919)
at hydra.ClientMgr.stopClientVm(ClientMgr.java:760)
at hydra.ClientMgr._stopClientVm(ClientMgr.java:698)
at hydra.ClientMgr$2.run(ClientMgr.java:657)
at java.lang.Thread.run(Thread.java:748)
{noformat}
)
> (deleted)
> ---------
>
> Key: GEODE-9011
> URL: https://issues.apache.org/jira/browse/GEODE-9011
> Project: Geode
> Issue Type: Test
> Components: membership, messaging
> Reporter: Bruce J Schuchardt
> Priority: Major
>
> submitted to the wrong JIRA - sorry
--
This message was sent by Atlassian Jira
(v8.3.4#803005)