[
https://issues.apache.org/jira/browse/GEODE-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bruce J Schuchardt resolved GEODE-9011.
---------------------------------------
Resolution: Invalid
> hctKill.conf Error deserializing message causes hang
> ----------------------------------------------------
>
> Key: GEODE-9011
> URL: https://issues.apache.org/jira/browse/GEODE-9011
> Project: Geode
> Issue Type: Test
> Components: membership, messaging
> Reporter: Bruce J Schuchardt
> Priority: Major
>
> A test was reported hung when it tried to shut down. One server reported
> this:
> {noformat}
> [warn 2021/03/06 09:45:18.783 PST bridgegemfire_1_1_host1_6920
> <vm_0_thr_0_bridge_1_1_host1_6920> tid=0x90] 15 seconds have elapsed while
> waiting for replies: <DistributedCacheOperation$CacheOperationReplyProcessor
> 66 waiting for 2 replies from
> [rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_3_host1_582:582)<ec><v95>:41006,
>
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_4_host1_31258:31258)<ec><v88>:41005]>
> on
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
> whose current membership list is:
> [[rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_2_host1_7658:7658)<ec><v102>:41004,
>
> rs-FullRegression58615648a0i3large-hydra-client-18(locatorgemfire_1_2_host1_13486:13486:locator)<ec><v5>:41003,
>
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_3_host1_582:582)<ec><v95>:41006,
>
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_4_host1_31258:31258)<ec><v88>:41005,
>
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007,
>
> rs-FullRegression58615648a0i3large-hydra-client-18(locatorgemfire_1_1_host1_13950:13950:locator)<ec><v15>:41000]]
> {noformat}
> and was stuck waiting for a reply in thread dumps
> {noformat}
> "vm_0_thr_0_bridge_1_1_host1_6920" #144 daemon prio=5 os_prio=0
> tid=0x00007fec70058800 nid=0x1d28 waiting on condition [0x00007fec62063000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000f4f654f8> (a
> java.util.concurrent.CountDownLatch$Sync)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
> at
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:723)
> at
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:794)
> at
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:771)
> at
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:857)
> at
> org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:779)
> at
> org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:676)
> at
> org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
> at
> org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
> at
> org.apache.geode.internal.cache.DistributedRegion.distributeDestroyRegion(DistributedRegion.java:1865)
> at
> org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1844)
> at
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6180)
> at
> org.apache.geode.internal.cache.HARegion.destroyRegion(HARegion.java:331)
> at
> org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:476)
> at
> org.apache.geode.internal.cache.ha.HARegionQueue.destroy(HARegionQueue.java:3438)
> at
> org.apache.geode.internal.cache.ha.HARegionQueue$BlockingHARegionQueue.destroy(HARegionQueue.java:2272)
> at
> org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.destroyRQ(CacheClientProxy.java:1031)
> at
> org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.terminateDispatching(CacheClientProxy.java:939)
> at
> org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.shutdown(CacheClientNotifier.java:1306)
> - locked <0x00000000f8022800> (a
> org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier)
> at
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.close(AcceptorImpl.java:1630)
> - locked <0x00000000f5f7b888> (a java.lang.Object)
> at
> org.apache.geode.internal.cache.CacheServerImpl.stop(CacheServerImpl.java:491)
> - locked <0x00000000f7ef2980> (a
> org.apache.geode.internal.cache.CacheServerImpl)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.stopServers(GemFireCacheImpl.java:2672)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.doClose(GemFireCacheImpl.java:2263)
> - locked <0x00000000f5a21a08> (a java.lang.Class for
> org.apache.geode.internal.cache.GemFireCacheImpl)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2151)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1559)
> - locked <0x00000000f5a21a08> (a java.lang.Class for
> org.apache.geode.internal.cache.GemFireCacheImpl)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1257)
> at hydra.RemoteTestModule$2.run(RemoteTestModule.java:388)
> {noformat}
> Server 31258 reported this deserialization problem in its log:
> {noformat}
> [fatal 2021/03/06 09:45:02.772 PST bridgegemfire_1_4_host1_31258 <P2P message
> reader for
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
> unshared ordered sender uid=43 dom #1 local port=57557 remote port=43412>
> tid=0x152] Error deserializing message
> java.lang.IllegalStateException: unexpected byte: HASH_TABLE while reading
> dsfid
> at
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2397)
> at
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2403)
> at
> org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:2979)
> at
> org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2797)
> at
> org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1651)
> at org.apache.geode.internal.tcp.Connection.run(Connection.java:1482)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The other server, 582, reported the same thing:
> {noformat}
> [fatal 2021/03/06 09:45:02.796 PST bridgegemfire_1_3_host1_582 <P2P message
> reader for
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
> unshared ordered sender uid=42 dom #1 local port=58695 remote port=52758>
> tid=0xcd] Error deserializing message
> java.lang.IllegalStateException: unexpected byte: HASH_TABLE while reading
> dsfid
> at
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2397)
> at
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2403)
> at
> org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:2979)
> at
> org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2797)
> at
> org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1651)
> at org.apache.geode.internal.tcp.Connection.run(Connection.java:1482)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The test run is here:
> http://hydradb.gemfire.pivotal.io/hdb/testresult/9653486
> bugReportTemplate.txt:
> {noformat}
> Host name: rs-FullRegression58615648a0i3large-hydra-client-18
> OS name: Linux
> Architecture: amd64
> OS version: 3.10.0-1160.15.2.el7.x86_64
> Java version: 1.8.0_282
> Java vm name: OpenJDK 64-Bit Server VM
> Java vendor: BellSoft
> Java home: /usr/local/regr/jdk/jdk-liberica-jdk8u282/jre
> #####################################################
>
> Product
> Product-Name: Apache Geode
> Product-Version: 1.15.0-build.36
> Build
> Build-Id: geode 36
> Build-Java-Vendor: BellSoft
> Build-Java-Version: 1.8.0_282
> Build-Platform: Linux 5.4.0-1037-gcp amd64
> Open
> Source-Date: 2021-03-05 21:41:55 +0000
> Source-Repository: develop
> Source-Revision: dc5541665b132dcf2f87a667ee34ee34ca223923
> Closed
> GemFire-Source-Date=2021-03-04 16:36:54 -0800
> GemFire-Source-Repository=develop
> GemFire-Source-Revision=b346c4572ef28cc9157a58d5021e948977d96966
> GemFire-Source-Status-Clean=true
>
> Running on: /10.32.110.103, 2 cpu(s), amd64 Linux
> 3.10.0-1160.15.2.el7.x86_64
> #####################################################
> Test was run from rollingupgrade/rollingUpgradeWan.bt
> Test:
> rollingupgrade/newWan/hctKill.conf
> bridgeHostsPerSite=4
> bridgeThreadsPerVM=2
> bridgeVMsPerHost=1
> clientMem=128m
> edgeHostsPerSite=3
> edgeThreadsPerVM=5
> edgeVMsPerHost=1
> locatorHostsPerSite=2
> locatorThreadsPerVM=1
> locatorVMsPerHost=1
> maxOps=50000
> oldVersion=924
> resultWaitSec=600
> serverMem=256m
> wanSites=2
> Run with local.conf:
> hydra.JDKVersionPrms-javaHomes = default,
> /usr/local/regr/jdk/jdk-liberica-jdk8u242/jre;
> hydra.JDKVersionPrms-javaVendors = default, BellSoft;
> hydra.JDKVersionPrms-javaVersions = default, 1.8.0_242;
> hydra.JDKVersionPrms-javaVmNames = default, OpenJDK;
> //randomSeed extracted from test:
> hydra.Prms-randomSeed=1615051005312;
> *** Test failed with this error:
> THREAD vm_8_thr_16_edge_1_1_host1_9768 Subthread Dynamic Client VM Stopper
> HANG Timeout stopping clients for DynamicCloseTasks
> hydra.HydraTimeoutException: Failed to stop client vms within 300 seconds,
> starting with vm_0
> at hydra.ClientMgr.waitForClientsToDie(ClientMgr.java:395)
> at hydra.BaseTaskScheduler.stopClients(BaseTaskScheduler.java:128)
> at hydra.ClientMgr.doDynamicCloseTasks(ClientMgr.java:980)
> at hydra.ClientMgr.exitClientVm(ClientMgr.java:919)
> at hydra.ClientMgr.stopClientVm(ClientMgr.java:760)
> at hydra.ClientMgr._stopClientVm(ClientMgr.java:698)
> at hydra.ClientMgr$2.run(ClientMgr.java:657)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)