[ 
https://issues.apache.org/jira/browse/GEODE-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce J Schuchardt resolved GEODE-9011.
---------------------------------------
    Resolution: Invalid

> hctKill.conf Error deserializing message causes hang
> ----------------------------------------------------
>
>                 Key: GEODE-9011
>                 URL: https://issues.apache.org/jira/browse/GEODE-9011
>             Project: Geode
>          Issue Type: Test
>          Components: membership, messaging
>            Reporter: Bruce J Schuchardt
>            Priority: Major
>
> A test was reported hung when it tried to shut down.  One server reported 
> this:
> {noformat}
> [warn 2021/03/06 09:45:18.783 PST bridgegemfire_1_1_host1_6920 
> <vm_0_thr_0_bridge_1_1_host1_6920> tid=0x90] 15 seconds have elapsed while 
> waiting for replies: <DistributedCacheOperation$CacheOperationReplyProcessor 
> 66 waiting for 2 replies from 
> [rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_3_host1_582:582)<ec><v95>:41006,
>  
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_4_host1_31258:31258)<ec><v88>:41005]>
>  on 
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
>  whose current membership list is: 
> [[rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_2_host1_7658:7658)<ec><v102>:41004,
>  
> rs-FullRegression58615648a0i3large-hydra-client-18(locatorgemfire_1_2_host1_13486:13486:locator)<ec><v5>:41003,
>  
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_3_host1_582:582)<ec><v95>:41006,
>  
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_4_host1_31258:31258)<ec><v88>:41005,
>  
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007,
>  
> rs-FullRegression58615648a0i3large-hydra-client-18(locatorgemfire_1_1_host1_13950:13950:locator)<ec><v15>:41000]]
> {noformat}
> and was stuck waiting for a reply in thread dumps
> {noformat}
> "vm_0_thr_0_bridge_1_1_host1_6920" #144 daemon prio=5 os_prio=0 
> tid=0x00007fec70058800 nid=0x1d28 waiting on condition [0x00007fec62063000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000000f4f654f8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>       at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>       at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>       at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
>       at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:723)
>       at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:794)
>       at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:771)
>       at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:857)
>       at 
> org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:779)
>       at 
> org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:676)
>       at 
> org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
>       at 
> org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
>       at 
> org.apache.geode.internal.cache.DistributedRegion.distributeDestroyRegion(DistributedRegion.java:1865)
>       at 
> org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1844)
>       at 
> org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6180)
>       at 
> org.apache.geode.internal.cache.HARegion.destroyRegion(HARegion.java:331)
>       at 
> org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:476)
>       at 
> org.apache.geode.internal.cache.ha.HARegionQueue.destroy(HARegionQueue.java:3438)
>       at 
> org.apache.geode.internal.cache.ha.HARegionQueue$BlockingHARegionQueue.destroy(HARegionQueue.java:2272)
>       at 
> org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.destroyRQ(CacheClientProxy.java:1031)
>       at 
> org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.terminateDispatching(CacheClientProxy.java:939)
>       at 
> org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.shutdown(CacheClientNotifier.java:1306)
>       - locked <0x00000000f8022800> (a 
> org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier)
>       at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.close(AcceptorImpl.java:1630)
>       - locked <0x00000000f5f7b888> (a java.lang.Object)
>       at 
> org.apache.geode.internal.cache.CacheServerImpl.stop(CacheServerImpl.java:491)
>       - locked <0x00000000f7ef2980> (a 
> org.apache.geode.internal.cache.CacheServerImpl)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.stopServers(GemFireCacheImpl.java:2672)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.doClose(GemFireCacheImpl.java:2263)
>       - locked <0x00000000f5a21a08> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>       at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2151)
>       at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1559)
>       - locked <0x00000000f5a21a08> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
>       at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1257)
>       at hydra.RemoteTestModule$2.run(RemoteTestModule.java:388)
> {noformat}
> Server 31258 reported this deserialization problem in its log:
> {noformat}
> [fatal 2021/03/06 09:45:02.772 PST bridgegemfire_1_4_host1_31258 <P2P message 
> reader for 
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
>  unshared ordered sender uid=43 dom #1 local port=57557 remote port=43412> 
> tid=0x152] Error deserializing message
> java.lang.IllegalStateException: unexpected byte: HASH_TABLE while reading 
> dsfid
>       at 
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2397)
>       at 
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2403)
>       at 
> org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:2979)
>       at 
> org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2797)
>       at 
> org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1651)
>       at org.apache.geode.internal.tcp.Connection.run(Connection.java:1482)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The other server, 582, reported the same thing:
> {noformat}
> [fatal 2021/03/06 09:45:02.796 PST bridgegemfire_1_3_host1_582 <P2P message 
> reader for 
> rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
>  unshared ordered sender uid=42 dom #1 local port=58695 remote port=52758> 
> tid=0xcd] Error deserializing message
> java.lang.IllegalStateException: unexpected byte: HASH_TABLE while reading 
> dsfid
>       at 
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2397)
>       at 
> org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2403)
>       at 
> org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:2979)
>       at 
> org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2797)
>       at 
> org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1651)
>       at org.apache.geode.internal.tcp.Connection.run(Connection.java:1482)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The test run is here:
> http://hydradb.gemfire.pivotal.io/hdb/testresult/9653486
> bugReportTemplate.txt:
> {noformat}
> Host name: rs-FullRegression58615648a0i3large-hydra-client-18
> OS name: Linux
> Architecture: amd64
> OS version: 3.10.0-1160.15.2.el7.x86_64
> Java version: 1.8.0_282
> Java vm name: OpenJDK 64-Bit Server VM
> Java vendor: BellSoft
> Java home: /usr/local/regr/jdk/jdk-liberica-jdk8u282/jre
>   #####################################################
>   
>   Product
>     Product-Name: Apache Geode
>     Product-Version: 1.15.0-build.36
>   Build
>     Build-Id: geode 36
>     Build-Java-Vendor: BellSoft
>     Build-Java-Version: 1.8.0_282
>     Build-Platform: Linux 5.4.0-1037-gcp amd64
>   Open
>     Source-Date: 2021-03-05 21:41:55 +0000
>     Source-Repository: develop
>     Source-Revision: dc5541665b132dcf2f87a667ee34ee34ca223923
>   Closed
>     GemFire-Source-Date=2021-03-04 16:36:54 -0800
>     GemFire-Source-Repository=develop
>     GemFire-Source-Revision=b346c4572ef28cc9157a58d5021e948977d96966
>     GemFire-Source-Status-Clean=true
>   
>     Running on: /10.32.110.103, 2 cpu(s), amd64 Linux 
> 3.10.0-1160.15.2.el7.x86_64 
>   #####################################################
> Test was run from rollingupgrade/rollingUpgradeWan.bt
> Test:
> rollingupgrade/newWan/hctKill.conf
>    bridgeHostsPerSite=4
>    bridgeThreadsPerVM=2
>    bridgeVMsPerHost=1
>    clientMem=128m
>    edgeHostsPerSite=3
>    edgeThreadsPerVM=5
>    edgeVMsPerHost=1
>    locatorHostsPerSite=2
>    locatorThreadsPerVM=1
>    locatorVMsPerHost=1
>    maxOps=50000
>    oldVersion=924
>    resultWaitSec=600
>    serverMem=256m
>    wanSites=2
> Run with local.conf:
> hydra.JDKVersionPrms-javaHomes    = default, 
> /usr/local/regr/jdk/jdk-liberica-jdk8u242/jre;
> hydra.JDKVersionPrms-javaVendors  = default, BellSoft;
> hydra.JDKVersionPrms-javaVersions = default, 1.8.0_242;
> hydra.JDKVersionPrms-javaVmNames  = default, OpenJDK;
> //randomSeed extracted from test:
> hydra.Prms-randomSeed=1615051005312;
> *** Test failed with this error:
> THREAD vm_8_thr_16_edge_1_1_host1_9768 Subthread Dynamic Client VM Stopper
> HANG Timeout stopping clients for DynamicCloseTasks
> hydra.HydraTimeoutException: Failed to stop client vms within 300 seconds, 
> starting with vm_0
>       at hydra.ClientMgr.waitForClientsToDie(ClientMgr.java:395)
>       at hydra.BaseTaskScheduler.stopClients(BaseTaskScheduler.java:128)
>       at hydra.ClientMgr.doDynamicCloseTasks(ClientMgr.java:980)
>       at hydra.ClientMgr.exitClientVm(ClientMgr.java:919)
>       at hydra.ClientMgr.stopClientVm(ClientMgr.java:760)
>       at hydra.ClientMgr._stopClientVm(ClientMgr.java:698)
>       at hydra.ClientMgr$2.run(ClientMgr.java:657)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to