[ https://issues.apache.org/jira/browse/GEODE-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bruce J Schuchardt resolved GEODE-9011. --------------------------------------- Resolution: Invalid > hctKill.conf Error deserializing message causes hang > ---------------------------------------------------- > > Key: GEODE-9011 > URL: https://issues.apache.org/jira/browse/GEODE-9011 > Project: Geode > Issue Type: Test > Components: membership, messaging > Reporter: Bruce J Schuchardt > Priority: Major > > A test was reported hung when it tried to shut down. One server reported > this: > {noformat} > [warn 2021/03/06 09:45:18.783 PST bridgegemfire_1_1_host1_6920 > <vm_0_thr_0_bridge_1_1_host1_6920> tid=0x90] 15 seconds have elapsed while > waiting for replies: <DistributedCacheOperation$CacheOperationReplyProcessor > 66 waiting for 2 replies from > [rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_3_host1_582:582)<ec><v95>:41006, > > rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_4_host1_31258:31258)<ec><v88>:41005]> > on > rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007 > whose current membership list is: > [[rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_2_host1_7658:7658)<ec><v102>:41004, > > rs-FullRegression58615648a0i3large-hydra-client-18(locatorgemfire_1_2_host1_13486:13486:locator)<ec><v5>:41003, > > rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_3_host1_582:582)<ec><v95>:41006, > > rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_4_host1_31258:31258)<ec><v88>:41005, > > rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007, > > rs-FullRegression58615648a0i3large-hydra-client-18(locatorgemfire_1_1_host1_13950:13950:locator)<ec><v15>:41000]] > {noformat} > and was stuck waiting for a reply in thread dumps > {noformat} > "vm_0_thr_0_bridge_1_1_host1_6920" #144 daemon prio=5 os_prio=0 > tid=0x00007fec70058800 nid=0x1d28 waiting on condition [0x00007fec62063000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000f4f654f8> (a > java.util.concurrent.CountDownLatch$Sync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72) > at > org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:723) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:794) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:771) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:857) > at > org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:779) > at > org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:676) > at > org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277) > at > org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318) > at > org.apache.geode.internal.cache.DistributedRegion.distributeDestroyRegion(DistributedRegion.java:1865) > at > org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1844) > at > org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6180) > at > org.apache.geode.internal.cache.HARegion.destroyRegion(HARegion.java:331) > at > org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:476) > at > org.apache.geode.internal.cache.ha.HARegionQueue.destroy(HARegionQueue.java:3438) > at > org.apache.geode.internal.cache.ha.HARegionQueue$BlockingHARegionQueue.destroy(HARegionQueue.java:2272) > at > org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.destroyRQ(CacheClientProxy.java:1031) > at > org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.terminateDispatching(CacheClientProxy.java:939) > at > org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.shutdown(CacheClientNotifier.java:1306) > - locked <0x00000000f8022800> (a > org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.close(AcceptorImpl.java:1630) > - locked <0x00000000f5f7b888> (a java.lang.Object) > at > org.apache.geode.internal.cache.CacheServerImpl.stop(CacheServerImpl.java:491) > - locked <0x00000000f7ef2980> (a > org.apache.geode.internal.cache.CacheServerImpl) > at > org.apache.geode.internal.cache.GemFireCacheImpl.stopServers(GemFireCacheImpl.java:2672) > at > org.apache.geode.internal.cache.GemFireCacheImpl.doClose(GemFireCacheImpl.java:2263) > - locked <0x00000000f5a21a08> (a java.lang.Class for > org.apache.geode.internal.cache.GemFireCacheImpl) > at > org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2151) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1559) > - locked <0x00000000f5a21a08> (a java.lang.Class for > org.apache.geode.internal.cache.GemFireCacheImpl) > at > org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1257) > at hydra.RemoteTestModule$2.run(RemoteTestModule.java:388) > {noformat} > Server 31258 reported this deserialization problem in its log: > {noformat} > [fatal 2021/03/06 09:45:02.772 PST bridgegemfire_1_4_host1_31258 <P2P message > reader for > rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007 > unshared ordered sender uid=43 dom #1 local port=57557 remote port=43412> > tid=0x152] Error deserializing message > java.lang.IllegalStateException: unexpected byte: HASH_TABLE while reading > dsfid > at > org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2397) > at > org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2403) > at > org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:2979) > at > org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2797) > at > org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1651) > at org.apache.geode.internal.tcp.Connection.run(Connection.java:1482) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > The other server, 582, reported the same thing: > {noformat} > [fatal 2021/03/06 09:45:02.796 PST bridgegemfire_1_3_host1_582 <P2P message > reader for > rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007 > unshared ordered sender uid=42 dom #1 local port=58695 remote port=52758> > tid=0xcd] Error deserializing message > java.lang.IllegalStateException: unexpected byte: HASH_TABLE while reading > dsfid > at > org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2397) > at > org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2403) > at > org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:2979) > at > org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2797) > at > org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1651) > at org.apache.geode.internal.tcp.Connection.run(Connection.java:1482) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > The test run is here: > http://hydradb.gemfire.pivotal.io/hdb/testresult/9653486 > bugReportTemplate.txt: > {noformat} > Host name: rs-FullRegression58615648a0i3large-hydra-client-18 > OS name: Linux > Architecture: amd64 > OS version: 3.10.0-1160.15.2.el7.x86_64 > Java version: 1.8.0_282 > Java vm name: OpenJDK 64-Bit Server VM > Java vendor: BellSoft > Java home: /usr/local/regr/jdk/jdk-liberica-jdk8u282/jre > ##################################################### > > Product > Product-Name: Apache Geode > Product-Version: 1.15.0-build.36 > Build > Build-Id: geode 36 > Build-Java-Vendor: BellSoft > Build-Java-Version: 1.8.0_282 > Build-Platform: Linux 5.4.0-1037-gcp amd64 > Open > Source-Date: 2021-03-05 21:41:55 +0000 > Source-Repository: develop > Source-Revision: dc5541665b132dcf2f87a667ee34ee34ca223923 > Closed > GemFire-Source-Date=2021-03-04 16:36:54 -0800 > GemFire-Source-Repository=develop > GemFire-Source-Revision=b346c4572ef28cc9157a58d5021e948977d96966 > GemFire-Source-Status-Clean=true > > Running on: /10.32.110.103, 2 cpu(s), amd64 Linux > 3.10.0-1160.15.2.el7.x86_64 > ##################################################### > Test was run from rollingupgrade/rollingUpgradeWan.bt > Test: > rollingupgrade/newWan/hctKill.conf > bridgeHostsPerSite=4 > bridgeThreadsPerVM=2 > bridgeVMsPerHost=1 > clientMem=128m > edgeHostsPerSite=3 > edgeThreadsPerVM=5 > edgeVMsPerHost=1 > locatorHostsPerSite=2 > locatorThreadsPerVM=1 > locatorVMsPerHost=1 > maxOps=50000 > oldVersion=924 > resultWaitSec=600 > serverMem=256m > wanSites=2 > Run with local.conf: > hydra.JDKVersionPrms-javaHomes = default, > /usr/local/regr/jdk/jdk-liberica-jdk8u242/jre; > hydra.JDKVersionPrms-javaVendors = default, BellSoft; > hydra.JDKVersionPrms-javaVersions = default, 1.8.0_242; > hydra.JDKVersionPrms-javaVmNames = default, OpenJDK; > //randomSeed extracted from test: > hydra.Prms-randomSeed=1615051005312; > *** Test failed with this error: > THREAD vm_8_thr_16_edge_1_1_host1_9768 Subthread Dynamic Client VM Stopper > HANG Timeout stopping clients for DynamicCloseTasks > hydra.HydraTimeoutException: Failed to stop client vms within 300 seconds, > starting with vm_0 > at hydra.ClientMgr.waitForClientsToDie(ClientMgr.java:395) > at hydra.BaseTaskScheduler.stopClients(BaseTaskScheduler.java:128) > at hydra.ClientMgr.doDynamicCloseTasks(ClientMgr.java:980) > at hydra.ClientMgr.exitClientVm(ClientMgr.java:919) > at hydra.ClientMgr.stopClientVm(ClientMgr.java:760) > at hydra.ClientMgr._stopClientVm(ClientMgr.java:698) > at hydra.ClientMgr$2.run(ClientMgr.java:657) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)