[ 
https://issues.apache.org/jira/browse/GEODE-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce J Schuchardt updated GEODE-9011:
--------------------------------------
    Description: submitted to the wrong JIRA - sorry  (was: A test was reported 
hung when it tried to shut down.  One server reported this:

{noformat}
[warn 2021/03/06 09:45:18.783 PST bridgegemfire_1_1_host1_6920 
<vm_0_thr_0_bridge_1_1_host1_6920> tid=0x90] 15 seconds have elapsed while 
waiting for replies: <DistributedCacheOperation$CacheOperationReplyProcessor 66 
waiting for 2 replies from 
[rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_3_host1_582:582)<ec><v95>:41006,
 
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_4_host1_31258:31258)<ec><v88>:41005]>
 on 
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
 whose current membership list is: 
[[rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_2_host1_7658:7658)<ec><v102>:41004,
 
rs-FullRegression58615648a0i3large-hydra-client-18(locatorgemfire_1_2_host1_13486:13486:locator)<ec><v5>:41003,
 
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_3_host1_582:582)<ec><v95>:41006,
 
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_4_host1_31258:31258)<ec><v88>:41005,
 
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007,
 
rs-FullRegression58615648a0i3large-hydra-client-18(locatorgemfire_1_1_host1_13950:13950:locator)<ec><v15>:41000]]
{noformat}

and was stuck waiting for a reply in thread dumps

{noformat}
"vm_0_thr_0_bridge_1_1_host1_6920" #144 daemon prio=5 os_prio=0 
tid=0x00007fec70058800 nid=0x1d28 waiting on condition [0x00007fec62063000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f4f654f8> (a 
java.util.concurrent.CountDownLatch$Sync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
        at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:723)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:794)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:771)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:857)
        at 
org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:779)
        at 
org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:676)
        at 
org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
        at 
org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
        at 
org.apache.geode.internal.cache.DistributedRegion.distributeDestroyRegion(DistributedRegion.java:1865)
        at 
org.apache.geode.internal.cache.DistributedRegion.basicDestroyRegion(DistributedRegion.java:1844)
        at 
org.apache.geode.internal.cache.LocalRegion.basicDestroyRegion(LocalRegion.java:6180)
        at 
org.apache.geode.internal.cache.HARegion.destroyRegion(HARegion.java:331)
        at 
org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:476)
        at 
org.apache.geode.internal.cache.ha.HARegionQueue.destroy(HARegionQueue.java:3438)
        at 
org.apache.geode.internal.cache.ha.HARegionQueue$BlockingHARegionQueue.destroy(HARegionQueue.java:2272)
        at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.destroyRQ(CacheClientProxy.java:1031)
        at 
org.apache.geode.internal.cache.tier.sockets.CacheClientProxy.terminateDispatching(CacheClientProxy.java:939)
        at 
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier.shutdown(CacheClientNotifier.java:1306)
        - locked <0x00000000f8022800> (a 
org.apache.geode.internal.cache.tier.sockets.CacheClientNotifier)
        at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.close(AcceptorImpl.java:1630)
        - locked <0x00000000f5f7b888> (a java.lang.Object)
        at 
org.apache.geode.internal.cache.CacheServerImpl.stop(CacheServerImpl.java:491)
        - locked <0x00000000f7ef2980> (a 
org.apache.geode.internal.cache.CacheServerImpl)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.stopServers(GemFireCacheImpl.java:2672)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.doClose(GemFireCacheImpl.java:2263)
        - locked <0x00000000f5a21a08> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2151)
        at 
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1559)
        - locked <0x00000000f5a21a08> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
        at 
org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1257)
        at hydra.RemoteTestModule$2.run(RemoteTestModule.java:388)
{noformat}

Server 31258 reported this deserialization problem in its log:
{noformat}
[fatal 2021/03/06 09:45:02.772 PST bridgegemfire_1_4_host1_31258 <P2P message 
reader for 
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
 unshared ordered sender uid=43 dom #1 local port=57557 remote port=43412> 
tid=0x152] Error deserializing message
java.lang.IllegalStateException: unexpected byte: HASH_TABLE while reading dsfid
        at 
org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2397)
        at 
org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2403)
        at 
org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:2979)
        at 
org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2797)
        at 
org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1651)
        at org.apache.geode.internal.tcp.Connection.run(Connection.java:1482)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{noformat}

The other server, 582, reported the same thing:
{noformat}
[fatal 2021/03/06 09:45:02.796 PST bridgegemfire_1_3_host1_582 <P2P message 
reader for 
rs-FullRegression58615648a0i3large-hydra-client-18(bridgegemfire_1_1_host1_6920:6920)<ec><v100>:41007
 unshared ordered sender uid=42 dom #1 local port=58695 remote port=52758> 
tid=0xcd] Error deserializing message
java.lang.IllegalStateException: unexpected byte: HASH_TABLE while reading dsfid
        at 
org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2397)
        at 
org.apache.geode.internal.InternalDataSerializer.readDSFID(InternalDataSerializer.java:2403)
        at 
org.apache.geode.internal.tcp.Connection.readMessage(Connection.java:2979)
        at 
org.apache.geode.internal.tcp.Connection.processInputBuffer(Connection.java:2797)
        at 
org.apache.geode.internal.tcp.Connection.readMessages(Connection.java:1651)
        at org.apache.geode.internal.tcp.Connection.run(Connection.java:1482)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{noformat}


The test run is here:
http://hydradb.gemfire.pivotal.io/hdb/testresult/9653486

bugReportTemplate.txt:
{noformat}
Host name: rs-FullRegression58615648a0i3large-hydra-client-18
OS name: Linux
Architecture: amd64
OS version: 3.10.0-1160.15.2.el7.x86_64
Java version: 1.8.0_282
Java vm name: OpenJDK 64-Bit Server VM
Java vendor: BellSoft
Java home: /usr/local/regr/jdk/jdk-liberica-jdk8u282/jre

  #####################################################
  
  Product
    Product-Name: Apache Geode
    Product-Version: 1.15.0-build.36
  Build
    Build-Id: geode 36
    Build-Java-Vendor: BellSoft
    Build-Java-Version: 1.8.0_282
    Build-Platform: Linux 5.4.0-1037-gcp amd64
  Open
    Source-Date: 2021-03-05 21:41:55 +0000
    Source-Repository: develop
    Source-Revision: dc5541665b132dcf2f87a667ee34ee34ca223923
  Closed
    GemFire-Source-Date=2021-03-04 16:36:54 -0800
    GemFire-Source-Repository=develop
    GemFire-Source-Revision=b346c4572ef28cc9157a58d5021e948977d96966
    GemFire-Source-Status-Clean=true
  
    Running on: /10.32.110.103, 2 cpu(s), amd64 Linux 
3.10.0-1160.15.2.el7.x86_64 
  #####################################################


Test was run from rollingupgrade/rollingUpgradeWan.bt

Test:
rollingupgrade/newWan/hctKill.conf
   bridgeHostsPerSite=4
   bridgeThreadsPerVM=2
   bridgeVMsPerHost=1
   clientMem=128m
   edgeHostsPerSite=3
   edgeThreadsPerVM=5
   edgeVMsPerHost=1
   locatorHostsPerSite=2
   locatorThreadsPerVM=1
   locatorVMsPerHost=1
   maxOps=50000
   oldVersion=924
   resultWaitSec=600
   serverMem=256m
   wanSites=2

Run with local.conf:
hydra.JDKVersionPrms-javaHomes    = default, 
/usr/local/regr/jdk/jdk-liberica-jdk8u242/jre;
hydra.JDKVersionPrms-javaVendors  = default, BellSoft;
hydra.JDKVersionPrms-javaVersions = default, 1.8.0_242;
hydra.JDKVersionPrms-javaVmNames  = default, OpenJDK;


//randomSeed extracted from test:
hydra.Prms-randomSeed=1615051005312;

*** Test failed with this error:
THREAD vm_8_thr_16_edge_1_1_host1_9768 Subthread Dynamic Client VM Stopper
HANG Timeout stopping clients for DynamicCloseTasks
hydra.HydraTimeoutException: Failed to stop client vms within 300 seconds, 
starting with vm_0
        at hydra.ClientMgr.waitForClientsToDie(ClientMgr.java:395)
        at hydra.BaseTaskScheduler.stopClients(BaseTaskScheduler.java:128)
        at hydra.ClientMgr.doDynamicCloseTasks(ClientMgr.java:980)
        at hydra.ClientMgr.exitClientVm(ClientMgr.java:919)
        at hydra.ClientMgr.stopClientVm(ClientMgr.java:760)
        at hydra.ClientMgr._stopClientVm(ClientMgr.java:698)
        at hydra.ClientMgr$2.run(ClientMgr.java:657)
        at java.lang.Thread.run(Thread.java:748)
{noformat}
)

> (deleted)
> ---------
>
>                 Key: GEODE-9011
>                 URL: https://issues.apache.org/jira/browse/GEODE-9011
>             Project: Geode
>          Issue Type: Test
>          Components: membership, messaging
>            Reporter: Bruce J Schuchardt
>            Priority: Major
>
> submitted to the wrong JIRA - sorry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to