[
https://issues.apache.org/jira/browse/GEODE-10330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541172#comment-17541172
]
Geode Integration commented on GEODE-10330:
-------------------------------------------
Seen in [distributed-test-openjdk8
#2435|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/distributed-test-openjdk8/builds/2435]
... see [test
results|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0237/test-results/distributedTest/1653121873/]
or download
[artifacts|http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.16.0-build.0237/test-artifacts/1653121873/distributedtestfiles-openjdk8-1.16.0-build.0237.tgz].
> Resource issues lead to "MemberDisconnectedException: Member isn't responding
> to heartbeat requests"
> ----------------------------------------------------------------------------------------------------
>
> Key: GEODE-10330
> URL: https://issues.apache.org/jira/browse/GEODE-10330
> Project: Geode
> Issue Type: Bug
> Affects Versions: 1.16.0
> Reporter: Donal Evans
> Priority: Major
> Labels: needsTriage
>
> A failure was observed in
> DistributedMulticastRegionWithUDPSecurityDUnitTest >
> testMulticastAfterReconnect due to suspect strings with fatal-level logging
> of "Membership service failure: Member isn't responding to heartbeat
> requests".
> Investigating the logs showed all members reporting long statistics sampling
> wakeup delays, indicating resource issues:
>
> {code:java}
> [vm3] [warn 2022/05/21 07:28:16.251 UTC LocatorWithMcast <StatSampler>
> tid=0xb8] Statistics sampling thread detected a wakeup delay of 4760 ms,
> indicating a possible resource issue. Check the GC, memory, and CPU
> statistics.
> ...
> [locator] [warn 2022/05/21 07:28:20.288 UTC <StatSampler> tid=0x3b]
> Statistics sampling thread detected a wakeup delay of 12400 ms, indicating a
> possible resource issue. Check the GC, memory, and CPU statistics.
> ...
> [vm1] [warn 2022/05/21 07:28:20.969 UTC vm1 <StatSampler> tid=0xda]
> Statistics sampling thread detected a wakeup delay of 13738 ms, indicating a
> possible resource issue. Check the GC, memory, and CPU statistics.
> ...
> [vm0] [warn 2022/05/21 07:28:22.226 UTC vm0 <StatSampler> tid=0xa9]
> Statistics sampling thread detected a wakeup delay of 15110 ms, indicating a
> possible resource issue. Check the GC, memory, and CPU statistics. {code}
> Using the progress tool from the dev-tools directory in the Geode repository,
> the following tests were found to be running during the resource issues,
> possibly indicating that one or more of them are particularly
> resource-intensive:
> {noformat}
> $> progress -r '2022-05-21 07:28:16.251 -0000' | grep org | sort{noformat}
> {code:java}
> org.apache.geode.cache.PRCacheListenerWithInterestPolicyAllDistributedTest.afterUpdateIsInvokedInEveryMember[0:
> redundancy=0]
> org.apache.geode.cache.lucene.LuceneQueriesReindexDUnitTest.recreateIndexWithDifferentFieldsShouldFail(PARTITION_OVERFLOW_TO_DISK)
> [2]
> org.apache.geode.cache.query.cq.dunit.CqDataUsingPoolOptimizedExecuteDUnitTest.testCQHAWithState
>
> org.apache.geode.cache.query.cq.dunit.PartitionedRegionCqQueryDUnitTest.testPartitionedCqOnAccessorBridgeServer
> org.apache.geode.cache30.CallbackArgDUnitTest.testForCA
> org.apache.geode.cache30.DistributedMulticastRegionWithUDPSecurityDUnitTest.testMulticastAfterReconnect
>
> org.apache.geode.cache30.DistributedNoAckRegionCCEOffHeapDUnitTest.testDistributedInvalidate
> org.apache.geode.cache30.GlobalRegionOffHeapDUnitTest.testOrderedUpdates
> org.apache.geode.cache30.ReconnectWithClusterConfigurationDUnitTest.testReconnectAfterMeltdown
>
> org.apache.geode.distributed.internal.P2PMessagingConcurrencyDUnitTest.testP2PMessaging(true,
> false, 32768, 65536) [6]
> org.apache.geode.disttx.PRDistTXDUnitTest.testSimulaneousChildRegionCreation
> org.apache.geode.internal.cache.ClientServerTransactionCCEDUnitTest.testClientCommitFunctionWithFailure
>
> org.apache.geode.internal.cache.eviction.OffHeapEvictionStatsDUnitTest.testHeapLruCounter
>
> org.apache.geode.internal.cache.wan.concurrent.ConcurrentParallelGatewaySenderOperation_1_DUnitTest.testParallelPropagationSenderStartAfterStopOnAccessorNode
>
> org.apache.geode.internal.cache.wan.offheap.ParallelGatewaySenderOperationsOffHeapDistributedTest.testParallelGatewaySenderStartOnAccessorNode
>
> org.apache.geode.internal.cache.wan.serial.SerialWANPropagation_PartitionedRegionDUnitTest.testPartitionedSerialPropagationHA
> org.apache.geode.internal.tcp.TCPConduitDUnitTest.basicAcceptConnection[0]
> org.apache.geode.management.internal.configuration.ClusterConfigImportDUnitTest.importFailWithExistingRegion
>
> org.apache.geode.rest.internal.web.controllers.RestAPIsOnGroupsFunctionExecutionDUnitTest.testBasicP2PFunctionSelectedGroup[1]
>
> org.apache.geode.session.tests.Jetty9CachingClientServerTest.failureShouldStillAllowOtherContainersDataAccess
>
> org.apache.geode.session.tests.Tomcat8ClientServerCustomCacheXmlTest.containersShouldExpireInSetTimeframe
> org.apache.geode.session.tests.Tomcat8Test.containersShouldReplicateCookies
> org.apache.geode.session.tests.Tomcat9ClientServerTest.invalidationShouldRemoveValueAccessForAllContainers
> {code}
> Future failures due to this sort of resource issue should also list
> concurrently running tests so that repeat appearances by individual tests can
> be used to identify the culprits.
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)