Hi, after adding additional checks in failing test, now I can see that test are failing due to fault that some batch are distributed at stopping of GW sender. Cause of that, I suspect that this problem existed prior to this PR, but this PR is first to introduce test to check this.
I will continue to investigate this fault, but I can not locally reproduce this fault, so this is slowing troubleshooting. BR, Mario ________________________________ Å alje: Alexander Murmann <amurm...@apache.org> Poslano: 14. srpnja 2020. 1:11 Prima: Alexander Murmann <amurm...@apache.org> Kopija: dev@geode.apache.org <dev@geode.apache.org>; Mario Ivanac <mario.iva...@est.tech> Predmet: Re: [INFO] Latest test run of 200 DistributedTestOpenJDK8 passes We continue to see these WAN tests adding a fail rate of just below 30% in our mass test runs <https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/create-mass-test-run-report/builds/10> . That's a very significant fail rate that impacts our ability to get our code committed with confidence. Can we resolve this issue? Otherwise, I think we need to consider reverting GEODE-7458. On Fri, Jun 19, 2020 at 3:28 PM Alexander Murmann <amurm...@apache.org> wrote: > Looking more into this, it looks like this was introduced by the changes > for GEODE-7458 - "Adding additional option in gfsh command "start gateway > sender" to control clearing of existing queues". > > That happened about a month ago, but it's inherent to those flaky tests > that we discover them only after a while. Nonetheless, they become paper > cuts that ultimately slow us down substantially if they persist. > > @Mario Ivanac If I am correct and GEODE-7458 introduced this you were the > one making that change. Might you be able to take a look at making that > test more reliable or reverting the change? > > Thank you! > > On Fri, Jun 19, 2020 at 7:57 AM Alexander Murmann <amurm...@apache.org> > wrote: > >> Thank you so much for sharing this, Mark! >> >> It looks like there is a big cluster around WAN Gateway. Is anyone >> already looking into the WAN issues? >> >> On Thu, Jun 18, 2020 at 10:06 PM Mark Hanson <hans...@vmware.com> wrote: >> >>> FYI, the build success rate was around 90% or so about two months ago. >>> >>> Here are the DUnit tests that are currently failing in our tests, most >>> likely in CI, and PR pipelines. >>> >>> Please let me know if you have any questions. >>> >>> Thanks, >>> Mark >>> >>> >>> >>> *********************************************************************************** >>> >>> Overall build success rate: 78.00000% (156 of 200) >>> >>> >>> *********************************************************************************** >>> >>> >>> >>> The following test methods see failures in more than one class. There >>> may be a failing *TestBase class >>> >>> >>> >>> *.testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived: >>> 12 failures : >>> >>> SerialWANPersistenceEnabledGatewaySenderDUnitTest: 8 failures >>> (96.000% success rate) >>> >>> SerialWANPersistenceEnabledGatewaySenderOffHeapDUnitTest: 4 failures >>> (98.000% success rate) >>> >>> >>> >>> *.testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived: >>> 12 failures : >>> >>> ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest: 5 >>> failures (97.500% success rate) >>> >>> ParallelWANPersistenceEnabledGatewaySenderDUnitTest: 7 failures >>> (96.500% success rate) >>> >>> >>> >>> *.testPingWrongServer: 4 failures : >>> >>> ClientServerMiscSelectorDUnitTest: 3 failures (98.500% success rate) >>> >>> ClientServerMiscDUnitTest: 1 failures (99.500% success rate) >>> >>> >>> >>> >>> *********************************************************************************** >>> >>> >>> >>> >>> >>> org.apache.geode.internal.cache.wan.serial.SerialWANPersistenceEnabledGatewaySenderDUnitTest: >>> 8 failures (96.000% success rate) >>> >>> >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3539 >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3526 >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3505 >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3435 >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3414 >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3391 >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3363 >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3359 >>> >>> >>> >>> org.apache.geode.management.MemberMXBeanDistributedTest: 2 failures >>> (99.000% success rate) >>> >>> >>> >>> testBucketCount >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3463 >>> >>> testBucketCount >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3411 >>> >>> >>> >>> org.apache.geode.internal.cache.tier.sockets.RedundancyLevelPart3DUnitTest: >>> 1 failures (99.500% success rate) >>> >>> >>> >>> testRegisterInterestAndMakePrimaryWithFullRedundancy >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3381 >>> >>> >>> >>> org.apache.geode.management.internal.cli.commands.QueryCommandOverHttpDUnitTest: >>> 1 failures (99.500% success rate) >>> >>> >>> >>> testSimpleQueryOnLocator >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3516 >>> >>> >>> >>> org.apache.geode.internal.cache.tier.sockets.ClientServerMiscDUnitTest: >>> 1 failures (99.500% success rate) >>> >>> >>> >>> testPingWrongServer >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3400 >>> >>> >>> >>> org.apache.geode.internal.cache.wan.serial.SerialWANStatsDUnitTest: 3 >>> failures (98.500% success rate) >>> >>> >>> >>> >>> >>> testReplicatedSerialPropagationWithGroupTransactionEventsSendsBatchesWithCompleteTransactions >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3495 >>> >>> >>> >>> testReplicatedSerialPropagationWithGroupTransactionEventsSendsBatchesWithCompleteTransactions >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3470 >>> >>> testReplicatedSerialPropagationHAWithGroupTransactionEvents >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3479 >>> >>> >>> >>> org.apache.geode.distributed.internal.deadlock.GemFireDeadlockDetectorDUnitTest: >>> 1 failures (99.500% success rate) >>> >>> >>> >>> testDistributedDeadlockWithDLock >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3421 >>> >>> >>> >>> org.apache.geode.distributed.LocatorDUnitTest: 1 failures (99.500% >>> success rate) >>> >>> >>> >>> testStartTwoLocatorsWithMultiKeystoreSSL >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3398 >>> >>> >>> >>> org.apache.geode.internal.cache.wan.offheap.ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest: >>> 5 failures (97.500% success rate) >>> >>> >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3531 >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3522 >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3456 >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3439 >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3405 >>> >>> >>> >>> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest: >>> 1 failures (99.500% success rate) >>> >>> >>> >>> >>> clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3433 >>> >>> >>> >>> org.apache.geode.internal.cache.wan.parallel.ParallelWANPersistenceEnabledGatewaySenderDUnitTest: >>> 7 failures (96.500% success rate) >>> >>> >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3478 >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3463 >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3439 >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3436 >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3405 >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3370 >>> >>> >>> >>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3351 >>> >>> >>> >>> org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDistributedTest: >>> 4 failures (98.000% success rate) >>> >>> >>> >>> testMultipleColocatedChildPRsMissingWithSequencedStart >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3493 >>> >>> testMissingColocatedChildPRDueToDelayedStart >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3503 >>> >>> testHierarchyOfColocatedChildPRsMissingGrandchild >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3537 >>> >>> testHierarchyOfColocatedChildPRsMissingGrandchild >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3424 >>> >>> >>> >>> org.apache.geode.distributed.DistributedMemberDUnitTest: 2 failures >>> (99.000% success rate) >>> >>> >>> >>> testGroupsInAllVMs >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3423 >>> >>> testGroupsInAllVMs >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3374 >>> >>> >>> >>> org.apache.geode.management.JMXMBeanReconnectDUnitTest: 5 failures >>> (97.500% success rate) >>> >>> >>> >>> serverMXBeansOnServerAreUnaffectedByLocatorCrash >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3503 >>> >>> >>> serverMXBeansAreRestoredOnBothLocatorsAfterCrashedServerReturns >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3436 >>> >>> >>> serverMXBeansAreRestoredOnBothLocatorsAfterCrashedServerReturns >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3406 >>> >>> locatorHasMemberTypeMXBeansForBothLocators >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3352 >>> >>> serverMXBeansOnLocatorAreRestoredAfterCrashedServerReturns >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3457 >>> >>> >>> >>> org.apache.geode.internal.cache.wan.offheap.SerialWANPersistenceEnabledGatewaySenderOffHeapDUnitTest: >>> 4 failures (98.000% success rate) >>> >>> >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3494 >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3465 >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3464 >>> >>> >>> >>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived >>> >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3379 >>> >>> >>> >>> org.apache.geode.internal.cache.tier.sockets.ClientServerMiscSelectorDUnitTest: >>> 3 failures (98.500% success rate) >>> >>> >>> >>> testPingWrongServer >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3402 >>> >>> testPingWrongServer >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3380 >>> >>> testPingWrongServer >>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3357 >>> >>>