Thanks for looking into this, Mario!

You are probably right, that the underlying issue might have been
pre-existing and that the test is surfacing it. I am glad though that you
are investigating, because a close to 30% fail rate is a problem. Something
like this happens every once in a while and then someone has to do some
work to resolve historical problems that they hadn't planned to addres.

Thanks!

On Tue, Jul 14, 2020 at 1:06 AM Mario Ivanac <mario.iva...@est.tech> wrote:

> Hi,
>
> after adding additional checks in failing test, now I can see that test
> are failing due to fault that some batch are distributed at stopping of GW
> sender.
> Cause of that, I suspect that this problem existed prior to this PR, but
> this PR is first to introduce test to check this.
>
> I will continue to investigate this fault, but I can not locally reproduce
> this fault, so this is slowing troubleshooting.
>
> BR,
> Mario
> ________________________________
> Å alje: Alexander Murmann <amurm...@apache.org>
> Poslano: 14. srpnja 2020. 1:11
> Prima: Alexander Murmann <amurm...@apache.org>
> Kopija: dev@geode.apache.org <dev@geode.apache.org>; Mario Ivanac
> <mario.iva...@est.tech>
> Predmet: Re: [INFO] Latest test run of 200 DistributedTestOpenJDK8 passes
>
> We continue to see these WAN tests adding a fail rate of just below 30% in
> our mass test runs
> <
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/create-mass-test-run-report/builds/10
> >
> .
>
> That's a very significant fail rate that impacts our ability to get our
> code committed with confidence.
>
> Can we resolve this issue? Otherwise, I think we need to consider reverting
> GEODE-7458.
>
> On Fri, Jun 19, 2020 at 3:28 PM Alexander Murmann <amurm...@apache.org>
> wrote:
>
> > Looking more into this, it looks like this was introduced by the changes
> > for GEODE-7458 - "Adding additional option in gfsh command "start gateway
> > sender" to control clearing of existing queues".
> >
> > That happened about a month ago, but it's inherent to those flaky tests
> > that we discover them only after a while. Nonetheless, they become paper
> > cuts that ultimately slow us down substantially if they persist.
> >
> > @Mario Ivanac If I am correct and GEODE-7458 introduced this you were the
> > one making that change. Might you be able to take a look at making that
> > test more reliable or reverting the change?
> >
> > Thank you!
> >
> > On Fri, Jun 19, 2020 at 7:57 AM Alexander Murmann <amurm...@apache.org>
> > wrote:
> >
> >> Thank you so much for sharing this, Mark!
> >>
> >> It looks like there is a big cluster around WAN Gateway. Is anyone
> >> already looking into the WAN issues?
> >>
> >> On Thu, Jun 18, 2020 at 10:06 PM Mark Hanson <hans...@vmware.com>
> wrote:
> >>
> >>> FYI, the build success rate was around 90% or so about two months ago.
> >>>
> >>> Here are the DUnit tests that are currently failing in our tests, most
> >>> likely in CI, and PR pipelines.
> >>>
> >>> Please let me know if you have any questions.
> >>>
> >>> Thanks,
> >>> Mark
> >>>
> >>>
> >>>
> >>>
> ***********************************************************************************
> >>>
> >>>  Overall build success rate: 78.00000% (156 of 200)
> >>>
> >>>
> >>>
> ***********************************************************************************
> >>>
> >>>
> >>>
> >>> The following test methods see failures in more than one class.  There
> >>> may be a failing *TestBase class
> >>>
> >>>
> >>>
> >>>
> *.testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived:
> >>> 12 failures :
> >>>
> >>>   SerialWANPersistenceEnabledGatewaySenderDUnitTest:  8 failures
> >>> (96.000% success rate)
> >>>
> >>>   SerialWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:  4 failures
> >>> (98.000% success rate)
> >>>
> >>>
> >>>
> >>>
> *.testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived:
> >>> 12 failures :
> >>>
> >>>   ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:  5
> >>> failures (97.500% success rate)
> >>>
> >>>   ParallelWANPersistenceEnabledGatewaySenderDUnitTest:  7 failures
> >>> (96.500% success rate)
> >>>
> >>>
> >>>
> >>> *.testPingWrongServer:  4 failures :
> >>>
> >>>   ClientServerMiscSelectorDUnitTest:  3 failures (98.500% success rate)
> >>>
> >>>   ClientServerMiscDUnitTest:  1 failures (99.500% success rate)
> >>>
> >>>
> >>>
> >>>
> >>>
> ***********************************************************************************
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> org.apache.geode.internal.cache.wan.serial.SerialWANPersistenceEnabledGatewaySenderDUnitTest:
> >>> 8 failures (96.000% success rate)
> >>>
> >>>
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3539
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3526
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3505
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3435
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3414
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3391
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3363
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3359
> >>>
> >>>
> >>>
> >>> org.apache.geode.management.MemberMXBeanDistributedTest:  2 failures
> >>> (99.000% success rate)
> >>>
> >>>
> >>>
> >>>          testBucketCount
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3463
> >>>
> >>>          testBucketCount
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3411
> >>>
> >>>
> >>>
> >>>
> org.apache.geode.internal.cache.tier.sockets.RedundancyLevelPart3DUnitTest:
> >>> 1 failures (99.500% success rate)
> >>>
> >>>
> >>>
> >>>          testRegisterInterestAndMakePrimaryWithFullRedundancy
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3381
> >>>
> >>>
> >>>
> >>>
> org.apache.geode.management.internal.cli.commands.QueryCommandOverHttpDUnitTest:
> >>> 1 failures (99.500% success rate)
> >>>
> >>>
> >>>
> >>>          testSimpleQueryOnLocator
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3516
> >>>
> >>>
> >>>
> >>> org.apache.geode.internal.cache.tier.sockets.ClientServerMiscDUnitTest:
> >>> 1 failures (99.500% success rate)
> >>>
> >>>
> >>>
> >>>          testPingWrongServer
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3400
> >>>
> >>>
> >>>
> >>> org.apache.geode.internal.cache.wan.serial.SerialWANStatsDUnitTest:  3
> >>> failures (98.500% success rate)
> >>>
> >>>
> >>>
> >>>
> >>>
> testReplicatedSerialPropagationWithGroupTransactionEventsSendsBatchesWithCompleteTransactions
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3495
> >>>
> >>>
> >>>
> testReplicatedSerialPropagationWithGroupTransactionEventsSendsBatchesWithCompleteTransactions
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3470
> >>>
> >>>          testReplicatedSerialPropagationHAWithGroupTransactionEvents
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3479
> >>>
> >>>
> >>>
> >>>
> org.apache.geode.distributed.internal.deadlock.GemFireDeadlockDetectorDUnitTest:
> >>> 1 failures (99.500% success rate)
> >>>
> >>>
> >>>
> >>>          testDistributedDeadlockWithDLock
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3421
> >>>
> >>>
> >>>
> >>> org.apache.geode.distributed.LocatorDUnitTest:  1 failures (99.500%
> >>> success rate)
> >>>
> >>>
> >>>
> >>>          testStartTwoLocatorsWithMultiKeystoreSSL
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3398
> >>>
> >>>
> >>>
> >>>
> org.apache.geode.internal.cache.wan.offheap.ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:
> >>> 5 failures (97.500% success rate)
> >>>
> >>>
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3531
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3522
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3456
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3439
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3405
> >>>
> >>>
> >>>
> >>>
> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest:
> >>> 1 failures (99.500% success rate)
> >>>
> >>>
> >>>
> >>>
> >>>  clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3433
> >>>
> >>>
> >>>
> >>>
> org.apache.geode.internal.cache.wan.parallel.ParallelWANPersistenceEnabledGatewaySenderDUnitTest:
> >>> 7 failures (96.500% success rate)
> >>>
> >>>
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3478
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3463
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3439
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3436
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3405
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3370
> >>>
> >>>
> >>>
> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3351
> >>>
> >>>
> >>>
> >>>
> org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDistributedTest:
> >>> 4 failures (98.000% success rate)
> >>>
> >>>
> >>>
> >>>          testMultipleColocatedChildPRsMissingWithSequencedStart
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3493
> >>>
> >>>          testMissingColocatedChildPRDueToDelayedStart
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3503
> >>>
> >>>          testHierarchyOfColocatedChildPRsMissingGrandchild
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3537
> >>>
> >>>          testHierarchyOfColocatedChildPRsMissingGrandchild
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3424
> >>>
> >>>
> >>>
> >>> org.apache.geode.distributed.DistributedMemberDUnitTest:  2 failures
> >>> (99.000% success rate)
> >>>
> >>>
> >>>
> >>>          testGroupsInAllVMs
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3423
> >>>
> >>>          testGroupsInAllVMs
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3374
> >>>
> >>>
> >>>
> >>> org.apache.geode.management.JMXMBeanReconnectDUnitTest:  5 failures
> >>> (97.500% success rate)
> >>>
> >>>
> >>>
> >>>          serverMXBeansOnServerAreUnaffectedByLocatorCrash
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3503
> >>>
> >>>
> >>>  serverMXBeansAreRestoredOnBothLocatorsAfterCrashedServerReturns
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3436
> >>>
> >>>
> >>>  serverMXBeansAreRestoredOnBothLocatorsAfterCrashedServerReturns
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3406
> >>>
> >>>          locatorHasMemberTypeMXBeansForBothLocators
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3352
> >>>
> >>>          serverMXBeansOnLocatorAreRestoredAfterCrashedServerReturns
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3457
> >>>
> >>>
> >>>
> >>>
> org.apache.geode.internal.cache.wan.offheap.SerialWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:
> >>> 4 failures (98.000% success rate)
> >>>
> >>>
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3494
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3465
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3464
> >>>
> >>>
> >>>
> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
> >>>
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3379
> >>>
> >>>
> >>>
> >>>
> org.apache.geode.internal.cache.tier.sockets.ClientServerMiscSelectorDUnitTest:
> >>> 3 failures (98.500% success rate)
> >>>
> >>>
> >>>
> >>>          testPingWrongServer
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3402
> >>>
> >>>          testPingWrongServer
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3380
> >>>
> >>>          testPingWrongServer
> >>>
> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3357
> >>>
> >>>
>

Reply via email to