Hi,

after adding additional checks in failing test, now I can see that test are 
failing due to fault that some batch are distributed at stopping of GW sender.
Cause of that, I suspect that this problem existed prior to this PR, but this 
PR is first to introduce test to check this.

I will continue to investigate this fault, but I can not locally reproduce this 
fault, so this is slowing troubleshooting.

BR,
Mario
________________________________
Å alje: Alexander Murmann <amurm...@apache.org>
Poslano: 14. srpnja 2020. 1:11
Prima: Alexander Murmann <amurm...@apache.org>
Kopija: dev@geode.apache.org <dev@geode.apache.org>; Mario Ivanac 
<mario.iva...@est.tech>
Predmet: Re: [INFO] Latest test run of 200 DistributedTestOpenJDK8 passes

We continue to see these WAN tests adding a fail rate of just below 30% in
our mass test runs
<https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-mass-test-run/jobs/create-mass-test-run-report/builds/10>
.

That's a very significant fail rate that impacts our ability to get our
code committed with confidence.

Can we resolve this issue? Otherwise, I think we need to consider reverting
GEODE-7458.

On Fri, Jun 19, 2020 at 3:28 PM Alexander Murmann <amurm...@apache.org>
wrote:

> Looking more into this, it looks like this was introduced by the changes
> for GEODE-7458 - "Adding additional option in gfsh command "start gateway
> sender" to control clearing of existing queues".
>
> That happened about a month ago, but it's inherent to those flaky tests
> that we discover them only after a while. Nonetheless, they become paper
> cuts that ultimately slow us down substantially if they persist.
>
> @Mario Ivanac If I am correct and GEODE-7458 introduced this you were the
> one making that change. Might you be able to take a look at making that
> test more reliable or reverting the change?
>
> Thank you!
>
> On Fri, Jun 19, 2020 at 7:57 AM Alexander Murmann <amurm...@apache.org>
> wrote:
>
>> Thank you so much for sharing this, Mark!
>>
>> It looks like there is a big cluster around WAN Gateway. Is anyone
>> already looking into the WAN issues?
>>
>> On Thu, Jun 18, 2020 at 10:06 PM Mark Hanson <hans...@vmware.com> wrote:
>>
>>> FYI, the build success rate was around 90% or so about two months ago.
>>>
>>> Here are the DUnit tests that are currently failing in our tests, most
>>> likely in CI, and PR pipelines.
>>>
>>> Please let me know if you have any questions.
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>>>
>>> ***********************************************************************************
>>>
>>>  Overall build success rate: 78.00000% (156 of 200)
>>>
>>>
>>> ***********************************************************************************
>>>
>>>
>>>
>>> The following test methods see failures in more than one class.  There
>>> may be a failing *TestBase class
>>>
>>>
>>>
>>> *.testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived:
>>> 12 failures :
>>>
>>>   SerialWANPersistenceEnabledGatewaySenderDUnitTest:  8 failures
>>> (96.000% success rate)
>>>
>>>   SerialWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:  4 failures
>>> (98.000% success rate)
>>>
>>>
>>>
>>> *.testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived:
>>> 12 failures :
>>>
>>>   ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:  5
>>> failures (97.500% success rate)
>>>
>>>   ParallelWANPersistenceEnabledGatewaySenderDUnitTest:  7 failures
>>> (96.500% success rate)
>>>
>>>
>>>
>>> *.testPingWrongServer:  4 failures :
>>>
>>>   ClientServerMiscSelectorDUnitTest:  3 failures (98.500% success rate)
>>>
>>>   ClientServerMiscDUnitTest:  1 failures (99.500% success rate)
>>>
>>>
>>>
>>>
>>> ***********************************************************************************
>>>
>>>
>>>
>>>
>>>
>>> org.apache.geode.internal.cache.wan.serial.SerialWANPersistenceEnabledGatewaySenderDUnitTest:
>>> 8 failures (96.000% success rate)
>>>
>>>
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3539
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3526
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3505
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3435
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3414
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3391
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3363
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3359
>>>
>>>
>>>
>>> org.apache.geode.management.MemberMXBeanDistributedTest:  2 failures
>>> (99.000% success rate)
>>>
>>>
>>>
>>>          testBucketCount
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3463
>>>
>>>          testBucketCount
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3411
>>>
>>>
>>>
>>> org.apache.geode.internal.cache.tier.sockets.RedundancyLevelPart3DUnitTest:
>>> 1 failures (99.500% success rate)
>>>
>>>
>>>
>>>          testRegisterInterestAndMakePrimaryWithFullRedundancy
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3381
>>>
>>>
>>>
>>> org.apache.geode.management.internal.cli.commands.QueryCommandOverHttpDUnitTest:
>>> 1 failures (99.500% success rate)
>>>
>>>
>>>
>>>          testSimpleQueryOnLocator
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3516
>>>
>>>
>>>
>>> org.apache.geode.internal.cache.tier.sockets.ClientServerMiscDUnitTest:
>>> 1 failures (99.500% success rate)
>>>
>>>
>>>
>>>          testPingWrongServer
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3400
>>>
>>>
>>>
>>> org.apache.geode.internal.cache.wan.serial.SerialWANStatsDUnitTest:  3
>>> failures (98.500% success rate)
>>>
>>>
>>>
>>>
>>>  
>>> testReplicatedSerialPropagationWithGroupTransactionEventsSendsBatchesWithCompleteTransactions
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3495
>>>
>>>
>>>  
>>> testReplicatedSerialPropagationWithGroupTransactionEventsSendsBatchesWithCompleteTransactions
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3470
>>>
>>>          testReplicatedSerialPropagationHAWithGroupTransactionEvents
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3479
>>>
>>>
>>>
>>> org.apache.geode.distributed.internal.deadlock.GemFireDeadlockDetectorDUnitTest:
>>> 1 failures (99.500% success rate)
>>>
>>>
>>>
>>>          testDistributedDeadlockWithDLock
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3421
>>>
>>>
>>>
>>> org.apache.geode.distributed.LocatorDUnitTest:  1 failures (99.500%
>>> success rate)
>>>
>>>
>>>
>>>          testStartTwoLocatorsWithMultiKeystoreSSL
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3398
>>>
>>>
>>>
>>> org.apache.geode.internal.cache.wan.offheap.ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:
>>> 5 failures (97.500% success rate)
>>>
>>>
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3531
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3522
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3456
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3439
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3405
>>>
>>>
>>>
>>> org.apache.geode.internal.cache.ClientServerTransactionFailoverWithMixedVersionServersDistributedTest:
>>> 1 failures (99.500% success rate)
>>>
>>>
>>>
>>>
>>>  clientTransactionOperationsAreNotLostIfTransactionIsOnRolledServer
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3433
>>>
>>>
>>>
>>> org.apache.geode.internal.cache.wan.parallel.ParallelWANPersistenceEnabledGatewaySenderDUnitTest:
>>> 7 failures (96.500% success rate)
>>>
>>>
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3478
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3463
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3439
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3436
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3405
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3370
>>>
>>>
>>>  
>>> testpersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3351
>>>
>>>
>>>
>>> org.apache.geode.internal.cache.partitioned.PersistentColocatedPartitionedRegionDistributedTest:
>>> 4 failures (98.000% success rate)
>>>
>>>
>>>
>>>          testMultipleColocatedChildPRsMissingWithSequencedStart
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3493
>>>
>>>          testMissingColocatedChildPRDueToDelayedStart
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3503
>>>
>>>          testHierarchyOfColocatedChildPRsMissingGrandchild
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3537
>>>
>>>          testHierarchyOfColocatedChildPRsMissingGrandchild
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3424
>>>
>>>
>>>
>>> org.apache.geode.distributed.DistributedMemberDUnitTest:  2 failures
>>> (99.000% success rate)
>>>
>>>
>>>
>>>          testGroupsInAllVMs
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3423
>>>
>>>          testGroupsInAllVMs
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3374
>>>
>>>
>>>
>>> org.apache.geode.management.JMXMBeanReconnectDUnitTest:  5 failures
>>> (97.500% success rate)
>>>
>>>
>>>
>>>          serverMXBeansOnServerAreUnaffectedByLocatorCrash
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3503
>>>
>>>
>>>  serverMXBeansAreRestoredOnBothLocatorsAfterCrashedServerReturns
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3436
>>>
>>>
>>>  serverMXBeansAreRestoredOnBothLocatorsAfterCrashedServerReturns
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3406
>>>
>>>          locatorHasMemberTypeMXBeansForBothLocators
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3352
>>>
>>>          serverMXBeansOnLocatorAreRestoredAfterCrashedServerReturns
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3457
>>>
>>>
>>>
>>> org.apache.geode.internal.cache.wan.offheap.SerialWANPersistenceEnabledGatewaySenderOffHeapDUnitTest:
>>> 4 failures (98.000% success rate)
>>>
>>>
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3494
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3465
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3464
>>>
>>>
>>>  
>>> testReplicatedRegionPersistentWanGateway_restartSenderWithCleanQueues_expectNoEventsReceived
>>>
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3379
>>>
>>>
>>>
>>> org.apache.geode.internal.cache.tier.sockets.ClientServerMiscSelectorDUnitTest:
>>> 3 failures (98.500% success rate)
>>>
>>>
>>>
>>>          testPingWrongServer
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3402
>>>
>>>          testPingWrongServer
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3380
>>>
>>>          testPingWrongServer
>>> https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-mass-test-run-main/jobs/DistributedTestOpenJDK8/builds/3357
>>>
>>>

Reply via email to