I understand the challenge, but I disagree. It is only through requirement that 
we keep new flakey tests out. While I don't think one should have to fix all 
the flaky tests to get their unrelated change in, I think it serves a purpose.

IMHO, the problems that you are seeing are indications that we are not keeping 
on top of fixing flakey tests, as recently indicated by our 15% pass rate on 
mass test runs. ( I have not seen  the results recently, I assume they are 
better, but the fact that it got to there is a sign). It seems we are not 
currently paying the full cost of changes, as indicated by decreasing pass 
rates.

Thanks,
Mark

On 6/8/21, 9:33 AM, "Kirk Lund" <kl...@apache.org> wrote:

    Our requirement for stress-new-test-openjdk11 to pass before allowing merge
    doesn't really work as intended for fixing distributed tests that contain
    multiple flaky test methods. In fact, I think it causes contributors to
    avoid tackling flaky tests.

    I've been working on GEODE-9103: CI Failure:
    PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED
    
<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9103&amp;data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=iCEpdxkzLjNyEMRfHY5n05Nro3pg%2FgBS2Y3KN%2Frk3mQ%3D&amp;reserved=0>
 and was able to fix it.

    However, stress-new-test-openjdk11 then continued to fail for other flaky
    tests in PutAllClientServerDistributedTest. I managed to fix GEODE-9296 and
    GEODE-8528 as well. I also tried but have not been able to fix GEODE-9242
    which remains flaky.

    Unfortunately, I cannot merge any of my fixes for
    PutAllClientServerDistributedTest unless every single flaky test in it is
    fixed by my PR. I think this is undesirable because it would be better to
    merge the fix for 3 flaky test methods than none.

    UPDATE: After running my precheckin more times, I did get
    stress-new-test-openjdk11 to pass once so I can merge, but that's more of a
    loophole than anything because I didn't manage to fix GEODE-9242.

    Despite having PR #6542 eventually pass, I would like to discuss removing
    or relaxing our requirement that stress-new-test-openjdk11 must pass in
    order to merge a PR...

    PR #6542: GEODE-9103: Fix ServerConnectivityExceptions in
    PutAllClientServerDistributedTest
    
<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6542&amp;data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=WzdY%2F9xs5%2FayYzqbwxc89cfWC02GMFELQf971drfr2s%3D&amp;reserved=0>

    Fixed in PR #6542:
    * GEODE-9296: CI Failure: PutAllClientServerDistributedTest >
    testPartialKeyInPRSingleHopWithRedundancy
    
<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9296&amp;data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=HJ3x3lMUCT%2F2mQsHu%2FymfX%2FKDHb9CBQRxuA9Pp0q%2BJg%3D&amp;reserved=0>
    * GEODE-9103: CI Failure:
    PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED
    
<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9103&amp;data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=iCEpdxkzLjNyEMRfHY5n05Nro3pg%2FgBS2Y3KN%2Frk3mQ%3D&amp;reserved=0>
    * GEODE-8528: PutAllClientServerDistributedTest.testPartialKeyInPRSingleHop
    fails due to missing disk store after server restart
    
<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8528&amp;data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=7GreRNbawjMuVz0Ue%2ByE4ftlz1EO9XmTYEOWx0srix0%3D&amp;reserved=0>

    Still flaky:
    * GEODE-9242: CI failure in PutAllClientServerDistributedTest >
    testEventIdOutOfOrderInPartitionRegionSingleHop
    
<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9242&amp;data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=YnQiHN6FzLdkPdS3JSARx1JHC2%2Bu6eQ4Aq44dduyTL8%3D&amp;reserved=0>

    Thanks,
    Kirk

Reply via email to