I understand the challenge, but I disagree. It is only through requirement that we keep new flakey tests out. While I don't think one should have to fix all the flaky tests to get their unrelated change in, I think it serves a purpose.
IMHO, the problems that you are seeing are indications that we are not keeping on top of fixing flakey tests, as recently indicated by our 15% pass rate on mass test runs. ( I have not seen the results recently, I assume they are better, but the fact that it got to there is a sign). It seems we are not currently paying the full cost of changes, as indicated by decreasing pass rates. Thanks, Mark On 6/8/21, 9:33 AM, "Kirk Lund" <kl...@apache.org> wrote: Our requirement for stress-new-test-openjdk11 to pass before allowing merge doesn't really work as intended for fixing distributed tests that contain multiple flaky test methods. In fact, I think it causes contributors to avoid tackling flaky tests. I've been working on GEODE-9103: CI Failure: PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9103&data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=iCEpdxkzLjNyEMRfHY5n05Nro3pg%2FgBS2Y3KN%2Frk3mQ%3D&reserved=0> and was able to fix it. However, stress-new-test-openjdk11 then continued to fail for other flaky tests in PutAllClientServerDistributedTest. I managed to fix GEODE-9296 and GEODE-8528 as well. I also tried but have not been able to fix GEODE-9242 which remains flaky. Unfortunately, I cannot merge any of my fixes for PutAllClientServerDistributedTest unless every single flaky test in it is fixed by my PR. I think this is undesirable because it would be better to merge the fix for 3 flaky test methods than none. UPDATE: After running my precheckin more times, I did get stress-new-test-openjdk11 to pass once so I can merge, but that's more of a loophole than anything because I didn't manage to fix GEODE-9242. Despite having PR #6542 eventually pass, I would like to discuss removing or relaxing our requirement that stress-new-test-openjdk11 must pass in order to merge a PR... PR #6542: GEODE-9103: Fix ServerConnectivityExceptions in PutAllClientServerDistributedTest <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6542&data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=WzdY%2F9xs5%2FayYzqbwxc89cfWC02GMFELQf971drfr2s%3D&reserved=0> Fixed in PR #6542: * GEODE-9296: CI Failure: PutAllClientServerDistributedTest > testPartialKeyInPRSingleHopWithRedundancy <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9296&data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=HJ3x3lMUCT%2F2mQsHu%2FymfX%2FKDHb9CBQRxuA9Pp0q%2BJg%3D&reserved=0> * GEODE-9103: CI Failure: PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9103&data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=iCEpdxkzLjNyEMRfHY5n05Nro3pg%2FgBS2Y3KN%2Frk3mQ%3D&reserved=0> * GEODE-8528: PutAllClientServerDistributedTest.testPartialKeyInPRSingleHop fails due to missing disk store after server restart <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8528&data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7GreRNbawjMuVz0Ue%2ByE4ftlz1EO9XmTYEOWx0srix0%3D&reserved=0> Still flaky: * GEODE-9242: CI failure in PutAllClientServerDistributedTest > testEventIdOutOfOrderInPartitionRegionSingleHop <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9242&data=04%7C01%7Chansonm%40vmware.com%7C129111e6619d41283fbf08d92a9b2c67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668225634197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=YnQiHN6FzLdkPdS3JSARx1JHC2%2Bu6eQ4Aq44dduyTL8%3D&reserved=0> Thanks, Kirk