Thanks Kirk for tackling some of our flakiest tests! I agree, we don't want to discourage anyone interested in paying down tech debt.
The Geode community has spoken clearly against bypassing or weakening required PR checks, so relaxing requirements in general might be a tough sell, but I'm curious what you had in mind. It might be easier to try to get consensus on the dev list to approve a one-time exception, when a solid argument can be made that all reasonable alternatives have been exhausted (e.g. splitting up the tests or PR, re-triggering several times, running with increased timeout, etc) and the risk is low (e.g. test-only fixes or reverts). On 6/8/21, 9:33 AM, "Kirk Lund" <kl...@apache.org> wrote: Our requirement for stress-new-test-openjdk11 to pass before allowing merge doesn't really work as intended for fixing distributed tests that contain multiple flaky test methods. In fact, I think it causes contributors to avoid tackling flaky tests. I've been working on GEODE-9103: CI Failure: PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9103&data=04%7C01%7Conichols%40vmware.com%7Cb59c0fd23cc14802ff6708d92a9b29d4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668177121955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=wxvXQmgGu%2FVL%2Bn84QzumhDCVJwVHIeXfH9CnuJTOoS0%3D&reserved=0> and was able to fix it. However, stress-new-test-openjdk11 then continued to fail for other flaky tests in PutAllClientServerDistributedTest. I managed to fix GEODE-9296 and GEODE-8528 as well. I also tried but have not been able to fix GEODE-9242 which remains flaky. Unfortunately, I cannot merge any of my fixes for PutAllClientServerDistributedTest unless every single flaky test in it is fixed by my PR. I think this is undesirable because it would be better to merge the fix for 3 flaky test methods than none. UPDATE: After running my precheckin more times, I did get stress-new-test-openjdk11 to pass once so I can merge, but that's more of a loophole than anything because I didn't manage to fix GEODE-9242. Despite having PR #6542 eventually pass, I would like to discuss removing or relaxing our requirement that stress-new-test-openjdk11 must pass in order to merge a PR... PR #6542: GEODE-9103: Fix ServerConnectivityExceptions in PutAllClientServerDistributedTest <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fgeode%2Fpull%2F6542&data=04%7C01%7Conichols%40vmware.com%7Cb59c0fd23cc14802ff6708d92a9b29d4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668177121955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=PnThYY0f8z7kX3nGIT9x4gqlIU%2B3NIRMlh%2B3XNphw5c%3D&reserved=0> Fixed in PR #6542: * GEODE-9296: CI Failure: PutAllClientServerDistributedTest > testPartialKeyInPRSingleHopWithRedundancy <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9296&data=04%7C01%7Conichols%40vmware.com%7Cb59c0fd23cc14802ff6708d92a9b29d4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668177121955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7bFHlYbkJ%2B1NIVnasz5%2B9ykmX8f766S13s6or82tPNs%3D&reserved=0> * GEODE-9103: CI Failure: PutAllClientServerDistributedTest.testPutAllReturnsExceptions FAILED <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9103&data=04%7C01%7Conichols%40vmware.com%7Cb59c0fd23cc14802ff6708d92a9b29d4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668177121955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=wxvXQmgGu%2FVL%2Bn84QzumhDCVJwVHIeXfH9CnuJTOoS0%3D&reserved=0> * GEODE-8528: PutAllClientServerDistributedTest.testPartialKeyInPRSingleHop fails due to missing disk store after server restart <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8528&data=04%7C01%7Conichols%40vmware.com%7Cb59c0fd23cc14802ff6708d92a9b29d4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668177121955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=31oBR7OYsFvJqSbO7mTFfGC7pA9fptCqBOcN8AoA9a0%3D&reserved=0> Still flaky: * GEODE-9242: CI failure in PutAllClientServerDistributedTest > testEventIdOutOfOrderInPartitionRegionSingleHop <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-9242&data=04%7C01%7Conichols%40vmware.com%7Cb59c0fd23cc14802ff6708d92a9b29d4%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637587668177121955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=GovuvljP%2FX%2BeR6GohGIhfwFBeAbfwUdjqeHtMKFOzes%3D&reserved=0> Thanks, Kirk