(Taking #cassandra-dev slack chat to here) For context, we have a long history of an ebb and flow of flaky test failures building up and getting burned down, but don't really have a workflow or discipline around having a clean snapshot of where we are or attempting to stay at some kind of steady state. We have thousands of tests executing in a wide variety of environments: this state is to be expected, but I argue needs to be actively managed so we don't get into the kind of situation we did with 4.0 again.
I threw together a couple of JIRA queries that paint a pretty navigable picture IMO: Total JIRA for test failures: https://issues.apache.org/jira/issues/?filter=12350869&jql=project%20%3D%20Cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20(summary%20~%20flaky%20OR%20summary%20~%20test%20OR%20component%20%3D%20%22Test%2Funit%22)%20AND%20type%20%3D%20bug%20AND%20issuekey%20not%20in%20(CASSANDRA-16010%2C%20CASSANDRA-16024%2C%20CASSANDRA-16022%2C%20CASSANDRA-16021%2C%20CASSANDRA-16025%2C%20CASSANDRA-16023)%20AND%20summary%20!~%20hardening%20ORDER%20BY%20cf%5B12313825%5D%20ASC (sorry for the URL) - 112 failures # of failures more recent than 6 months: https://issues.apache.org/jira/issues/?filter=12350869 10 failures. In the interest of tidying this up and staying on top of it going forward, I propose the following: 1. We close as won't fix all test failures created >= 6 months ago (We had a big push for 4.0 and a lot of this JIRA content is stale) 2. We switch the "Bug Category" for these 10 more recent to "Correctness - Test Failure" 3. We document a "canonical" workflow around test failures that links to a saved JIRA filter query that includes: 4. When you're working on something and you see a test failure that isn't related to your patch, check that filter, see if the test name is there, and if not create a new ticket w/that Bug Category In theory this should give us a single source of truth for documented test failures as well as an entry point for new contributors. Thoughts? ~Josh