> > Good point Jordan re: flaky test being either implying API instability or > > blocker to ability to beta test.
+1 > While I agree that we should use the Apache infrastructure as > the canonical > infrastructure, failures in both (or any) environment matter > when it comes > to flaky tests. Totally agree. What I meant is I personally feel we need a common view and understanding of what is the current status and what is a blocker/not blocker. > I don't know how hard to do it would be, but I think > it would be great to > have a CI job to repeatedly run a specific test, so > we can check if it's > flaky. We could systematically run it for new tests, > or for existing tests > related to what we are modifying. > Using this CI job to run a suspicious test a few dozen times is probably >going to be quicker, safer and cheaper than re-running the entire suite. > That's not going to absolutely probe that a test is not flaky in every > environment, but probably it would reduce the risk of introducing new > flaky > tests. I like this > I agree that Jenkins must be a source of input, but don’t think it should > be the only one at this moment; currently Circle CI produces more builds of > Cassandra than Jenkins, so ignoring tests failures there causes a more > unstable environment for development and hide the fact that Jenkins will > also see the issue. There are also gaps with Jenkins coverage which hide > things such as lack of java 11 support and that tests fail more often on > java 11. +1 I totally don't advocate for skipping failures/not check them I am not aware of anyone opening JIRAs based off this, only using this > method to reproduce issues found in CI. I started using this method to > help quickly reproduce race condition bugs found in CI such as nodetool > reporting repairs as success when they were actually failed, and one case > you are working on where preview repair conflicts with a non-committed IR > participant even though we reported commit to users (both cases are valid > bugs found in CI). Any similarities with the mentioned Jira tickets were not intentional. :-) But if I see a failure while running tests on my computer and they look suspicious I would probably raise a ticket even if the failure was not seen in the CI. >What do > you mean by “leave it for Beta”? Right now we label alpha but don’t block > alpha releases on flaky tests, given this I don’t follow this statement, > could you explain more? True, we don't block alpha but we block beta and my point is whether all of the failures/flakiness we see are really blockers or some of them could be worked even in beta. > > One trend I have noticed in Cassandra is a lack of trust in tests caused > by the fact that unrelated failing builds are common; what then happens is > the author/reviewer ignore the new failing test, write it off as a flaky > test, commit, and cause more tests to fail. Since testing can be skipped > pre-commit, and failing tests can be ignored, it puts us in a state that > new regressions pop up after commit; by having the flaky-tests as a guard > against release it causes a forcing function to stay stable as long as > possible. I agree here that sometimes failures are ignored based on the assumption that they are not related to the patch tested. My vision is that no one expects a person to fix regressions from other patches but it's great that many people raise new tickets for those failures to raise the flag. At least I see it as the right direction to go to. Sure someone might miss something testing a patch as it wasn't failing consistently at this point for some reason. Those things are not always easy Can you explain what you mean by this? Currently we don’t block alpha > releases on flaky tests even though they are marked alpha, are you > proposing we don’t block beta releases on flaky tests or are you suggesting > we label them beta to better match the doc and keep them beta release > blockers? I think what was meant here was to be careful on what kind of failures we block beta. Not skipping any failures/falkiness but filter what is a blocker and what is not really. So the main idea of starting this thread: Do we have a full clear vision of what is left until Beta as problems with the flaky tests? What is the common understanding and approach? Ekaterina Dimitrova e. ekaterina.dimitr...@datastax.com w. www.datastax.com