I don't know how hard to do it would be, but I think it would be great to have a CI job to repeatedly run a specific test, so we can check if it's flaky. We could systematically run it for new tests, or for existing tests related to what we are modifying.
Using this CI job to run a suspicious test a few dozen times is probably going to be quicker, safer and cheaper than re-running the entire suite. That's not going to absolutely probe that a test is not flaky in every environment, but probably it would reduce the risk of introducing new flaky tests. On Thu, 28 May 2020 at 22:26, David Capwell <dcapw...@apple.com.invalid> wrote: > > - No flaky tests according to Jenkins or CircleCI? Also, some people run > > the free tier, others take advantage of premium CircleCI. What should be > > the framework? > > It would be good to have a common understanding of this; my current mental > model is > > 1) Jenkins > 2) Circle CI Free tear unit tests (including in-jvm dtests) > 3) Circle CI paid tear python dtest > > > - "ignored in exceptional cases" - examples? > > > I personally don’t classify a test as flaky if the CI environment is at > fault, simple example could be bad disk causing tests to fail. In this > example, actions should be taken to fix the CI environment, but if the > tests pass in another environment I am fine moving on and not blocking a > release. > > > I got the impression that canonical suite (in this case Jenkins) might > be the right direction to follow. > > > I agree that Jenkins must be a source of input, but don’t think it should > be the only one at this moment; currently Circle CI produces more builds of > Cassandra than Jenkins, so ignoring tests failures there causes a more > unstable environment for development and hide the fact that Jenkins will > also see the issue. There are also gaps with Jenkins coverage which hide > things such as lack of java 11 support and that tests fail more often on > java 11. > > > But also, sometimes I feel in many cases CircleCI could provide input > worth tracking but less likely to be product flakes > > > Since Circle CI runs more builds than Jenkins, we are more likely to see > flaky tests there than Jenkins. > > > Not to mention flaky tests on Mac running with two cores... Yes, this is > sometimes the only way to reproduce some of the reported tests' issues... > > > I am not aware of anyone opening JIRAs based off this, only using this > method to reproduce issues found in CI. I started using this method to > help quickly reproduce race condition bugs found in CI such as nodetool > reporting repairs as success when they were actually failed, and one case > you are working on where preview repair conflicts with a non-committed IR > participant even though we reported commit to users (both cases are valid > bugs found in CI). > > > So my idea was to suggest to start tracking an exact Jenkins report maybe > > > Better visibility is great! Mick has been setup Slack/Email > notifications, but maybe a summary in the 4.0 report would be great to > enhance visibility to all? > > > checked but potentially to be able to leave it for Beta in case we don't > feel it shows a product defect > > Based off > https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle < > https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle> > flaky tests block beta releases, so need to happen before then. What do > you mean by “leave it for Beta”? Right now we label alpha but don’t block > alpha releases on flaky tests, given this I don’t follow this statement, > could you explain more? > > >> At least for me, what I learned in the past is we'd drive to a green > test > >> board and immediately transition it as a milestone, so flaky tests would > >> reappear like a disappointing game of whack-a-mole. They seem > frustratingly > >> ever-present. > > How I see the document, all definitions/expectations from previous phases > hold true for later stages. Right now the document says we can not cut > beta1 until flaky tests are resolved, but this would also mean beta2+, rc+, > etc; how I internalize this is that pre-beta1+, flaky tests are not > allowed, so we don’t immediately transition away from this. > > One trend I have noticed in Cassandra is a lack of trust in tests caused > by the fact that unrelated failing builds are common; what then happens is > the author/reviewer ignore the new failing test, write it off as a flaky > test, commit, and cause more tests to fail. Since testing can be skipped > pre-commit, and failing tests can be ignored, it puts us in a state that > new regressions pop up after commit; by having the flaky-tests as a guard > against release it causes a forcing function to stay stable as long as > possible. > > >> Default posture to label fix version as beta > > Can you explain what you mean by this? Currently we don’t block alpha > releases on flaky tests even though they are marked alpha, are you > proposing we don’t block beta releases on flaky tests or are you suggesting > we label them beta to better match the doc and keep them beta release > blockers? > > >>> Also, I agree with Mick that it’s good to have a plan and opened Jira > tickets earlier than later. > > +1 > > > On May 28, 2020, at 10:02 AM, Joshua McKenzie <jmcken...@apache.org> > wrote: > > > > Good point Jordan re: flaky test being either implying API instability or > > blocker to ability to beta test. > > > > > > On Thu, May 28, 2020 at 12:56 PM Jordan West <jw...@apache.org> wrote: > > > >>> On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova < > >>> ekaterina.dimitr...@datastax.com> wrote: > >> > >>> - No flaky tests according to Jenkins or CircleCI? Also, some people > run > >>>> the free tier, others take advantage of premium CircleCI. What should > >> be > >>>> the framework? > >> > >> > >> While I agree that we should use the Apache infrastructure as the > canonical > >> infrastructure, failures in both (or any) environment matter when it > comes > >> to flaky tests. > >> > >> On Wed, May 27, 2020 at 5:23 PM Joshua McKenzie <jmcken...@apache.org> > >> wrote: > >> > >>> > >>> At least for me, what I learned in the past is we'd drive to a green > test > >>> board and immediately transition it as a milestone, so flaky tests > would > >>> reappear like a disappointing game of whack-a-mole. They seem > >> frustratingly > >>> ever-present. > >>> > >>> > >> Agreed. Having multiple successive green runs would be a better bar than > >> one on a single platform imo. > >> > >> > >>> I'd personally advocate for us taking the following stance on flaky > tests > >>> from this point in the cycle forward: > >>> > >>> - Default posture to label fix version as beta > >>> - *excepting* on case-by-case basis, if flake could imply product > >> defect > >>> that would greatly impair beta testing we leave alpha > >>> > >> > >> I would be in favor of tightening this further to flakes that imply > >> interface changes or major defects (e.g. corruption, data loss, etc). > To do > >> so would require evaluation of the flaky test, however, which I think > is in > >> sync with our "start in alpha and make exceptions to move to beta". The > >> difference would be that we better define and widen what flaky tests > can be > >> punted to beta and my guess is we could already evaluate all outstanding > >> flaky test tickets by that bar. > >> > >> Jordan > >> > >