+1 go ahead and delete FlakyTest category but please NEVER go back to marking flaky tests with @Ignore or renaming JUnit 3 test methods -- in my opinion that was infinitely worse. The category was meant to be a quarantine for FIXING not for IGNORING.
On Fri, Jul 6, 2018 at 7:26 AM, Anthony Baker <aba...@pivotal.io> wrote: > Check [1] to see what ‘flaky’ tests have failed recently. > > Anthony > > [1] https://concourse.apachegeode-ci.info/teams/main/pipelines/ > develop-metrics/jobs/GeodeFlakyTestMetrics/builds/51 <https://concourse. > apachegeode-ci.info/teams/main/pipelines/develop-metrics/jobs/ > GeodeFlakyTestMetrics/builds/51> > > > > On Jul 6, 2018, at 6:56 AM, Jinmei Liao <jil...@pivotal.io> wrote: > > > > +1 for removing flaky category and fix as failure occurs. > > > > On Thu, Jul 5, 2018 at 8:21 PM Dan Smith <dsm...@pivotal.io> wrote: > > > >> Honestly I've never liked the flaky category. What it means is that at > some > >> point in the past, we decided to put off tracking down and fixing a > failure > >> and now we're left with a bug number and a description and that's it. > >> > >> I think we will be better off if we just get rid of the flaky category > >> entirely. That way no one can label anything else as flaky and push it > off > >> for later, and if flaky tests fail again we will actually prioritize and > >> fix them instead of ignoring them. > >> > >> I think Patrick was looking at rerunning the flaky tests to see what is > >> still failing. How about we just run the whole flaky suite some number > of > >> times (100?), fix whatever is still failing and close out and remove the > >> category from the rest? > >> > >> I think will we get more benefit from shaking out and fixing the issues > we > >> have in the current codebase than we will from carefully explaining the > >> flaky failures from the past. > >> > >> -Dan > >> > >> On Thu, Jul 5, 2018 at 7:03 PM, Dale Emery <dem...@pivotal.io> wrote: > >> > >>> Hi Alexander and all, > >>> > >>>> On Jul 5, 2018, at 5:11 PM, Alexander Murmann <amurm...@pivotal.io> > >>> wrote: > >>>> > >>>> Hi everyone! > >>>> > >>>> Dan Smith started a discussion about shaking out more flaky DUnit > >> tests. > >>>> That's a great effort and I am happy it's happening. > >>>> > >>>> As a corollary to that conversation I wonder what the criteria should > >> be > >>>> for a test to not be considered flaky any longer and have the category > >>>> removed. > >>>> > >>>> In general the bar should be fairly high. Even if a test only fails ~1 > >> in > >>>> 500 runs that's still a problem given how many tests we have. > >>>> > >>>> I see two ends of the spectrum: > >>>> 1. We have a good understanding why the test was flaky and think we > >> fixed > >>>> it. > >>>> 2. We have a hard time reproducing the flaky behavior and have no good > >>>> theory as to why the test might have shown flaky behavior. > >>>> > >>>> In the first case I'd suggest to run the test ~100 times to get a > >> little > >>>> more confidence that we fixed the flaky behavior and then remove the > >>>> category. > >>> > >>> Here’s a test for case 1: > >>> > >>> If we really understand why it was flaky, we will be able to: > >>> - Identify the “faults”—the broken places in the code (whether > system > >>> code or test code). > >>> - Identify the exact conditions under which those faults led to the > >>> failures we observed. > >>> - Explain how those faults, under those conditions. led to those > >>> failures. > >>> - Run unit tests that exercise the code under those same conditions, > >>> and demonstrate that > >>> the formerly broken code now does the right thing. > >>> > >>> If we’re lacking any of these things, I’d say we’re dealing with case > 2. > >>> > >>>> The second case is a lot more problematic. How often do we want to run > >> a > >>>> test like that before we decide that it might have been fixed since we > >>> last > >>>> saw it happen? Anything else we could/should do to verify the test > >>> deserves > >>>> our trust again? > >>> > >>> > >>> I would want a clear, compelling explanation of the failures we > observed. > >>> > >>> Clear and compelling are subjective, of course. For me, clear and > >>> compelling would include > >>> descriptions of: > >>> - The faults in the code. What, specifically, was broken. > >>> - The specific conditions under which the code did the wrong thing. > >>> - How those faults, under those conditions, led to those failures. > >>> - How the fix either prevents those conditions, or causes the > formerly > >>> broken code to > >>> now do the right thing. > >>> > >>> Even if we don’t have all of these elements, we may have some of them. > >>> That can help us > >>> calibrate our confidence. But the elements work together. If we’re > >> lacking > >>> one, the others > >>> are shaky, to some extent. > >>> > >>> The more elements are missing in our explanation, the more times I’d > want > >>> to run the test > >>> before trusting it. > >>> > >>> Cheers, > >>> Dale > >>> > >>> — > >>> Dale Emery > >>> dem...@pivotal.io > >>> > >>> > >> > > > > > > -- > > Cheers > > > > Jinmei > >