Re: [DISCUSS] When is a test not flaky anymore?

Dale Emery Thu, 05 Jul 2018 19:03:49 -0700

Hi Alexander and all,

> On Jul 5, 2018, at 5:11 PM, Alexander Murmann <[email protected]> wrote:
> 
> Hi everyone!
> 
> Dan Smith started a discussion about shaking out more flaky DUnit tests.
> That's a great effort and I am happy it's happening.
> 
> As a corollary to that conversation I wonder what the criteria should be
> for a test to not be considered flaky any longer and have the category
> removed.
> 
> In general the bar should be fairly high. Even if a test only fails ~1 in
> 500 runs that's still a problem given how many tests we have.
> 
> I see two ends of the spectrum:
> 1. We have a good understanding why the test was flaky and think we fixed
> it.
> 2. We have a hard time reproducing the flaky behavior and have no good
> theory as to why the test might have shown flaky behavior.
> 
> In the first case I'd suggest to run the test ~100 times to get a little
> more confidence that we fixed the flaky behavior and then remove the
> category.


Here’s a test for case 1:

If we really understand why it was flaky, we will be able to:
    - Identify the “faults”—the broken places in the code (whether system code 
or test code).
    - Identify the exact conditions under which those faults led to the 
failures we observed.
    - Explain how those faults, under those conditions. led to those failures.
    - Run unit tests that exercise the code under those same conditions, and 
demonstrate that
      the formerly broken code now does the right thing.

If we’re lacking any of these things, I’d say we’re dealing with case 2.

> The second case is a lot more problematic. How often do we want to run a
> test like that before we decide that it might have been fixed since we last
> saw it happen? Anything else we could/should do to verify the test deserves
> our trust again?


I would want a clear, compelling explanation of the failures we observed.

Clear and compelling are subjective, of course. For me, clear and compelling 
would include
descriptions of:
   - The faults in the code. What, specifically, was broken.
   - The specific conditions under which the code did the wrong thing.
   - How those faults, under those conditions, led to those failures.
   - How the fix either prevents those conditions, or causes the formerly 
broken code to
     now do the right thing.

Even if we don’t have all of these elements, we may have some of them. That can 
help us
calibrate our confidence. But the elements work together. If we’re lacking one, 
the others
are shaky, to some extent.

The more elements are missing in our explanation, the more times I’d want to 
run the test
before trusting it.

Cheers,
Dale

—
Dale Emery
[email protected]

Re: [DISCUSS] When is a test not flaky anymore?

Reply via email to