On 2014-04-07, 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek <t...@mielczarek.org> wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.

To some degree, yes, marking a test as expected intermittent causes it
to lose value.  If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so.  That should be their
decision, not something forced on them by infrastructure issues
("everyone else will suffer if you don't find the cause for this
failure in your test").  Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.

What you're saying above is true *if* someone investigates the intermittent test failure and determines that the bug is not important. But in my experience, that's not what happens at all. I think many people treat intermittent test failures as a category of unimportant problems, and therefore some bugs are never investigated. The fact of the matter is that most of these bugs are bugs in our tests, which of course will not impact our users directly, but I have occasionally come across bugs in our code code which are exposed as intermittent failures. The real issue is that the work of identifying where the root of the problem is often time is the majority of work needed to fix the intermittent test failure, so unless someone is willing to investigate the bug we cannot say whether or not it impacts our users.

The thing that really makes me care about these intermittent failures a lot is that ultimately they make us have to trade either disabling a whole bunch of tests with being unable to manage our tree. As more and more tests get disabled, we lose more and more test coverage, and that can have a much more severe impact on the health of our products than every individual intermittent test failure.

Cheers,
Ehsan

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to