Re: Policy for disabling tests which run on TBPL

James Graham Tue, 08 Apr 2014 07:36:11 -0700

On 08/04/14 15:06, Ehsan Akhgari wrote:

On 2014-04-08, 9:51 AM, James Graham wrote:

On 08/04/14 14:43, Andrew Halberstadt wrote:

On 07/04/14 11:49 AM, Aryeh Gregor wrote:

On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek <t...@mielczarek.org>
wrote:

If a bug is causing a test to fail intermittently, then that test
loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.


To some degree, yes, marking a test as expected intermittent causes it
to lose value.  If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so.  That should be their
decision, not something forced on them by infrastructure issues
("everyone else will suffer if you don't find the cause for this
failure in your test").  Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.


I think this proposal would make more sense if the state of our
infrastructure and tooling was able to handle it properly. Right now,
automatically marking known intermittents would cause the test to lose
*all* value. It's sad, but the only data we have about intermittents
comes from the sheriffs manually starring them. There is also currently
no way to mark a test KNOWN-RANDOM and automatically detect if it starts
failing permanently. This means the failures can't be starred and become
nearly impossible to discover, let alone diagnose.


So, what's the minimum level of infrastructure that you think would be
needed to go ahead with this plan? To me it seems like the current
system already isn't working very well, so the bar for moving forward
with a plan that would increase the amount of data we had available to
diagnose problems with intermittents, and reduce the amount of manual
labour needed in marking them, should be quite low.


dbaron raised the point that there are tests which are supposed to fail
intermittently if they detect a bug.  With that in mind, the tests
cannot be marked as intermittently failing by the sheriffs, less so in
an automated way (see the discussion in bug 918921).

Such tests are problematic indeed, but it seems like they're problematicin the current infrastructure too. For example if a test goes fromalways passing to failing 1 time in 10 when it regresses, the first timewe see the regression is likely to be around 10 testruns after theproblem is introduced. That presumably makes it rather hard to trackdown what when things went wrong. Or are we running such tests N timeswhere N is some high enough number that we are confident that the testhas a 95% (or whatever) chance of failing if there is actually aregression? If not maybe we should be. Or perhaps the idea ofindependent testruns isn't useful in the face of all the state we have.

In any case this kind of test could be explicitly excluded from thereruns, which would make the situation the same as it is today.

But to answer your question, I think this is something which can be done
in the test harness itself so we don't need any special infra support
for it.  Note that I don't think that automatically marking such tests
is a good idea either way.

The infra support I had in mind was something like "automatically (doingsomething like) starring tests that only passed after being rerun" or"listing all tests that needed a rerun" or "having a tool to find thefirst build in which the test became intermittent". The goal of thisextra infrastructure would be to get the new information about rerunsout of the testharness and address the concern that doing automatedreruns would mean people paying even less attention to intermittentsthan they do today.


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

Reply via email to