On Tuesday 2014-04-08 14:51 +0100, James Graham wrote:
> So, what's the minimum level of infrastructure that you think would
> be needed to go ahead with this plan? To me it seems like the
> current system already isn't working very well, so the bar for
> moving forward with a plan that would increase the amount of data we
> had available to diagnose problems with intermittents, and reduce
> the amount of manual labour needed in marking them, should be quite
> low.

Not sure what plan you're talking about, but:

The first step I'd like to see is having better tools for finding
where known intermittent failures regressed.  In particular, we
should have:
 * the ability to retrigger a partial test run (not the whole
   suite) on our testing infrastructure.  This doesn't always help,
   since some failures will happen only in the context of the whole
   suite, but I think it's likely to help most of the time.
 * auto-bisection tools for intermittent failures that use the above
   ability when they can

I think we're pretty good about backing out changesets that cause
new intermittent failures that happen at ~20% or more failure rates.
We need to get better about backing out for new intermittent
failures that are intermittent at lower rates, and being able to do
that is best done with better tools.


(One piece of context I'm coming from:  there have been multiple
times that the tests that I consider necessary to have enabled to
allow people to add new CSS properties or values have failed
intermittently at a reasonably high rate for a few months; I think
both the start and end of these periods of failures has, in the
cases where we found it, correlated with major or minor changes to
the Javascript JIT.  I think those JIT bugs, if they shipped, were
likely causing real problems for users, and we should be detecting
those bugs rather than disabling our CSS testing coverage and
putting us in a state where we can't add new CSS features.)


I also don't think that moving the failure threshold is a long-term
solution.  There will always be tests that hover on the edge of
whatever the failure threshold is and give us trouble as a result; I
think moving the threshold will only give temporary relief due to
the history of writing tests to a stricter standard.  For example,
if we retry intermittent failures up to 10 times to see if they
pass, we'll end up with tests that fail 75% of the time and thus
fail all 10 retries intermittently (5.6% of the time).

-David

-- 
𝄞   L. David Baron                         http://dbaron.org/   𝄂
𝄢   Mozilla                          https://www.mozilla.org/   𝄂
             Before I built a wall I'd ask to know
             What I was walling in or walling out,
             And to whom I was like to give offense.
               - Robert Frost, Mending Wall (1914)

Attachment: signature.asc
Description: Digital signature

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to