On Tuesday 2014-04-08 14:51 +0100, James Graham wrote: > So, what's the minimum level of infrastructure that you think would > be needed to go ahead with this plan? To me it seems like the > current system already isn't working very well, so the bar for > moving forward with a plan that would increase the amount of data we > had available to diagnose problems with intermittents, and reduce > the amount of manual labour needed in marking them, should be quite > low.
Not sure what plan you're talking about, but: The first step I'd like to see is having better tools for finding where known intermittent failures regressed. In particular, we should have: * the ability to retrigger a partial test run (not the whole suite) on our testing infrastructure. This doesn't always help, since some failures will happen only in the context of the whole suite, but I think it's likely to help most of the time. * auto-bisection tools for intermittent failures that use the above ability when they can I think we're pretty good about backing out changesets that cause new intermittent failures that happen at ~20% or more failure rates. We need to get better about backing out for new intermittent failures that are intermittent at lower rates, and being able to do that is best done with better tools. (One piece of context I'm coming from: there have been multiple times that the tests that I consider necessary to have enabled to allow people to add new CSS properties or values have failed intermittently at a reasonably high rate for a few months; I think both the start and end of these periods of failures has, in the cases where we found it, correlated with major or minor changes to the Javascript JIT. I think those JIT bugs, if they shipped, were likely causing real problems for users, and we should be detecting those bugs rather than disabling our CSS testing coverage and putting us in a state where we can't add new CSS features.) I also don't think that moving the failure threshold is a long-term solution. There will always be tests that hover on the edge of whatever the failure threshold is and give us trouble as a result; I think moving the threshold will only give temporary relief due to the history of writing tests to a stricter standard. For example, if we retry intermittent failures up to 10 times to see if they pass, we'll end up with tests that fail 75% of the time and thus fail all 10 retries intermittently (5.6% of the time). -David -- 𝄞 L. David Baron http://dbaron.org/ 𝄂 𝄢 Mozilla https://www.mozilla.org/ 𝄂 Before I built a wall I'd ask to know What I was walling in or walling out, And to whom I was like to give offense. - Robert Frost, Mending Wall (1914)
signature.asc
Description: Digital signature
_______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform