Re: Policy for disabling tests which run on TBPL

Gavin Sharp Tue, 08 Apr 2014 11:48:36 -0700

I see only two real goals for the proposed policy:
- ensure that module owners/peers have the opportunity to object to
any "disable test" decisions before they take effect
- set an expectation that intermittent orange failures are dealt with
promptly ("dealt with" first involves investigation, usually by a
developer familiar with the code, and can then lead to either them
being fixed, disabled, or ignored)


Neither of those happen reliably today. Sheriffs are failing to get
the help they need to investigate failures, which leads loss of
(sometimes quite important) test coverage when they decide to
unilaterally disable the relevant tests. Sheriffs should not be
disabling tests unilaterally; developers should not be ignoring
sheriff requests to investigate failures.

The policy is not intended to suggest that any particular outcome
(i.e. test disabling) is required.

Separately from all of that, we could definitely invest in better
tools for dealing with intermittent failures in general. Anecdotally,
I know chromium has some nice ways of dealing with them, for example.
But I see that a separate discussion not really related to the goals
above.

Gavin

On Tue, Apr 8, 2014 at 10:20 AM, L. David Baron <dba...@dbaron.org> wrote:
> On Tuesday 2014-04-08 14:51 +0100, James Graham wrote:
>> So, what's the minimum level of infrastructure that you think would
>> be needed to go ahead with this plan? To me it seems like the
>> current system already isn't working very well, so the bar for
>> moving forward with a plan that would increase the amount of data we
>> had available to diagnose problems with intermittents, and reduce
>> the amount of manual labour needed in marking them, should be quite
>> low.
>
> Not sure what plan you're talking about, but:
>
> The first step I'd like to see is having better tools for finding
> where known intermittent failures regressed.  In particular, we
> should have:
>  * the ability to retrigger a partial test run (not the whole
>    suite) on our testing infrastructure.  This doesn't always help,
>    since some failures will happen only in the context of the whole
>    suite, but I think it's likely to help most of the time.
>  * auto-bisection tools for intermittent failures that use the above
>    ability when they can
>
> I think we're pretty good about backing out changesets that cause
> new intermittent failures that happen at ~20% or more failure rates.
> We need to get better about backing out for new intermittent
> failures that are intermittent at lower rates, and being able to do
> that is best done with better tools.
>
>
> (One piece of context I'm coming from:  there have been multiple
> times that the tests that I consider necessary to have enabled to
> allow people to add new CSS properties or values have failed
> intermittently at a reasonably high rate for a few months; I think
> both the start and end of these periods of failures has, in the
> cases where we found it, correlated with major or minor changes to
> the Javascript JIT.  I think those JIT bugs, if they shipped, were
> likely causing real problems for users, and we should be detecting
> those bugs rather than disabling our CSS testing coverage and
> putting us in a state where we can't add new CSS features.)
>
>
> I also don't think that moving the failure threshold is a long-term
> solution.  There will always be tests that hover on the edge of
> whatever the failure threshold is and give us trouble as a result; I
> think moving the threshold will only give temporary relief due to
> the history of writing tests to a stricter standard.  For example,
> if we retry intermittent failures up to 10 times to see if they
> pass, we'll end up with tests that fail 75% of the time and thus
> fail all 10 retries intermittently (5.6% of the time).
>
> -David
>
> --
> 𝄞   L. David Baron                         http://dbaron.org/   𝄂
> 𝄢   Mozilla                          https://www.mozilla.org/   𝄂
>              Before I built a wall I'd ask to know
>              What I was walling in or walling out,
>              And to whom I was like to give offense.
>                - Robert Frost, Mending Wall (1914)
>
> _______________________________________________
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

Reply via email to