Just ran across this thread. I'm not quote sure its what you're thinking of,
but this may be of interest:
https://github.com/Ealdwulf/bbchop
It's a tool for bisection of intermittent bugs, based on Bayesian search
theory. That is it, is supposed to find the intermittent bug, as opposed to
fin
I have made small edits thanks to Kyle and Karl, the official policy is posted
on the Sheriffing wiki page:
https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/l
Thank you for putting this together. It is important.
jmaher writes:
> This policy will define an escalation path for when a single test case is
> identified to be leaking or failing and is causing enough disruption on the
> trees.
> Exceptions:
> 1) If this test has landed (or been modified) i
On Tuesday, April 15, 2014 9:42:25 AM UTC-4, Kyle Huey wrote:
> On Tue, Apr 15, 2014 at 6:21 AM, jmaher wrote:
>
> > This policy will define an escalation path for when a single test case is
> > identified to be leaking or failing and is causing enough disruption on the
> > trees. Disruption is
On Tue, Apr 15, 2014 at 6:21 AM, jmaher wrote:
> This policy will define an escalation path for when a single test case is
> identified to be leaking or failing and is causing enough disruption on the
> trees. Disruption is defined as:
> 1) Test case is on the list of top 20 intermittent failure
I want to express my thanks to everyone who contributed to this thread. We
have a lot of passionate and smart people who care about this topic- thanks
again for weighing in so far.
Below is a slightly updated policy from the original, and following that is an
attempt to summarize the thread an
On 2014-04-09, 6:46 PM, Chris Peterson wrote:
On 4/9/14, 11:48 AM, Gregory Szorc wrote:
I feel a lot of people just shrug shoulders and allow the test to be
disabled (I'm guilty of it as much as anyone). From my perspective, it's
difficult to convince the powers at be that fixing intermittent fa
On 4/9/14, 11:48 AM, Gregory Szorc wrote:
I feel a lot of people just shrug shoulders and allow the test to be
disabled (I'm guilty of it as much as anyone). From my perspective, it's
difficult to convince the powers at be that fixing intermittent failures
(that have been successfully swept under
On 4/9/14, 2:07 PM, Karl Tomlinson wrote:
Gregory Szorc writes:
2) Run marked intermittent tests multiple times. If it works all
25 times, fail the test run for inconsistent metadata.
We need to consider intermittently failing tests as failed, and we
need to only test things that always pass.
Gregory Szorc writes:
> 2) Run marked intermittent tests multiple times. If it works all
> 25 times, fail the test run for inconsistent metadata.
We need to consider intermittently failing tests as failed, and we
need to only test things that always pass.
We can't rely on statistics to tell us a
On 4/9/14, 11:29 AM, L. David Baron wrote:
On Wednesday 2014-04-09 11:00 -0700, Gregory Szorc wrote:
The simple solution is to have a separate in-tree manifest
annotation for intermittents. Put another way, we can describe
exactly why we are not running a test. This is kinda/sorta the realm
of b
On Wednesday 2014-04-09 11:00 -0700, Gregory Szorc wrote:
> The simple solution is to have a separate in-tree manifest
> annotation for intermittents. Put another way, we can describe
> exactly why we are not running a test. This is kinda/sorta the realm
> of bug 922581.
>
> The harder solution is
On 4/8/14, 6:51 AM, James Graham wrote:
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek
wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that i
On 2014-04-08, 6:10 PM, Karl Tomlinson wrote:
I wonder whether the real problem here is that we have too many
bad tests that report false negatives, and these bad tests are
reducing the value of our testsuite in general. Tests also need
to be well documented so that people can understand what a
Aryeh Gregor writes:
> On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote:
>> What you're saying above is true *if* someone investigates the
>> intermittent test failure and determines that the bug is not
>> important. But in my experience, that's not what happens at
>> all. I think many peopl
On 2014-04-08, 3:15 PM, Chris Peterson wrote:
On 4/8/14, 11:41 AM, Gavin Sharp wrote:
Separately from all of that, we could definitely invest in better
tools for dealing with intermittent failures in general. Anecdotally,
I know chromium has some nice ways of dealing with them, for example.
But
On 4/8/14, 11:41 AM, Gavin Sharp wrote:
Separately from all of that, we could definitely invest in better
tools for dealing with intermittent failures in general. Anecdotally,
I know chromium has some nice ways of dealing with them, for example.
But I see that a separate discussion not really rel
On Tuesday 2014-04-08 11:41 -0700, Gavin Sharp wrote:
> I see only two real goals for the proposed policy:
> - ensure that module owners/peers have the opportunity to object to
> any "disable test" decisions before they take effect
> - set an expectation that intermittent orange failures are dealt
I see only two real goals for the proposed policy:
- ensure that module owners/peers have the opportunity to object to
any "disable test" decisions before they take effect
- set an expectation that intermittent orange failures are dealt with
promptly ("dealt with" first involves investigation, usua
On Tuesday 2014-04-08 14:51 +0100, James Graham wrote:
> So, what's the minimum level of infrastructure that you think would
> be needed to go ahead with this plan? To me it seems like the
> current system already isn't working very well, so the bar for
> moving forward with a plan that would incre
On 08/04/14 15:06, Ehsan Akhgari wrote:
On 2014-04-08, 9:51 AM, James Graham wrote:
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek
wrote:
If a bug is causing a test to fail intermittently, then that test
l
On 2014-04-08, 8:15 AM, Aryeh Gregor wrote:
On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote:
What you're saying above is true *if* someone investigates the intermittent
test failure and determines that the bug is not important. But in my
experience, that's not what happens at all. I think
On 2014-04-08, 9:51 AM, James Graham wrote:
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek
wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in th
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek
wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to
On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote:
> What you're saying above is true *if* someone investigates the intermittent
> test failure and determines that the bug is not important. But in my
> experience, that's not what happens at all. I think many people treat
> intermittent test fa
On 2014-04-07, 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able
On 2014-04-07, 11:12 AM, Ted Mielczarek wrote:
It's difficult to say whether bugs we find via tests are more or less
important than bugs we find via users. It's entirely possible that
lots of the bugs that cause intermittent test failures cause
intermittent weird behavior for our users, we simp
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote:
> If a bug is causing a test to fail intermittently, then that test loses
> value. It still has some value in that it can catch regressions that
> cause it to fail permanently, but we would not be able to catch a
> regression that causes it to
On 4/7/2014 9:02 AM, Aryeh Gregor wrote:
> On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt
> wrote:
>> I would guess the former is true in most cases. But at least there we have a
>> *chance* at tracking down and fixing the failure, even if it takes awhile
>> before it becomes annoying enough t
On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt
wrote:
> I would guess the former is true in most cases. But at least there we have a
> *chance* at tracking down and fixing the failure, even if it takes awhile
> before it becomes annoying enough to prioritize. If we made it so
> intermittents n
On 07/04/14 05:10 AM, James Graham wrote:
On 07/04/14 04:33, Andrew Halberstadt wrote:
On 06/04/14 08:59 AM, Aryeh Gregor wrote:
Is there any reason in principle that we couldn't have the test runner
automatically rerun tests with known intermittent failures a few
times, and let the test pass i
On Mon, Apr 7, 2014 at 6:33 AM, Andrew Halberstadt
wrote:
> Many of our test runners have that ability. But doing this implies that
> intermittents are always the fault of the test. We'd be missing whole
> classes of regressions (notably race conditions).
We already are, because we already will s
On 07/04/14 04:33, Andrew Halberstadt wrote:
On 06/04/14 08:59 AM, Aryeh Gregor wrote:
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari
wrote:
Note that is only accurate to a certain point. There are other
things which
we can do to guesswork our way out of the situation for Autoland, but of
cou
On 04/04/14 03:44 PM, Ehsan Akhgari wrote:
On 2014-04-04, 3:12 PM, L. David Baron wrote:
Are you talking about newly-added tests, or tests that have been
passing for a long time and recently started failing?
In the latter case, the burden should fall on the regressing patch,
and the regressing
On 06/04/14 08:59 AM, Aryeh Gregor wrote:
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari wrote:
Note that is only accurate to a certain point. There are other things which
we can do to guesswork our way out of the situation for Autoland, but of
course they're resource/time intensive (basically
On Fri, 4 Apr 2014 11:58:28 -0700 (PDT), jmaher wrote:
> Two exceptions:
> 2) When we are bringing a new platform online (Android 2.3, b2g, etc.) many
> tests will need to be disabled prior to getting the tests on tbpl.
It makes sense to disable some tests so that others can run.
I assume bugs
On Fri, 4 Apr 2014 12:49:45 -0700 (PDT), jmaher wrote:
>> overburdened in other ways (e.g., reviews). the burden
>> needs to be placed on the regressing change rather than the original
>> author of the test.
>
> I am open to ideas to help figure out the offending changes. My
> understanding is m
On 06 April 2014 14:58:24, Ehsan Akhgari wrote:
On 2014-04-06, 8:59 AM, Aryeh Gregor wrote:
Is there any reason in principle that we couldn't have the test runner
automatically rerun tests with known intermittent failures a few
times, and let the test pass if it passes a few times in a row after
On 2014-04-06, 8:59 AM, Aryeh Gregor wrote:
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari wrote:
Note that is only accurate to a certain point. There are other things which
we can do to guesswork our way out of the situation for Autoland, but of
course they're resource/time intensive (basical
On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari wrote:
> Note that is only accurate to a certain point. There are other things which
> we can do to guesswork our way out of the situation for Autoland, but of
> course they're resource/time intensive (basically running orange tests over
> and over a
On Friday 2014-04-04 12:49 -0700, jmaher wrote:
> > If this plan is applied to existing tests, then it will lead to
> > style system mochitests being turned off due to other regressions
> > because I'm the person who wrote them and the module owner, and I
> > don't always have time to deal with reg
On 4/4/14, 2:21 PM, Martin Thomson wrote:
On 2014-04-04, at 14:02, Ehsan Akhgari wrote:
That's not true, we were in that state once, before I stopped working on this
issue. We can get there again if we wanted to. It's just a lot of hard work
which won't scale if we only have one person doi
On 2014-04-04, at 14:02, Ehsan Akhgari wrote:
> That's not true, we were in that state once, before I stopped working on this
> issue. We can get there again if we wanted to. It's just a lot of hard work
> which won't scale if we only have one person doing it.
It’s self-correcting too. Turn
On 2014-04-04, 4:58 PM, Jonathan Griffin wrote:
With respect to Autoland, I think we'll need to figure out how to make
it take intermittents into account. I don't think we'll ever be a state
with 0 intermittents.
That's not true, we were in that state once, before I stopped working on
this is
On 2014-04-04, 4:30 PM, Chris Peterson wrote:
On 4/4/14, 1:19 PM, Gavin Sharp wrote:
The majority of the time identifying the regressing patch is
difficult
Identifying the regressing patch is only difficult because we have so
many intermittently failing tests.
Intermittent oranges are one of
With respect to Autoland, I think we'll need to figure out how to make
it take intermittents into account. I don't think we'll ever be a state
with 0 intermittents.
Jonathan
On 4/4/2014 1:30 PM, Chris Peterson wrote:
On 4/4/14, 1:19 PM, Gavin Sharp wrote:
The majority of the time identifyin
On 4/4/14, 1:19 PM, Gavin Sharp wrote:
The majority of the time identifying the regressing patch is
difficult
Identifying the regressing patch is only difficult because we have so
many intermittently failing tests.
Intermittent oranges are one of the major blockers for Autoland. If TBPL
nev
On Fri, Apr 4, 2014 at 12:12 PM, L. David Baron wrote:
>> Escalation path:
>> 1) Ensure we have a bug on file, with the test author, reviewer, module
>> owner, and any other interested parties, links to logs, etc.
>> 2) We need to needinfo? and expect a response within 2 business days, this
>> s
>
> > 4) In the case we go another 2 days with no response from a module owner,
> > we will disable the test.
>
>
>
> Are you talking about newly-added tests, or tests that have been
>
> passing for a long time and recently started failing?
>
>
>
> In the latter case, the burden should fal
On 2014-04-04, 3:12 PM, L. David Baron wrote:
On Friday 2014-04-04 11:58 -0700, jmaher wrote:
As the sheriff's know it is frustrating to deal with hundreds of tests that
fail on a daily basis, but are intermittent.
When a single test case is identified to be leaking or failing at least 10% of
On Friday 2014-04-04 11:58 -0700, jmaher wrote:
> As the sheriff's know it is frustrating to deal with hundreds of tests that
> fail on a daily basis, but are intermittent.
>
> When a single test case is identified to be leaking or failing at least 10%
> of the time, it is time to escalate.
>
>
52 matches
Mail list logo