Wish list for tools to help fix intermittent bugs

Gijs Kruitbosch Mon, 08 Dec 2014 12:56:36 -0800

Because I've been working on a few of them and here's what I think wouldmake them a lot easier to fix, and therefore improve our test coverageand make sheriffs much happier

1) make it easier to figure out from bugzilla/treeherder when and wherethe failure first occurred- I don't want to know the first thing that got reported to bmo - IME,that is not always the first time it happened, just the first time itgot filed.

In other words, can I query treeherder in some way (we have structuredlogs now right, and all this stuff is in a DB somewhere?) with a testname and a regex, to have it tell me where the test first failed with amessage matching that regex?

2) make it easier to figure out from bugzilla/treeherder when and wherethe failure happens


Linux only? Debug only? (non-)e10s only?

These questions are reasonably OK to answer right now by expanding allthe TBPL comments and using 'find in page'.


Harder questions to figure out are:

How often does this happen on which platform? Id est, more likely tohappen on debug, linux, asan, ... ? This helps with figuring out optimalstrategies to test fixes and/or regression hunt

I'm thinking a table with OS vs. debug/opt/asan/pgo vs. e10s/non-e10sand numbers in the cells would already go a long way.


3) numbers on how frequently a test fails

"But we have this in orange-factor" I hear you say. Sure, but that tellsme how often it got starred, not a percentage ("failed 1% of the time onLinux debug, 2% of the time on Windows 7 pgo, ..."), and so I can't knowhow often to retrigger until I try. It also makes it hard to estimatewhen the intermittent started being intermittent because it's rarely thecset from (1) - given failure in 1 out of N runs, the likely regressionrange is correlated with N (can't be bothered doing the exactprobability math right now).

This is an increasing problem because we run more and more jobs everymonth, and so the threshold for annoyance for the sheriffs is gettinglower and lower.

4) automate regression hunting (aka mozregression for intermittentinfra-only failures)

see https://bugzilla.mozilla.org/show_bug.cgi?id=1099095 for an exampleof how this works manually. We have APIs for retriggering now, right? Wehave APIs for distinguishing relevant failures in logs from unrelatedorange, too. With the above, it should even be possible to narrow downwhich platforms to retrigger on (I ended up just using winxp/7/linuxdebug because they seemed most prominent, but I was too lazy to manuallycreate (2)), and how often to retrigger to get reasonable confidence inranges (3).

Right now, doing this manually costs me probably a full day or two of mytime to actually pore over results and such, with obviously a lot moretime spent waiting on the retriggers themselves. Automating this canreduce this to 10 minutes of putting together the data and setting offthe requisite automation, plus it could theoretically strategize to runretriggers at non-peak times.


5) rr or similar recording of failing test runs

We've talked about this before on this newsgroup, but it's been a longtime. Is this feasible and/or currently in the pipeline?

Do we have projects on any of this, and if not, can we start some? Doother people have other ideas on how to make this stuff easier,especially considering my note under (3) regarding the (implicit)threshold percentage going lower and lower, and that making this harderand harder (ie understanding and fixing bugs that show up on <1% of runs)?


~ Gijs
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Wish list for tools to help fix intermittent bugs

Reply via email to