Re: Wish list for tools to help fix intermittent bugs

Andrew Halberstadt Tue, 09 Dec 2014 06:26:15 -0800

We had a session on intermittents in PDX. Additionally we (the ateam)have had several brainstorming sessions prior to the work week. I'll tryto summarize what we talked about and answer your questions at the sametime in-line.


On 08/12/14 03:52 PM, Gijs Kruitbosch wrote:

1) make it easier to figure out from bugzilla/treeherder when and where
the failure first occurred
- I don't want to know the first thing that got reported to bmo - IME,
that is not always the first time it happened, just the first time it
got filed.


In other words, can I query treeherder in some way (we have structured
logs now right, and all this stuff is in a DB somewhere?) with a test
name and a regex, to have it tell me where the test first failed with a
message matching that regex?

Structured logs have been around for a few months now, but only recentlyhas mozharness started using them for determining failure status (andeven now only for a few suites).

The next step is absolutely storing this stuff into a DB. Starting nowand into Q1 we'll be creating a prototype to figure out things likeschemas, costs and logistics. Unlike logs, we want to keep this dataforever, so we need to make sure we get it right.

As part of the prototype phase, we plan to answer some simple questionsthat don't require lots of historical data. Can we identify new flakytests? Can we normalize chunks based on runtime instead of number of tests?

2) make it easier to figure out from bugzilla/treeherder when and where
the failure happens

3) numbers on how frequently a test fails

I think these both tie into number 1. We aren't sure exactly what theschema will look like, but tying metadata about the test run into theresults is obviously something we need to do. These questions wouldbecome easy to answer.

We also want to look into cross correlating data from other systems (e.gbugzilla, orangefactor, ...) into test results. This will likely befurther out though.

4) automate regression hunting (aka mozregression for intermittent
infra-only failures)

Yes, this is explicitly one of the first things we'll be tackling. Oftensheriffs don't have time to go and retrigger backfills, they shouldn'thave to. This sort of but not really depends on the DB project outlinedabove.

5) rr or similar recording of failing test runs

We've talked about this before on this newsgroup, but it's been a long
time. Is this feasible and/or currently in the pipeline?

We're aware of rr, but it's not something that has been called out assomething we should do in the short term. My understanding is that thereare still a lot of unknowns, and getting something stood up inproduction infrastructure will likely be a large multi-quarter project.Maybe :roc can clarify here.

I'm not saying we won't do it, it would be awesome, but it seems likethere are easier wins we can make in the meantime.

~ Gijs

Other things that we talked about that might make dealing withintermittents better:

* dynamic (maybe also static) analysis of new tests to determine commonbad patterns (ehsan has ideas) to be integrated into autoland or apost-commit hook or some kind of quarantine.

* in-tree chunking/more dynamic test scheduling (ability to scheduleonly certain tests). One of the end goals here is for the term"chunking" to disappear from the point of view of developers.

* c++ code coverage tied into the build system with automaticallyupdated reports (I'm working on the build integration pieces on the side).

* automatic filing of intermittents (this is currently what the sheriffsspend the most time on, fixing this frees them up to better monitor thetree).

Thanks for caring about the state of intermittents, they've beenneglected for too long. I'm hopeful that 2015 will bring manyimprovements in this area. And of course, please let us know if you haveany other ideas or would like to help out.


-Andrew
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Wish list for tools to help fix intermittent bugs

Reply via email to