We had a session on intermittents in PDX. Additionally we (the ateam)
have had several brainstorming sessions prior to the work week. I'll try
to summarize what we talked about and answer your questions at the same
time in-line.
On 08/12/14 03:52 PM, Gijs Kruitbosch wrote:
1) make it easier to figure out from bugzilla/treeherder when and where
the failure first occurred
- I don't want to know the first thing that got reported to bmo - IME,
that is not always the first time it happened, just the first time it
got filed.
In other words, can I query treeherder in some way (we have structured
logs now right, and all this stuff is in a DB somewhere?) with a test
name and a regex, to have it tell me where the test first failed with a
message matching that regex?
Structured logs have been around for a few months now, but only recently
has mozharness started using them for determining failure status (and
even now only for a few suites).
The next step is absolutely storing this stuff into a DB. Starting now
and into Q1 we'll be creating a prototype to figure out things like
schemas, costs and logistics. Unlike logs, we want to keep this data
forever, so we need to make sure we get it right.
As part of the prototype phase, we plan to answer some simple questions
that don't require lots of historical data. Can we identify new flaky
tests? Can we normalize chunks based on runtime instead of number of tests?
2) make it easier to figure out from bugzilla/treeherder when and where
the failure happens
3) numbers on how frequently a test fails
I think these both tie into number 1. We aren't sure exactly what the
schema will look like, but tying metadata about the test run into the
results is obviously something we need to do. These questions would
become easy to answer.
We also want to look into cross correlating data from other systems (e.g
bugzilla, orangefactor, ...) into test results. This will likely be
further out though.
4) automate regression hunting (aka mozregression for intermittent
infra-only failures)
Yes, this is explicitly one of the first things we'll be tackling. Often
sheriffs don't have time to go and retrigger backfills, they shouldn't
have to. This sort of but not really depends on the DB project outlined
above.
5) rr or similar recording of failing test runs
We've talked about this before on this newsgroup, but it's been a long
time. Is this feasible and/or currently in the pipeline?
We're aware of rr, but it's not something that has been called out as
something we should do in the short term. My understanding is that there
are still a lot of unknowns, and getting something stood up in
production infrastructure will likely be a large multi-quarter project.
Maybe :roc can clarify here.
I'm not saying we won't do it, it would be awesome, but it seems like
there are easier wins we can make in the meantime.
~ Gijs
Other things that we talked about that might make dealing with
intermittents better:
* dynamic (maybe also static) analysis of new tests to determine common
bad patterns (ehsan has ideas) to be integrated into autoland or a
post-commit hook or some kind of quarantine.
* in-tree chunking/more dynamic test scheduling (ability to schedule
only certain tests). One of the end goals here is for the term
"chunking" to disappear from the point of view of developers.
* c++ code coverage tied into the build system with automatically
updated reports (I'm working on the build integration pieces on the side).
* automatic filing of intermittents (this is currently what the sheriffs
spend the most time on, fixing this frees them up to better monitor the
tree).
Thanks for caring about the state of intermittents, they've been
neglected for too long. I'm hopeful that 2015 will bring many
improvements in this area. And of course, please let us know if you have
any other ideas or would like to help out.
-Andrew
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform