A lot of great discussion here, thanks everyone for taking some time out of 
your day to weigh in on this subject.  There are slight differences between a 
bug being filed and actively working on the bug once it crosses our threshold 
of 30 failures/week- I want to discuss when we have looked at the bug and have 
tried to add context/value, including a ni? request.

Let me comment on a few items here:
1) BUG_COMPONENT, I have been working this quarter to get this completed 
in-tree (https://bugzilla.mozilla.org/show_bug.cgi?id=1328351 ).  Ideally the 
sheriffs and Bug Filer tools will use this, we can work to fix that.  Part of 
this is ensuring there is an active triager responsible for those components, 
that is mostly done: 
https://bugzilla.mozilla.org/page.cgi?id=triage_owners.html.

2) how do we get the right people to see the bugs?  We will always ni? the 
triage owner unless we know a better person to send a ni? request to.  In many 
cases we determine a specific patch causes a regression, we will also include 
the patch author with a ni? and reviewer as a cc on the bug in those cases.  
Please watch your components in bugzilla and keep your bugzilla handle updated 
with PTO.

3) to the point of not clearing the ni? on a bug where we disable the test 
case, that is easy to do, lets assume that is standard protocol if we are 
disabling a test (or hacking up the test case)

4) more granular whiteboard tags, and ones that don't use stockwell.  We will 
figure out the right naming, right now it most likely will be extra tags to 
track when we fixed a disabled test as well as differentiating between test 
fixes and product fixes.

5) When we triage a bug (initial investigation after crossing 30 
failures/week), we will include a brief report of the configuration affected 
the most along with the number of failures, number of runs, and the percentage 
failure.  This will be retrieved by using |mach test-info <path>| (bug 1345572 
for more info) and will look similar to this:
Total: 307 failures in 4313 runs or 0.071 failures/run
Worst rate on linux32/debug-e10s: 73 failures in 119 runs or 0.613 failures/run

6) using a different metric/threshold for investigating a bug.  We looked at 6 
months of data from 2016 to come up with this number.  Assuming we fixed all of 
the bugs that are high frequency, Orange Factor would still be 4.78 (as of 
Monday) which is still unacceptable, we are only interested in investigating 
tests that have the highest chance of getting fixed or cause the most pain, not 
just whatever is top 10 or relatively high.  My goal is to adjust the threshold 
down to 20 in the future- that might not be as realistic as I would hope in the 
short term.

Keep in mind sheriffs are human, they make mistakes (filing bugs wrong, ni? the 
wrong person, etc.) but they are also flexible and will work with you to help 
get more information or help manage a larger volume of failures and allowing 
extra time if you are actively debugging the problem.

Thanks for the many encouraging comments in this thread and suggestions of how 
to work out the quirks with this new process.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to