A lot of great discussion here, thanks everyone for taking some time out of your day to weigh in on this subject. There are slight differences between a bug being filed and actively working on the bug once it crosses our threshold of 30 failures/week- I want to discuss when we have looked at the bug and have tried to add context/value, including a ni? request.
Let me comment on a few items here: 1) BUG_COMPONENT, I have been working this quarter to get this completed in-tree (https://bugzilla.mozilla.org/show_bug.cgi?id=1328351 ). Ideally the sheriffs and Bug Filer tools will use this, we can work to fix that. Part of this is ensuring there is an active triager responsible for those components, that is mostly done: https://bugzilla.mozilla.org/page.cgi?id=triage_owners.html. 2) how do we get the right people to see the bugs? We will always ni? the triage owner unless we know a better person to send a ni? request to. In many cases we determine a specific patch causes a regression, we will also include the patch author with a ni? and reviewer as a cc on the bug in those cases. Please watch your components in bugzilla and keep your bugzilla handle updated with PTO. 3) to the point of not clearing the ni? on a bug where we disable the test case, that is easy to do, lets assume that is standard protocol if we are disabling a test (or hacking up the test case) 4) more granular whiteboard tags, and ones that don't use stockwell. We will figure out the right naming, right now it most likely will be extra tags to track when we fixed a disabled test as well as differentiating between test fixes and product fixes. 5) When we triage a bug (initial investigation after crossing 30 failures/week), we will include a brief report of the configuration affected the most along with the number of failures, number of runs, and the percentage failure. This will be retrieved by using |mach test-info <path>| (bug 1345572 for more info) and will look similar to this: Total: 307 failures in 4313 runs or 0.071 failures/run Worst rate on linux32/debug-e10s: 73 failures in 119 runs or 0.613 failures/run 6) using a different metric/threshold for investigating a bug. We looked at 6 months of data from 2016 to come up with this number. Assuming we fixed all of the bugs that are high frequency, Orange Factor would still be 4.78 (as of Monday) which is still unacceptable, we are only interested in investigating tests that have the highest chance of getting fixed or cause the most pain, not just whatever is top 10 or relatively high. My goal is to adjust the threshold down to 20 in the future- that might not be as realistic as I would hope in the short term. Keep in mind sheriffs are human, they make mistakes (filing bugs wrong, ni? the wrong person, etc.) but they are also flexible and will work with you to help get more information or help manage a larger volume of failures and allowing extra time if you are actively debugging the problem. Thanks for the many encouraging comments in this thread and suggestions of how to work out the quirks with this new process. _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform