Over the last 9 months a few of us have really watched intermittent test 
failures almost daily and done a lot to pester people as well as fix many.  
While there are over 420 bugs that have been fixed since the beginning of the 
year, there are half that many (211+) which have been disabled in some form 
(including turning off the jobs).

We don't like to disable and have been pretty relaxed in recommending disabling 
a test.  Overall we have tried to adhere to a policy of:
* >=30 failures/week- ask for owner to look at failure and fix it, if this 
persists for a few weeks with no real traction we would go ahead [and 
recommend] disabling it.
* >= 75 failures/week- ask for people to fix this in a shorter time frame and 
recommend disabling the test in a week or so
* >= 150 failures/week- often just disable the test

This is confusing and hard to manage.  Since then we have started adjusting 
triage queries and some teams are doing their own triage and we are ignoring 
those bugs (while they are getting prioritized properly). 

What we are looking to start doing this month is adopting a simpler policy:
* any bug that has >=200 instances in the last 30 days will be disabled
** this will be a manual process, so it will happen a couple times/week

We expect the outcome of this to be a similar amount of disabling, just an 
easier method for doing so.  It is very possible we might recommend disabling a 
test before it hits the threshold- keep in mind a disabled test is easy to 
re-enable (so feel free to disable for that one platform until you have time to 
look at fixing it)

To be clear we (and some component owners) will continue triaging bugs and 
trying to get fixes in place as often as possible and prefer a fix, not a 
disabled test!

Please raise any concerns, otherwise we will move forward with this in the 
coming weeks.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to