Re: Bug Program Next Steps

Boris Zbarsky Tue, 02 Feb 2016 09:56:14 -0800

On 2/2/16 12:04 PM, Richard Newman wrote:

Once or twice a week


Once a week is not nearly often enough.

As far as I can tell, we have effectively 4.5 weeks or so of beta beforethings are locked down for ship. Lopping a week off that (or moreprecisely off whatever time is left after the bug is reported, which isprobably not on day 1 of the beta cycle) seriously impacts getting thebug fixed before ship.

But it gets worse. The typical lifetime of a bug goes like this. Itgets reported to somewhere like Firefox:Untriaged (I think we have aguided form that automatically dumps things there or something?).Someone goes through and triages it, moving it from there to somecomponent. In a large fraction of cases, it's the wrong component,because making sense of our components is rocket science. But chancesare, it's a more-correct component than "Firefox:Untriaged". Now thebest-case scenario is that someone is triaging that component andnotices the bug. Either it's in the right place and they try to dealwith it (e.g. evaluate its impact) or they move it to ahopefully-more-correct component (chances are higher of getting it rightnow, but still not perfect). Now someone needs to triage _this_component to notice the bug.

Notice that in the common case, to just get to an evaluation like "hey,should this bug be nominated for tracking?" it needs to go through atleast two, more likely three, triage cycles. If each of those is a weeklong, you will totally fail to sanely handle any bugs reported against beta.

For reference, I've spent a number of years now doing daily triage(typically morning and evening in my timezone, actually) of various corecomponents and in many cases it was still a struggle to get bugsnoticed, nominated, approved, assigned, patched, reviewed, landed beforethe "oh, we've locked down the release" cutoff. And I'm talking aboutclear "we regressed web compat" bugs here, not something fuzzy that madeit not clear that it really shouldn't ship.

This needs to be a daily process, in my opinion. It certainly needs tobe a daily process in the typical dumping ground components. Thoseinclude *:Untriaged, Core:General, and anywhere people might be temptedto move bugs they don't really understand. For Core that's probablyLayout, DOM, XML, File Handling, Document Navigation, and maybe a fewothers. I won't presume to tell you what they are for Toolkit or Firefox.

    - Doesn't necessarily require extra meetings.

Fwiw, I don't think we need meetings here at least for the parts I careabout. I think the gfx triage rotation works reasonably, with nomeetings involved. Not sure whether it happens daily, but it's rare forgfx to be a dumping ground component.

    - Isn't a daily obligation.

As I said above, that's a serious drawback in many cases. But I can seehow for some particular components maybe the extra lag is OK. Though ifyour bug volume is low enough that you're sure there's nothing criticalin it, then I'm not sure why aiming for triaging it daily is a problem.Then if some days you fail and miss it... no terrible harm done.

    - Allows teams to manage their own costs

Ah, but there are externalities here. Here's an example; not picking onparticular components here, but using some real component names toillustrate how this would play out in practice. Say a bug gets filed inFirefox:Untriaged and then is moved to Core:DOM by the initial triagepass and the DOM folks take their sweet time looking at it because theydon't want to do daily triage and then determine that it's a layout bugand move it to Core:Layout. Suddenly what happened is that the_benefit_ (not doing frequent triage) accrued to the DOM team but the_cost_ (having to scramble to fix a bug with a lot less schedule roomfor it) is shouldered by the layout team.

(e.g., the value versus the
 cost of responding to every bug report individually).

Responding to every bug is a higher bar than just getting it into theright place so it can be evaluated. I see no problem with silentlymoving a misfiled bug into a more appropriate component so it can betriaged there.

If, say, Aaron files a critical bug, I trust him to set the right flags to
go through our current triage processes, or work directly with engineering
managers to find an assignee… and that can be quicker than even a daily
triage, and avoids the need to process that bug. And if I file a NEW
work-item bug, I don't want it to redundantly end up in a triage list the
next day.

I can point to explicit instances of engineers filing critical bugs andthen not setting the right flags. I've done it, certainly. It happens,whether through inexperience or forgetfulness or just the pain ofsetting those flags in the Bugzilla UI and a distraction happening atthe wrong moment. Ideally, we would catch it when it happens, set theflags, and if needed (the inexperience case) point out that it needs tobe done by the filing engineer.

I can also point to instances of engineers filing bugs they just didn'trealize were critical, especially if they're not filing in their ownarea of expertise.

That said, I can certainly see an argument for not bothering to triagebugs filed by whoever is in the triage rotation to start with, sincepresumably you can trust them to get it right in most cases. Yes, Iknow I argued against it above; I don't feel as strongly about this partof things as I do about triage frequency.

When considering the long-term success of a project, is it more important
to triage non-nominated incoming bugs daily, or spend some time going
through that list of 391 bugs to see what's slipping through the cracks, or
re-triage existing product priorities? I think that's a fairly deep
philosophical question.


Sure.  In practice, ideally we would do both.

I think Marco has a legit point here: even if triage is fast and painless,
if nobody owns the component, then there's nobody to look at the bugs. And
yes, that implies that — with this definition — there are components in
which we don't have a focus on quality!

Nobody "owns" Core:General or Core:Untriaged or Firefox:Untriaged... butthey need to be owned for triage purposes.

But more to the point, yes, we do have such components. We need tosplit up the job of triaging them somehow, because bugs that are notactually in those components can end up in them; see above.


-Boris

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Bug Program Next Steps

Reply via email to