On 2/2/16 12:04 PM, Richard Newman wrote:
Once or twice a week

Once a week is not nearly often enough.

As far as I can tell, we have effectively 4.5 weeks or so of beta before things are locked down for ship. Lopping a week off that (or more precisely off whatever time is left after the bug is reported, which is probably not on day 1 of the beta cycle) seriously impacts getting the bug fixed before ship.

But it gets worse. The typical lifetime of a bug goes like this. It gets reported to somewhere like Firefox:Untriaged (I think we have a guided form that automatically dumps things there or something?). Someone goes through and triages it, moving it from there to some component. In a large fraction of cases, it's the wrong component, because making sense of our components is rocket science. But chances are, it's a more-correct component than "Firefox:Untriaged". Now the best-case scenario is that someone is triaging that component and notices the bug. Either it's in the right place and they try to deal with it (e.g. evaluate its impact) or they move it to a hopefully-more-correct component (chances are higher of getting it right now, but still not perfect). Now someone needs to triage _this_ component to notice the bug.

Notice that in the common case, to just get to an evaluation like "hey, should this bug be nominated for tracking?" it needs to go through at least two, more likely three, triage cycles. If each of those is a week long, you will totally fail to sanely handle any bugs reported against beta.

For reference, I've spent a number of years now doing daily triage (typically morning and evening in my timezone, actually) of various core components and in many cases it was still a struggle to get bugs noticed, nominated, approved, assigned, patched, reviewed, landed before the "oh, we've locked down the release" cutoff. And I'm talking about clear "we regressed web compat" bugs here, not something fuzzy that made it not clear that it really shouldn't ship.

This needs to be a daily process, in my opinion. It certainly needs to be a daily process in the typical dumping ground components. Those include *:Untriaged, Core:General, and anywhere people might be tempted to move bugs they don't really understand. For Core that's probably Layout, DOM, XML, File Handling, Document Navigation, and maybe a few others. I won't presume to tell you what they are for Toolkit or Firefox.

    - Doesn't necessarily require extra meetings.

Fwiw, I don't think we need meetings here at least for the parts I care about. I think the gfx triage rotation works reasonably, with no meetings involved. Not sure whether it happens daily, but it's rare for gfx to be a dumping ground component.

    - Isn't a daily obligation.

As I said above, that's a serious drawback in many cases. But I can see how for some particular components maybe the extra lag is OK. Though if your bug volume is low enough that you're sure there's nothing critical in it, then I'm not sure why aiming for triaging it daily is a problem. Then if some days you fail and miss it... no terrible harm done.

    - Allows teams to manage their own costs

Ah, but there are externalities here. Here's an example; not picking on particular components here, but using some real component names to illustrate how this would play out in practice. Say a bug gets filed in Firefox:Untriaged and then is moved to Core:DOM by the initial triage pass and the DOM folks take their sweet time looking at it because they don't want to do daily triage and then determine that it's a layout bug and move it to Core:Layout. Suddenly what happened is that the _benefit_ (not doing frequent triage) accrued to the DOM team but the _cost_ (having to scramble to fix a bug with a lot less schedule room for it) is shouldered by the layout team.

(e.g., the value versus the
 cost of responding to every bug report individually).

Responding to every bug is a higher bar than just getting it into the right place so it can be evaluated. I see no problem with silently moving a misfiled bug into a more appropriate component so it can be triaged there.

If, say, Aaron files a critical bug, I trust him to set the right flags to
go through our current triage processes, or work directly with engineering
managers to find an assignee… and that can be quicker than even a daily
triage, and avoids the need to process that bug. And if I file a NEW
work-item bug, I don't want it to redundantly end up in a triage list the
next day.

I can point to explicit instances of engineers filing critical bugs and then not setting the right flags. I've done it, certainly. It happens, whether through inexperience or forgetfulness or just the pain of setting those flags in the Bugzilla UI and a distraction happening at the wrong moment. Ideally, we would catch it when it happens, set the flags, and if needed (the inexperience case) point out that it needs to be done by the filing engineer.

I can also point to instances of engineers filing bugs they just didn't realize were critical, especially if they're not filing in their own area of expertise.

That said, I can certainly see an argument for not bothering to triage bugs filed by whoever is in the triage rotation to start with, since presumably you can trust them to get it right in most cases. Yes, I know I argued against it above; I don't feel as strongly about this part of things as I do about triage frequency.

When considering the long-term success of a project, is it more important
to triage non-nominated incoming bugs daily, or spend some time going
through that list of 391 bugs to see what's slipping through the cracks, or
re-triage existing product priorities? I think that's a fairly deep
philosophical question.

Sure.  In practice, ideally we would do both.

I think Marco has a legit point here: even if triage is fast and painless,
if nobody owns the component, then there's nobody to look at the bugs. And
yes, that implies that — with this definition — there are components in
which we don't have a focus on quality!

Nobody "owns" Core:General or Core:Untriaged or Firefox:Untriaged... but they need to be owned for triage purposes.

But more to the point, yes, we do have such components. We need to split up the job of triaging them somehow, because bugs that are not actually in those components can end up in them; see above.

-Boris

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to