> If Try is hogging resources needed by Inbound, we should lower the priority > of Try.
> Inbound is not for catching pesky WinXP-only failures. Try is. > I'd even go as far to suggest that we should *require* a green Try run > before allowing people to land, for everything except "simple" changes. We already had this debate, extensively. As part of this debate, I calculated that for patches with a reasonable chance of success, pushing to m-i and failing actually saves resources. So in the current state of affairs, where we do not have as much resources as we'd like, I showed how it's entirely responsible for developers to push not-entirely-tested code to m-i. I also would love to have more infrastructure capacity, eliminate coalescing on m-i, and have sufficient capacity to require a green try run for all non-trivial changes. (I don't want to require try runs, but I do want to have enough capacity so we /could/.) But the facts on the ground are that we don't have sufficient capacity to do any of these things, and releng/it is already well aware of our pain in this respect. > One proposal that's been made elsewhere > (https://bugzilla.mozilla.org/show_bug.cgi?id=791385) is to have a soft limit > of one active push per developer on try. If you try and push a 2nd time before > your previous jobs are all finished, you will be asked to cancel your > previous jobs. There would be some kind of manual override that would allow > you to push additional patches. I think this would likely be much less impactful than than bholley's proposed -p any, since in the common circumstance where I push to try, notice it's going permaorange on all platforms, and then want to cancel all remaining builds/tests, I've already wasted a lot of resources which would have been saved by -p any. That's not to say it's not an interesting idea; I just hope it gets prioritized appropriately. Also, I hope this manual override is not a pain to use. Pretty please? :) > Surface [the leaderboard of try abusers] on tbpl, clearly visible on the > inbound pushes. Public shaming ftw. If we're going to hold anyone publicly accountable, I think it should be the teams which are responsible for ensuring we have enough resources to run builds and tests. We should have a public dashboard showing end-to-end tryserver times -- starting with a push, how long did it take for all the requested tests to complete? And we should surface not only the mean, but quantiles -- that is, how long were wait times for the 90th percentile of longest wait times? I understand an intern worked on an approximation of this, but didn't entirely get there, so his tool hasn't been publicly released. If the expectation is that developers should be accountable for the resources they use, I think it's only fair that releng/it be accountable for the resources they provide. We've seen that where we don't have tracking -- e.g. for how long it takes to push to try [1], or basically for anything else at Mozilla -- we often regress the metric we're interested in. You make what you measure. If we want consistently fast try pushes, it's hard to imagine how we'd get there without public data monitoring exactly the thing we're interested in. -Justin [1] https://bugzilla.mozilla.org/show_bug.cgi?id=691459 _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform