> If Try is hogging resources needed by Inbound, we should lower the priority
> of Try.

> Inbound is not for catching pesky WinXP-only failures. Try is.

> I'd even go as far to suggest that we should *require* a green Try run
> before allowing people to land, for everything except "simple" changes.

We already had this debate, extensively.

As part of this debate, I calculated that for patches with a
reasonable chance of success, pushing to m-i and failing actually
saves resources.

So in the current state of affairs, where we do not have as much
resources as we'd like, I showed how it's entirely responsible for
developers to push not-entirely-tested code to m-i.

I also would love to have more infrastructure capacity, eliminate
coalescing on m-i, and have sufficient capacity to require a green try
run for all non-trivial changes.  (I don't want to require try runs,
but I do want to have enough capacity so we /could/.)

But the facts on the ground are that we don't have sufficient capacity
to do any of these things, and releng/it is already well aware of our
pain in this respect.

> One proposal that's been made elsewhere 
> (https://bugzilla.mozilla.org/show_bug.cgi?id=791385) is to have a soft limit 
> of one active push per developer on try. If you try and push a 2nd time before
> your previous jobs are all finished, you will be asked to cancel your 
> previous jobs. There would be some kind of manual override that would allow 
> you to push additional patches.

I think this would likely be much less impactful than than bholley's
proposed -p any, since in the common circumstance where I push to try,
notice it's going permaorange on all platforms, and then want to
cancel all remaining builds/tests, I've already wasted a lot of
resources which would have been saved by -p any.

That's not to say it's not an interesting idea; I just hope it gets
prioritized appropriately.

Also, I hope this manual override is not a pain to use.  Pretty please?  :)

> Surface [the leaderboard of try abusers] on tbpl, clearly visible on the 
> inbound pushes. Public shaming ftw.

If we're going to hold anyone publicly accountable, I think it should
be the teams which are responsible for ensuring we have enough
resources to run builds and tests.

We should have a public dashboard showing end-to-end tryserver times
-- starting with a push, how long did it take for all the requested
tests to complete?  And we should surface not only the mean, but
quantiles -- that is, how long were wait times for the 90th percentile
of longest wait times?

I understand an intern worked on an approximation of this, but didn't
entirely get there, so his tool hasn't been publicly released.

If the expectation is that developers should be accountable for the
resources they use, I think it's only fair that releng/it be
accountable for the resources they provide.

We've seen that where we don't have tracking -- e.g. for how long it
takes to push to try [1], or basically for anything else at Mozilla --
we often regress the metric we're interested in.  You make what you
measure.  If we want consistently fast try pushes, it's hard to
imagine how we'd get there without public data monitoring exactly the
thing we're interested in.

-Justin

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=691459
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to