Re: try: -p all considered harmful?

Justin Lebar Sun, 30 Sep 2012 00:44:14 -0700

>> Also, I hope this manual override is not a pain to use.  Pretty please?
>> :)
>
> The hook attached to the bug requires that you include a short string token
> in your commit message. The token is generated as a function of time, your
> ldap name, and a local secret. Without specifying the token the hook will
> reject your 2nd push, remind you that you can cancel your previous jobs, and
> give you the token as well as the time at which the token expires.


That sounds reasonable to me.  If it's really a pain in practice, one
can always script around it, which I think is a fair trade-off.  (If I
were to work around it in my git-push-to-try script, I'd require the
user to confirm that they wanted the second push; I wouldn't make it
automatic.)

Thanks.

>>> Surface [the leaderboard of try abusers] on tbpl, clearly visible on the
>>> inbound pushes. Public shaming ftw.
>
> The intent here is definitely not public shaming. More like public
> awareness. I'm in no position to judge if all those pushes are using try
> effectively.

Yeah, that's the problem with public shaming or per-developer quotas, agreed.

> We're all trying to build the best system we can here. We've been publishing
> as much raw data as we can, as well as reports like wait time data for ages.
> We're not trying to hide this stuff away.

I understand.  My point is just that the data we currently have isn't
what we actually want to measure.  Wait times for individual parts of
a try push don't tell the whole story.  If Linux-64 wait times go
down, what fraction of people get their full try results faster?
(That is, how often is Linux-64 on the critical path for a try push?)
I honestly don't know.

>> If we're going to hold anyone publicly accountable, I think it should
>> be the teams which are responsible for ensuring we have enough
>> resources to run builds and tests.
>
> At the same time, it's impossible
> to give any kind of SLA when the build/test load is unbounded.

I feel like this is over-simplifying the problem to the point of
making it impossible to solve; that is, it's a straw man.

The requirement for releng/IT is not "build a system which scales to
arbitrary load."  It's "build a system which scales to current load
today and to expected load in the future, adjusting our expectations
as time goes on."

I hope we all agree that by this metric, we're currently failing.  The
current infrastructure does not meet demand.  (Indeed, demand is
actually higher than the jobs we're currently running, because we
would very much like to disable coalescing on m-i, but we can't do
that for lack of capacity.)  All I'm saying is that we currently don't
have the right public data to determine, after X amount of time has
passed, whether we've made any progress in this respect.

(This data would be critical even if we /were/ meeting current demand,
because it would alert us to future increases in demand above
capacity, which would help us avoid in the getting into the situation
we're currently in, where we don't start working on a problem until it
starts preventing many people from doing work.)

Whether we make our infra meet demand by decreasing demand (e.g. -p
any, which it seems like we all agree is a good idea, or the hook
asking for confirmation before pushing multiple try builds) or by
increasing capacity is an entirely different question.  I'm even open
to debating who should ultimately be responsible for making our infra
meet demand (although I do think that someone needs to own this).  But
based on my experience, step zero is measuring.

>> We should have a public dashboard showing end-to-end tryserver times
>> -- starting with a push, how long did it take for all the requested
>> tests to complete?  And we should surface not only the mean, but
>> quantiles -- that is, how long were wait times for the 90th percentile
>> of longest wait times?
>
> I can take another stab at this. However, I'm not sure that try is the best
> branch to do this on, since the best-case end-to-end time varies drastically
> depending on which platforms/tests were selected, and if the user opted to
> cancel or rebuild jobs later.

That could be.  If the raw data consists of tuples of the form
(trychooser params, end-to-end time) or (number of jobs requested,
end-to-end time), we can play with it and figure out what's the best
way to present the data.

>> We've seen that where we don't have tracking -- e.g. for how long it
>> takes to push to try [1], or basically for anything else at Mozilla --
>> we often regress the metric we're interested in.  You make what you
>> measure.  If we want consistently fast try pushes, it's hard to
>> imagine how we'd get there without public data monitoring exactly the
>> thing we're interested in.
>
> I'm sure IT would love some help in figuring out how to measure this.

There were lots of suggestions in the bug.  For example, one could
periodically push a nop patch queue to try.  You could modify the
trychooser syntax to support this, but one could also just include
invalid trychooser syntax, since at the moment that results in no
builds.  (That's another one of my pet peeves, but this would let you
call it a feature!  :)

I got the impression that bug was stalled because it's not a priority,
not because nobody could figure it out.  But perhaps I'm mistaken.

-Justin

>> [1] https://bugzilla.mozilla.org/show_bug.cgi?id=691459
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: try: -p all considered harmful?

Reply via email to