Re: Optimizing what runs on which push

Gregory Szorc Wed, 31 May 2017 10:32:45 -0700

On Wed, May 31, 2017 at 6:26 AM, Benjamin Smedberg <benja...@smedbergs.us>
wrote:

> I don't know if I'm the typical use-case, but the big problem for me is
> that when I change something such as plugins, the jobs as currently
> bucketed don't help much. There are reftests, crashtests, mochitest-plain,
> and mochitest-browser-chrome which test plugin code paths, and the
> splitting means that I pretty much need to run all of those suites on try
> to get adequate coverage.
>

Exactly.

FWIW my ideal end state is try syntax is eliminated or relegated to a <5%
use case because the tools figure out the optimal set of what to run based
on what changed.

>
> There is a facility in the tree for tagging tests (at least some suites)
> and running only tests with certain tags. However, that doesn't help on try
> very much because you can't tell try to run only a certain set of tags.
>

Not supporting this is a bug IMO.

Of course, you can argue that the tagging system is a subset or one-off of
a properly designed "change impacts" system. The difference is the tagging
system exists today, so it provides end-user benefit today.

>
> Your ultimate goal is to save $$, and my ultimate goal is to reduce the
> time to results to make the developer cycle faster (let's measure
> round-trip times in minutes instead of hours). I don't believe that running
> subsets of the current jobs will solve either of those goals. Maybe it's a
> step along the path, but I don't see how that fits together yet.
>

I agree we should largely leave money out of the discussion. While the cost
to run the CI is significant, it is still relatively cheap compared to
people time (developers cost ~1000x more than many EC2 instances). The
people time that will be saved from an efficient development cycle will
dwarf the money savings from reduced platform consumption. And deploying a
more efficient CI pipeline will naturally reduce operational costs. So we
should keep focused on the people impact.

That being said, we also need to take care to not drastically increase our
cost to run CI because it is a non-negligible expense. What's happening now
is groups like Stylo and Quantum want one-off build configurations. We also
have "Go Faster" efforts to decouple development of some features from core
Firefox, leading to more one-off build configurations. Running most of the
jobs most of the time with N+1 build configurations quickly increases our
CI operational costs. And more intelligently running things based on what
changed can keep costs in check, avoiding most discussions about budget,
value, etc.

>
> Related to this, you need to remember one of the primary functions of try
> is to validate changes before landing on inbound/autoland, and so reduce
> the backout rate from sheriffs. Running subsets of tests will increase the
> backout rate. I think that's probably ok, but we need to be aware of this
> social/workflow impact as it's not just a technical decision.
>

Agreed. Anecdotally, I find it more frustrating to be backed out the
farther down the release pipeline the changeset is. My threshold for
getting more than inconvenienced (read: mildly frustrated) is when things
run OK on autoland/inbound then fail on central.

I also agree we should be concerned about sheriff impact. FWIW, we would
like most backouts to be automated. But this requires a way to identify
when a changeset is good. This is actually a hard problem. We were planning
to implement an API on Treeherder to determine this. However, this project
seemed to have gotten lost as part of recent reorgs. My guess is it will
surface again sometime in the next year as part of overall {sheriff
happiness, development cycle, autoland} work.

>
>
> On Wed, May 31, 2017 at 8:47 AM, Dustin Mitchell <dmitch...@mozilla.com>
> wrote:
>
>> I think this topic is big enough already without broadening it into
>> "how can we make automation better".  But getting some data from the
>> survey sounds great! Maybe it makes sense to get down to the core
>> question we have here:
>>
>> When you push to try, how often do you want:
>>  * to run every job relevant to the changes you have made
>>    [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
>>  * to run a specific job or set of jobs
>>    [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
>> * to run all jobs for one or more platforms
>>    [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
>>
>> Or something like that?
>>
>> 2017-05-30 21:21 GMT-04:00 Mike Hommey <m...@glandium.org>:
>> > On Tue, May 30, 2017 at 05:25:20PM -0700, Gregory Szorc wrote:
>> >> On Thu, May 11, 2017 at 10:05 AM, <dmitch...@mozilla.com> wrote:
>> >>
>> >> > Background:
>> >> >  https://bugzilla.mozilla.org/show_bug.cgi?id=1359942
>> >> >
>> >> > As jobs move to taskcluster, we have an improved opportunity to do
>> some
>> >> > smarter scheduling of what jobs to run on what sort of push.  Of
>> course,
>> >> > it's a thorny subject: optimizing away a task that should run may
>> let a bad
>> >> > push show green, while a subsequent push bears responsibility for the
>> >> > orange it introduces.
>> >> >
>> >> > One of the more common expectations is that pushes that only change a
>> >> > directory affecting one platform should not cause other platforms'
>> tasks to
>> >> > run.
>> >> >
>> >> > In the bug above, I have proposed a method of identifying pushes
>> >> > "affecting" a particular platform, and Greg has raised some concerns
>> about
>> >> > the generality of my solution.  I'm happy to generalize, but I would
>> like
>> >> > to keep the process in motion rather than let the perfect be the
>> enemy of
>> >> > the good.
>> >> >
>> >> > To that end, I'd like some further feedback on implementing this
>> sort of
>> >> > optimization support.
>> >> >
>> >> > If there's sufficient interest, then this is probably something we
>> could
>> >> > set up a time to talk about in SFO in June.
>> >> >
>> >>
>> >> I still owe a proper reply to everything in this thread. But as I'm
>> >> preparing to send out another Firefox developer survey, I'm looking at
>> the
>> >> old one we conducted and there are some results that seemingly justify
>> >> doing work to intelligently run things based on what changed.
>> >>
>> >> One of the questions on the last survey was "Thinking of running
>> automated
>> >> tests, rank the following potential improvements in terms of their
>> impact
>> >> on your productivity." "Determine and run relevant tests based on what
>> >> source files have been modified" was one of the most wanted
>> improvements -
>> >> right up there with "make try runs really fast so I can effectively
>> iterate
>> >> on automated tests using try instead."
>> >
>> > FWIW, I recently added a unit test for Firefox. On try, I essentially
>> > had to run the whole corresponding test suite (browser-chrome), instead
>> > of just the block that contains the test, because it's almost impossible
>> > to figure out which one it's going to run in.
>> >
>> > Making /that/ less painful would go a long way.
>> >
>> > Mike
>> _______________________________________________
>> dev-builds mailing list
>> dev-builds@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-builds
>>
>
>

_______________________________________________
dev-builds mailing list
dev-builds@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-builds

Re: Optimizing what runs on which push

Reply via email to