Re: Optimizing what runs on which push

Andrew Halberstadt Wed, 31 May 2017 13:15:07 -0700

On Wed, May 31, 2017 at 1:31 PM, Gregory Szorc <g...@mozilla.com> wrote:


>
>> There is a facility in the tree for tagging tests (at least some suites)
>> and running only tests with certain tags. However, that doesn't help on try
>> very much because you can't tell try to run only a certain set of tags.
>>
>
> Not supporting this is a bug IMO.
>
> Of course, you can argue that the tagging system is a subset or one-off of
> a properly designed "change impacts" system. The difference is the tagging
> system exists today, so it provides end-user benefit today.
>


Actually, you can run tags on try with:

./mach try <syntax> --and --tag <tag1> --tag <tag2>

You can also do test paths:
./mach try <syntax> --and <test path>

The --and takes the intersection of <syntax> and <tag> rather than
the union. I agree that it isn't as intuitive as could be, and there are
probably many edge cases for which it falls apart. The |mach try|
command in general could use a lot of TLC.

On Wed, May 31, 2017 at 1:31 PM, Gregory Szorc <g...@mozilla.com> wrote:

> On Wed, May 31, 2017 at 6:26 AM, Benjamin Smedberg <benja...@smedbergs.us>
> wrote:
>
>> I don't know if I'm the typical use-case, but the big problem for me is
>> that when I change something such as plugins, the jobs as currently
>> bucketed don't help much. There are reftests, crashtests, mochitest-plain,
>> and mochitest-browser-chrome which test plugin code paths, and the
>> splitting means that I pretty much need to run all of those suites on try
>> to get adequate coverage.
>>
>
> Exactly.
>
> FWIW my ideal end state is try syntax is eliminated or relegated to a <5%
> use case because the tools figure out the optimal set of what to run based
> on what changed.
>
>
>>
>> There is a facility in the tree for tagging tests (at least some suites)
>> and running only tests with certain tags. However, that doesn't help on try
>> very much because you can't tell try to run only a certain set of tags.
>>
>
> Not supporting this is a bug IMO.
>
> Of course, you can argue that the tagging system is a subset or one-off of
> a properly designed "change impacts" system. The difference is the tagging
> system exists today, so it provides end-user benefit today.
>
>
>>
>> Your ultimate goal is to save $$, and my ultimate goal is to reduce the
>> time to results to make the developer cycle faster (let's measure
>> round-trip times in minutes instead of hours). I don't believe that running
>> subsets of the current jobs will solve either of those goals. Maybe it's a
>> step along the path, but I don't see how that fits together yet.
>>
>
> I agree we should largely leave money out of the discussion. While the
> cost to run the CI is significant, it is still relatively cheap compared to
> people time (developers cost ~1000x more than many EC2 instances). The
> people time that will be saved from an efficient development cycle will
> dwarf the money savings from reduced platform consumption. And deploying a
> more efficient CI pipeline will naturally reduce operational costs. So we
> should keep focused on the people impact.
>
> That being said, we also need to take care to not drastically increase our
> cost to run CI because it is a non-negligible expense. What's happening now
> is groups like Stylo and Quantum want one-off build configurations. We also
> have "Go Faster" efforts to decouple development of some features from core
> Firefox, leading to more one-off build configurations. Running most of the
> jobs most of the time with N+1 build configurations quickly increases our
> CI operational costs. And more intelligently running things based on what
> changed can keep costs in check, avoiding most discussions about budget,
> value, etc.
>
>
>>
>> Related to this, you need to remember one of the primary functions of try
>> is to validate changes before landing on inbound/autoland, and so reduce
>> the backout rate from sheriffs. Running subsets of tests will increase the
>> backout rate. I think that's probably ok, but we need to be aware of this
>> social/workflow impact as it's not just a technical decision.
>>
>
> Agreed. Anecdotally, I find it more frustrating to be backed out the
> farther down the release pipeline the changeset is. My threshold for
> getting more than inconvenienced (read: mildly frustrated) is when things
> run OK on autoland/inbound then fail on central.
>
> I also agree we should be concerned about sheriff impact. FWIW, we would
> like most backouts to be automated. But this requires a way to identify
> when a changeset is good. This is actually a hard problem. We were planning
> to implement an API on Treeherder to determine this. However, this project
> seemed to have gotten lost as part of recent reorgs. My guess is it will
> surface again sometime in the next year as part of overall {sheriff
> happiness, development cycle, autoland} work.
>
>
>>
>>
>> On Wed, May 31, 2017 at 8:47 AM, Dustin Mitchell <dmitch...@mozilla.com>
>> wrote:
>>
>>> I think this topic is big enough already without broadening it into
>>> "how can we make automation better".  But getting some data from the
>>> survey sounds great! Maybe it makes sense to get down to the core
>>> question we have here:
>>>
>>> When you push to try, how often do you want:
>>>  * to run every job relevant to the changes you have made
>>>    [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
>>>  * to run a specific job or set of jobs
>>>    [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
>>> * to run all jobs for one or more platforms
>>>    [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always]
>>>
>>> Or something like that?
>>>
>>> 2017-05-30 21:21 GMT-04:00 Mike Hommey <m...@glandium.org>:
>>> > On Tue, May 30, 2017 at 05:25:20PM -0700, Gregory Szorc wrote:
>>> >> On Thu, May 11, 2017 at 10:05 AM, <dmitch...@mozilla.com> wrote:
>>> >>
>>> >> > Background:
>>> >> >  https://bugzilla.mozilla.org/show_bug.cgi?id=1359942
>>> >> >
>>> >> > As jobs move to taskcluster, we have an improved opportunity to do
>>> some
>>> >> > smarter scheduling of what jobs to run on what sort of push.  Of
>>> course,
>>> >> > it's a thorny subject: optimizing away a task that should run may
>>> let a bad
>>> >> > push show green, while a subsequent push bears responsibility for
>>> the
>>> >> > orange it introduces.
>>> >> >
>>> >> > One of the more common expectations is that pushes that only change
>>> a
>>> >> > directory affecting one platform should not cause other platforms'
>>> tasks to
>>> >> > run.
>>> >> >
>>> >> > In the bug above, I have proposed a method of identifying pushes
>>> >> > "affecting" a particular platform, and Greg has raised some
>>> concerns about
>>> >> > the generality of my solution.  I'm happy to generalize, but I
>>> would like
>>> >> > to keep the process in motion rather than let the perfect be the
>>> enemy of
>>> >> > the good.
>>> >> >
>>> >> > To that end, I'd like some further feedback on implementing this
>>> sort of
>>> >> > optimization support.
>>> >> >
>>> >> > If there's sufficient interest, then this is probably something we
>>> could
>>> >> > set up a time to talk about in SFO in June.
>>> >> >
>>> >>
>>> >> I still owe a proper reply to everything in this thread. But as I'm
>>> >> preparing to send out another Firefox developer survey, I'm looking
>>> at the
>>> >> old one we conducted and there are some results that seemingly justify
>>> >> doing work to intelligently run things based on what changed.
>>> >>
>>> >> One of the questions on the last survey was "Thinking of running
>>> automated
>>> >> tests, rank the following potential improvements in terms of their
>>> impact
>>> >> on your productivity." "Determine and run relevant tests based on what
>>> >> source files have been modified" was one of the most wanted
>>> improvements -
>>> >> right up there with "make try runs really fast so I can effectively
>>> iterate
>>> >> on automated tests using try instead."
>>> >
>>> > FWIW, I recently added a unit test for Firefox. On try, I essentially
>>> > had to run the whole corresponding test suite (browser-chrome), instead
>>> > of just the block that contains the test, because it's almost
>>> impossible
>>> > to figure out which one it's going to run in.
>>> >
>>> > Making /that/ less painful would go a long way.
>>> >
>>> > Mike
>>> _______________________________________________
>>> dev-builds mailing list
>>> dev-builds@lists.mozilla.org
>>> https://lists.mozilla.org/listinfo/dev-builds
>>>
>>
>>
>
> _______________________________________________
> dev-builds mailing list
> dev-builds@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-builds
>
>

_______________________________________________
dev-builds mailing list
dev-builds@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-builds

Re: Optimizing what runs on which push

Reply via email to