On Wed, May 31, 2017 at 1:31 PM, Gregory Szorc <g...@mozilla.com> wrote:
> >> There is a facility in the tree for tagging tests (at least some suites) >> and running only tests with certain tags. However, that doesn't help on try >> very much because you can't tell try to run only a certain set of tags. >> > > Not supporting this is a bug IMO. > > Of course, you can argue that the tagging system is a subset or one-off of > a properly designed "change impacts" system. The difference is the tagging > system exists today, so it provides end-user benefit today. > Actually, you can run tags on try with: ./mach try <syntax> --and --tag <tag1> --tag <tag2> You can also do test paths: ./mach try <syntax> --and <test path> The --and takes the intersection of <syntax> and <tag> rather than the union. I agree that it isn't as intuitive as could be, and there are probably many edge cases for which it falls apart. The |mach try| command in general could use a lot of TLC. On Wed, May 31, 2017 at 1:31 PM, Gregory Szorc <g...@mozilla.com> wrote: > On Wed, May 31, 2017 at 6:26 AM, Benjamin Smedberg <benja...@smedbergs.us> > wrote: > >> I don't know if I'm the typical use-case, but the big problem for me is >> that when I change something such as plugins, the jobs as currently >> bucketed don't help much. There are reftests, crashtests, mochitest-plain, >> and mochitest-browser-chrome which test plugin code paths, and the >> splitting means that I pretty much need to run all of those suites on try >> to get adequate coverage. >> > > Exactly. > > FWIW my ideal end state is try syntax is eliminated or relegated to a <5% > use case because the tools figure out the optimal set of what to run based > on what changed. > > >> >> There is a facility in the tree for tagging tests (at least some suites) >> and running only tests with certain tags. However, that doesn't help on try >> very much because you can't tell try to run only a certain set of tags. >> > > Not supporting this is a bug IMO. > > Of course, you can argue that the tagging system is a subset or one-off of > a properly designed "change impacts" system. The difference is the tagging > system exists today, so it provides end-user benefit today. > > >> >> Your ultimate goal is to save $$, and my ultimate goal is to reduce the >> time to results to make the developer cycle faster (let's measure >> round-trip times in minutes instead of hours). I don't believe that running >> subsets of the current jobs will solve either of those goals. Maybe it's a >> step along the path, but I don't see how that fits together yet. >> > > I agree we should largely leave money out of the discussion. While the > cost to run the CI is significant, it is still relatively cheap compared to > people time (developers cost ~1000x more than many EC2 instances). The > people time that will be saved from an efficient development cycle will > dwarf the money savings from reduced platform consumption. And deploying a > more efficient CI pipeline will naturally reduce operational costs. So we > should keep focused on the people impact. > > That being said, we also need to take care to not drastically increase our > cost to run CI because it is a non-negligible expense. What's happening now > is groups like Stylo and Quantum want one-off build configurations. We also > have "Go Faster" efforts to decouple development of some features from core > Firefox, leading to more one-off build configurations. Running most of the > jobs most of the time with N+1 build configurations quickly increases our > CI operational costs. And more intelligently running things based on what > changed can keep costs in check, avoiding most discussions about budget, > value, etc. > > >> >> Related to this, you need to remember one of the primary functions of try >> is to validate changes before landing on inbound/autoland, and so reduce >> the backout rate from sheriffs. Running subsets of tests will increase the >> backout rate. I think that's probably ok, but we need to be aware of this >> social/workflow impact as it's not just a technical decision. >> > > Agreed. Anecdotally, I find it more frustrating to be backed out the > farther down the release pipeline the changeset is. My threshold for > getting more than inconvenienced (read: mildly frustrated) is when things > run OK on autoland/inbound then fail on central. > > I also agree we should be concerned about sheriff impact. FWIW, we would > like most backouts to be automated. But this requires a way to identify > when a changeset is good. This is actually a hard problem. We were planning > to implement an API on Treeherder to determine this. However, this project > seemed to have gotten lost as part of recent reorgs. My guess is it will > surface again sometime in the next year as part of overall {sheriff > happiness, development cycle, autoland} work. > > >> >> >> On Wed, May 31, 2017 at 8:47 AM, Dustin Mitchell <dmitch...@mozilla.com> >> wrote: >> >>> I think this topic is big enough already without broadening it into >>> "how can we make automation better". But getting some data from the >>> survey sounds great! Maybe it makes sense to get down to the core >>> question we have here: >>> >>> When you push to try, how often do you want: >>> * to run every job relevant to the changes you have made >>> [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always] >>> * to run a specific job or set of jobs >>> [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always] >>> * to run all jobs for one or more platforms >>> [ ] never [ ] rarely [ ] sometimes [ ] often [ ] always] >>> >>> Or something like that? >>> >>> 2017-05-30 21:21 GMT-04:00 Mike Hommey <m...@glandium.org>: >>> > On Tue, May 30, 2017 at 05:25:20PM -0700, Gregory Szorc wrote: >>> >> On Thu, May 11, 2017 at 10:05 AM, <dmitch...@mozilla.com> wrote: >>> >> >>> >> > Background: >>> >> > https://bugzilla.mozilla.org/show_bug.cgi?id=1359942 >>> >> > >>> >> > As jobs move to taskcluster, we have an improved opportunity to do >>> some >>> >> > smarter scheduling of what jobs to run on what sort of push. Of >>> course, >>> >> > it's a thorny subject: optimizing away a task that should run may >>> let a bad >>> >> > push show green, while a subsequent push bears responsibility for >>> the >>> >> > orange it introduces. >>> >> > >>> >> > One of the more common expectations is that pushes that only change >>> a >>> >> > directory affecting one platform should not cause other platforms' >>> tasks to >>> >> > run. >>> >> > >>> >> > In the bug above, I have proposed a method of identifying pushes >>> >> > "affecting" a particular platform, and Greg has raised some >>> concerns about >>> >> > the generality of my solution. I'm happy to generalize, but I >>> would like >>> >> > to keep the process in motion rather than let the perfect be the >>> enemy of >>> >> > the good. >>> >> > >>> >> > To that end, I'd like some further feedback on implementing this >>> sort of >>> >> > optimization support. >>> >> > >>> >> > If there's sufficient interest, then this is probably something we >>> could >>> >> > set up a time to talk about in SFO in June. >>> >> > >>> >> >>> >> I still owe a proper reply to everything in this thread. But as I'm >>> >> preparing to send out another Firefox developer survey, I'm looking >>> at the >>> >> old one we conducted and there are some results that seemingly justify >>> >> doing work to intelligently run things based on what changed. >>> >> >>> >> One of the questions on the last survey was "Thinking of running >>> automated >>> >> tests, rank the following potential improvements in terms of their >>> impact >>> >> on your productivity." "Determine and run relevant tests based on what >>> >> source files have been modified" was one of the most wanted >>> improvements - >>> >> right up there with "make try runs really fast so I can effectively >>> iterate >>> >> on automated tests using try instead." >>> > >>> > FWIW, I recently added a unit test for Firefox. On try, I essentially >>> > had to run the whole corresponding test suite (browser-chrome), instead >>> > of just the block that contains the test, because it's almost >>> impossible >>> > to figure out which one it's going to run in. >>> > >>> > Making /that/ less painful would go a long way. >>> > >>> > Mike >>> _______________________________________________ >>> dev-builds mailing list >>> dev-builds@lists.mozilla.org >>> https://lists.mozilla.org/listinfo/dev-builds >>> >> >> > > _______________________________________________ > dev-builds mailing list > dev-builds@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-builds > >
_______________________________________________ dev-builds mailing list dev-builds@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-builds