Thanks for asking Jan. I think 16% is the maximum we can save. In talking with a few more people, I think a middle of the road proposal would be to: Turn off linux64/windows7/windows10 opt builds+tests on autoland and mozilla-inbound. Leave them on for mozilla-central and try.
What this does is allows for try to be faster as needed, continue to offer peace of mind by running the tests on m-c (and sheriffs can backfill if needed), and removes confusion about building/testing locally vs try. This would be similar to what we already see where many people only test opt on try and land and if a pgo test regresses we would need to backout. Are there any concerns with this latest proposal? On Thu, Jan 17, 2019 at 12:52 PM Jan de Mooij <jdemo...@mozilla.com> wrote: > Hi Joel, > > Can you say more about this point in your original email: "3) This will > reduce the jobs (about 16%) we run which in turn reduces, cpu time, money > spent, turnaround time, intermittents, complexity of the taskgraph." It > seems to me that if we remove non-PGO opt builds even on Try, we might use > more cpu time because there are so many Try pushes requesting opt builds. > Do we have data on this? > > Thanks, > Jan > > On Thu, Jan 17, 2019 at 5:45 PM jmaher <joel.ma...@gmail.com> wrote: > >> Following up on this, thanks to Chris we have fast artifact builds for >> PGO, so the time to develop and use try server is in parity with current >> opt solutions for many cases (front end development, most bisection cases). >> >> I have also looked in depth at what the impact on the integration >> branches would be. In the data set from July-December (H2 2018) there were >> 11 instances of tests that we originally only scheduled in the OPT config >> and we didn't have PGO or Debug test jobs to point out the regression (this >> is due to scheduling choices). Worse case scenario is finding the >> regression on PGO up to 1 hour later 11 times or roughly 2x/month. >> Backfilling to find the offending patch as we do now 24% of the time would >> be similar time. In fact running the OPT jobs on Debug instead would >> result in same time for all 11 instances (due to more chunks on debug and >> similar runtimes). In short, little to no impact. >> >> Lastly there was a pending question about talos. There is an edge case >> where we can see a regression on talos that is PGO, but it is unrelated to >> the code and just a side effect of how PGO works. I looked into that in >> https://bugzilla.mozilla.org/show_bug.cgi?id=1514829. I found that if >> we didn't get opt alerts that we would not have missed any regressions. >> Furthermore, for the regressions, for the ones that were pgo only >> regressions (very rare) there were many other regressions at the same time >> (say a build change, or test change, etc.) and usually these were accepted >> changes, backed out, or investigated on a different test or platform. In >> the past when we have determined a regression is a PGO artifact we have >> resolved it as WONTFIX and moved on. >> >> Given this summary, I feel that most concerns around removing testing for >> OPT are addressed. I would also like to extend the proposal to remove the >> OPT builds since no unit or perf tests would run on there. >> >> As my original timeline is not realistic, I would like to see if there >> are comments until next Wednesday- January 23rd, then I can follow up on >> remaining issues or work towards ensuring we start the process of making >> this happen and what the right timeline is. >> _______________________________________________ >> dev-platform mailing list >> dev-platform@lists.mozilla.org >> https://lists.mozilla.org/listinfo/dev-platform >> > _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform