Thanks for asking Jan.  I think 16% is the maximum we can save.  In talking
with a few more people, I think a middle of the road proposal would be to:
Turn off linux64/windows7/windows10 opt builds+tests on autoland and
mozilla-inbound.  Leave them on for mozilla-central and try.

What this does is allows for try to be faster as needed, continue to offer
peace of mind by running the tests on m-c (and sheriffs can backfill if
needed), and removes confusion about building/testing locally vs try.  This
would be similar to what we already see where many people only test opt on
try and land and if a pgo test regresses we would need to backout.

Are there any concerns with this latest proposal?


On Thu, Jan 17, 2019 at 12:52 PM Jan de Mooij <jdemo...@mozilla.com> wrote:

> Hi Joel,
>
> Can you say more about this point in your original email: "3) This will
> reduce the jobs (about 16%) we run which in turn reduces, cpu time, money
> spent, turnaround time, intermittents, complexity of the taskgraph." It
> seems to me that if we remove non-PGO opt builds even on Try, we might use
> more cpu time because there are so many Try pushes requesting opt builds.
> Do we have data on this?
>
> Thanks,
> Jan
>
> On Thu, Jan 17, 2019 at 5:45 PM jmaher <joel.ma...@gmail.com> wrote:
>
>> Following up on this, thanks to Chris we have fast artifact builds for
>> PGO, so the time to develop and use try server is in parity with current
>> opt solutions for many cases (front end development, most bisection cases).
>>
>> I have also looked in depth at what the impact on the integration
>> branches would be.  In the data set from July-December (H2 2018) there were
>> 11 instances of tests that we originally only scheduled in the OPT config
>> and we didn't have PGO or Debug test jobs to point out the regression (this
>> is due to scheduling choices).  Worse case scenario is finding the
>> regression on PGO up to 1 hour later 11 times or roughly 2x/month.
>> Backfilling to find the offending patch as we do now 24% of the time would
>> be similar time.  In fact running the OPT jobs on Debug instead would
>> result in same time for all 11 instances (due to more chunks on debug and
>> similar runtimes).  In short, little to no impact.
>>
>> Lastly there was a pending question about talos.  There is an edge case
>> where we can see a regression on talos that is PGO, but it is unrelated to
>> the code and just a side effect of how PGO works.  I looked into that in
>> https://bugzilla.mozilla.org/show_bug.cgi?id=1514829.  I found that if
>> we didn't get opt alerts that we would not have missed any regressions.
>> Furthermore, for the regressions, for the ones that were pgo only
>> regressions (very rare) there were many other regressions at the same time
>> (say a build change, or test change, etc.) and usually these were accepted
>> changes, backed out, or investigated on a different test or platform.  In
>> the past when we have determined a regression is a PGO artifact we have
>> resolved it as WONTFIX and moved on.
>>
>> Given this summary, I feel that most concerns around removing testing for
>> OPT are addressed.  I would also like to extend the proposal to remove the
>> OPT builds since no unit or perf tests would run on there.
>>
>> As my original timeline is not realistic, I would like to see if there
>> are comments until next Wednesday- January 23rd, then I can follow up on
>> remaining issues or work towards ensuring we start the process of making
>> this happen and what the right timeline is.
>> _______________________________________________
>> dev-platform mailing list
>> dev-platform@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-platform
>>
>
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to