Re: Increase in mozilla-inbound bustage due to people not using Try

Mike Hommey Wed, 15 Aug 2012 23:10:15 -0700

On Wed, Aug 15, 2012 at 09:16:04PM -0400, Ehsan Akhgari wrote:
> On 12-08-15 6:17 PM, William Lachance wrote:
> >On 08/14/2012 03:47 PM, Gregory Szorc wrote:
> >>On 8/14/12 12:14 PM, Ed Morley wrote:
> >>>On Thursday, 9 August 2012 15:35:28 UTC+1, Justin Lebar  wrote:
> >>>>Is there a plan to mitigate the coalescing on m-i?  It seems like that
> >>>>is a big part of the problem.
> >>>
> >>>Reducing the amount of coalescing permitted would just mean we end up
> >>>with a backlog of pending tests on the repo tip - which would result
> >>>in tree closures regardless. So other than bug 690672 making sheriffs'
> >>>lives easier, we just need more machines in the test pool - since it's
> >>>simply a case of demand exceeding capacity.
> >>>
> >>>The situation is made worse now that we're adding new platforms (OS X
> >>>10.7, B2G GB, B2G ICS, Android Armv6, soon OS X 10.8, Win8 desktop,
> >>>Win8 metro) faster than we're EOLing them - and we're pushing more
> >>>changes per day than ever before [1]. From what I understand, Apple's
> >>>aggressive hardware cycle is also making it difficult to expand the
> >>>test pool [2].
> >>
> >>Is there a tracking bug for areas where we could gain efficiency? We all
> >>know the build phase is full of clownshoes. But, I believe we also do
> >>silly things like execute some tests serially, only taking advantage of
> >>1/N CPU cores in the process. This is just wasting resources. See [1]
> >>for a concrete example.
> >
> >Last year we had a buildfaster project to try and improve our end-to-end
> >build/test times:
> >
> >https://wiki.mozilla.org/ReleaseEngineering/BuildFaster
> >
> >I think it's been recently reactivated, I believe mostly with the
> >intention of working on build times (which is important, but only one
> >small part of the overall picture):
> >
> >http://coop.deadsquid.com/2012/07/reviving-buildfaster-fixing-makefiles/
> >
> >In general I would be very careful before tackling any particular bug
> >for the sake of improving our build/test times. If something is slow,
> >but not on the critical path as far as build/test is concerned, fixing
> >it will not result in any tangible improvement.
> >
> >When I was working on this project last year, I designed a build charts
> >view to help visualize which parts were taking the longest (you can see
> >implicit dependencies between build/test tasks by seeing when certain
> >jobs run), which proved very helpful to determine which areas we needed
> >to optimize:
> >
> >http://brasstacks.mozilla.com/gofaster/#/buildcharts
> >
> >I'm not sure if the data feeding into that is still valid (some things
> >like look suspiciously low, and at the very least it doesn't seem
> >completely up to date). Anyway, if I were going to look into this again
> >(don't have time right now unfortunately), I would first spend a lot of
> >time staring at data. :)
> 
> This looks great William.  But looking at how our load has been for
> the past few weeks, I think we're not going to benefit a lot by
> incremental improvements to end-to-end times.
> 
> Honestly, the only big thing that we can probably fix to improve our
> end-to-end times is to enable using pymake on our Windows builders
> to do parallel builds.  Developers on Windows have been using pymake
> to get parallel builds for quite a while now, and somebody needs to
> figure out what's happening on our build machines which causes us
> not to be able to use pymake there, and fix it.  That should
> significantly decrease our Windows build times depending on the
> number of cores available on our Windows builders.
> 
> Any other low hanging fruits that I can think of are all going to be
> small incremental improvements which, although being very nice,
> stand no chance against the rate at which our load is increasing.
> So unfortunately I don't see any way to address the problem that
> we're facing in the short term except for adding hardware.


Something I noticed recently is that we spend more than 5 minutes (!)
during windows clobber builds to do the clobber (rm -rf). All try builds
are clobbers. A lot of time is wasted on mercurial cloning, too.

What is interesting is that the corresponding times are in the order of
seconds on linux and osx. We're just hitting the fact that windows sucks
at I/O.

But maybe we can work around this. At least for rm -rf, instead of
rm -rf'ing before the build, we could move the objdir away so that a
fresh new one is created. The older one could be removed much later.

Mike
_______________________________________________
dev-platform mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-platform

Re: Increase in mozilla-inbound bustage due to people not using Try

Reply via email to