On 2013-04-13 4:28 AM, Mike Hommey wrote:
Hi,
For almost three months now, we've had graphs following the amount of
memory used by the linker on Windows builders during PGO builds. The
result can be seen here:
http://graphs.mozilla.org/graph.html#tests=[[205,63,8]]&sel=none&displayrange=90&datatype=running
/me shivers
The first thing to notice in here is the 13 spikes down. The last one is
bug 860371. I wasn't aware of any of the 12 others. It might be worth
looking into them to understand why they happen. Interestingly, my
dev-tree-management archive doesn't show any notification for these
(except for the last one), nor for any of the progressive "regressions".
This made me curious. I went ahead and looked at 4 of those spikes.
They included the following code changes:
<http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=0531bbbb0ee1&tochange=459afca0e391>
<http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d75e34da1a9f&tochange=336b6586074e>
<http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=7ac3f76249e7&tochange=d764382ed4cf>
<http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=30b977b2b911&tochange=4d7684259549>
None of them contained any obvious reasons why there should be such a
significant memory usage difference when linking PGO builds. *But*,
there was something common in all four of them, they _all_ included
changes to the build system which would cause most of the tree to be
rebuilt, respectively:
configure.in change:
<http://hg.mozilla.org/integration/mozilla-inbound/rev/2a450024df4e>
removal of empty Makefile.in's:
<http://hg.mozilla.org/integration/mozilla-inbound/rev/336b6586074e>
configure.in change:
<http://hg.mozilla.org/integration/mozilla-inbound/rev/d764382ed4cf>
configure.in change:
<http://hg.mozilla.org/integration/mozilla-inbound/rev/cf75954e488f>
Is this merely a correlation?
The second thing to notice is the graph starts a little over 3.2GB and
ends a little below 3.6GB, for a 360MB growth in less than three months.
At this pace, we'll run out of address space around june or july.
So, it's this time of year again. But for once, we can get things in order
before they blow up, not when it's too late and we have to rush things.
And as bug 860371 reminded us (look for recent massive regressions accross
the board on dev-tree-management), PGO is a big deal. Note bug 860371
only removed the data used by the compiler during PGO builds, so link
time code generation was still happening. But we already knew LTCG alone
wasn't much of a win.
We need to look back (and looking around the times where the graph jumps
up might be good starting points) and see what can be accounted for this
growth. I suspect part of it is due to newly imported code. Possibly, this
new code might not need to be PGOed. There may be other areas that can
be unPGOed without much of an impact, like we did last time.
Are there any volunteers?
If nobody volunteers, I guess we're going to have to volunteer somebody.
;-) We can't go ahead and ignore this...
I think we need to start thinking how to make PGO opt-in instead of
opt-out, while keeping performance where it is now.
Doing that would make sense to me.
We also need to ensure we do get regression notifications on
dev-tree-management. If I hadn't looked at the graph after bug 860371
blew things up, I wouldn't have noticed we were getting in the dangerous
zone again.
Good idea, is there a bug on file for that?
Cheers,
Ehsan
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform