Re: Function alignment and benchmark results

2012-08-23 Thread Matthew Gretton-Dann
Michael,

On 23 August 2012 03:09, Michael Hope  wrote:
> Zhenqiang's been working on the later split 2 patch which causes more
> constants to be built using a movw/movt instead of a constant pool
> load.  There was an unexpected ~10 % regression in one benchmark which
> seems to be due to function alignment.  I think we've tracked down the
> reason but not the action.
>
> Compared to the baseline, the split2 branch took 113 % of the time to
> run, i.e. 13 % longer.  Adding an explicit 16 byte alignment to the
> function changed this to 97 % of the time, i.e. 3 % faster.  The
> reason Zhenqiang and I got different results was the build-id.  He
> used the binary build scripts to make the cross compiler, which turn
> on the build ID, which added an extra 20 bytes ahead of .text, which
> happened to align the function to 16 bytes.  cbuild doesn't use the
> build-id (although it should) which happened to align the function to
> an 8 byte boundary.
>
> The disassembly is identical so I assume the regression is cache or
> fast loop related.  I'm not sure what to do, so let's talk about this
> at the next performance call.

I've made a note in the agenda for the performance call, but here are
some quick notes/questions that come to my mind:

My guesses would include cache alignment and wide Thumb-2 instructions
straddling cache-line boundaries changing core performance.

My thoughts on further investigations would be - does the function
need to be aligned, or is it a hot loop in the function?  Can we
manually alter the code to choose different instructions to make sure
there are none that straddle a cache-line boundary - and if so what
happens as we change alignment?

If it is code (either function or loop) alignment and not instruction
sizes that are the issue then we can probably do something about it.
If it is instruction sizes then we need to work out a way to mitigate
the effects - as GCC doesn't have precise instruction size knowledge.

Thanks,

Matt

-- 
Matthew Gretton-Dann
Linaro Toolchain Working Group
matthew.gretton-d...@linaro.org

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Testing builds via cbuild

2012-08-23 Thread Michael Hope
(cc'd to linaro-toolchain to archive)

Hi Matt.  I've had a look at the manual builds you tried to spawn.
Here's what I did to run cbuild locally to test it:
 * cd linaro
 * bzr branch lp:cbuild
 * cd cbuild/slaves
 * cp -a example `hostname`
 * cd `hostname`
 * make -f ../../lib/build.mk final/gcc-4.8+svn190558.stamp

I'm not proud of the whole 'slaves/$hostname' setup, but it is what it is.

Results are in $version, such as gcc-4.8+svn190558.  The build tree is
in $version/gcc/default/build.  Nuke using rm -rf final results
$version.  See lib/common.mk and override the email address etc in
local.mk to stop build results going out You should put a proxy in
$http_proxy or ~/.wgetrc to reduce the download cost.  Please add to
the cbuild README.

A nit: we use the Debian versioning scheme so the version should be
4.8~svn190558, i.e. a SVN checkout leading up to 4.8.  Compare with
4.7+svn1234, which is a SVN checkout of the 4.7 branch.

The tarballs look generally good.  I've spawned them into the a9hf
queue as that's what we benchmark on.  Note that they're low priority
due to not having 'linaro' in the name.

-- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain