On 01/04/11 14:48, H.J. Lu wrote:
On Tue, Jan 4, 2011 at 4:43 AM, Martin Reinecke
<mar...@mpa-garching.mpg.de> wrote:
Hi,
while benchmarking a numerical C library making heavy use of SSE2
intrinsics, I have noticed a significant (around 10 percent) slowdown
in the code generated by the current gcc trunk, compared to the one
produced by the 4.5.1 release.
It's quite hard to reduce the code to a small test case, but I can easily
point out the hot code regions where most of the CPU time is spent.
Do you think I should open a PR for this, or is this kind of performance
fluctuation to be expected?
What compiler flags are you using? On which processors do you
run the library?
The CPU is a Core2 Duo E8500; the optimization flags are
"-O2 -ffast-math -fomit-frame-pointer".
This is on a 64bit OS, so SSE2 is supported without additional
flags.
Using "-march=native" in addition to the flags above makes the timings
worse for gcc 4.5.1 and slightly better for gcc 4.6, but still the
code generated by 4.5.1 is quite a bit faster.
The trunk version was compiled from yesterday's sources.
Cheers,
Martin