http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29874

--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-08 
10:03:22 UTC ---
I raised the number of FFTs to 10000000 and get

       -O2   -O3   -O3 -ffast-math   -O3 -ffast-math -funroll-loops
3.3-H  7.32  7.47  7.48              7.39
4.1    7.21  7.22  7.18              7.21
4.3    7.21  7.20  7.20              7.34
4.5    7.27  7.27  7.21              7.34
4.6    7.09  7.06  7.01              7.16

I don't have a 64bit 3.4 compiler handy, but 3.3-H is the hammer branch so
should be close to 3.4.

Thus I can't reproduce the slowdown (but I don't have a real 3.4) and 4.6
looks promising here.  The generated code looks quite good, though we still
have some stack spills left (not sure if due to required temporaries).

ICC 12.0 does not manage to come close to the above performance, the
best I found was -fast -xHOST which makes the benchmark take 7.30.

Reply via email to