[Bug rtl-optimization/29874] gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6

rguenth at gcc dot gnu.org Tue, 08 Mar 2011 02:03:33 -0800

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29874


--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-08 
10:03:22 UTC ---
I raised the number of FFTs to 10000000 and get

       -O2   -O3   -O3 -ffast-math   -O3 -ffast-math -funroll-loops
3.3-H  7.32  7.47  7.48              7.39
4.1    7.21  7.22  7.18              7.21
4.3    7.21  7.20  7.20              7.34
4.5    7.27  7.27  7.21              7.34
4.6    7.09  7.06  7.01              7.16

I don't have a 64bit 3.4 compiler handy, but 3.3-H is the hammer branch so
should be close to 3.4.

Thus I can't reproduce the slowdown (but I don't have a real 3.4) and 4.6
looks promising here.  The generated code looks quite good, though we still
have some stack spills left (not sure if due to required temporaries).

ICC 12.0 does not manage to come close to the above performance, the
best I found was -fast -xHOST which makes the benchmark take 7.30.

[Bug rtl-optimization/29874] gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6

Reply via email to