------- Comment #5 from rguenth at gcc dot gnu dot org 2009-05-16 11:20 ------- With that patch and
-O3 -ffast-math -funroll-loops -mfpmath=sse -msse2 --param max-completely-peel-times=27 --param max-completely-peeled-insns=1500 --params for allow unrolling of all innermost loops I get ./test Sparse: time[s] 0.68804300 New: time[s] 0.40802497 speedup 1.6862767 Glfops 1.5881380 Error: 1.11022302462515654E-016 which isn't too bad. The rest of the difference might be attributed to unfortunate scheduling or that multiplication thing (PRE skips a lot of multiplication hoisting opportunities because they look like induction variables though LIM later hoists them). With the above flags and -fno-ivopts (looking what dump stuff it does again...) I get ./test Sparse: time[s] 0.58003598 New: time[s] 0.62803900 speedup 0.92356682 Glfops 1.0317831 Error: 1.11022302462515654E-016 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40168