------- Comment #2 from jv244 at cam dot ac dot uk 2006-01-01 18:14 ------- (In reply to comment #1) > What happens if you use -funroll-loops? It should get about the same > improvement.
I have the following timings (for N=1024, calling these subroutines a number of times+some external initialisation) -O2 -ffast-math -funroll-loops S31 S32 0.0229959786 0.0119980276 -O2 -ffast-math 0.0229960084 0.0119979978 I think the issue is not pure unrolling but the fact that you have two independent sums in the loop In fact, I now find that -O2 -ffast-math -funroll-loops -ftree-loop-ivcanon -fivopts -fvariable-expansion-in-unroller yields much improved code: 0.0119979978 0.0079990029 The last option indeed seems to do what I did by hand, still the routine S32 seems about 30% faster. > Also your two loops not equal if N is old. I've added at least the comment ;-) ! assume N is even -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25621