http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499
--- Comment #7 from fb.programming at gmail dot com 2011-12-11 14:55:13 UTC --- (In reply to comment #5) > > (3) If I change all double's into float's in the code above it seems to > I think you are looking at the scalar epilogue. The number of iterations is > unknown, so we need an epilogue loop for the case that number of iterations is > not a multiple of 4. Yes you're right. Sorry about that, my mistake. > > (1) In this case it should work without -funsafe-math-optimizations but > > it doesn't. gcc 4.7 requires -fno-signed-zeros -fno-trapping-math > > -fassociative-math to make it work. > > > > It's reduction, when we vectorize we change the order of computation. In order > to be able to do that for floating point we need flag_associative_math. In some cases it might be necessary but not here: sum1+=a; sum2+=a; gives exactly the same result as (sum1, sum2) += (a, a); Lets take a more applied example, say calculating the sum of 1/i: double harmon(int n) { double sum=0.0; for(int i=1; i<n; i++){ sum += 1.0/i; } return sum; } This requires reordering of the sum to be vectorized, so in this case I agree we need -funsafe-math-optimizations. However, one could manually split the sum double harmon(int n) { assert(n%2==0); double sum1=0.0, sum2=0.0; for(int i=1; i<n; i+=2){ sum1 += 1.0/i; sum2 += 1.0/(i+1); } return sum1+sum2; } and now I'd expect the compiler to vectorize this without -funsafe-math-optimizations as it doesn't change any computational results: (sum1, sum2) += (1.0/i, 1.0/(i+1)); I can attach a test case with that example if that'd be useful?