http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #16 from arturomdn at gmail dot com 2013-02-14 17:42:55 UTC --- With -ftree-vectorize -fno-tree-loop-if-convert flags it generated this for the loop in question: .L39: movq %rdi, %rdx addq (%rsi,%rax,8), %rcx imulq (%r9,%rax,8), %rdx addq %rcx, %rdx xorl %ecx, %ecx cmpq %r10, %rdx jbe .L38 movq %rdx, %rcx andl $4294967295, %edx shrq $32, %rcx .L38: addq $1, %rax cmpq %r8, %rax movq %rdx, -8(%rsi,%rax,8) jne .L39 And it executed fast: ./by-val-O3-flags Took 6.74 seconds total.