https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103903
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- If you fix the loop to do for (i=0;i<100000;i++) { dest[i].r/=src[i].g; dest[i].g/=src[i].g; dest[i].b/=src[i].b; } it's vectorized just fine (with larger than necessary VF): .L2: movaps dest+16(%rax), %xmm1 movaps dest+32(%rax), %xmm0 addq $48, %rax divps src-32(%rax), %xmm1 movaps dest-48(%rax), %xmm2 divps src-16(%rax), %xmm0 divps src-48(%rax), %xmm2 movaps %xmm1, dest-32(%rax) movaps %xmm2, dest-48(%rax) movaps %xmm0, dest-16(%rax) cmpq $1200000, %rax jne .L2 so not sure what you are asking for? Is the unrolling harmful? It should be doable to do the "re-rolling" on the fly in some cases but it might be some work to tie that in.