https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57534
--- Comment #34 from bin cheng <amker at gcc dot gnu.org> --- So we could have three different addressing modes here. 1. What we have now: leaq 0(,%rbp,8), %rax movsd 8(%rbx,%rax), %xmm0 addsd (%rbx,%rbp,8), %xmm0 addq $8, %rbp addsd 16(%rbx,%rax), %xmm0 addsd 24(%rbx,%rax), %xmm0 addsd %xmm0, %xmm1 movsd 32(%rbx,%rax), %xmm0 addsd 40(%rbx,%rax), %xmm0 addsd 48(%rbx,%rax), %xmm0 addsd 56(%rbx,%rax), %xmm0 addsd %xmm0, %xmm2 cmpq %rsi, %rbp 2. GCC-4.7: fldl (%esi,%ebx,8) lea 0x8(%ebx),%eax faddl 0x8(%esi,%ebx,8) cmp %eax,%edi faddl 0x10(%esi,%ebx,8) faddl 0x18(%esi,%ebx,8) faddp %st,%st(2) fldl 0x20(%esi,%ebx,8) faddl 0x28(%esi,%ebx,8) faddl 0x30(%esi,%ebx,8) faddl 0x38(%esi,%ebx,8) faddp %st,%st(1) 3. With slsr change: leaq 0(%rbp,%rbx,8), %rax addq $8, %rbx movsd (%rax), %xmm0 addsd 8(%rax), %xmm0 addsd 16(%rax), %xmm0 addsd 24(%rax), %xmm0 addsd %xmm0, %xmm1 movsd 32(%rax), %xmm0 addsd 40(%rax), %xmm0 addsd 48(%rax), %xmm0 addsd 56(%rax), %xmm0 addsd %xmm0, %xmm2 cmpq %rsi, %rbx This was reported that 2. is better than 1. Also Jeff recommended 3. What I don't understand are: A) why 2. is better than 1.? It seems to have more computations in address. B) Is 3. the best one? It has the simplest addressing mode, but does require one additional lea because of strength reduction. Thanks.