http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #12 from Steven Bosscher <steven at gcc dot gnu.org> 2013-02-14 17:10:02 UTC --- A bit more clear with insn 195 added: 195: flags:CC=cmp(r124:DI,r235:DI) 197: r116:DI={(gtu(flags:CC,0))?r125:DI:r233:DI} 199: {r110:DI=r110:DI+0x1;clobber flags:CC;} 201: flags:CC=cmp(r124:DI,r235:DI) 202: r221:DI={(gtu(flags:CC,0))?r126:DI:r124:DI} insns 195 and 201 compute the same condition but GCC can't eliminate the comparison in insns 201 because insn 199 clobbers the flags (i.e. destroys the result from insn 195). As for speed, of course I can measure that myself: --- by-val-O3.s.orig 2013-02-14 18:06:56.000000000 +0100 +++ by-val-O3.s 2013-02-14 18:07:23.000000000 +0100 @@ -357,9 +357,8 @@ shrq $32, %rdi cmpq %r8, %rdx cmovbe %r11, %rdi - addq $1, %rax - cmpq %r8, %rdx cmovbe %rdx, %rcx + addq $1, %rax cmpq %rbp, %rax movq %rcx, -8(%rsi,%rax,8) jne .L50 unmodified: Took 14.31 seconds total. modified: Took 13.04 seconds total. So re. comment #9: it's not the problem but it'd be a small improvement.