http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
--- Comment #8 from arturomdn at gmail dot com 2013-02-14 15:53:15 UTC --- It is possible (just a guess) that the extra compare is causing an interlock in the processor since the first cmov is issued speculatively and the condition won't be confirmed until the first compare has executed. Someone from Intel could tell us exactly why the original sequence is so disastrous and suggest an alternative that still uses cmov and is better than jmp. I wonder if instead of emitting this sequence shr $0x20,%rdi and $0xffffffff,%ecx cmp %r8,%rdx cmovbe %r11,%rdi add $0x1,%rax cmp %r8,%rdx cmovbe %rdx,%rcx it would do this instead shr $0x20,%rdi and $0xffffffff,%ecx add $0x1,%rax cmp %r8,%rdx cmovbe %r11,%rdi cmovbe %rdx,%rcx