http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309



--- Comment #8 from arturomdn at gmail dot com 2013-02-14 15:53:15 UTC ---

It is possible (just a guess) that the extra compare is causing an interlock in

the processor since the first cmov is issued speculatively and the condition

won't be confirmed until the first compare has executed.  Someone from Intel

could tell us exactly why the original sequence is so disastrous and suggest an

alternative that still uses cmov and is better than jmp.



I wonder if instead of emitting this sequence



   shr    $0x20,%rdi                                                            

   and    $0xffffffff,%ecx                                                      

   cmp    %r8,%rdx                                                              

   cmovbe %r11,%rdi                                                             

   add    $0x1,%rax                                                             

   cmp    %r8,%rdx                                                              

   cmovbe %rdx,%rcx                                                             



it would do this instead



   shr    $0x20,%rdi                                                            

   and    $0xffffffff,%ecx                                                      

   add    $0x1,%rax                                                             

   cmp    %r8,%rdx                                                              

   cmovbe %r11,%rdi                                                             

   cmovbe %rdx,%rcx

Reply via email to