http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309



Steven Bosscher <steven at gcc dot gnu.org> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

                 CC|                            |steven at gcc dot gnu.org



--- Comment #11 from Steven Bosscher <steven at gcc dot gnu.org> 2013-02-14 
16:59:04 UTC ---

(In reply to comment #8)

> I wonder if instead of emitting this sequence

> 

>    shr    $0x20,%rdi

>    and    $0xffffffff,%ecx

>    cmp    %r8,%rdx

>    cmovbe %r11,%rdi

>    add    $0x1,%rax

>    cmp    %r8,%rdx

>    cmovbe %rdx,%rcx

> 

> it would do this instead

> 

>    shr    $0x20,%rdi

>    and    $0xffffffff,%ecx

>    add    $0x1,%rax

>    cmp    %r8,%rdx

>    cmovbe %r11,%rdi

>    cmovbe %rdx,%rcx



GCC fails to do so because the flags are clobbered between the two

cmovs, preventing code motion to group the two cmovs:



  197: r116:DI={(gtu(flags:CC,0))?r125:DI:r233:DI}

  199: {r110:DI=r110:DI+0x1;clobber flags:CC;}

  201: flags:CC=cmp(r124:DI,r235:DI)

  202: r221:DI={(gtu(flags:CC,0))?r126:DI:r124:DI}



If you do this change manually in your code (compile with -S, "fix"

the .s file and assemble it), does that speed up your code?

Reply via email to