http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309
Steven Bosscher <steven at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |steven at gcc dot gnu.org --- Comment #11 from Steven Bosscher <steven at gcc dot gnu.org> 2013-02-14 16:59:04 UTC --- (In reply to comment #8) > I wonder if instead of emitting this sequence > > shr $0x20,%rdi > and $0xffffffff,%ecx > cmp %r8,%rdx > cmovbe %r11,%rdi > add $0x1,%rax > cmp %r8,%rdx > cmovbe %rdx,%rcx > > it would do this instead > > shr $0x20,%rdi > and $0xffffffff,%ecx > add $0x1,%rax > cmp %r8,%rdx > cmovbe %r11,%rdi > cmovbe %rdx,%rcx GCC fails to do so because the flags are clobbered between the two cmovs, preventing code motion to group the two cmovs: 197: r116:DI={(gtu(flags:CC,0))?r125:DI:r233:DI} 199: {r110:DI=r110:DI+0x1;clobber flags:CC;} 201: flags:CC=cmp(r124:DI,r235:DI) 202: r221:DI={(gtu(flags:CC,0))?r126:DI:r124:DI} If you do this change manually in your code (compile with -S, "fix" the .s file and assemble it), does that speed up your code?