On 05/07/2015 06:15 PM, H. Peter Anvin wrote:
> /* This case really should produce good code in both cases */
> 
> void good1(int x, int y)
> {
>   _Bool pf;
> 
>   asm("cmpl %2,%1"
>       : "=@ccp" (pf)
>       : "r" (x), "g" (y));
> 
>   if (pf)
>     beta();
> }
> 
> void bad1(int x, int y)
> {
>   _Bool le, pf;
> 
>   asm("cmpl %3,%2"
>       : "=@ccle" (le), "=@ccp" (pf)
>       : "r" (x), "g" (y));
> 
>   if (le)
>     alpha();
>   else if (pf)
>     beta();
> }

I have a feeling I know why these didn't get merged.

The global optimizers aren't allowed to operate on hard registers lest they
extend the lifetime of the hard register such that it creates an impossible
situation for the register allocator.  Think what would happen if EAX were
suddenly live across the entire function.

Similarly, combine is allowed to merge insns with hard registers if the insns
are sequential.  But if the insns aren't sequential, we're lengthening the
lifetime of the hard register.  Now, I thought this didn't apply to fixed
registers like esp or flags, but perhaps not.

Note what happens if you swap the order of le and pf in the asm:

  asm("cmpl %3,%2" : "=@ccp" (pf), "=@ccle" (le) : "r" (x), "g" (y));

the order of the two setcc insns is reversed, and then the setle is in fact
merged with the branch.

Anyway, I'll look into whether the branch around alpha can be optimized, but
I'd be shocked if I'd be able to do anything about the branch around beta.
True, there's nothing in between that will clobber the flags so it would be an
excellent improvement, but combine doesn't work across basic blocks and
changing that would be a major task.


> /* This case really is too much to ask... */
> 
> _Bool good2(int x, int y)
> {
>   _Bool le;
> 
>   asm("cmpl %2,%1"
>       : "=@ccle" (le)
>       : "r" (x), "g" (y));
> 
>   return le;
> }
> 
> _Bool bad2(int x, int y)
> {
>   _Bool zf, of, sf;
> 
>   asm("cmpl %4,%3"
>       : "=@ccz" (zf), "=@cco" (of), "=@ccs" (sf)
>       : "r" (x), "g" (y));
> 
>   return zf | (sf ^ of);
> }

Haha, yes.

> /* One should expect this shouldn't produce *worse* code than the above... */
> 
> int good3(int x, int y, int a, int b)
> {
>   _Bool le;
> 
>   asm("cmpl %2,%1"
>       : "=@ccle" (le)
>       : "r" (x), "g" (y));
> 
>   return le ? b : a;
> }
> 
> int bad3(int x, int y, int a, int b)
> {
>   _Bool zf, of, sf;
> 
>   asm("cmpl %4,%3"
>       : "=@ccz" (zf), "=@cco" (of), "=@ccs" (sf)
>       : "r" (x), "g" (y));
> 
>   return zf | (sf ^ of) ? b : a;
> }

This is a case of the optimizers thinking they're helping you by not folding
too much computation into a condition.

If you use -mbranch-cost=4 you'll get the cmovne that you expect.


r~

Reply via email to