http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48297

           Summary: Suboptimal optimization of boolean expression addition
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: darkshik...@gmail.com


gcc does not seem to recognize that boolean expressions, such as (a==b), are at
most 1.  At least not when performing math on them.

Take the following function:

int foo( int a, int b, int c, int x )
{
    return (a==x)+(b==x)+(c==x);
}

With -O3 -fomit-frame-pointer -march=core2 on x86_64, gcc compiles this to:

   0:   39 cf                   cmp    edi,ecx
   2:   40 0f 94 c7             sete   dil
   6:   31 c0                   xor    eax,eax
   8:   39 ce                   cmp    esi,ecx
   a:   40 0f b6 ff             movzx  edi,dil
   e:   0f 94 c0                sete   al
  11:   01 f8                   add    eax,edi
  13:   39 ca                   cmp    edx,ecx
  15:   0f 94 c2                sete   dl
  18:   0f b6 d2                movzx  edx,dl
  1b:   01 d0                   add    eax,edx
  1d:   c3                      ret

As can be seen, gcc extends the inputs to 32-bit (with xor or movzx) and uses
32-bit additions, even though it could avoid this by using 8-bit additions and
a single movzx at the end.

Let's try to force it:

int bar( int a, int b, int c, int x )
{
    return
(uint8_t)((uint8_t)((uint8_t)(a==x)+(uint8_t)(b==x))+(uint8_t)(c==x));
}

0000000000000020 <bar>:
  20:   39 ce                   cmp    esi,ecx
  22:   40 0f 94 c6             sete   sil
  26:   39 cf                   cmp    edi,ecx
  28:   0f 94 c0                sete   al
  2b:   01 f0                   add    eax,esi
  2d:   39 ca                   cmp    edx,ecx
  2f:   0f 94 c2                sete   dl
  32:   01 d0                   add    eax,edx
  34:   0f b6 c0                movzx  eax,al
  37:   c3                      ret

Closer -- we saved two instructions -- but this is actually worse: the
additions will have false dependencies on the previous contents of those
registers.  What we WANT is this:

cmp    esi,ecx
sete   sil
cmp    edi,ecx
sete   al
add    al,sil
cmp    edx,ecx
sete   dl
add    al,dl
movzx  eax,al
ret

But gcc won't generate it no matter how much I attempt to massage it into doing
so.

Is this a missing optimization or an optimization bug?

Reply via email to