[Bug middle-end/97743] Failure to optimize boolean multiplication to select

pinskia at gcc dot gnu.org via Gcc-bugs Tue, 17 Aug 2021 23:00:43 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97743


--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
What about:
        movzbl  %dil, %eax
        negl    %eax
        andl    $743, %eax

That would be 3 cycles, definitely better than the cmov case.  maybe one cycle
better than the imul case. It all depends on where the argument is coming from
really. If it was a set then we have these three choices:
int fm(int a, int d)
{
    int b = a == d;
    return b * 743;
}
int fc(int a, int d)
{
    return a==d ? 0 : 743;
}
int fand(int a, int d)
{
    int b = a == d;
    int t = -(b&1);
    return t & 743;
}

Producing:
fm:
        xorl    %eax, %eax
        cmpl    %esi, %edi
        sete    %al
        imull   $743, %eax, %eax
        ret
fc:
        xorl    %eax, %eax
        movl    $743, %edx
        cmpl    %esi, %edi
        cmovne  %edx, %eax
        ret
fand:
        xorl    %eax, %eax
        cmpl    %esi, %edi
        sete    %al
        negl    %eax
        andl    $743, %eax
        ret

For aarch64 we get:
fm:
        cmp     w0, w1
        mov     w0, 743
        cset    w1, eq
        mul     w0, w1, w0
        ret
fc:
        cmp     w0, w1
        mov     w0, 743
        csel    w0, w0, wzr, ne
        ret
fand:
        cmp     w0, w1
        mov     w0, 743
        csetm   w1, eq
        and     w0, w1, w0
        ret

For aarch64, the csel is faster as cset is just a csel but with increment.

For x86, It all depends on the cmov performance, I do think the neg/and case is
still faster than the imull case.

[Bug middle-end/97743] Failure to optimize boolean multiplication to select

Reply via email to