"Andrew Pinski" <pins...@gmail.com> wrote:

> On Sat, May 27, 2023 at 3:54 PM Stefan Kanthak <stefan.kant...@nexgo.de> 
> wrote:

>> Nevertheless GCC fails to optimise code properly:
>>
>> --- .c ---
>> int ispowerof2(unsigned long long argument) {
>>     return __builtin_popcountll(argument) == 1;
>> }
>> --- EOF ---
>>
>> GCC 13.3    gcc -m32 -mpopcnt -O3
>>
>> https://godbolt.org/z/fT7a7jP4e
>> ispowerof2(unsigned long long):
>>         xor     eax, eax
>>         xor     edx, edx
>>         popcnt  eax, [esp+4]
>>         popcnt  edx, [esp+8]
>>         add     eax, edx                 # eax is less than 64!
>>         cmp     eax, 1    ->    dec eax  # 2 bytes shorter
           sete    al
           movzx   eax, al
           ret
>
> dec eax is done for -Os already.
> -O2 means performance, it does not mean decrease size.

But -Os is supposed to optimise for size? REALITY CHECK:

GCC 13.3    gcc -m32 -mpopcnt -Os

https://godbolt.org/z/41Ed6rr6r
ispowerof2(unsigned long long):
        push    ebp
        mov     ebp, esp
        sub     esp, 16
        push    [ebp+12]
        push    [ebp+8]
        call    __popcountdi2
        add     esp, 16        # superfluous!
        leave
        dec     eax
        sete    al
        movzx   eax, al
        ret

While -O3 generates 9 instructions in a total of 24 bytes,
-Os[lowmotion] but generates 12 instructions in 29 bytes,
PLUS the uncounted instructions/bytes of __popcountdi2()!

This is what I call an EPIC FAILURE!

Stefan

Reply via email to