"Andrew Pinski" <pins...@gmail.com> wrote: > On Sat, May 27, 2023 at 3:54 PM Stefan Kanthak <stefan.kant...@nexgo.de> > wrote:
>> Nevertheless GCC fails to optimise code properly: >> >> --- .c --- >> int ispowerof2(unsigned long long argument) { >> return __builtin_popcountll(argument) == 1; >> } >> --- EOF --- >> >> GCC 13.3 gcc -m32 -mpopcnt -O3 >> >> https://godbolt.org/z/fT7a7jP4e >> ispowerof2(unsigned long long): >> xor eax, eax >> xor edx, edx >> popcnt eax, [esp+4] >> popcnt edx, [esp+8] >> add eax, edx # eax is less than 64! >> cmp eax, 1 -> dec eax # 2 bytes shorter sete al movzx eax, al ret > > dec eax is done for -Os already. > -O2 means performance, it does not mean decrease size. But -Os is supposed to optimise for size? REALITY CHECK: GCC 13.3 gcc -m32 -mpopcnt -Os https://godbolt.org/z/41Ed6rr6r ispowerof2(unsigned long long): push ebp mov ebp, esp sub esp, 16 push [ebp+12] push [ebp+8] call __popcountdi2 add esp, 16 # superfluous! leave dec eax sete al movzx eax, al ret While -O3 generates 9 instructions in a total of 24 bytes, -Os[lowmotion] but generates 12 instructions in 29 bytes, PLUS the uncounted instructions/bytes of __popcountdi2()! This is what I call an EPIC FAILURE! Stefan