--- .c --- int ispowerof2(unsigned long long argument) { return __builtin_popcountll(argument) == 1; } --- EOF ---
GCC 13.3 gcc -m32 -march=alderlake -O3 gcc -m32 -march=sapphirerapids -O3 gcc -m32 -mpopcnt -mtune=sapphirerapids -O3 https://gcc.godbolt.org/z/cToYrrYPq ispowerof2(unsigned long long): xor eax, eax # superfluous xor edx, edx # superfluous popcnt eax, [esp+4] popcnt edx, [esp+8] add eax, edx cmp eax, 1 -> dec eax sete al movzx eax, al # superfluous ret 9 instructions in 28 bytes # 6 instructions in 20 bytes OUCH: popcnt writes the WHOLE result register, there is ABSOLUTELY no need to clear it beforehand nor to clear the higher 24 bits afterwards! JFTR: before GCC zealots write nonsense: see -march= or -mtune= GCC 13.3 gcc -mpopcnt -mtune=barcelona -O3 https://gcc.godbolt.org/z/3Ks8vh7a6 ispowerof2(unsigned long long): popcnt rdi, rdi -> popcnt rax, rdi xor eax, eax # superfluous! dec edi -> dec eax sete al -> setz al ret GCC 13.3 gcc -m32 -mpopcnt -mtune=barcelona -O3 https://gcc.godbolt.org/z/s5s5KTGnv ispowerof2(unsigned long long): popcnt eax, [esp+4] popcnt edx, [esp+8] add eax, edx dec eax sete al movzx eax, al # superfluous! ret Will GCC eventually generate properly optimised code instead of bloat? Stefan