"Andrew Pinski" <pins...@gmail.com> wrote: > On Sat, May 27, 2023 at 2:25 PM Stefan Kanthak <stefan.kant...@nexgo.de> > wrote: >> >> Just to show how SLOPPY, INCONSEQUENTIAL and INCOMPETENT GCC's developers >> are: >> >> --- dontcare.c --- >> int ispowerof2(unsigned __int128 argument) { >> return __builtin_popcountll(argument) + __builtin_popcountll(argument >> >> 64) == 1; >> } >> --- EOF --- >> >> GCC 13.3 gcc -march=haswell -O3 >> >> https://gcc.godbolt.org/z/PPzYsPzMc >> ispowerof2(unsigned __int128): >> popcnt rdi, rdi >> popcnt rsi, rsi >> add esi, edi >> xor eax, eax >> cmp esi, 1 >> sete al >> ret >> >> OOPS: what about Intel's CPU errata regarding the false dependency on >> POPCNTs output? > > Because the popcount is going to the same register, there is no false > dependency .... > The false dependency errata only applies if the result of the popcnt > is going to a different register, the processor thinks it depends on > the result in that register from a previous instruction but it does > not (which is why it is called a false dependency). In this case it > actually does depend on the previous result since the input is the > same as the input.
OUCH, my fault; sorry for the confusion and the wrong accusation. Nevertheless GCC fails to optimise code properly: --- .c --- int ispowerof2(unsigned long long argument) { return __builtin_popcountll(argument) == 1; } --- EOF --- GCC 13.3 gcc -m32 -mpopcnt -O3 https://godbolt.org/z/fT7a7jP4e ispowerof2(unsigned long long): xor eax, eax xor edx, edx popcnt eax, [esp+4] popcnt edx, [esp+8] add eax, edx # eax is less than 64! cmp eax, 1 -> dec eax # 2 bytes shorter sete al movzx eax, al # superfluous ret 5 bytes and 1 instruction saved; 5 bytes here and there accumulate to kilo- or even megabytes, and they can extend code to cross a cache line or a 16-byte alignment boundary. JFTR: same for "__builtin_popcount(argument) == 1;" and 32-bit argument JFTR: GCC is notorious for generating superfluous MOVZX instructions where its optimiser SHOULD be able see that the value is already less than 256! Stefan