https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468
--- Comment #4 from Jens Seifert ---
clang is emitting extended mnemonics.
On gcc, I only can enforce this by using inline assembly:
unsigned long long parityfast(unsigned long long in)
{
__asm__("popcntd %0,%1":"+r"(in));
return in & 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468
--- Comment #3 from Segher Boessenkool ---
prtyd and popcntb are executed similarly on all hardware: same execution pipes.
The extsw we currently generate is not needed at all, a very common and
well-known issue, generic as well (not really rs60
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468
--- Comment #2 from Jens Seifert ---
popcnt + parity is slower than just
64-bit popcount and extracting last bit.
"missed-optimization" opportunity applies as well to big endian.
Optimal code:
popcntd 3, 3
clrldi 3, 3, 63
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468
Segher Boessenkool changed:
What|Removed |Added
CC||segher at gcc dot gnu.org
--- Comm