[Bug target/119468] PPCLE: Inefficient implementation of __builtin_parityll

2025-04-09 Thread jens.seifert at de dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468 --- Comment #4 from Jens Seifert --- clang is emitting extended mnemonics. On gcc, I only can enforce this by using inline assembly: unsigned long long parityfast(unsigned long long in) { __asm__("popcntd %0,%1":"+r"(in)); return in & 1

[Bug target/119468] PPCLE: Inefficient implementation of __builtin_parityll

2025-04-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468 --- Comment #3 from Segher Boessenkool --- prtyd and popcntb are executed similarly on all hardware: same execution pipes. The extsw we currently generate is not needed at all, a very common and well-known issue, generic as well (not really rs60

[Bug target/119468] PPCLE: Inefficient implementation of __builtin_parityll

2025-04-09 Thread jens.seifert at de dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468 --- Comment #2 from Jens Seifert --- popcnt + parity is slower than just 64-bit popcount and extracting last bit. "missed-optimization" opportunity applies as well to big endian. Optimal code: popcntd 3, 3 clrldi 3, 3, 63

[Bug target/119468] PPCLE: Inefficient implementation of __builtin_parityll

2025-04-09 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comm