https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468
--- Comment #2 from Jens Seifert <jens.seifert at de dot ibm.com> --- popcnt + parity is slower than just 64-bit popcount and extracting last bit. "missed-optimization" opportunity applies as well to big endian. Optimal code: popcntd 3, 3 clrldi 3, 3, 63 blr current code: popcntb 3,3 prtyd 3,3 extsw 3,3 blr prtyd has longer latency than clrldi.