https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119912
Bug ID: 119912 Summary: PPC: Inefficient vector immediate shifts Product: gcc Version: 14.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jens.seifert at de dot ibm.com Target Milestone: --- Shifts by <element bit width>-1 should be performed by a 0xFF..FF constant as PPC has modulo shift and the constant generation for 0xFF..FF requires just 1 instruction. On Power9 always use a byte mask for the shift amount that xxspltib can be used. On Power8 use vspltisb for the - value range 0..15 and 16..31 for int - value range 0..15 for short - value range 0..7 for byte - 0..15 48..63 for long long. 1 byte shift left as add is done already by gcc. Sample: #include <altivec.h> vector unsigned int shl31(vector unsigned int in) { return vec_sl(in, (vector unsigned int)vec_splats((unsigned char)31)); } Today on Power8/9: shl31(unsigned int __vector(4)): .LCF0: 0: addis 2,12,.TOC.-.LCF0@ha addi 2,2,.TOC.-.LCF0@l addis 9,2,.LC0@toc@ha addi 9,9,.LC0@toc@l lxv 32,0(9) vslw 2,2,0 blr Should be done by: Power8: vspltisw 0,-1 vslw 2,2,0 Power9: xxspltib 34,31 vslw 2,2,0