https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117007
--- Comment #6 from Steven Munroe <munroesj at gcc dot gnu.org> --- I am starting to see pattern and wonder if the compiler is confused by assuming the sihft count must match the width/type of the shift/rotate target. This is implied all the way back to the Altivec-PIM and the current Intrinsic Reference and the GCC documentation. The intrinsics vec_rl(), vec_sl(), vec_sr(), vec_sra() all require that the shift-count be the same (unsigned) type (element size) as the shifted/rotated a value. This might confuse the compiler into thinking it MUST properly (zero/sign) extend any shift count. But that is wrong. But the PowerISA only requires that the shift-count in the (3-7-bits) low-order bits of each element. And any high-order element bits are don't care. So the shift-count (operand b) could easily be a vector unsigned char (byte elements). In fact the vec_sll(), vec_slo(), vec_srl(), and vec_sro() allow this. So the compiler can correctly use vspltisb, vspltish, vspltisw, xxspltib, for any vector shift/rotate where the shift-count is a compiler time constant. The is always less and faster code then loading vector constants from .rodata.