https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118480
--- Comment #1 from Steven Munroe <munroesj at gcc dot gnu.org> --- Strangely the ticks that seem to work for positive immediate values (see test_slqi_char_18_V3 above) fail (generate and .rodata load) for negative values. For example the shift count for 110 (110-128 = -18): vui8_t test_splat1_char_110_V2 () { return vec_splats ((unsigned char)110); } test_splat1_char_110_V2: xxspltib 34,110 blr But fails when the vec_splats results is passed to vec_slo/vec_sll: vui128_t test_slqi_char_110_V3 (vui128_t vra) { vui8_t result; vui8_t tmp = vec_splats((unsigned char)110); result = vec_vslo ((vui8_t) vra, tmp); return (vui128_t) vec_vsl (result, tmp); } test_slqi_char_110_V3: addis 9,2,.LC9@toc@ha addi 9,9,.LC9@toc@l lxv 32,0(9) vslo 2,2,0 vsl 2,2,0 blr Strangely GCC playes along with the even (but negative) numbers trick. For example: vui8_t test_splat7_char_110_V0 () { // 110-128 = -18 // (-18 / 2) + (-18 / 2) // (-9) + (-9) vui8_t tmp = vec_splat_u8(-9); return vec_add (tmp, tmp); } test_splat7_char_110_V0: xxspltib 34,247 vaddubm 2,2,2 blr But fails when this value passed to vec_slo/vec_sll: vui128_t test_slqi_char_110_V2 (vui128_t vra) { vui8_t result; vui8_t tmp = vec_splat_u8(-9); tmp = vec_vaddubm (tmp, tmp); result = vec_vslo ((vui8_t) vra, tmp); return (vui128_t) vec_vsl (result, tmp); } test_slqi_char_110_V2: addis 9,2,.LC11@toc@ha addi 9,9,.LC11@toc@l lxv 32,0(9) vslo 2,2,0 vsl 2,2,0 blr Stranger yet, replacing the vaddubm with a shift left 1 vui8_t test_splat7_char__110_V4 () { // 110 - 128 = -18 // -18 = (-9 * 2) = (-9 << 1) vui8_t v1 = vec_splat_u8(1); vui8_t tmp = vec_splat_u8(-9); return vec_sl (tmp, v1); } test_splat7_char__110_V4: .LFB34: .cfi_startproc xxspltib 34,247 vaddubm 2,2,2 blr When this is passed to vec_slo/vec_sll, GCC avoids the conversion to .rodata, but converts the shift back to xxspltib/vaddubm. This is slightly better but generates an extra (and unnecessary) instruction: vui8_t test_slqi_char_110_V4 (vui8_t vra) { vui8_t result; // 110 = (-9 * 2) = (-9 << 1) vui8_t v1 = vec_splat_u8(1); vui8_t tmp = vec_splat_u8(-9); tmp = vec_sl (tmp, v1); result = vec_slo (vra, tmp); return vec_sll (result, tmp); } test_slqi_char_110_V4: .LFB41: .cfi_startproc xxspltib 32,247 vaddubm 0,0,0 vslo 2,2,0 vsl 2,2,0 blr Perhaps we are on to something! - Avoid negative values - Use explicit shift instead of add So one last example generating the 7-bit shift-count as octet (times 8) plus bit shift and using only positive values: vui8_t test_splat7_char_110_V1 () { // 110 = (13 * 8) + 4 vui8_t v3 = vec_splat_u8(3); vui8_t tmp = vec_splat_u8(13); vui8_t tmp2 = vec_splat_u8(6); tmp = vec_sl (tmp, v3); return vec_add (tmp, tmp2); } test_splat7_char_110_V1: xxspltib 34,110 blr And: vui8_t test_slqi_char_110_V5 (vui8_t vra) { vui8_t result; // 110 = (13 * 8) + 6 vui8_t v3 = vec_splat_u8(3); vui8_t tmp = vec_splat_u8(13); vui8_t tmp2 = vec_splat_u8(6); tmp = vec_sl (tmp, v3); tmp = vec_add (tmp, tmp2); result = vec_slo (vra, tmp); return vec_sll (result, tmp); } test_slqi_char_110_V5: xxspltib 32,110 vslo 2,2,0 vsl 2,2,0 blr Finally we have a reasonable result that should have been possible with simple vec_splats((unsigned char)110)! Note: this looks like a possible workaround for generating vector splatted with positive constants. It still looks like a problem with negative constants persists.