https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117007

--- Comment #15 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Found where handling of vec_splat_u32 constant shift counts are handled
differently across the various shift/rotate intrinsics.

Even for the 5-bit shift counts (the easy case) the behavior of the various
shift/rotate intrinsic are inconsistent. The compiler pays way to much
attention to how the shift count is generated but differently between shift
left/right word and different again for rotate left word.

Any reasonable person would assume that using vec_splat_u32() for any shift
value 1 to 31 (-16 to 15) will generate efficient code. And it does for
vec_vslw() which generates two instructions (vspltisw v0,-16; vslw v2,v2,v0).

But the compiler behaves differently for vec_vsrw() and vec_vsraw():
 - for values 1-15 generates:
   - vspltisw v0,15; vsrw    v2,v2,v0
 - for even values between 16 - 30
   - vspltisw v0,8; vadduwm v0,v0,v0; vsrw    v2,v2,v0
 - for odd values between 17 - 31 generates a load for .rodata

And positively strange for vec_vrlw():
 - for values 1-15 it generates:
   - vspltisw v0,15; vrlw    v2,v2,v0
 - but for any value between 16 - 31 it gets strange:
0000000000001200 <test_rlwi_16>:
    1200:       30 00 20 39     li      r9,48
    1204:       8c 03 00 10     vspltisw v0,0
    1208:       67 01 29 7c     mtvrd   v1,r9
    120c:       93 0a 21 f0     xxspltw vs33,vs33,1
    1210:       80 0c 00 10     vsubuwm v0,v0,v1
    1214:       84 00 42 10     vrlw    v2,v2,v0
    1218:       20 00 80 4e     blr

Reply via email to