https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117818

--- Comment #5 from Steven Munroe <munroesj at gcc dot gnu.org> ---
I expected compiling for -mcpu=power9 to do a better job generating splats for
small constants.

Given the new instructions like VSX Vector Splat Immediate Byte (xxspltib) and
Vector Extend Sign Byte To Word/Doubleword the compiler should have a easier
time generating vec_splats(). It would seem that Vector Splat Immediate Byte
would be the perfect way to generate a constant shift quadword left/right.

But that is not what I am seeing. First note there is no direct intrinsic for
xxspltib. It is sometimes generated for vec_splat_u8(0-15) and
vec_splats((vector unsigned char) x). But sometimes it gets weird.

For example:

vui128_t
test_slqi_char_18_V3 (vui128_t vra)
{
  vui8_t result;
  vui8_t tmp = vec_splats((unsigned char)18);
  result = vec_vslo ((vui8_t) vra, tmp);
  return (vui128_t) vec_vsl (result, tmp);
}

Which I would expect to generate:

        xxspltib 34,18
        vslo 2,2,0
        vsl 2,2,0

But generates:

        vspltisb 0,9
        vadduwm 0,0,0
        vslo 2,2,0
        vsl 2,2,0

It recognizes that it can't generate 18 with vspltisb and uses the 18 = 9 * 2
pattern. It also erroneously generates vector add word. Seem like GCC is
reusing the old pattern and ignoring the new instructions.

This is weird because:

vui8_t
test_splat6_char_18 ()
{
  vui8_t tmp = vec_splat_u8(9);
  return vec_add (tmp, tmp);
}

Generates:

        xxspltib 34,9
        vaddubm 2,2,2

But:

vui8_t
test_splat6_char_31 ()
{
  // 31 = (16+15) = (15 - (-16))
  vui8_t v16 = vec_splat_u8(-16);
  vui8_t tmp = vec_splat_u8(15);
  return vec_sub (tmp, v16);
}

Generates:

        xxspltib 34,31

Which seems like a miracle. Is this constant propagation?

But:

vui8_t
test_slqi_char_31_V0 (vui8_t vra)
{
  vui8_t result;
  // 31 = (16+15) = (15 - (-16))
  vui8_t v16 = vec_splat_u8(-16);
  vui8_t tmp = vec_splat_u8(15);
  tmp = vec_sub (tmp, v16);
  result = vec_slo (vra, tmp);
  return vec_sll (result, tmp);
}

Generates:

        addis 9,2,.LC0@toc@ha
        addi 9,9,.LC0@toc@l
        lxv 32,0(9)
        vslo 2,2,0
        vsl 2,2,0

Ok I think I can fix ths with:

vui8_t
test_slqi_char_31_V3 (vui8_t vra)
{
  vui8_t result;
  vui8_t tmp = vec_splats((unsigned char)31);
  result = vec_slo (vra, tmp);
  return vec_sll (result, tmp);
}

But no. it still generated:

        addis 9,2,.LC0@toc@ha
        addi 9,9,.LC0@toc@l
        lxv 32,0(9)
        vslo 2,2,0
        vsl 2,2,0

Which is all very confusing.

Reply via email to