[Bug middle-end/92080] Missed CSE of _mm512_set1_epi8(c) with _mm256_set1_epi8(c)

rguenth at gcc dot gnu.org Mon, 14 Oct 2019 02:12:27 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92080


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-10-14
                 CC|                            |rguenth at gcc dot gnu.org
          Component|tree-optimization           |middle-end
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Interestingly enough with just -mavx512f we get

        vmovd   %edi, %xmm0
        vpbroadcastb    %xmm0, %ymm0
        vinserti64x4    $0x1, %ymm0, %zmm0, %zmm1
        vmovdqa %ymm0, sinky(%rip)
        vmovdqa64       %zmm1, sinkz(%rip)

the GIMPLE we expand from is

  _7 = {c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D),
c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D),
c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D),
c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D),
c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D),
c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D),
c_1(D), c_1(D), c_1(D), c_1(D), c_1(D)};
  _8 = VIEW_CONVERT_EXPR<__m512i>(_7);
  sinkz = _8;
  _3 = {c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D),
c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D),
c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D), c_1(D),
c_1(D), c_1(D), c_1(D)};
  _6 = VIEW_CONVERT_EXPR<__m256i>(_3);
  sinky = _6;

where we could replace _6 with a BIT_FIELD_REF but it will be a quite
costly thing to do in general.  Our representation for the splats isn't
too nice either...

So without avx512bw we seem miss the splat on V64QI and do a V32QI splat
plus a concat.  On the RTL side optimizing this isn't any less awkward
than on GIMPLE I guess.

[Bug middle-end/92080] Missed CSE of _mm512_set1_epi8(c) with _mm256_set1_epi8(c)

Reply via email to