https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115252

            Bug ID: 115252
           Summary: The SLP vectorizer failed to perform automatic
                    vectorization on pixel_sub_wxh of x264
           Product: gcc
           Version: 14.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hkzhang455 at gmail dot com
  Target Milestone: ---

Test case: (from https://github.com/mirror/x264/blob/master/common/dct.c) 

void pixel_sub_wxh(int16_t *diff, uint8_t *pix1, uint8_t *pix2) {
  for (int y = 0; y < 4; y++) {
    for (int x = 0; x < 4; x++)
      diff[x + y * 4] = pix1[x] - pix2[x];
    pix1 += 16;
    pix2 += 32;
  }
}

This is a simplified version, as the original code will inlined and some of the
parameters are constant.

When compiling the function with `-O3 -mavx2`, . But after that, the code in it
should be vectorized


When I compiled with `-O3 -mavx2/-msse4.2`, the inner loop will be unrolled and
SLP vectorizer failed to vectorize it, and I got the following message when
adding
`-fopt-info-vec-all`.

<source>:6:21: optimized: loop vectorized using 8 byte vectors
<source>:6:21: optimized:  loop versioned for vectorization because of
possible aliasing
<source>:5:6: note: vectorized 1 loops in function.
<source>:5:6: note: ***** Analysis failed with vector mode V8SI
<source>:5:6: note: ***** The result for vector mode V32QI would be the same
<source>:5:6: note: ***** Re-trying analysis with vector mode V16QI
<source>:5:6: note: ***** Analysis failed with vector mode V16QI
<source>:5:6: note: ***** Re-trying analysis with vector mode V8QI
<source>:5:6: note: ***** Analysis failed with vector mode V8QI
<source>:5:6: note: ***** Re-trying analysis with vector mode V4QI
<source>:5:6: note: ***** Analysis failed with vector mode V4QI

If I manually use the type declaration provided by `immintrin.h` to
rewrite the code, the code is as follows (which I hope the SLP
vectorizer to be able to do)

void pixel_sub_wxh_vec(int16_t *diff, uint8_t *pix1, uint8_t *pix2) {
  for (int y = 0; y < 4; y++) {
    __v4hi pix1_v = {pix1[0], pix1[1], pix1[2], pix1[3]};
    __v4hi pix2_v = {pix2[0], pix2[1], pix2[2], pix2[3]};
    __v4hi diff_v = pix1_v - pix2_v;
    *(long long *)(diff + y * 4) = (long long)diff_v;
    pix1 += 16;
    pix2 += 32;
  }
}


I raised this issue in Gcc mailling list already, and Biner gave some analysis,
that is, pix1 and pix2 are both uint8_t type, and their iterations are scalar,
so this issue will exist, but I still submit a bug here and hope to follow up.

Reply via email to