https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111023

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
So for gcc.dg/vect/pr65947-7.c the main difference is that aarch64 succeeds
with

t.c:12:21: note:  ***** Re-trying analysis with vector mode V4HI
t.c:12:21: note:   === vect_analyze_data_refs ===
t.c:12:21: note:   got vectype for stmt: aval_13 = *_3;
vector(4) short int
t.c:12:21: note:   got vectype for stmt: _7 = *_6;
vector(4) int

while x86_64 fails with both the default (V8HI) and V8QI

t.c:12:21: note:   === vect_analyze_data_refs ===
t.c:12:21: note:   got vectype for stmt: aval_13 = *_3;
vector(8) short int
t.c:12:21: note:   got vectype for stmt: _7 = *_6;
vector(4) int
...
t.c:12:21: note:  ***** Re-trying analysis with vector mode V8QI
t.c:12:21: note:   === vect_analyze_data_refs ===
t.c:12:21: note:   got vectype for stmt: aval_13 = *_3;
vector(4) short int
t.c:12:21: note:   got vectype for stmt: _7 = *_6;
vector(2) int

that is, aarch64 is special here in that it somehow tries V4HI which ends
up behaving differently than V8QI.  aarch64 also tries V2SI for the
epilogue which yields

t.c:12:21: note:   === vect_analyze_data_refs ===
t.c:12:21: note:   got vectype for stmt: aval_13 = *_3;
vector(4) short int 
t.c:12:21: note:   got vectype for stmt: _7 = *_6;
vector(2) int

aarch64 also fails for V8HI (same default).

The order for aarch64 is V8HI, V4HI ..., x86_64 tries V8HI, V8QI, V4QI.

That V4HI yields V4SI as related mode is a "fluke"(?) of
aarch64_vectorize_related_mode which has

  /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors.  */
  if (TARGET_SIMD
      && (vec_flags & VEC_ADVSIMD)
      && known_eq (nunits, 0U)
      && known_eq (GET_MODE_BITSIZE (vector_mode), 64U)
      && maybe_ge (GET_MODE_BITSIZE (element_mode)
                   * GET_MODE_NUNITS (vector_mode), 128U))
    {
      machine_mode res = aarch64_simd_container_mode (element_mode, 128);
      if (VECTOR_MODE_P (res))
        return res;
    }

which essentially "violates" the one-vector-size design of the loop
vectorizer in these kind of special cases.

So indeed x86 isn't going to vectorize this because of the inherent
limitation of the vectorizer which chooses vector types too early.

Reply via email to