https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111023
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- So for gcc.dg/vect/pr65947-7.c the main difference is that aarch64 succeeds with t.c:12:21: note: ***** Re-trying analysis with vector mode V4HI t.c:12:21: note: === vect_analyze_data_refs === t.c:12:21: note: got vectype for stmt: aval_13 = *_3; vector(4) short int t.c:12:21: note: got vectype for stmt: _7 = *_6; vector(4) int while x86_64 fails with both the default (V8HI) and V8QI t.c:12:21: note: === vect_analyze_data_refs === t.c:12:21: note: got vectype for stmt: aval_13 = *_3; vector(8) short int t.c:12:21: note: got vectype for stmt: _7 = *_6; vector(4) int ... t.c:12:21: note: ***** Re-trying analysis with vector mode V8QI t.c:12:21: note: === vect_analyze_data_refs === t.c:12:21: note: got vectype for stmt: aval_13 = *_3; vector(4) short int t.c:12:21: note: got vectype for stmt: _7 = *_6; vector(2) int that is, aarch64 is special here in that it somehow tries V4HI which ends up behaving differently than V8QI. aarch64 also tries V2SI for the epilogue which yields t.c:12:21: note: === vect_analyze_data_refs === t.c:12:21: note: got vectype for stmt: aval_13 = *_3; vector(4) short int t.c:12:21: note: got vectype for stmt: _7 = *_6; vector(2) int aarch64 also fails for V8HI (same default). The order for aarch64 is V8HI, V4HI ..., x86_64 tries V8HI, V8QI, V4QI. That V4HI yields V4SI as related mode is a "fluke"(?) of aarch64_vectorize_related_mode which has /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors. */ if (TARGET_SIMD && (vec_flags & VEC_ADVSIMD) && known_eq (nunits, 0U) && known_eq (GET_MODE_BITSIZE (vector_mode), 64U) && maybe_ge (GET_MODE_BITSIZE (element_mode) * GET_MODE_NUNITS (vector_mode), 128U)) { machine_mode res = aarch64_simd_container_mode (element_mode, 128); if (VECTOR_MODE_P (res)) return res; } which essentially "violates" the one-vector-size design of the loop vectorizer in these kind of special cases. So indeed x86 isn't going to vectorize this because of the inherent limitation of the vectorizer which chooses vector types too early.