https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81410
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- t.ii:25:19: note: === vect_analyze_data_ref_accesses === t.ii:25:19: note: Detected interleaving store _10->x and _10->y t.ii:25:19: note: Detected interleaving load MEM[(const struct Foo &)_8].x and MEM[(const struct Foo &)_8].y t.ii:25:19: note: Detected interleaving store of size 2 starting with _10->x = _37; t.ii:25:19: note: Detected interleaving load of size 3 starting with _37 = MEM[(const struct Foo &)_8].x; t.ii:25:19: note: There is a gap of 1 elements after the group ... t.ii:25:19: note: Final SLP tree for instance: t.ii:25:19: note: node t.ii:25:19: note: stmt 0 _10->x = _37; t.ii:25:19: note: stmt 1 _10->y = _38; t.ii:25:19: note: node t.ii:25:19: note: stmt 0 _37 = MEM[(const struct Foo &)_8].x; t.ii:25:19: note: stmt 1 _38 = MEM[(const struct Foo &)_8].y; (note no load permutation) t.ii:25:19: note: Loop contains SLP and non-SLP stmts t.ii:25:19: note: Updating vectorization factor to 4 t.ii:25:19: note: vectorization_factor = 4, niters = 5 _37 = MEM[(const struct Foo &)_8].x; vect__37.14_78 = MEM[(long int *)vectp.12_80]; vectp.12_73 = vectp.12_80 + 16; vect__37.15_72 = MEM[(long int *)vectp.12_73]; vectp.12_71 = vectp.12_73 + 16; vect__37.16_70 = MEM[(long int *)vectp.12_71]; vectp.12_69 = vectp.12_71 + 16; vect__37.17_68 = MEM[(long int *)vectp.12_69]; vectp.12_67 = vectp.12_69 + 32; _38 = MEM[(const struct Foo &)_8].y; so the gap is accounted for in the wrong place once instead of twice as required. C testcase: typedef __UINT64_TYPE__ uint64_t; uint64_t x[24]; uint64_t y[16]; uint64_t z[8]; void __attribute__((noinline)) foo() { for (int i = 0; i < 8; ++i) { y[2*i] = x[3*i]; y[2*i + 1] = x[3*i + 1]; z[i] = 1; } } int main() { for (int i = 0; i < 24; ++i) x[i] = i; foo (); for (int i = 0; i < 8; ++i) if (y[2*i] != 3*i || y[2*i+1] != 3*i + 1) __builtin_abort (); return 0; }