https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66280
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- Related to PR66251 (partial backport fixes the gcc 5 branch). On trunk the issue is that we have a SLP node node stmt 0 _9->re = _17; stmt 1 _9->im = _23; node stmt 0 patt_56 = _13 w* pretmp_51; stmt 1 patt_27 = _21 w* pretmp_51; node stmt 0 _13 = _12->re; stmt 1 _21 = _12->im; and non-SLP stores: # a.0_31 = PHI <a.0_29(22), a.5_25(7)> _5 = (long unsigned int) a.0_31; _6 = _5 * 8; _7 = pretmp_45 + _6; _9 = pretmp_47 + _6; _11 = _5 * 4; _12 = pretmp_49 + _11; _13 = _12->re; _14 = (int) _13; _17 = _14 * pretmp_53; _9->re = _17; _7->im = _17; _7->re = _17; _21 = _12->im; _22 = (int) _21; _23 = _22 * pretmp_53; _9->im = _23; a.5_25 = a.0_31 + 1; if (a.5_25 == 0) where the stored values and thus the vector stmts are shared. For the SLP scheduling we insert stmts before the last scalar use - in this case _9->im = _23 - and this includes the computation of _17 as the result is just {_17, _23}. But then regular vectorization comes along and inserts the required permutes at the place of the _7->{re,im} stores. I can't see how the old placement method avoided all the issues in similar situations. Well, it inserted stmts before the first stmt in a group (apart from stores where it chooses the last and loads where it chooses the first stmt). So with _9->re = _23; and _9->im = _17; it would have been broken there as well. Ah, no, on the GCC 5 branch we then simply fail to detect the SLP opportunity... typedef struct { short re; short im; } cint16_T; typedef struct { int re; int im; } cint32_T; int a; short b; cint16_T *c; cint32_T *d, *e; void fn1 () { for (; a; a++) { d[a].re = d[a].im = e[a].im = c[a].im * b; e[a].re = c[a].re * b; } } even after the backport (and on trunk) ICEs in vect_get_vec_def_for_operand