https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68379
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- The PR68614 testcase int a, b[3], c[3][5]; void fn1 () { int e; for (a = 2; a >= 0; a--) for (e = 0; e < 4; e++) c[a][e] = b[a]; } exposes the same issue in basic-block vectorization. We fully unroll the nest leaving a SLP group of size 3 for the b loads where each individual load is used in a different SLP target (so the load isn't really grouped which shows an opportunity for a cheap fix improving code-gen as well).