https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- The issue is that when we treat this as a group the same group in the next iteration will overlap - this isn't something we support (we'd have to alter dependence analysis to consider overlap with gaps as no overlap). It's really a hard problem and much easier to BB vectorize when unrolled. That said - if DR analysis could, say, "force" a particular VF where it knows that gaps are closed we might "virtually" unroll this and thus detect it as a group of contiguous 16 stores. Now we'd need to do the same virtual unrolling for all other stmts of course. I think it would be easier if we'd somehow detect this situation beforehand and actually perform the unrolling - we might want to do it with a if (.LOOP_VECTORIZED (...)) versioning scheme though. I do wonder how common such loops are though. It might be also possible to override cost considerations of early unrolling with -O3 (aka when vectorization is enabled) and when the number of iterations matches the gap of related DRs (but as said, it looks like a very special thing to do). That said - I do plan to change the vectorizer from iterating over modes to iterating over VFs which means we could perform the unrolling implied by the VF on the vectorizer IL (SLP) and (re-)perform group discovery afterwards. For a more general loop we'd essentially apply blocking with the desired VF, unroll that blocking loop and apply BB vectorization. So to make the point - I don't like how handling this special case within the current vectorizer framework pays off with the cost this will have (I'm not sure it's really feasible to add even). Instead this looks like in need of a vectorization enablement pre-transform to me.