https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340
--- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> --- > The stores are not considered "grouped" because they have gaps. > To do better we'd have to improve the store dataref analysis to see > that a vectorization factor of four would "close" the gaps, or more > generally support store groups with gaps. Stores with gaps can be > handled by masking for example. I have been stepping through the code and experimenting a bit, starting with the store side. for( int i = 0; i < 4; i++ ) { out[i + 0] = tmp[0][i] + 1; Those stores are not considered grouped because the step is constant and the access for the data-ref itself is contiguous. We discover four of those (as you mentioned before). With some dirty hacks (i.e. continuing the group discovery even for the contiguous case and annotate the statements/refs with a special flag) it is possible to discover the full group of four and mark the stores as related. Then (again as you said) the lack of store with gap support is still in the way but for the case here we could just ignore the gap at the early discovery phase. We'd just need to make sure the current behavior is preserved for all other cases. Is that a way forward? I was thinking of adding another memory access type like VMAT_GAP_CLOSING (or whatever fitting name) for such cases. In the analysis part we'd need to verify that the vectorization factor matches the group gap as well as support for a large vector type etc. If everything succeeded we could emit a large store instead of the four individual ones. Or is that too specific? If we had full store-with-gaps support, let's say using masking, we'd still need dedicated handling for cases where the gaps vanish I suppose.