https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18438
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Maxim Kuvyrkov from comment #9) > I've looked into another case where inability to handle stores with gaps > generates sub-optimal code. I'm interested in spending some time on fixing > this, provided some guidance in the vectorizer. > > Is it substantially more difficult to handle stores with gaps compared to > loads with gaps? It has the complication that we can't actually store to the gaps because that creates store data races (and we'd need a load-modify-write cycle). So we have to emit either scalar stores (which is what we currently do), emit masked stores (not implemented yet) or something you suggest (I suppose that's a store-lanes kind?). A slight complication is that we have to avoid detecting the store group if we want to end up with scalar stores (well, that's a vectorizer implementation limit). This is why we simply split all groups at gap boundaries. Cost-based selection of the kind of store (or even load) implementation is not implemented.