https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18438

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Maxim Kuvyrkov from comment #9)
> I've looked into another case where inability to handle stores with gaps
> generates sub-optimal code.  I'm interested in spending some time on fixing
> this, provided some guidance in the vectorizer.
> 
> Is it substantially more difficult to handle stores with gaps compared to
> loads with gaps?

It has the complication that we can't actually store to the gaps because
that creates store data races (and we'd need a load-modify-write cycle).

So we have to emit either scalar stores (which is what we currently do),
emit masked stores (not implemented yet) or something you suggest
(I suppose that's a store-lanes kind?).

A slight complication is that we have to avoid detecting the store group
if we want to end up with scalar stores (well, that's a vectorizer
implementation limit).  This is why we simply split all groups at gap
boundaries.  Cost-based selection of the kind of store (or even load)
implementation is not implemented.

Reply via email to