https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-06-04
             Blocks|                            |53947
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that the DRs for the loads tmp[0][i] and tmp[1][i] are not
related - they are off different base pointers.  At the moment we are
not merging unrelated "groups" (even though the loads are not marked
as grouped) into one SLP node.

The stores are not considered "grouped" because they have gaps.

With SLP-ification you'd get four instances and the same code-gen as now.

To do better we'd have to improve the store dataref analysis to see
that a vectorization factor of four would "close" the gaps, or more
generally support store groups with gaps.  Stores with gaps can be
handled by masking for example.

You get the store side handled when using -fno-tree-loop-vectorize to
get basic-block vectorization after unrolling the loop.  But you
still run into the issue that we do not combine from different load
groups during SLP discovery.  That's another angle you can attack;
during greedy discovery we also do not consider splitting the store
but instead build the loads from scalars which is of course less than
optimal.  Also since we do not re-process the built vector CTORs for
further basic-block vectorization opportunities.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to