We fail to demote this to VMAT_ELEMENTWISE and thus run into the three
vector permutation limit (and would not consider to use strided loads
or gathers).

This resolves another bunch of SVE regressions with --param
vect-force-slp=1

Bootstrapped and tested on x86_64-unknown-linux-gnu.

        PR tree-optimization/117605
        * tree-vect-stmts.cc (get_group_load_store_type): Also
        apply group size limit for single-element interleaving
        to VMAT_CONTIGUOUS_REVERSE.
---
 gcc/tree-vect-stmts.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f4a4d5a554c..ab5ea038d1d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2086,8 +2086,9 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
             at least create very sub-optimal code in that case (and
             blow up memory, see PR65518).  */
          if (loop_vinfo
-             && *memory_access_type == VMAT_CONTIGUOUS
              && single_element_p
+             && (*memory_access_type == VMAT_CONTIGUOUS
+                 || *memory_access_type == VMAT_CONTIGUOUS_REVERSE)
              && maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
            {
              if (SLP_TREE_LANES (slp_node) == 1)
-- 
2.43.0

Reply via email to