We fail to demote this to VMAT_ELEMENTWISE and thus run into the three vector permutation limit (and would not consider to use strided loads or gathers).
This resolves another bunch of SVE regressions with --param vect-force-slp=1 Bootstrapped and tested on x86_64-unknown-linux-gnu. PR tree-optimization/117605 * tree-vect-stmts.cc (get_group_load_store_type): Also apply group size limit for single-element interleaving to VMAT_CONTIGUOUS_REVERSE. --- gcc/tree-vect-stmts.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f4a4d5a554c..ab5ea038d1d 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2086,8 +2086,9 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, at least create very sub-optimal code in that case (and blow up memory, see PR65518). */ if (loop_vinfo - && *memory_access_type == VMAT_CONTIGUOUS && single_element_p + && (*memory_access_type == VMAT_CONTIGUOUS + || *memory_access_type == VMAT_CONTIGUOUS_REVERSE) && maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype))) { if (SLP_TREE_LANES (slp_node) == 1) -- 2.43.0