The following makes sure we are not lowering single-element interleaving
schemes in a way that defeats load vectorizing later but allows the
VMAT_ELEMENTWISE fallback to be used.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

        PR tree-optimization/120457
        * tree-vect-slp.cc (vect_lower_load_permutations): Implement
        the same heuristics as load vectorization for single-element
        interleaving that spans multiple vectors.
---
 gcc/tree-vect-slp.cc | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index fb2262a6137..dc89da3bf17 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4557,6 +4557,15 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo,
       if (!SLP_TREE_CHILDREN (load).is_empty ())
        continue;
 
+      /* For single-element interleaving spanning multiple vectors avoid
+        lowering, we want to use VMAT_ELEMENTWISE later.  */
+      if (ld_lanes_lanes == 0
+         && SLP_TREE_LANES (load) == 1
+         && !DR_GROUP_NEXT_ELEMENT (first)
+         && maybe_gt (group_lanes,
+                      TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (load))))
+       return;
+
       /* We want to pattern-match special cases here and keep those
         alone.  Candidates are splats and load-lane.  */
 
-- 
2.43.0

Reply via email to