The following makes sure we are not lowering single-element interleaving schemes in a way that defeats load vectorizing later but allows the VMAT_ELEMENTWISE fallback to be used.
Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/120457 * tree-vect-slp.cc (vect_lower_load_permutations): Implement the same heuristics as load vectorization for single-element interleaving that spans multiple vectors. --- gcc/tree-vect-slp.cc | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index fb2262a6137..dc89da3bf17 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -4557,6 +4557,15 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo, if (!SLP_TREE_CHILDREN (load).is_empty ()) continue; + /* For single-element interleaving spanning multiple vectors avoid + lowering, we want to use VMAT_ELEMENTWISE later. */ + if (ld_lanes_lanes == 0 + && SLP_TREE_LANES (load) == 1 + && !DR_GROUP_NEXT_ELEMENT (first) + && maybe_gt (group_lanes, + TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (load)))) + return; + /* We want to pattern-match special cases here and keep those alone. Candidates are splats and load-lane. */ -- 2.43.0