https://gcc.gnu.org/g:f594008dcced0ebb86908f3d7602fcf943e05bc7

commit r15-3820-gf594008dcced0ebb86908f3d7602fcf943e05bc7
Author: Richard Biener <rguent...@suse.de>
Date:   Fri Sep 20 15:07:24 2024 +0200

    tree-optimization/115372 - failed store-lanes in some cases
    
    The gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c testcase shows
    that we sometimes fail to use store-lanes even though it should be
    profitable.  We're currently relying on vect_slp_prefer_store_lanes_p
    at the point we run into the first SLP discovery mismatch with obviously
    limited information.  For the case at hand we have 3, 5 or 7 lanes
    of VnDImode [2, 2] vectors with the first mismatch at lane 2 so the
    new group size is 1.  The heuristic says that might be an OK split
    given the rest is a multiple of the vector lanes.  Now we continue
    discovery but in the end mismatches result in uniformly single-lane
    SLP instances which we can handle via interleaving but of course are
    prime candidates for store-lanes.  The following patch re-assesses
    with the extra knowledge now just relying on the fact whether the
    target supports store-lanes for the given group size.
    
            PR tree-optimization/115372
            * tree-vect-slp.cc (vect_build_slp_instance): Compute the
            uniform, if, number of lanes of the RHS sub-graphs feeding
            the store and if uniformly one, use store-lanes if the target
            supports that.

Diff:
---
 gcc/tree-vect-slp.cc | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index ab49bb0e7ee1..f5b47e430e31 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3957,6 +3957,7 @@ vect_build_slp_instance (vec_info *vinfo,
          /* Calculate the unrolling factor based on the smallest type.  */
          poly_uint64 unrolling_factor = 1;
 
+         unsigned int rhs_common_nlanes = 0;
          unsigned int start = 0, end = i;
          while (start < group_size)
            {
@@ -3978,6 +3979,10 @@ vect_build_slp_instance (vec_info *vinfo,
                                             calculate_unrolling_factor
                                               (max_nunits, end - start));
                  rhs_nodes.safe_push (node);
+                 if (start == 0)
+                   rhs_common_nlanes = SLP_TREE_LANES (node);
+                 else if (rhs_common_nlanes != SLP_TREE_LANES (node))
+                   rhs_common_nlanes = 0;
                  start = end;
                  if (want_store_lanes || force_single_lane)
                    end = start + 1;
@@ -4015,6 +4020,19 @@ vect_build_slp_instance (vec_info *vinfo,
                }
            }
 
+         /* Now re-assess whether we want store lanes in case the
+            discovery ended up producing all single-lane RHSs.  */
+         if (rhs_common_nlanes == 1
+             && ! STMT_VINFO_GATHER_SCATTER_P (stmt_info)
+             && ! STMT_VINFO_STRIDED_P (stmt_info)
+             && compare_step_with_zero (vinfo, stmt_info) > 0
+             && (vect_store_lanes_supported (SLP_TREE_VECTYPE (rhs_nodes[0]),
+                                             group_size,
+                                             SLP_TREE_CHILDREN
+                                               (rhs_nodes[0]).length () != 1)
+                 != IFN_LAST))
+           want_store_lanes = true;
+
          /* Now we assume we can build the root SLP node from all stores.  */
          if (want_store_lanes)
            {

Reply via email to