This removes --param vect-inner-loop-cost-factor in favor of looking
at the basic block counts of the outer vs. the inner loop header
when they are comparable and otherwise just assumes a single inner
iteration which is conservative on the side of not vectorizing.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Does this look sane?  I will be looking into computing the scalar
loop cost on the original, not if-converted body where I'd have
to do sth similar for the different branches.  From there I can
see to fix the issue with BB vectorization of if-converted
loop bodies.

Thanks,
Richard.

2021-08-23  Richard Biener  <rguent...@suse.de>

        * doc/invoke.texi (vect-inner-loop-cost-factor): Remove
        documentation.
        * params.opt (--param vect-inner-loop-cost-factor): Remove.
        * tree-vect-loop.c (_loop_vec_info::_loop_vec_info):
        Initialize inner_loop_cost_factor to 1.
        (vect_analyze_loop_form): Initialize inner_loop_cost_factor
        from inner and outer loop header counts.
---
 gcc/doc/invoke.texi  | 5 -----
 gcc/params.opt       | 4 ----
 gcc/tree-vect-loop.c | 9 ++++++++-
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d8a6b0b60c9..c7819917dce 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14385,11 +14385,6 @@ code to iterate.  2 allows partial vector loads and 
stores in all loops.
 The parameter only has an effect on targets that support partial
 vector loads and stores.
 
-@item vect-inner-loop-cost-factor
-The factor which the loop vectorizer applies to the cost of statements
-in an inner loop relative to the loop being vectorized.  The default
-value is 50.
-
 @item avoid-fma-max-bits
 Maximum number of bits for which we avoid creating FMAs.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index f9264887b40..f7b19fa430d 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the 
vectorizer's loop versioning f
 Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) 
IntegerRange(0, 2) Param Optimization
 Controls how loop vectorizer uses partial vectors.  0 means never, 1 means 
only for loops whose need to iterate can be removed, 2 means for all loops.  
The default value is 2.
 
--param=vect-inner-loop-cost-factor=
-Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) 
IntegerRange(1, 999999) Param Optimization
-The factor which the loop vectorizer applies to the cost of statements in an 
inner loop relative to the loop being vectorized.
-
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c521b43a47c..86cd5d8f730 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
     single_scalar_iteration_cost (0),
     vec_outside_cost (0),
     vec_inside_cost (0),
-    inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
+    inner_loop_cost_factor (1),
     vectorizable (false),
     can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
     using_partial_vectors_p (false),
@@ -1519,6 +1519,13 @@ vect_analyze_loop_form (class loop *loop, 
vec_info_shared *shared)
       stmt_vec_info inner_loop_cond_info
        = loop_vinfo->lookup_stmt (inner_loop_cond);
       STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
+      /* If we can compare profile counts of the outer and inner loop header
+        compute the scale based on that, otherwise conservatively assume
+        a single inner iteration.  */
+      if (loop->inner->header->count > loop->header->count)
+       LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
+         = CEIL (loop->inner->header->count.value (),
+                 loop->header->count.value ());
     }
 
   gcc_assert (!loop->aux);
-- 
2.31.1

Reply via email to