[Bug tree-optimization/79460] gcc fails to optimise out a trivial additive loop for seemingly arbitrary numbers of iterations

rguenther at suse dot de Tue, 14 Feb 2017 01:31:16 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79460


--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 13 Feb 2017, amker at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79460
> 
> --- Comment #5 from amker at gcc dot gnu.org ---
> (In reply to Jakub Jelinek from comment #4)
> > (In reply to Richard Biener from comment #3)
> > > In this case it is complete unrolling that can estimate the non-vector 
> > > code
> > > to constant fold but not the vectorized code.  OTOH it's quite excessive
> > > work done by the unroller when doing this for large N...
> > > 
> > > And yes, SCEV final value replacement doesn't know how to handle float
> > > reductions
> > > (we have a different PR for that).
> > 
> > Doesn't handle float reductions nor vector (integer or vector) reductions.
> > Even the vector ones would be useful, if e.g. to a vector every iteration
> > adds a VECTOR_CST or similar, then it could be still nicely optimized.
> Integer version should have already been supported now.
> 
> > 
> > For the 202 case, it seems we are generating a scalar loop epilogue (not
> > needed for 200) and somehow it seems something in the vector is actually
> > able to figure out the floating point final value, because we get:
> >   # p_2 = PHI <2.01e+2(5), p_12(7)>
> >   # i_3 = PHI <200(5), i_13(7)>
> > on the scalar loop epilogue.  So if something in the vectorizer is able to
> > figure it out, why can't it just use that even in the case where no epilogue
> > loop is needed?
> IIUC, scev-ccp should be made query based interface so that it can be called
> for each loop closed phi at different compilation stage.  It also needs to be
> extended to cover basic floating point case like this.  Effectively, it need 
> to
> do the same transformation as vectorizer does now, but just thought it might 
> be
> a better place to do that.

Yeah, the vectorizer does this in vect_update_ivs_after_vectorizer
by accident I think - it sees the float "IV" and replaces the prologue
loop init by init + niter * step which is on the border of invalid
(without -ffp-contract=on/fast).  At least if the vectorizer can do this
then final value replacement can do so as well with

Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c (revision 245417)
+++ gcc/tree-scalar-evolution.c (working copy)
@@ -3718,13 +3718,6 @@ final_value_replacement_loop (struct loo
          continue;
        }

-      if (!POINTER_TYPE_P (TREE_TYPE (def))
-         && !INTEGRAL_TYPE_P (TREE_TYPE (def)))
-       {
-         gsi_next (&psi);
-         continue;
-       }
-
       bool folded_casts;
       def = analyze_scalar_evolution_in_loop (ex_loop, loop, def,
                                              &folded_casts);

(rather than removing the condition replace it with a validity check -
like FP contraction?  etc...).
But ideally SCEV itself would contain those (or compute exact results
with rounding effects).

Like maybe simply

Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c (revision 245417)
+++ gcc/tree-scalar-evolution.c (working copy)
@@ -3718,8 +3718,10 @@ final_value_replacement_loop (struct loo
          continue;
        }

-      if (!POINTER_TYPE_P (TREE_TYPE (def))
-         && !INTEGRAL_TYPE_P (TREE_TYPE (def)))
+      if (! (POINTER_TYPE_P (TREE_TYPE (def))
+            || INTEGRAL_TYPE_P (TREE_TYPE (def))
+            || (FLOAT_TYPE_P (TREE_TYPE (def))
+                && flag_fp_contract_mode == FP_CONTRACT_FAST)))
        {
          gsi_next (&psi);
          continue;

Richard.

[Bug tree-optimization/79460] gcc fails to optimise out a trivial additive loop for seemingly arbitrary numbers of iterations

Reply via email to