https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79460
--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 13 Feb 2017, amker at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79460 > > --- Comment #5 from amker at gcc dot gnu.org --- > (In reply to Jakub Jelinek from comment #4) > > (In reply to Richard Biener from comment #3) > > > In this case it is complete unrolling that can estimate the non-vector > > > code > > > to constant fold but not the vectorized code. OTOH it's quite excessive > > > work done by the unroller when doing this for large N... > > > > > > And yes, SCEV final value replacement doesn't know how to handle float > > > reductions > > > (we have a different PR for that). > > > > Doesn't handle float reductions nor vector (integer or vector) reductions. > > Even the vector ones would be useful, if e.g. to a vector every iteration > > adds a VECTOR_CST or similar, then it could be still nicely optimized. > Integer version should have already been supported now. > > > > > For the 202 case, it seems we are generating a scalar loop epilogue (not > > needed for 200) and somehow it seems something in the vector is actually > > able to figure out the floating point final value, because we get: > > # p_2 = PHI <2.01e+2(5), p_12(7)> > > # i_3 = PHI <200(5), i_13(7)> > > on the scalar loop epilogue. So if something in the vectorizer is able to > > figure it out, why can't it just use that even in the case where no epilogue > > loop is needed? > IIUC, scev-ccp should be made query based interface so that it can be called > for each loop closed phi at different compilation stage. It also needs to be > extended to cover basic floating point case like this. Effectively, it need > to > do the same transformation as vectorizer does now, but just thought it might > be > a better place to do that. Yeah, the vectorizer does this in vect_update_ivs_after_vectorizer by accident I think - it sees the float "IV" and replaces the prologue loop init by init + niter * step which is on the border of invalid (without -ffp-contract=on/fast). At least if the vectorizer can do this then final value replacement can do so as well with Index: gcc/tree-scalar-evolution.c =================================================================== --- gcc/tree-scalar-evolution.c (revision 245417) +++ gcc/tree-scalar-evolution.c (working copy) @@ -3718,13 +3718,6 @@ final_value_replacement_loop (struct loo continue; } - if (!POINTER_TYPE_P (TREE_TYPE (def)) - && !INTEGRAL_TYPE_P (TREE_TYPE (def))) - { - gsi_next (&psi); - continue; - } - bool folded_casts; def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, &folded_casts); (rather than removing the condition replace it with a validity check - like FP contraction? etc...). But ideally SCEV itself would contain those (or compute exact results with rounding effects). Like maybe simply Index: gcc/tree-scalar-evolution.c =================================================================== --- gcc/tree-scalar-evolution.c (revision 245417) +++ gcc/tree-scalar-evolution.c (working copy) @@ -3718,8 +3718,10 @@ final_value_replacement_loop (struct loo continue; } - if (!POINTER_TYPE_P (TREE_TYPE (def)) - && !INTEGRAL_TYPE_P (TREE_TYPE (def))) + if (! (POINTER_TYPE_P (TREE_TYPE (def)) + || INTEGRAL_TYPE_P (TREE_TYPE (def)) + || (FLOAT_TYPE_P (TREE_TYPE (def)) + && flag_fp_contract_mode == FP_CONTRACT_FAST))) { gsi_next (&psi); continue; Richard.