On Mon, Dec 4, 2017 at 1:11 PM, Richard Biener <rguent...@suse.de> wrote: > > I've noticed we perform FP reduction association without the required > checks for associative math. I've added > gcc.dg/tree-ssa/loop-interchange-1b.c to cover this. > > I also noticed we happily interchange a loop with a reduction like > > sum = a[i] - sum; > > where a change in order of elements isn't ok. Unfortunately bwaves > is exactly a case where single_use != next_def (tried to simply remove > that case for now), because reassoc didn't have a chance to fix the > operand order. Thus this patch exports the relevant handling from > the vectorizer (for stage1 having a separate infrastructure gathering / > analyzing of reduction/induction infrastructure would be nice...) > and uses it from interchange. We then don't handle > gcc.dg/tree-ssa/loop-interchange-4.c anymore (similar vectorizer > missed-opt is PR65930). I didn't bother to split up the vectorizer > code further to implement relaxed validity checking but simply XFAILed > this testcase. > > Earlier I simplified allocation stuff in the main loop which is why > this part is included in this patch. > > Bootstrap running on x86_64-unknown-linux-gnu. > > I'll see to craft a testcase with the sum = a[i] - sum; mis-handling. > > Ok? > > Thanks, > Richard. > > 2017-12-04 Richard Biener <rguent...@suse.de> > > * tree-vectorizer.h (check_reduction_path): Declare. > * tree-vect-loop.c (check_reduction_path): New function, split out > from ... > (vect_is_simple_reduction): ... here. > * gimple-loop-interchange.cc: Include tree-vectorizer.h. > (loop_cand::analyze_iloop_reduction_var): Use single_imm_use. > Properly check for a supported reduction operation and a > valid expression if the reduction covers multiple stmts. > (prepare_perfect_loop_nest): Simpify allocation. > (pass_linterchange::execute): Likewise. > > * gcc.dg/tree-ssa/loop-interchange-1.c: Add fast-math flags. > * gcc.dg/tree-ssa/loop-interchange-1b.c: New test variant. > * gcc.dg/tree-ssa/loop-interchange-4.c: XFAIL. > > > Index: gcc/gimple-loop-interchange.cc > =================================================================== > --- gcc/gimple-loop-interchange.cc (revision 255375) > +++ gcc/gimple-loop-interchange.cc (working copy) > @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. > #include "tree-ssa-loop-ivopts.h" > #include "tree-ssa-dce.h" > #include "tree-data-ref.h" > +#include "tree-vectorizer.h" > > /* This pass performs loop interchange: for example, the loop nest > > @@ -551,23 +552,29 @@ loop_cand::analyze_iloop_reduction_var ( > in a way that reduction operation is seen as black box. In general, > we can ignore reassociation of reduction operator; we can handle fake > reductions in which VAR is not even used to compute NEXT. */ > - FOR_EACH_IMM_USE_FAST (use_p, iterator, var) > - { > - stmt = USE_STMT (use_p); > - if (is_gimple_debug (stmt)) > - continue; > - > - if (!flow_bb_inside_loop_p (m_loop, gimple_bb (stmt))) > - return false; > - > - if (single_use != NULL) > - return false; > + if (! single_imm_use (var, &use_p, &single_use) > + || ! flow_bb_inside_loop_p (m_loop, gimple_bb (single_use))) > + return false; > > - single_use = stmt; > - } > + /* Check the reduction operation. We require a commutative or > + left-associative operation. For FP math we also need to be allowed > + to associate operations. */ > + if (! is_gimple_assign (single_use) > + || ! (commutative_tree_code (gimple_assign_rhs_code (single_use)) > + || (commutative_ternary_tree_code > + (gimple_assign_rhs_code (single_use)) > + && (use_p->use == gimple_assign_rhs1_ptr (single_use) > + || use_p->use == gimple_assign_rhs2_ptr (single_use))) > + || (gimple_assign_rhs_code (single_use) == MINUS_EXPR > + && use_p->use == gimple_assign_rhs1_ptr (single_use))) > + || (FLOAT_TYPE_P (TREE_TYPE (var)) > + && ! flag_associative_math)) > + return false; > > + /* Handle and verify a series of stmts feeding the reduction op. */ > if (single_use != next_def > - && !stmt_dominates_stmt_p (single_use, next_def)) > + && !check_reduction_path (UNKNOWN_LOCATION, m_loop, phi, next, > + gimple_assign_rhs_code (single_use))) > return false; > > /* Only support cases in which INIT is used in inner loop. */ > @@ -1964,7 +1971,7 @@ prepare_perfect_loop_nest (struct loop * > vec<data_reference_p> *datarefs, vec<ddr_p> *ddrs) > { > struct loop *start_loop = NULL, *innermost = loop; > - struct loop *outermost = superloop_at_depth (loop, 0); > + struct loop *outermost = loops_for_fn (cfun)->tree_root; > > /* Find loop nest from the innermost loop. The outermost is the innermost > outer*/ > @@ -1985,21 +1992,26 @@ prepare_perfect_loop_nest (struct loop * > if (!start_loop || !start_loop->inner) > return false; > > + /* Prepare the data reference vector for the loop nest, pruning outer > + loops we cannot handle. */ > datarefs->create (20); > - if ((start_loop = prepare_data_references (start_loop, datarefs)) == NULL > + start_loop = prepare_data_references (start_loop, datarefs); > + if (!start_loop > /* Check if there is no data reference. */ > || datarefs->is_empty () > /* Check if there are too many of data references. */ > - || (int) datarefs->length () > MAX_DATAREFS > - /* Compute access strides for all data references. */ > - || ((start_loop = compute_access_strides (start_loop, innermost, > - *datarefs)) == NULL) > - /* Check if loop nest should be interchanged. */ > - || !should_interchange_loop_nest (start_loop, innermost, *datarefs)) > - { > - free_data_refs_with_aux (*datarefs); > - return false; > - } > + || (int) datarefs->length () > MAX_DATAREFS) > + return false; > + > + /* Compute access strides for all data references, pruning outer > + loops we cannot analyze refs in. */ > + start_loop = compute_access_strides (start_loop, innermost, *datarefs); > + if (!start_loop) > + return false; > + > + /* Check if any interchange is profitable in the loop nest. */ > + if (!should_interchange_loop_nest (start_loop, innermost, *datarefs)) > + return false; > > /* Check if data dependences can be computed for loop nest starting from > start_loop. */ > @@ -2029,14 +2041,15 @@ prepare_perfect_loop_nest (struct loop * > return true; > } > > + /* ??? We should be able to adjust DDRs we already computed by > + truncating distance vectors. */ Note it could be hard enough to adjust DDRs for stripped outer loop. Dependence can become non-dep, even worse, we may result in wrong "invalid dep" in case of: DDR<a,b>: (<, <, =) (<, >, =) because after simply stripping, it becomes: DDR<a,b>: (<, =) (>, =) While the correct result is "no-dep".
Thanks, bin