On Fri, Dec 06, 2013 at 01:49:50PM +0100, Richard Biener wrote: > Comments inline (scary large this patch for this stage ...)
Thanks. > > +(define_expand "maskload<mode>" > > + [(set (match_operand:V48_AVX2 0 "register_operand") > > + (unspec:V48_AVX2 > > + [(match_operand:<sseintvecmode> 2 "register_operand") > > + (match_operand:V48_AVX2 1 "memory_operand")] > > + UNSPEC_MASKMOV))] > > + "TARGET_AVX") > > + > > +(define_expand "maskstore<mode>" > > + [(set (match_operand:V48_AVX2 0 "memory_operand") > > + (unspec:V48_AVX2 > > + [(match_operand:<sseintvecmode> 2 "register_operand") > > + (match_operand:V48_AVX2 1 "register_operand") > > + (match_dup 0)] > > + UNSPEC_MASKMOV))] > > + "TARGET_AVX") > > + > > (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" > > [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") > > (unspec:AVX256MODE2P > > x86 maintainers should comment here (ick - unspecs) Well, the unspecs are preexisting (right now used by intrinsics only), I'm just adding expanders that will expand to those instructions. > > @@ -4386,16 +4396,35 @@ get_references_in_stmt (gimple stmt, vec > > { > > unsigned i, n; > > > > - op0 = gimple_call_lhs_ptr (stmt); > > + ref.is_read = false; > > + if (gimple_call_internal_p (stmt)) > > + switch (gimple_call_internal_fn (stmt)) > > + { > > + case IFN_MASK_LOAD: > > + ref.is_read = true; > > + case IFN_MASK_STORE: > > + ref.ref = build2 (MEM_REF, > > + ref.is_read > > + ? TREE_TYPE (gimple_call_lhs (stmt)) > > + : TREE_TYPE (gimple_call_arg (stmt, 3)), > > + gimple_call_arg (stmt, 0), > > + gimple_call_arg (stmt, 1)); > > + references->safe_push (ref); > > This may not be a canonical MEM_REF AFAIK, so you should > use fold_build2 here (if the address is &a.b the .b needs folding Ok, will try that. > into the offset). I assume the 2nd arg is always constant and > thus doesn't change pointer-type during propagations? Yes, it is, always created by ptr = build_int_cst (reference_alias_ptr_type (ref), 0); (and for vectorized IFN_MASK_* copied over from the non-vectorized IFN_MASK_* call). > > @@ -4464,7 +4493,7 @@ graphite_find_data_references_in_stmt (l > > > > FOR_EACH_VEC_ELT (references, i, ref) > > { > > - dr = create_data_ref (nest, loop, *ref->pos, stmt, ref->is_read); > > + dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read); > > gcc_assert (dr != NULL); > > datarefs->safe_push (dr); > > } > > Interetsting that you succeeded in removing the indirection > on ref.pos ... I remember trying that twice at least and > failing ;) > > You can install that as cleanup now if you split it out (so hopefully > no users creep back that make removing it impossible). Ok, will do. > > + /* Check whether this is a load or store. */ > > + lhs = gimple_assign_lhs (stmt); > > + if (TREE_CODE (lhs) != SSA_NAME) > > + { > > gimple_store_p ()? Likely. > > + if (!is_gimple_val (gimple_assign_rhs1 (stmt))) > > + return false; > > + op = maskstore_optab; > > + ref = lhs; > > + } > > + else if (gimple_assign_load_p (stmt)) > > + { > > + op = maskload_optab; > > + ref = gimple_assign_rhs1 (stmt); > > + } > > + else > > + return false; > > + > > + /* And whether REF isn't a MEM_REF with non-addressable decl. */ > > + if (TREE_CODE (ref) == MEM_REF > > + && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR > > + && DECL_P (TREE_OPERAND (TREE_OPERAND (ref, 0), 0)) > > + && !TREE_ADDRESSABLE (TREE_OPERAND (TREE_OPERAND (ref, 0), 0))) > > + return false; > > I think that's overly conservative and not conservative enough. Just > use may_be_nonaddressable_p () (even though the implementation can > need some TLC) and make sure to set TREE_ADDRESSABLE when you > end up taking its address. Will try. > Please factor out the target bits into a predicate in optabs.c > so you can reduce the amount of includes here. You can eventually > re-use that from the vectorization parts. Okay. > > @@ -1404,7 +1530,8 @@ insert_gimplified_predicates (loop_p loo > > basic_block bb = ifc_bbs[i]; > > gimple_seq stmts; > > > > - if (!is_predicated (bb)) > > + if (!is_predicated (bb) > > + || dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) > > isn't that redundant now? Will try (and read the corresponding threads and IRC about that). > > for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > > - if ((stmt = gsi_stmt (gsi)) > > - && gimple_assign_single_p (stmt) > > - && gimple_vdef (stmt)) > > + if ((stmt = gsi_stmt (gsi)) == NULL > > I don't think gsi_stmt can be NULL It can if gsi_end_p, but that is apparently in the loop condition. It was preexisting code anyway, but will change. > > + will be then if-converted, the new copy of the loop will not, > > + and the LOOP_VECTORIZED internal call will be guarding which > > + loop to execute. The vectorizer pass will fold this > > + internal call into either true or false. */ > > > > static bool > > +version_loop_for_if_conversion (struct loop *loop, bool *do_outer) > > +{ > > What's the do_outer parameter? That is whether it should version also the outer loop. Though, perhaps if we solely use the loop versioning + IFN_LOOP_VECTORIZED for MASK_{LOAD,STORE} and perhaps MASK_CALL, we could avoid that and simply live with not vectorizing those in nested loops, perhaps that occurs rarely enough and will not be a vectorization regression. The reason I have added it was when the versioning was used always for if-conversion, then without it we have regressed quite some tests. Perhaps the outer loop versioning is too costly for too unlikely case. So, do you prefer to drop that, or keep? > > This needs a comment with explaining what code you create. Ok. > > > > - return changed; > > + if (todo && version_outer_loop) > > + { > > + if (todo & TODO_update_ssa_only_virtuals) > > + { > > + update_ssa (TODO_update_ssa_only_virtuals); > > + todo &= ~TODO_update_ssa_only_virtuals; > > + } > > Btw I hate that we do update_ssa multiple times per pass per > function. That makes us possibly worse than O(N^2) as update_ssa computes > the IDF of the whole function. > > This is something your patch introduces (it's only rewriting the > virtuals, not the incremental SSA update by BB copying). See above. If you don't like the 2x loop versioning, it can be dropped. That said, every loop versioning has one update_ssa anyway, so this isn't making the complexity any worse, other than constant factor. > > + maskt = gimple_call_arg (stmt, 2); > > + lhs = gimple_call_lhs (stmt); > > + type = TREE_TYPE (lhs); > > + rhs = build2 (MEM_REF, type, gimple_call_arg (stmt, 0), > > + gimple_call_arg (stmt, 1)); > > That's possibly not canonical again, use fold_build2. Ok, will try. > > + gimple g = loop_vectorized_call; > > + tree lhs = gimple_call_lhs (g); > > + gimple_stmt_iterator gsi = gsi_for_stmt (g); > > + gimplify_and_update_call_from_tree (&gsi, boolean_true_node); > > plain update_call_from_tree should also work here, boolean_true_node > is already gimple. Will test. > > > + gsi_next (&gsi); > > + if (!gsi_end_p (gsi)) > > + { > > + g = gsi_stmt (gsi); > > + if (gimple_code (g) == GIMPLE_COND > > + && gimple_cond_lhs (g) == lhs) > > + { > > + gimple_cond_set_lhs (g, boolean_true_node); > > or simply replace all immediate uses of 'lhs' by boolean_true_node > and remove the loop_vectorized call? That would work, though e.g. removing of the call is something any DCE probably handles well and vectorizer relies on tons of DCE anyway. > See above. And factor this out into a function. Also move this > to the cleanup loop below. Ok. Jakub