On Fri, 22 Nov 2013, Jakub Jelinek wrote: > On Fri, Nov 22, 2013 at 11:08:41AM +0100, Richard Biener wrote: > > > @@ -284,6 +382,12 @@ public: > > > /* Declaration node used to be clone of. */ > > > tree former_clone_of; > > > > > > + /* If this is a SIMD clone, this points to the SIMD specific > > > + information for it. */ > > > + struct cgraph_simd_clone *simdclone; > > > + /* If this function has SIMD clones, this points to the first clone. > > > */ > > > + struct cgraph_node *simd_clones; > > > + > > > > I wonder how you run all of this through LTO (I'll see below I guess ;)) > > It doesn't work, as in, all the added testcases work just fine without -flto > and all of them ICE with -flto, but there are multiple known issues with LTO > before that (internal fns, etc.). More below. > > > The expr.c hunk is also ok independently of the patch. > > Ok, thanks (though without the rest of the patch probably nothing emits it). > > > > @@ -3758,6 +3772,124 @@ ipa_modify_call_arguments (struct cgraph > > > free_dominance_info (CDI_DOMINATORS); > > > } > > > > You've run the above through Martin IIRC, but ... > > Aldy did. > > > > +/* If the expression *EXPR should be replaced by a reduction of a > > > parameter, do > > > + so. ADJUSTMENTS is a pointer to a vector of adjustments. CONVERT > > > + specifies whether the function should care about type incompatibility > > > the > > > + current and new expressions. If it is false, the function will leave > > > + incompatibility issues to the caller. Return true iff the expression > > > + was modified. */ > > > + > > > +bool > > > +ipa_modify_expr (tree *expr, bool convert, > > > + ipa_parm_adjustment_vec adjustments) > > > +{ > > > + struct ipa_parm_adjustment *cand > > > + = ipa_get_adjustment_candidate (&expr, &convert, adjustments, false); > > > + if (!cand) > > > + return false; > > > + > > > + tree src; > > > + if (cand->by_ref) > > > + src = build_simple_mem_ref (cand->new_decl); > > > > is this function mostly copied from elsewhere? Because > > using build_simple_mem_ref always smells like possible TBAA problems. > > Perhaps, but this is just code reorg, the same > > - if (cand->by_ref) > - src = build_simple_mem_ref (cand->reduction); > - else > - src = cand->reduction; > > used to sit in sra_ipa_modify_expr before. > > > > > > + else > > > + src = cand->new_decl; > > > + > > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > > + { > > > + fprintf (dump_file, "About to replace expr "); > > > + print_generic_expr (dump_file, *expr, 0); > > > + fprintf (dump_file, " with "); > > > + print_generic_expr (dump_file, src, 0); > > > + fprintf (dump_file, "\n"); > > > + } > > > + > > > + if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), > > > cand->type)) > > > + { > > > + tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src); > > > + *expr = vce; > > > > Why build1 and not fold it? I assume from above you either have a plain > > decl (cand->new_decl) or a MEM_REF. For both cases simply folding > > the VCE into a MEM_REF works. > > Again, preexisting code from sra_ipa_modify_expr. Can it be changed > incrementally/independently of this? > > > > + } > > > + else > > > + *expr = src; > > > + return true; > > > +} > > > + > > > +/* If T is an SSA_NAME, return NULL if it is not a default def or > > > + return its base variable if it is. If IGNORE_DEFAULT_DEF is true, > > > + the base variable is always returned, regardless if it is a default > > > + def. Return T if it is not an SSA_NAME. */ > > > + > > > +static tree > > > +get_ssa_base_param (tree t, bool ignore_default_def) > > > +{ > > > + if (TREE_CODE (t) == SSA_NAME) > > > + { > > > + if (ignore_default_def || SSA_NAME_IS_DEFAULT_DEF (t)) > > > + return SSA_NAME_VAR (t); > > > + else > > > + return NULL_TREE; > > > + } > > > + return t; > > > +} > > > > This function will return non-NULL for non-PARMs - is that intended? > > Again, seems to be preexisting code from tree-sra.c. Aldy/Martin? > > > > + /* Ignore > > > + #pragma omp declare simd > > > + extern int foo (); > > > + in C, there we don't know the argument types at all. */ > > > + if (!node->definition > > > + && TYPE_ARG_TYPES (TREE_TYPE (node->decl)) == NULL_TREE) > > > + return; > > > > I wonder if you want to diagnose this case (but where? best during > > parsing if that is allowed). > > It isn't invalid per the standard, though of course if you have > #pragma omp declare simd > int foo (); > you can't supply any clauses that refer to parameters (thus, all are assumed > to be vector arguments. If the function is defined locally and supplies > arguments there, it will have DECL_ARGUMENTS and can be handled easily, > otherwise I just chose to punt, it is too hard for too little gain. > Perhaps could warn with -Wopenmp-simd about it. I mean to guard also > the other warnings about inability to emit simd clones with -Wopenmp-simd. > > > > + if (count == 0) > > > + continue; > > > + > > > + for (int i = 0; i < count * 2; i++) > > > > Here (and also elsewhere) the patch could do with a few extra > > comments what is happening. > > Ok. > > > > --- gcc/passes.def (.../trunk) (revision 205223) > > > +++ gcc/passes.def (.../branches/gomp-4_0-branch) (revision > > > 205231) > > > @@ -97,6 +97,7 @@ along with GCC; see the file COPYING3. > > > NEXT_PASS (pass_feedback_split_functions); > > > POP_INSERT_PASSES () > > > NEXT_PASS (pass_ipa_increase_alignment); > > > + NEXT_PASS (pass_omp_simd_clone); > > > NEXT_PASS (pass_ipa_tm); > > > NEXT_PASS (pass_ipa_lower_emutls); > > > TERMINATE_PASS_LIST () > > > > So clones are created before streaming LTO. You do have vect.exp > > testcases that are also run through -flto but does it actually > > "work" there? I remember seeing changes to cgraph unreachable > > node removal based on some flag that isn't streamed, no? > > Aldy has done the pass placement, I wonder also whether it wouldn't be > best to put the OpenMP cloning as the very last IPA pass where all the other > cloning etc. is already done. > Right now we want to punt on IPA-CP/IPA-SRA etc. cloning of > #pragma omp declare simd functions, because if the simd clones are created > first, then cloning the origins and adjusting calls to them would lead to > the simd clones not actually being used, and if simd clones are created > late, on the other side the code isn't able to adjust "omp declare simd" > attribute (hopefully it could be taught at least e.g. about removing > arguments, either because they are unused or because they can be assumed > to be constant, we perhaps could punt only if IPA cloning wants to replace > an argument with something else).
If you don't need gimple bodies then doing a real IPA pass is possible but I don't see any advantages as all clones will not yet be referenced so they are not interesting to any other IPA pass or partitioning. Doing a late simple IPA pass (the "IPA" passes that LTRANS executes) would be the easiest IMHO and should side-step all LTO issues nicely. > > > + tree fndecl = gimple_call_fndecl (stmt), op; > > > + if (fndecl != NULL_TREE) > > > + { > > > + struct cgraph_node *node = cgraph_get_node (fndecl); > > > + if (node != NULL && node->simd_clones != NULL) > > > > So you use node->simd_clones which also need LTO streaming. > > > > What's the reason you cannot defer SIMD cloning to LTRANS stage > > as simple IPA pass next to IPA-PTA? > > Yeah, see above. > > > > > + { > > > + unsigned int j, n = gimple_call_num_args (stmt); > > > + for (j = 0; j < n; j++) > > > + { > > > + op = gimple_call_arg (stmt, j); > > > + if (DECL_P (op) > > > + || (REFERENCE_CLASS_P (op) > > > + && get_base_address (op))) > > > + break; > > > + } > > > + op = gimple_call_lhs (stmt); > > > + /* Ignore #pragma omp declare simd functions > > > + if they don't have data references in the > > > + call stmt itself. */ > > > + if (j == n > > > + && !(op > > > + && (DECL_P (op) > > > + || (REFERENCE_CLASS_P (op) > > > + && get_base_address (op))))) > > > + continue; > > > > Hmm. I guess I have an idea now how to "better" support calls in > > data-ref/dependence analysis. The above is fine for now - you > > might want to dump sth here if you fail because datarefs in a declare > > simd fn call. > > Okay. > > > > + if (is_gimple_call (stmt)) > > > + { > > > + /* Ignore calls with no lhs. These must be calls to > > > + #pragma omp simd functions, and what vectorization factor > > > + it really needs can't be determined until > > > + vectorizable_simd_clone_call. */ > > > > Ick - that's bad. Well, or rather it doesn't participate in > > vectorization factor determining then, resulting in missed > > vectorizations eventually. You basically say "any vect factor is ok" > > here? > > Right. The thing is, if there is no lhs, I really don't know how it will > participate in the vectorization factor decision, and won't know it until > the vectorizable_simd_clone_call call, because whether a particular > clone is usable depends on which of the arguments are uniform, linear (with > what linear step) and tons of other things. > Perhaps if there is just one simd clone or all simd clones have some > non-empty set of arguments all without uniform/linear clauses, then we could > pick the smallest of those surely vector args as the one for determining > vectorization factor. If those arguments have internal def, then the type > will be used already somewhere else in the loop to determine vf, so it is > only about parameters that are passed constant/external def values, but are > required to be in vector parameters. But I believe > vectorizable_simd_clone_call can handle those just fine, say if you have > all types in the loop long and thus vf decisions are only for long, > so for AVX2 say vf = 4, then if you have > #pragma omp declare simd uniform (a) aligned (a : 32) linear (b) > void foo (long *a, long b, int c); > and pass constant 23 to it, then if there is a simdlen(4) clone (will be > on i?86/x86_64), then the last argument is passed in V4SImode parameter > and the code should handle it fine. Similarly if all types are int > and there is a vector long argument passed a constant (or external def), > it will be passed in two parameters, each one containing half, and the > function should handle that too. > > > > > + if (STMT_VINFO_VECTYPE (stmt_info) == NULL_TREE) > > > + { > > > + unsigned int j, n = gimple_call_num_args (stmt); > > > + for (j = 0; j < n; j++) > > > + { > > > + scalar_type = TREE_TYPE (gimple_call_arg (stmt, j)); > > > + vectype = get_vectype_for_scalar_type (scalar_type); > > > + if (vectype) > > > + { > > > + STMT_VINFO_VECTYPE (stmt_info) = vectype; > > > + break; > > > + } > > > + } > > > + } > > > + if (STMT_VINFO_VECTYPE (stmt_info) != NULL_TREE) > > > + { > > > + if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si)) > > > + { > > > + pattern_def_seq = NULL; > > > + gsi_next (&si); > > > + } > > > + continue; > > > + } > > > > Both cases above need comments - why do you chose the first param > > for determining STMT_VINFO_VECTYPE? Isn't STMT_VINFO_VECTYPE > > completely irrelevant for calls w/o LHS? Answer: yes it is! > > It is completely irrelevant, yes. > > > I'd have expected an unconditional continue here (and leave > > STMT_VINFO_VECTYPE == NULL - fact is that the vector type of > > the argument is determined by its definition and thus may > > be different from what you record here anyway). > > Unfortunately it doesn't work (tried that). The way all the > vectorizable_* functions are called in sequence, most of them > actually look at STMT_VINFO_VECTYPE before bailing out because > they are for stmts that aren't simd clone calls and thus ICE/segfault. > It was much easier to pass some non-NULL value than to change all of them. Move vectorizable_simd_function first ;) Or assign a random type (but remove the odd code looking at some random parameters...) > > > + if (stmt_can_throw_internal (stmt)) > > > + return false; > > > > Can't happen (loop form checks). > > But vectorizable_call has the same call. So shall both be removed? Yeah, should probably be moved to a generic place for safety. > > > + vectype = STMT_VINFO_VECTYPE (stmt_info); > > > > See above - questionable if this doesn't result from looking at > > the LHS. > > This particular function just loads it into a variable and uses > only if it has lhs. yeah, seen that later > > > + if (thisarginfo.vectype != NULL_TREE > > > + && loop_vinfo > > > + && TREE_CODE (op) == SSA_NAME > > > + && simple_iv (loop, loop_containing_stmt (stmt), op, &iv, false) > > > + && tree_fits_shwi_p (iv.step)) > > > + { > > > + thisarginfo.linear_step = tree_to_shwi (iv.step); > > > > Hmm, you should check thisarginfo.dt instead (I assume this case > > is for induction/reduction defs)? In this case you also should > > use STMT_VINFO_LOOP_PHI_EVOLUTION_PART and not re-analyze via simple_iv. > > I can try that. > > > > > + thisarginfo.op = iv.base; > > > + } > > > + else if (thisarginfo.vectype == NULL_TREE > > > + && POINTER_TYPE_P (TREE_TYPE (op))) > > > + thisarginfo.align = get_pointer_alignment (op) / BITS_PER_UNIT; > > > > So this is for dt_external defs? > > I guess even both vect_constant_def and vect_external_def, simply something > that is uniform. > > > Please switch on thisarginfo.dt here - that more naturally explains > > what you are doing (otherwise this definitely misses a comment). > > > > + this_badness += target_badness * 512; > > > + /* FORNOW: Have to add code to add the mask argument. */ > > > + if (n->simdclone->inbranch) > > > + continue; > > > > We don't support if-converting calls anyway, no? > > Not yet. Supporting them I guess depends on the > http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01268.html > http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01437.html > http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01550.html > series. With that infrastructure, I think we could e.g. represent > the conditional calls as MASK_CALL internal call that would have > a mask argument (like MASK_LOAD/STORE), then ADDR_EXPR of the > function decl that has simd clones, plus the original arguments, > or something similar, then we'd just extract the function decl > from it in this function and just vectorize the mask argument > too and pass it through as the last argument (or set of arguments) > to the inbranch simd clone. > > > > + for (i = 0; i < nargs; i++) > > > + { > > > + switch (n->simdclone->args[i].arg_type) > > > + { > > > + case SIMD_CLONE_ARG_TYPE_VECTOR: > > > + if (!useless_type_conversion_p > > > + (n->simdclone->args[i].orig_type, > > > + TREE_TYPE (gimple_call_arg (stmt, i)))) > > > + i = -1; > > > > But you don't verify the vectype against the clone vectype? > > The code can handle vector narrowing or widening, splitting > into multiple arguments etc. If the clone exist, we know the > corresponding vector type exists, so does the arginfo[i].vectype > that the vectorizer gives us the argument in. > The above only handles the case where arguments are promoted > from the types in TYPE_ARG_TYPES of the call/DECL_ARGUMENTS > to something wider in the GIMPLE_CALL (happens for short/char > arguments apparently). The above code just punts on it, I don't > want to have in that function yet another full copy of narrowing/widening > conversions. The plan was (so far unimplemented) to handle this > in tree-vect-patterns.c, if we have say char argument and pass an > int to it, if the argument is constant, we'd just fold_convert it > to the right type, if there is widening right before it, we'd use > the unwidened SSA_NAME instead, otherwise narrow. Then vf > determination etc. would handle it right. Does that look reasonable to you? The above tests scalar types, not arginfo[].vectype. I'm concerned about mismatches there (and miss such check). There are surely cases where (with multiple arguments) you cannot create a match. We can of course add checking if we discover a testcase ;) > > > + else if (arginfo[i].vectype == NULL_TREE > > > > I'd like to see checks based on the def type, not vectype. > > Ok. > > > > > + || arginfo[i].linear_step) > > > + this_badness += 64; > > > + break; > > > + case SIMD_CLONE_ARG_TYPE_UNIFORM: > > > + if (arginfo[i].vectype != NULL_TREE) > > > > Likewise (and below, too). > > > > + if (!vec_stmt) /* transformation not required. */ > > > + { > > > + STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type; > > > + if (dump_enabled_p ()) > > > + dump_printf_loc (MSG_NOTE, vect_location, > > > + "=== vectorizable_simd_clone_call ===\n"); > > > +/* vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL); */ > > > + arginfo.release (); > > > > Please save the result from the analysis (selecting the simd clone) > > in the stmt_vinfo and skip the analysis during transform phase. > > Just stick there the selected cgraph_node? Works for me. > As for the cost computation commented out above, it is hard to predict it > right, probably we should at least add the cost of the scalar call, so > the vectorizable function isn't considered cheaper. But more than that? No idea - this is the wrong function to do a cost model (other than selecting between different applicable simd clones). > > > + vec_oprnd0 > > > + = build3 (BIT_FIELD_REF, atype, vec_oprnd0, > > > + build_int_cst (integer_type_node, prec), > > > + build_int_cst (integer_type_node, > > > + (m & (k - 1)) * prec)); > > > > Some helpers to build the tree to select a sub-vector would be nice > > (I remember seeing this kind of pattern elsewhere). > > Ok, I'll try something. > > > > + new_stmt > > > + = gimple_build_assign_with_ops (TREE_CODE (t), > > > + make_ssa_name (vectype, > > > + NULL), > > > + t, NULL_TREE); > > > > For SINGLE_RHS assigns I prefer gimple_build_assign. > > Okay. > > > > + > > > + /* Update the exception handling table with the vector stmt if > > > necessary. */ > > > + if (maybe_clean_or_replace_eh_stmt (stmt, *vec_stmt)) > > > + gimple_purge_dead_eh_edges (gimple_bb (stmt)); > > > > But you've early-outed on throwing stmts? Generally this shouldn't > > happen. > > This is again a copy from vectorizable_call. So, do you think it can > be dropped there too? Yes. > > Overall it looks good - it would be nice to split out and commit > > separately the IPA cloning infrastructure re-org (and the expr.c hunk). > > > > The LTO issue needs to be addressed - the simplest thing to me looks > > to defer cloning to LTRANS stage. > > Yeah, but the start should be to handle the internal calls that are used > everywhere now by #pragma omp simd too, and ubsan etc. Correct - there is a bugreport about it. The solution is to completely ignore them when building the cgraph (and fix the fallout - heh). I can give it a try again. Richard.