On Fri, 22 Nov 2013, Jakub Jelinek wrote:

> On Fri, Nov 22, 2013 at 11:08:41AM +0100, Richard Biener wrote:
> > > @@ -284,6 +382,12 @@ public:
> > >    /* Declaration node used to be clone of. */
> > >    tree former_clone_of;
> > >  
> > > +  /* If this is a SIMD clone, this points to the SIMD specific
> > > +     information for it.  */
> > > +  struct cgraph_simd_clone *simdclone;
> > > +  /* If this function has SIMD clones, this points to the first clone.  
> > > */
> > > +  struct cgraph_node *simd_clones;
> > > +
> > 
> > I wonder how you run all of this through LTO (I'll see below I guess ;))
> 
> It doesn't work, as in, all the added testcases work just fine without -flto
> and all of them ICE with -flto, but there are multiple known issues with LTO
> before that (internal fns, etc.).  More below.
> 
> > The expr.c hunk is also ok independently of the patch.
> 
> Ok, thanks (though without the rest of the patch probably nothing emits it).
> 
> > > @@ -3758,6 +3772,124 @@ ipa_modify_call_arguments (struct cgraph
> > >    free_dominance_info (CDI_DOMINATORS);
> > >  }
> > 
> > You've run the above through Martin IIRC, but ...
> 
> Aldy did.
> 
> > > +/* If the expression *EXPR should be replaced by a reduction of a 
> > > parameter, do
> > > +   so.  ADJUSTMENTS is a pointer to a vector of adjustments.  CONVERT
> > > +   specifies whether the function should care about type incompatibility 
> > > the
> > > +   current and new expressions.  If it is false, the function will leave
> > > +   incompatibility issues to the caller.  Return true iff the expression
> > > +   was modified. */
> > > +
> > > +bool
> > > +ipa_modify_expr (tree *expr, bool convert,
> > > +          ipa_parm_adjustment_vec adjustments)
> > > +{
> > > +  struct ipa_parm_adjustment *cand
> > > +    = ipa_get_adjustment_candidate (&expr, &convert, adjustments, false);
> > > +  if (!cand)
> > > +    return false;
> > > +
> > > +  tree src;
> > > +  if (cand->by_ref)
> > > +    src = build_simple_mem_ref (cand->new_decl);
> > 
> > is this function mostly copied from elsewhere?  Because
> > using build_simple_mem_ref always smells like possible TBAA problems.
> 
> Perhaps, but this is just code reorg, the same
> 
> -  if (cand->by_ref)
> -    src = build_simple_mem_ref (cand->reduction);
> -  else
> -    src = cand->reduction;
> 
> used to sit in sra_ipa_modify_expr before.
> 
> > 
> > > +  else
> > > +    src = cand->new_decl;
> > > +
> > > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > > +    {
> > > +      fprintf (dump_file, "About to replace expr ");
> > > +      print_generic_expr (dump_file, *expr, 0);
> > > +      fprintf (dump_file, " with ");
> > > +      print_generic_expr (dump_file, src, 0);
> > > +      fprintf (dump_file, "\n");
> > > +    }
> > > +
> > > +  if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), 
> > > cand->type))
> > > +    {
> > > +      tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src);
> > > +      *expr = vce;
> > 
> > Why build1 and not fold it?  I assume from above you either have a plain 
> > decl (cand->new_decl) or a MEM_REF.  For both cases simply folding
> > the VCE into a MEM_REF works.
> 
> Again, preexisting code from sra_ipa_modify_expr.  Can it be changed
> incrementally/independently of this?
> 
> > > +    }
> > > +  else
> > > +    *expr = src;
> > > +  return true;
> > > +}
> > > +
> > > +/* If T is an SSA_NAME, return NULL if it is not a default def or
> > > +   return its base variable if it is.  If IGNORE_DEFAULT_DEF is true,
> > > +   the base variable is always returned, regardless if it is a default
> > > +   def.  Return T if it is not an SSA_NAME.  */
> > > +
> > > +static tree
> > > +get_ssa_base_param (tree t, bool ignore_default_def)
> > > +{
> > > +  if (TREE_CODE (t) == SSA_NAME)
> > > +    {
> > > +      if (ignore_default_def || SSA_NAME_IS_DEFAULT_DEF (t))
> > > + return SSA_NAME_VAR (t);
> > > +      else
> > > + return NULL_TREE;
> > > +    }
> > > +  return t;
> > > +}
> > 
> > This function will return non-NULL for non-PARMs - is that intended?
> 
> Again, seems to be preexisting code from tree-sra.c.  Aldy/Martin?
> 
> > > +  /* Ignore
> > > +     #pragma omp declare simd
> > > +     extern int foo ();
> > > +     in C, there we don't know the argument types at all.  */
> > > +  if (!node->definition
> > > +      && TYPE_ARG_TYPES (TREE_TYPE (node->decl)) == NULL_TREE)
> > > +    return;
> > 
> > I wonder if you want to diagnose this case (but where?  best during
> > parsing if that is allowed).
> 
> It isn't invalid per the standard, though of course if you have
> #pragma omp declare simd
> int foo ();
> you can't supply any clauses that refer to parameters (thus, all are assumed
> to be vector arguments.  If the function is defined locally and supplies
> arguments there, it will have DECL_ARGUMENTS and can be handled easily,
> otherwise I just chose to punt, it is too hard for too little gain.
> Perhaps could warn with -Wopenmp-simd about it.  I mean to guard also
> the other warnings about inability to emit simd clones with -Wopenmp-simd.
> 
> > > +      if (count == 0)
> > > + continue;
> > > +
> > > +      for (int i = 0; i < count * 2; i++)
> > 
> > Here (and also elsewhere) the patch could do with a few extra
> > comments what is happening.
> 
> Ok.
> 
> > > --- gcc/passes.def        (.../trunk)     (revision 205223)
> > > +++ gcc/passes.def        (.../branches/gomp-4_0-branch)  (revision 
> > > 205231)
> > > @@ -97,6 +97,7 @@ along with GCC; see the file COPYING3.
> > >        NEXT_PASS (pass_feedback_split_functions);
> > >    POP_INSERT_PASSES ()
> > >    NEXT_PASS (pass_ipa_increase_alignment);
> > > +  NEXT_PASS (pass_omp_simd_clone);
> > >    NEXT_PASS (pass_ipa_tm);
> > >    NEXT_PASS (pass_ipa_lower_emutls);
> > >    TERMINATE_PASS_LIST ()
> > 
> > So clones are created before streaming LTO.  You do have vect.exp
> > testcases that are also run through -flto but does it actually
> > "work" there?  I remember seeing changes to cgraph unreachable
> > node removal based on some flag that isn't streamed, no?
> 
> Aldy has done the pass placement, I wonder also whether it wouldn't be
> best to put the OpenMP cloning as the very last IPA pass where all the other
> cloning etc. is already done.
> Right now we want to punt on IPA-CP/IPA-SRA etc. cloning of
> #pragma omp declare simd functions, because if the simd clones are created
> first, then cloning the origins and adjusting calls to them would lead to
> the simd clones not actually being used, and if simd clones are created
> late, on the other side the code isn't able to adjust "omp declare simd"
> attribute (hopefully it could be taught at least e.g. about removing
> arguments, either because they are unused or because they can be assumed
> to be constant, we perhaps could punt only if IPA cloning wants to replace
> an argument with something else).

If you don't need gimple bodies then doing a real IPA pass is possible
but I don't see any advantages as all clones will not yet be referenced
so they are not interesting to any other IPA pass or partitioning.

Doing a late simple IPA pass (the "IPA" passes that LTRANS executes)
would be the easiest IMHO and should side-step all LTO issues nicely.

> > > +               tree fndecl = gimple_call_fndecl (stmt), op;
> > > +               if (fndecl != NULL_TREE)
> > > +                 {
> > > +                   struct cgraph_node *node = cgraph_get_node (fndecl);
> > > +                   if (node != NULL && node->simd_clones != NULL)
> > 
> > So you use node->simd_clones which also need LTO streaming.
> > 
> > What's the reason you cannot defer SIMD cloning to LTRANS stage
> > as simple IPA pass next to IPA-PTA?
> 
> Yeah, see above.
> > 
> > > +                     {
> > > +                       unsigned int j, n = gimple_call_num_args (stmt);
> > > +                       for (j = 0; j < n; j++)
> > > +                         {
> > > +                           op = gimple_call_arg (stmt, j);
> > > +                           if (DECL_P (op)
> > > +                               || (REFERENCE_CLASS_P (op)
> > > +                                   && get_base_address (op)))
> > > +                             break;
> > > +                         }
> > > +                       op = gimple_call_lhs (stmt);
> > > +                       /* Ignore #pragma omp declare simd functions
> > > +                          if they don't have data references in the
> > > +                          call stmt itself.  */
> > > +                       if (j == n
> > > +                           && !(op
> > > +                                && (DECL_P (op)
> > > +                                    || (REFERENCE_CLASS_P (op)
> > > +                                        && get_base_address (op)))))
> > > +                         continue;
> > 
> > Hmm.  I guess I have an idea now how to "better" support calls in
> > data-ref/dependence analysis.  The above is fine for now - you
> > might want to dump sth here if you fail because datarefs in a declare
> > simd fn call.
> 
> Okay.
> 
> > > +       if (is_gimple_call (stmt))
> > > +         {
> > > +           /* Ignore calls with no lhs.  These must be calls to
> > > +              #pragma omp simd functions, and what vectorization factor
> > > +              it really needs can't be determined until
> > > +              vectorizable_simd_clone_call.  */
> > 
> > Ick - that's bad.  Well, or rather it doesn't participate in
> > vectorization factor determining then, resulting in missed
> > vectorizations eventually.  You basically say "any vect factor is ok"
> > here?
> 
> Right.  The thing is, if there is no lhs, I really don't know how it will
> participate in the vectorization factor decision, and won't know it until
> the vectorizable_simd_clone_call call, because whether a particular
> clone is usable depends on which of the arguments are uniform, linear (with
> what linear step) and tons of other things.
> Perhaps if there is just one simd clone or all simd clones have some
> non-empty set of arguments all without uniform/linear clauses, then we could
> pick the smallest of those surely vector args as the one for determining
> vectorization factor.  If those arguments have internal def, then the type
> will be used already somewhere else in the loop to determine vf, so it is
> only about parameters that are passed constant/external def values, but are
> required to be in vector parameters.  But I believe
> vectorizable_simd_clone_call can handle those just fine, say if you have
> all types in the loop long and thus vf decisions are only for long,
> so for AVX2 say vf = 4, then if you have
> #pragma omp declare simd uniform (a) aligned (a : 32) linear (b)
> void foo (long *a, long b, int c);
> and pass constant 23 to it, then if there is a simdlen(4) clone (will be
> on i?86/x86_64), then the last argument is passed in V4SImode parameter
> and the code should handle it fine.  Similarly if all types are int
> and there is a vector long argument passed a constant (or external def),
> it will be passed in two parameters, each one containing half, and the
> function should handle that too.
> > 
> > > +           if (STMT_VINFO_VECTYPE (stmt_info) == NULL_TREE)
> > > +             {
> > > +               unsigned int j, n = gimple_call_num_args (stmt);
> > > +               for (j = 0; j < n; j++)
> > > +                 {
> > > +                   scalar_type = TREE_TYPE (gimple_call_arg (stmt, j));
> > > +                   vectype = get_vectype_for_scalar_type (scalar_type);
> > > +                   if (vectype)
> > > +                     {
> > > +                       STMT_VINFO_VECTYPE (stmt_info) = vectype;
> > > +                       break;
> > > +                     }
> > > +                 }
> > > +             }
> > > +           if (STMT_VINFO_VECTYPE (stmt_info) != NULL_TREE)
> > > +             {
> > > +               if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
> > > +                 {
> > > +                   pattern_def_seq = NULL;
> > > +                   gsi_next (&si);
> > > +                 }
> > > +               continue;
> > > +             }
> > 
> > Both cases above need comments - why do you chose the first param
> > for determining STMT_VINFO_VECTYPE?  Isn't STMT_VINFO_VECTYPE
> > completely irrelevant for calls w/o LHS?  Answer: yes it is!
> 
> It is completely irrelevant, yes.
> 
> > I'd have expected an unconditional continue here (and leave
> > STMT_VINFO_VECTYPE == NULL - fact is that the vector type of
> > the argument is determined by its definition and thus may
> > be different from what you record here anyway).
> 
> Unfortunately it doesn't work (tried that).  The way all the
> vectorizable_* functions are called in sequence, most of them
> actually look at STMT_VINFO_VECTYPE before bailing out because
> they are for stmts that aren't simd clone calls and thus ICE/segfault.
> It was much easier to pass some non-NULL value than to change all of them.

Move vectorizable_simd_function first ;)

Or assign a random type (but remove the odd code looking at some
random parameters...)

> > > +  if (stmt_can_throw_internal (stmt))
> > > +    return false;
> > 
> > Can't happen (loop form checks).
> 
> But vectorizable_call has the same call.  So shall both be removed?

Yeah, should probably be moved to a generic place for safety.

> > > +  vectype = STMT_VINFO_VECTYPE (stmt_info);
> > 
> > See above - questionable if this doesn't result from looking at
> > the LHS.
> 
> This particular function just loads it into a variable and uses
> only if it has lhs.

yeah, seen that later

> > > +      if (thisarginfo.vectype != NULL_TREE
> > > +   && loop_vinfo
> > > +   && TREE_CODE (op) == SSA_NAME
> > > +   && simple_iv (loop, loop_containing_stmt (stmt), op, &iv, false)
> > > +   && tree_fits_shwi_p (iv.step))
> > > + {
> > > +   thisarginfo.linear_step = tree_to_shwi (iv.step);
> > 
> > Hmm, you should check thisarginfo.dt instead (I assume this case
> > is for induction/reduction defs)?  In this case you also should
> > use STMT_VINFO_LOOP_PHI_EVOLUTION_PART and not re-analyze via simple_iv.
> 
> I can try that.
> > 
> > > +   thisarginfo.op = iv.base;
> > > + }
> > > +      else if (thisarginfo.vectype == NULL_TREE
> > > +        && POINTER_TYPE_P (TREE_TYPE (op)))
> > > + thisarginfo.align = get_pointer_alignment (op) / BITS_PER_UNIT;
> > 
> > So this is for dt_external defs?
> 
> I guess even both vect_constant_def and vect_external_def, simply something
> that is uniform.
> 
> > Please switch on thisarginfo.dt here - that more naturally explains
> > what you are doing (otherwise this definitely misses a comment).
> 
> > > +      this_badness += target_badness * 512;
> > > +      /* FORNOW: Have to add code to add the mask argument.  */
> > > +      if (n->simdclone->inbranch)
> > > + continue;
> > 
> > We don't support if-converting calls anyway, no?
> 
> Not yet.  Supporting them I guess depends on the
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01268.html
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01437.html
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01550.html
> series.  With that infrastructure, I think we could e.g. represent
> the conditional calls as MASK_CALL internal call that would have
> a mask argument (like MASK_LOAD/STORE), then ADDR_EXPR of the
> function decl that has simd clones, plus the original arguments,
> or something similar, then we'd just extract the function decl
> from it in this function and just vectorize the mask argument
> too and pass it through as the last argument (or set of arguments)
> to the inbranch simd clone.
> 
> > > +      for (i = 0; i < nargs; i++)
> > > + {
> > > +   switch (n->simdclone->args[i].arg_type)
> > > +     {
> > > +     case SIMD_CLONE_ARG_TYPE_VECTOR:
> > > +       if (!useless_type_conversion_p
> > > +              (n->simdclone->args[i].orig_type,
> > > +               TREE_TYPE (gimple_call_arg (stmt, i))))
> > > +         i = -1;
> > 
> > But you don't verify the vectype against the clone vectype?
> 
> The code can handle vector narrowing or widening, splitting
> into multiple arguments etc.  If the clone exist, we know the
> corresponding vector type exists, so does the arginfo[i].vectype
> that the vectorizer gives us the argument in.
> The above only handles the case where arguments are promoted
> from the types in TYPE_ARG_TYPES of the call/DECL_ARGUMENTS
> to something wider in the GIMPLE_CALL (happens for short/char
> arguments apparently).  The above code just punts on it, I don't
> want to have in that function yet another full copy of narrowing/widening
> conversions.  The plan was (so far unimplemented) to handle this
> in tree-vect-patterns.c, if we have say char argument and pass an
> int to it, if the argument is constant, we'd just fold_convert it
> to the right type, if there is widening right before it, we'd use
> the unwidened SSA_NAME instead, otherwise narrow.  Then vf
> determination etc. would handle it right.  Does that look reasonable to you?

The above tests scalar types, not arginfo[].vectype.  I'm concerned
about mismatches there (and miss such check).  There are surely
cases where (with multiple arguments) you cannot create a match.

We can of course add checking if we discover a testcase ;)

> > > +       else if (arginfo[i].vectype == NULL_TREE
> > 
> > I'd like to see checks based on the def type, not vectype.
> 
> Ok.
> > 
> > > +                || arginfo[i].linear_step)
> > > +         this_badness += 64;
> > > +       break;
> > > +     case SIMD_CLONE_ARG_TYPE_UNIFORM:
> > > +       if (arginfo[i].vectype != NULL_TREE)
> > 
> > Likewise (and below, too).
> 
> > > +  if (!vec_stmt) /* transformation not required.  */
> > > +    {
> > > +      STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type;
> > > +      if (dump_enabled_p ())
> > > + dump_printf_loc (MSG_NOTE, vect_location,
> > > +                  "=== vectorizable_simd_clone_call ===\n");
> > > +/*      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL); */
> > > +      arginfo.release ();
> > 
> > Please save the result from the analysis (selecting the simd clone)
> > in the stmt_vinfo and skip the analysis during transform phase.
> 
> Just stick there the selected cgraph_node?

Works for me.

> As for the cost computation commented out above, it is hard to predict it
> right, probably we should at least add the cost of the scalar call, so
> the vectorizable function isn't considered cheaper.  But more than that?

No idea - this is the wrong function to do a cost model (other than
selecting between different applicable simd clones).

> > > +               vec_oprnd0
> > > +                 = build3 (BIT_FIELD_REF, atype, vec_oprnd0,
> > > +                           build_int_cst (integer_type_node, prec),
> > > +                           build_int_cst (integer_type_node,
> > > +                                          (m & (k - 1)) * prec));
> > 
> > Some helpers to build the tree to select a sub-vector would be nice
> > (I remember seeing this kind of pattern elsewhere).
> 
> Ok, I'll try something.
> 
> > > +           new_stmt
> > > +             = gimple_build_assign_with_ops (TREE_CODE (t),
> > > +                                             make_ssa_name (vectype,
> > > +                                                            NULL),
> > > +                                             t, NULL_TREE);
> > 
> > For SINGLE_RHS assigns I prefer gimple_build_assign.
> 
> Okay.
> 
> > > +
> > > +  /* Update the exception handling table with the vector stmt if 
> > > necessary.  */
> > > +  if (maybe_clean_or_replace_eh_stmt (stmt, *vec_stmt))
> > > +    gimple_purge_dead_eh_edges (gimple_bb (stmt));
> > 
> > But you've early-outed on throwing stmts?  Generally this shouldn't 
> > happen.
> 
> This is again a copy from vectorizable_call.  So, do you think it can
> be dropped there too?

Yes.

> > Overall it looks good - it would be nice to split out and commit
> > separately the IPA cloning infrastructure re-org (and the expr.c hunk).
> > 
> > The LTO issue needs to be addressed - the simplest thing to me looks
> > to defer cloning to LTRANS stage.
> 
> Yeah, but the start should be to handle the internal calls that are used
> everywhere now by #pragma omp simd too, and ubsan etc.

Correct - there is a bugreport about it.  The solution is to completely
ignore them when building the cgraph (and fix the fallout - heh).
I can give it a try again.

Richard.

Reply via email to