On Wed, Aug 28, 2019 at 11:02 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> > On Tue, 27 Aug 2019 at 21:14, Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> >>
> >> Richard should have the final say, but some comments...
> >>
> >> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> >> > index 1e2dfe5d22d..862206b3256 100644
> >> > --- a/gcc/tree-vect-stmts.c
> >> > +++ b/gcc/tree-vect-stmts.c
> >> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info 
> >> > loop_vinfo, tree vectype,
> >> >
> >> >  static tree
> >> >  prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask,
> >> > -                      gimple_stmt_iterator *gsi)
> >> > +                      gimple_stmt_iterator *gsi, tree mask,
> >> > +                      cond_vmask_map_type *cond_to_vec_mask)
> >>
> >> "scalar_mask" might be a better name.  But maybe we should key off the
> >> vector mask after all, now that we're relying on the code having no
> >> redundancies.
> >>
> >> Passing the vinfo would be better than passing the cond_vmask_map_type
> >> directly.
> >>
> >> >  {
> >> >    gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE 
> >> > (vec_mask)));
> >> >    if (!loop_mask)
> >> >      return vec_mask;
> >> >
> >> >    gcc_assert (TREE_TYPE (loop_mask) == mask_type);
> >> > +
> >> > +  tree *slot = 0;
> >> > +  if (cond_to_vec_mask)
> >>
> >> The pointer should never be null in this context.
> > Disabling check for NULL results in segfault with cond_arith_4.c because we
> > reach prepare_load_store_mask via vect_schedule_slp, called from
> > here in vect_transform_loop:
> >  /* Schedule the SLP instances first, then handle loop vectorization
> >      below.  */
> >   if (!loop_vinfo->slp_instances.is_empty ())
> >     {
> >       DUMP_VECT_SCOPE ("scheduling SLP instances");
> >       vect_schedule_slp (loop_vinfo);
> >     }
> >
> > which is before bb processing loop.
>
> We want this optimisation to be applied to SLP too though.  Especially
> since non-SLP will be going away at some point.
>
> But as Richard says, the problem with SLP is that the statements aren't
> traversed in block order, so I guess we can't do the on-the-fly
> redundancy elimination there...

And the current patch AFAICS can generate wrong SSA for this reason.

> Maybe an alternative would be to record during the analysis phase which
> scalar conditions need which loop masks.  Statements that need a loop
> mask currently do:
>
>       vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype);
>
> If we also pass the scalar condition, we can maintain a hash_set of
> <condition, ncopies> pairs, representing the conditions that have
> loop masks applied at some point in the vectorised code.  The COND_EXPR
> code can use that set to decide whether to apply the loop mask or not.

Yeah, that sounds better.

Note that I don't like the extra "helpers" in fold-const.c/h, they do not look
useful in general so put them into vectorizer private code.  The decomposing
also doesn't look too nice, instead prepare_load_store_mask could get
such decomposed representation - possibly quite natural with the suggestion
from Richard above.

Richard.

> Trying to avoid duplicate ANDs with the loop mask would then become a
> separate follow-on change.  Not sure whether it's worth it on its own.
>
> Thanks,
> Richard

Reply via email to