Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> On Tue, 27 Aug 2019 at 21:14, Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> Richard should have the final say, but some comments...
>>
>> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
>> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> > index 1e2dfe5d22d..862206b3256 100644
>> > --- a/gcc/tree-vect-stmts.c
>> > +++ b/gcc/tree-vect-stmts.c
>> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info 
>> > loop_vinfo, tree vectype,
>> >
>> >  static tree
>> >  prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask,
>> > -                      gimple_stmt_iterator *gsi)
>> > +                      gimple_stmt_iterator *gsi, tree mask,
>> > +                      cond_vmask_map_type *cond_to_vec_mask)
>>
>> "scalar_mask" might be a better name.  But maybe we should key off the
>> vector mask after all, now that we're relying on the code having no
>> redundancies.
>>
>> Passing the vinfo would be better than passing the cond_vmask_map_type
>> directly.
>>
>> >  {
>> >    gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE 
>> > (vec_mask)));
>> >    if (!loop_mask)
>> >      return vec_mask;
>> >
>> >    gcc_assert (TREE_TYPE (loop_mask) == mask_type);
>> > +
>> > +  tree *slot = 0;
>> > +  if (cond_to_vec_mask)
>>
>> The pointer should never be null in this context.
> Disabling check for NULL results in segfault with cond_arith_4.c because we
> reach prepare_load_store_mask via vect_schedule_slp, called from
> here in vect_transform_loop:
>  /* Schedule the SLP instances first, then handle loop vectorization
>      below.  */
>   if (!loop_vinfo->slp_instances.is_empty ())
>     {
>       DUMP_VECT_SCOPE ("scheduling SLP instances");
>       vect_schedule_slp (loop_vinfo);
>     }
>
> which is before bb processing loop.

We want this optimisation to be applied to SLP too though.  Especially
since non-SLP will be going away at some point.

But as Richard says, the problem with SLP is that the statements aren't
traversed in block order, so I guess we can't do the on-the-fly
redundancy elimination there...

Maybe an alternative would be to record during the analysis phase which
scalar conditions need which loop masks.  Statements that need a loop
mask currently do:

      vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype);

If we also pass the scalar condition, we can maintain a hash_set of
<condition, ncopies> pairs, representing the conditions that have
loop masks applied at some point in the vectorised code.  The COND_EXPR
code can use that set to decide whether to apply the loop mask or not.

Trying to avoid duplicate ANDs with the loop mask would then become a
separate follow-on change.  Not sure whether it's worth it on its own.

Thanks,
Richard

Reply via email to