Prathamesh Kulkarni <[email protected]> writes:
> On Tue, 27 Aug 2019 at 21:14, Richard Sandiford
> <[email protected]> wrote:
>>
>> Richard should have the final say, but some comments...
>>
>> Prathamesh Kulkarni <[email protected]> writes:
>> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> > index 1e2dfe5d22d..862206b3256 100644
>> > --- a/gcc/tree-vect-stmts.c
>> > +++ b/gcc/tree-vect-stmts.c
>> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info
>> > loop_vinfo, tree vectype,
>> >
>> > static tree
>> > prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask,
>> > - gimple_stmt_iterator *gsi)
>> > + gimple_stmt_iterator *gsi, tree mask,
>> > + cond_vmask_map_type *cond_to_vec_mask)
>>
>> "scalar_mask" might be a better name. But maybe we should key off the
>> vector mask after all, now that we're relying on the code having no
>> redundancies.
>>
>> Passing the vinfo would be better than passing the cond_vmask_map_type
>> directly.
>>
>> > {
>> > gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE
>> > (vec_mask)));
>> > if (!loop_mask)
>> > return vec_mask;
>> >
>> > gcc_assert (TREE_TYPE (loop_mask) == mask_type);
>> > +
>> > + tree *slot = 0;
>> > + if (cond_to_vec_mask)
>>
>> The pointer should never be null in this context.
> Disabling check for NULL results in segfault with cond_arith_4.c because we
> reach prepare_load_store_mask via vect_schedule_slp, called from
> here in vect_transform_loop:
> /* Schedule the SLP instances first, then handle loop vectorization
> below. */
> if (!loop_vinfo->slp_instances.is_empty ())
> {
> DUMP_VECT_SCOPE ("scheduling SLP instances");
> vect_schedule_slp (loop_vinfo);
> }
>
> which is before bb processing loop.
We want this optimisation to be applied to SLP too though. Especially
since non-SLP will be going away at some point.
But as Richard says, the problem with SLP is that the statements aren't
traversed in block order, so I guess we can't do the on-the-fly
redundancy elimination there...
Maybe an alternative would be to record during the analysis phase which
scalar conditions need which loop masks. Statements that need a loop
mask currently do:
vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype);
If we also pass the scalar condition, we can maintain a hash_set of
<condition, ncopies> pairs, representing the conditions that have
loop masks applied at some point in the vectorised code. The COND_EXPR
code can use that set to decide whether to apply the loop mask or not.
Trying to avoid duplicate ANDs with the loop mask would then become a
separate follow-on change. Not sure whether it's worth it on its own.
Thanks,
Richard