Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > On Tue, 27 Aug 2019 at 21:14, Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Richard should have the final say, but some comments... >> >> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c >> > index 1e2dfe5d22d..862206b3256 100644 >> > --- a/gcc/tree-vect-stmts.c >> > +++ b/gcc/tree-vect-stmts.c >> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info >> > loop_vinfo, tree vectype, >> > >> > static tree >> > prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask, >> > - gimple_stmt_iterator *gsi) >> > + gimple_stmt_iterator *gsi, tree mask, >> > + cond_vmask_map_type *cond_to_vec_mask) >> >> "scalar_mask" might be a better name. But maybe we should key off the >> vector mask after all, now that we're relying on the code having no >> redundancies. >> >> Passing the vinfo would be better than passing the cond_vmask_map_type >> directly. >> >> > { >> > gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE >> > (vec_mask))); >> > if (!loop_mask) >> > return vec_mask; >> > >> > gcc_assert (TREE_TYPE (loop_mask) == mask_type); >> > + >> > + tree *slot = 0; >> > + if (cond_to_vec_mask) >> >> The pointer should never be null in this context. > Disabling check for NULL results in segfault with cond_arith_4.c because we > reach prepare_load_store_mask via vect_schedule_slp, called from > here in vect_transform_loop: > /* Schedule the SLP instances first, then handle loop vectorization > below. */ > if (!loop_vinfo->slp_instances.is_empty ()) > { > DUMP_VECT_SCOPE ("scheduling SLP instances"); > vect_schedule_slp (loop_vinfo); > } > > which is before bb processing loop.
We want this optimisation to be applied to SLP too though. Especially since non-SLP will be going away at some point. But as Richard says, the problem with SLP is that the statements aren't traversed in block order, so I guess we can't do the on-the-fly redundancy elimination there... Maybe an alternative would be to record during the analysis phase which scalar conditions need which loop masks. Statements that need a loop mask currently do: vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype); If we also pass the scalar condition, we can maintain a hash_set of <condition, ncopies> pairs, representing the conditions that have loop masks applied at some point in the vectorised code. The COND_EXPR code can use that set to decide whether to apply the loop mask or not. Trying to avoid duplicate ANDs with the loop mask would then become a separate follow-on change. Not sure whether it's worth it on its own. Thanks, Richard