On Wed, Aug 28, 2019 at 11:02 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > > On Tue, 27 Aug 2019 at 21:14, Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Richard should have the final say, but some comments... > >> > >> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > >> > index 1e2dfe5d22d..862206b3256 100644 > >> > --- a/gcc/tree-vect-stmts.c > >> > +++ b/gcc/tree-vect-stmts.c > >> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info > >> > loop_vinfo, tree vectype, > >> > > >> > static tree > >> > prepare_load_store_mask (tree mask_type, tree loop_mask, tree vec_mask, > >> > - gimple_stmt_iterator *gsi) > >> > + gimple_stmt_iterator *gsi, tree mask, > >> > + cond_vmask_map_type *cond_to_vec_mask) > >> > >> "scalar_mask" might be a better name. But maybe we should key off the > >> vector mask after all, now that we're relying on the code having no > >> redundancies. > >> > >> Passing the vinfo would be better than passing the cond_vmask_map_type > >> directly. > >> > >> > { > >> > gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE > >> > (vec_mask))); > >> > if (!loop_mask) > >> > return vec_mask; > >> > > >> > gcc_assert (TREE_TYPE (loop_mask) == mask_type); > >> > + > >> > + tree *slot = 0; > >> > + if (cond_to_vec_mask) > >> > >> The pointer should never be null in this context. > > Disabling check for NULL results in segfault with cond_arith_4.c because we > > reach prepare_load_store_mask via vect_schedule_slp, called from > > here in vect_transform_loop: > > /* Schedule the SLP instances first, then handle loop vectorization > > below. */ > > if (!loop_vinfo->slp_instances.is_empty ()) > > { > > DUMP_VECT_SCOPE ("scheduling SLP instances"); > > vect_schedule_slp (loop_vinfo); > > } > > > > which is before bb processing loop. > > We want this optimisation to be applied to SLP too though. Especially > since non-SLP will be going away at some point. > > But as Richard says, the problem with SLP is that the statements aren't > traversed in block order, so I guess we can't do the on-the-fly > redundancy elimination there...
And the current patch AFAICS can generate wrong SSA for this reason. > Maybe an alternative would be to record during the analysis phase which > scalar conditions need which loop masks. Statements that need a loop > mask currently do: > > vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype); > > If we also pass the scalar condition, we can maintain a hash_set of > <condition, ncopies> pairs, representing the conditions that have > loop masks applied at some point in the vectorised code. The COND_EXPR > code can use that set to decide whether to apply the loop mask or not. Yeah, that sounds better. Note that I don't like the extra "helpers" in fold-const.c/h, they do not look useful in general so put them into vectorizer private code. The decomposing also doesn't look too nice, instead prepare_load_store_mask could get such decomposed representation - possibly quite natural with the suggestion from Richard above. Richard. > Trying to avoid duplicate ANDs with the loop mask would then become a > separate follow-on change. Not sure whether it's worth it on its own. > > Thanks, > Richard