Sorry for the slow reply.
Prathamesh Kulkarni <[email protected]> writes:
> On Fri, 30 Aug 2019 at 16:15, Richard Biener <[email protected]>
> wrote:
>>
>> On Wed, Aug 28, 2019 at 11:02 AM Richard Sandiford
>> <[email protected]> wrote:
>> >
>> > Prathamesh Kulkarni <[email protected]> writes:
>> > > On Tue, 27 Aug 2019 at 21:14, Richard Sandiford
>> > > <[email protected]> wrote:
>> > >>
>> > >> Richard should have the final say, but some comments...
>> > >>
>> > >> Prathamesh Kulkarni <[email protected]> writes:
>> > >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> > >> > index 1e2dfe5d22d..862206b3256 100644
>> > >> > --- a/gcc/tree-vect-stmts.c
>> > >> > +++ b/gcc/tree-vect-stmts.c
>> > >> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info
>> > >> > loop_vinfo, tree vectype,
>> > >> >
>> > >> > static tree
>> > >> > prepare_load_store_mask (tree mask_type, tree loop_mask, tree
>> > >> > vec_mask,
>> > >> > - gimple_stmt_iterator *gsi)
>> > >> > + gimple_stmt_iterator *gsi, tree mask,
>> > >> > + cond_vmask_map_type *cond_to_vec_mask)
>> > >>
>> > >> "scalar_mask" might be a better name. But maybe we should key off the
>> > >> vector mask after all, now that we're relying on the code having no
>> > >> redundancies.
>> > >>
>> > >> Passing the vinfo would be better than passing the cond_vmask_map_type
>> > >> directly.
>> > >>
>> > >> > {
>> > >> > gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE
>> > >> > (vec_mask)));
>> > >> > if (!loop_mask)
>> > >> > return vec_mask;
>> > >> >
>> > >> > gcc_assert (TREE_TYPE (loop_mask) == mask_type);
>> > >> > +
>> > >> > + tree *slot = 0;
>> > >> > + if (cond_to_vec_mask)
>> > >>
>> > >> The pointer should never be null in this context.
>> > > Disabling check for NULL results in segfault with cond_arith_4.c because
>> > > we
>> > > reach prepare_load_store_mask via vect_schedule_slp, called from
>> > > here in vect_transform_loop:
>> > > /* Schedule the SLP instances first, then handle loop vectorization
>> > > below. */
>> > > if (!loop_vinfo->slp_instances.is_empty ())
>> > > {
>> > > DUMP_VECT_SCOPE ("scheduling SLP instances");
>> > > vect_schedule_slp (loop_vinfo);
>> > > }
>> > >
>> > > which is before bb processing loop.
>> >
>> > We want this optimisation to be applied to SLP too though. Especially
>> > since non-SLP will be going away at some point.
>> >
>> > But as Richard says, the problem with SLP is that the statements aren't
>> > traversed in block order, so I guess we can't do the on-the-fly
>> > redundancy elimination there...
>>
>> And the current patch AFAICS can generate wrong SSA for this reason.
>>
>> > Maybe an alternative would be to record during the analysis phase which
>> > scalar conditions need which loop masks. Statements that need a loop
>> > mask currently do:
>> >
>> > vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype);
>> >
>> > If we also pass the scalar condition, we can maintain a hash_set of
>> > <condition, ncopies> pairs, representing the conditions that have
>> > loop masks applied at some point in the vectorised code. The COND_EXPR
>> > code can use that set to decide whether to apply the loop mask or not.
>>
>> Yeah, that sounds better.
>>
>> Note that I don't like the extra "helpers" in fold-const.c/h, they do not
>> look
>> useful in general so put them into vectorizer private code. The decomposing
>> also doesn't look too nice, instead prepare_load_store_mask could get
>> such decomposed representation - possibly quite natural with the suggestion
>> from Richard above.
> Hi,
> Thanks for the suggestions, I have an attached updated patch, that
> tries to address above suggestions.
> With patch, we manage to use same predicate for both tests in PR, and
> the redundant AND ops are eliminated
> by fre4.
>
> I have a few doubts:
> 1] I moved tree_cond_ops into tree-vectorizer.[ch], I will get rid of
> it in follow up patch.
> I am not sure what to pass as def of scalar condition (scalar_mask) to
> vect_record_loop_mask
> from vectorizable_store, vectorizable_reduction and
> vectorizable_live_operation ? In the patch,
> I just passed NULL.
For vectorizable_store this is just "mask", like for vectorizable_load.
Passing NULL looks right for the other two. (Nit, GCC style is to use
NULL rather than 0.)
> 2] Do changes to vectorizable_condition and
> vectorizable_condition_apply_loop_mask look OK ?
Some comments below.
> 3] The patch additionally regresses following tests (apart from fmla_2.c):
> FAIL: gcc.target/aarch64/sve/cond_convert_1.c -march=armv8.2-a+sve
> scan-assembler-not \\tsel\\t
> FAIL: gcc.target/aarch64/sve/cond_convert_4.c -march=armv8.2-a+sve
> scan-assembler-not \\tsel\\t
> FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
> scan-assembler-not \\tsel\\t
> FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
> scan-assembler-times \\tmovprfx\\t
> [...]
For cond_convert_1.c, I think it would be OK to change the test to:
for (int i = 0; i < n; ++i) \
{ \
FLOAT_TYPE bi = b[i]; \
r[i] = pred[i] ? (FLOAT_TYPE) a[i] : bi; \
} \
so that only the a[i] load is conditional. Same for the other two.
I think originally I had to write it this way precisely because
we didn't have the optimisation you're adding, so this is actually
a good sign :-)
> @@ -8313,7 +8313,7 @@ vect_double_mask_nunits (tree type)
>
> void
> vect_record_loop_mask (loop_vec_info loop_vinfo, vec_loop_masks *masks,
> - unsigned int nvectors, tree vectype)
> + unsigned int nvectors, tree vectype, tree scalar_mask)
> {
> gcc_assert (nvectors != 0);
> if (masks->length () < nvectors)
New parameter needs documentation.
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index dd9d45a9547..49ea86a0680 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1888,7 +1888,7 @@ static void
> check_load_store_masking (loop_vec_info loop_vinfo, tree vectype,
> vec_load_store_type vls_type, int group_size,
> vect_memory_access_type memory_access_type,
> - gather_scatter_info *gs_info)
> + gather_scatter_info *gs_info, tree scalar_mask)
> {
> /* Invariant loads need no special support. */
> if (memory_access_type == VMAT_INVARIANT)
Same here.
> @@ -9763,6 +9765,29 @@ vect_is_simple_cond (tree cond, vec_info *vinfo,
> return true;
> }
>
> +static void
> +vectorizable_condition_apply_loop_mask (tree &vec_compare,
> + gimple_stmt_iterator *&gsi,
> + stmt_vec_info &stmt_info,
> + tree loop_mask,
> + tree vec_cmp_type)
Function needs a comment.
I think it'd be better to return the new mask and not make vec_compare
a reference. stmt_info shouldn't need to be a reference either (it's
just a pointer type).
> +{
> + if (COMPARISON_CLASS_P (vec_compare))
> + {
> + tree tmp = make_ssa_name (vec_cmp_type);
> + gassign *g = gimple_build_assign (tmp, TREE_CODE (vec_compare),
> + TREE_OPERAND (vec_compare, 0),
> + TREE_OPERAND (vec_compare, 1));
> + vect_finish_stmt_generation (stmt_info, g, gsi);
> + vec_compare = tmp;
> + }
> +
> + tree tmp2 = make_ssa_name (vec_cmp_type);
> + gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare,
> loop_mask);
> + vect_finish_stmt_generation (stmt_info, g, gsi);
> + vec_compare = tmp2;
> +}
> +
> /* vectorizable_condition.
>
> Check if STMT_INFO is conditional modify expression that can be
> vectorized.
> @@ -9975,6 +10000,36 @@ vectorizable_condition (stmt_vec_info stmt_info,
> gimple_stmt_iterator *gsi,
> /* Handle cond expr. */
> for (j = 0; j < ncopies; j++)
> {
> + tree loop_mask = NULL_TREE;
> +
> + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> + {
> + scalar_cond_masked_key cond (cond_expr, ncopies);
> + if (loop_vinfo->scalar_cond_masked_set->contains (cond))
Nit: untabified line.
> + {
> + scalar_cond_masked_key cond (cond_expr, ncopies);
> + if (loop_vinfo->scalar_cond_masked_set->contains (cond))
This "if" looks redundant -- isn't the condition the same as above?
> + {
> + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype,
> j);
> + }
> + }
> + else
> + {
> + cond.cond_ops.code
> + = invert_tree_comparison (cond.cond_ops.code, true);
Would be better to pass an HONOR_NANS value instead of "true".
> + if (loop_vinfo->scalar_cond_masked_set->contains (cond))
> + {
> + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype,
> j);
> + std::swap (then_clause, else_clause);
> + cond_code = cond.cond_ops.code;
> + cond_expr = build2 (cond_code, TREE_TYPE (cond_expr),
> + then_clause, else_clause);
Rather than do the swap here and build a new tree, I think it would be
better to set a boolean that indicates that the then and else are swapped.
Then we can conditionally swap them after:
vec_then_clause = vec_oprnds2[i];
vec_else_clause = vec_oprnds3[i];
> [...]
> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> index dc181524744..794e65f0007 100644
> --- a/gcc/tree-vectorizer.c
> +++ b/gcc/tree-vectorizer.c
> @@ -464,6 +464,7 @@ vec_info::vec_info (vec_info::vec_kind kind_in, void
> *target_cost_data_in,
> target_cost_data (target_cost_data_in)
> {
> stmt_vec_infos.create (50);
> + scalar_cond_masked_set = new scalar_cond_masked_set_type ();
> }
>
> vec_info::~vec_info ()
> @@ -476,6 +477,8 @@ vec_info::~vec_info ()
>
> destroy_cost_data (target_cost_data);
> free_stmt_vec_infos ();
> + delete scalar_cond_masked_set;
> + scalar_cond_masked_set = 0;
> }
>
> vec_info_shared::vec_info_shared ()
No need to assign null here, since we're at the end of the destructor.
But maybe scalar_cond_masked_set should be "scalar_cond_masked_set_type"
rather than "scalar_cond_masked_set_type *", if the object is going to
have the same lifetime as the vec_info anyway.
Looks good otherwise. I skipped over the tree_cond_ops bit given
your comment above that this was temporary.
Thanks,
Richard