Re: [SVE] PR86753

Richard Sandiford Tue, 10 Sep 2019 05:20:24 -0700

Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> On Mon, 9 Sep 2019 at 22:06, Prathamesh Kulkarni
> <prathamesh.kulka...@linaro.org> wrote:
>>
>> On Mon, 9 Sep 2019 at 16:45, Richard Sandiford
>> <richard.sandif...@arm.com> wrote:
>> >
>> > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
>> > > With patch, the only following FAIL remains for aarch64-sve.exp:
>> > > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
>> > > scan-assembler-times \\tmovprfx\\t 6
>> > > which now contains 14.
>> > > Should I adjust the test, assuming the change isn't a regression ?
>> >
>> > Well, it is kind-of a regression, but it really just means that the
>> > integer code is now consistent with the floating-point code in having
>> > an unnecessary MOVPRFX.  So I think adjusting the count is fine.
>> > Presumably any future fix for the existing redundant MOVPRFXs will
>> > apply to the new ones as well.
>> >
>> > The patch looks good to me, just some very minor nits:
>> >
>> > > @@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type)
>> > >
>> > >  /* Record that a fully-masked version of LOOP_VINFO would need MASKS to
>> > >     contain a sequence of NVECTORS masks that each control a vector of 
>> > > type
>> > > -   VECTYPE.  */
>> > > +   VECTYPE. SCALAR_MASK if non-null, represents the mask used for 
>> > > corresponding
>> > > +   load/store stmt.  */
>> >
>> > Should be two spaces between sentences.  Maybe:
>> >
>> >    VECTYPE.  If SCALAR_MASK is nonnull, the fully-masked loop would AND
>> >    these vector masks with the vector version of SCALAR_MASK.  */
>> >
>> > since the mask isn't necessarily for a load or store statement.
>> >
>> > > [...]
>> > > @@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, tree, 
>> > > tree, stmt_vec_info,
>> > >     says how the load or store is going to be implemented and GROUP_SIZE
>> > >     is the number of load or store statements in the containing group.
>> > >     If the access is a gather load or scatter store, GS_INFO describes
>> > > -   its arguments.
>> > > +   its arguments. SCALAR_MASK is the scalar mask used for corresponding
>> > > +   load or store stmt.
>> >
>> > Maybe:
>> >
>> >    its arguments.  If the load or store is conditional, SCALAR_MASK is the
>> >    condition under which it occurs.
>> >
>> > since SCALAR_MASK can be null here too.
>> >
>> > > [...]
>> > > @@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info stmt_info, 
>> > > gimple_stmt_iterator *gsi,
>> > >    /* Handle cond expr.  */
>> > >    for (j = 0; j < ncopies; j++)
>> > >      {
>> > > +      tree loop_mask = NULL_TREE;
>> > > +      bool swap_cond_operands = false;
>> > > +
>> > > +      if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>> > > +     {
>> > > +       scalar_cond_masked_key cond (cond_expr, ncopies);
>> > > +       if (loop_vinfo->scalar_cond_masked_set.contains (cond))
>> > > +         {
>> > > +           vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
>> > > +           loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
>> > > vectype, j);
>> > > +         }
>> > > +       else
>> > > +         {
>> > > +           cond.code = invert_tree_comparison (cond.code,
>> > > +                                               HONOR_NANS (TREE_TYPE 
>> > > (cond.op0)));
>> >
>> > Long line.  Maybe just split it out into a separate assignment:
>> >
>> >               bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
>> >               cond.code = invert_tree_comparison (cond.code, honor_nans);
>> >
>> > > +           if (loop_vinfo->scalar_cond_masked_set.contains (cond))
>> > > +             {
>> > > +               vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
>> > > +               loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
>> > > vectype, j);
>> >
>> > Long line here too.
>> >
>> > > [...]
>> > > @@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info 
>> > > stmt_info, gimple_stmt_iterator *gsi,
>> > >                   }
>> > >               }
>> > >           }
>> > > +
>> > > +       if (loop_mask)
>> > > +         {
>> > > +           if (COMPARISON_CLASS_P (vec_compare))
>> > > +             {
>> > > +               tree tmp = make_ssa_name (vec_cmp_type);
>> > > +               gassign *g = gimple_build_assign (tmp,
>> > > +                                                 TREE_CODE 
>> > > (vec_compare),
>> > > +                                                 TREE_OPERAND 
>> > > (vec_compare, 0),
>> > d> +                                                TREE_OPERAND 
>> > (vec_compare, 1));
>> >
>> > Two long lines.
>> >
>> > > +               vect_finish_stmt_generation (stmt_info, g, gsi);
>> > > +               vec_compare = tmp;
>> > > +             }
>> > > +
>> > > +           tree tmp2 = make_ssa_name (vec_cmp_type);
>> > > +           gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, 
>> > > vec_compare, loop_mask);
>> >
>> > Long line here too.
>> >
>> > > [...]
>> > > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
>> > > index dc181524744..c4b2d8e8647 100644
>> > > --- a/gcc/tree-vectorizer.c
>> > > +++ b/gcc/tree-vectorizer.c
>> > > @@ -1513,3 +1513,39 @@ make_pass_ipa_increase_alignment (gcc::context 
>> > > *ctxt)
>> > >  {
>> > >    return new pass_ipa_increase_alignment (ctxt);
>> > >  }
>> > > +
>> > > +/* If code(T) is comparison op or def of comparison stmt,
>> > > +   extract it's operands.
>> > > +   Else return <NE_EXPR, T, 0>.  */
>> > > +
>> > > +void
>> > > +scalar_cond_masked_key::get_cond_ops_from_tree (tree t)
>> > > +{
>> > > +  if (TREE_CODE_CLASS (TREE_CODE (t)) == tcc_comparison)
>> > > +    {
>> > > +      this->code = TREE_CODE (t);
>> > > +      this->op0 = TREE_OPERAND (t, 0);
>> > > +      this->op1 = TREE_OPERAND (t, 1);
>> > > +      return;
>> > > +    }
>> > > +
>> > > +  if (TREE_CODE (t) == SSA_NAME)
>> > > +    {
>> > > +      gassign *stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (t));
>> > > +      if (stmt)
>> > > +        {
>> >
>> > Might as well do this as:
>> >
>> >   if (TREE_CODE (t) == SSA_NAME)
>> >     if (gassign *stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (t)))
>> >       {
>> >
>> > The patch (as hoped) introduces some XPASSes:
>> >
>> > XPASS: gcc.target/aarch64/sve/cond_cnot_2.c scan-assembler-not \\tsel\\t
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmgt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmgt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0\\n 21
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmlt\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 42
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0\\n 15
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmlt\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 30
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmuo\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d\\n 252
>> > XPASS: gcc.target/aarch64/sve/vcond_4.c scan-assembler-times 
>> > \\tfcmuo\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s\\n 180
>> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times 
>> > \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21
>> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times 
>> > \\tfcmge\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42
>> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times 
>> > \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15
>> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times 
>> > \\tfcmge\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30
>> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times 
>> > \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, #0\\.0 21
>> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times 
>> > \\tfcmle\\tp[0-9]+\\.d, p[0-7]/z, z[0-9]+\\.d, z[0-9]+\\.d 42
>> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times 
>> > \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, #0\\.0 15
>> > XPASS: gcc.target/aarch64/sve/vcond_5.c scan-assembler-times 
>> > \\tfcmle\\tp[0-9]+\\.s, p[0-7]/z, z[0-9]+\\.s, z[0-9]+\\.s 30
>> >
>> > Could you remove the associated xfails (and comments above them where
>> > appropriate)?
>> >
>> > OK with those changes from my POV, but please give Richi a day or so
>> > to object.
>> >
>> > Thanks for doing this.
>> Thanks for the suggestions, I have updated the patch accordingly.
>> Boostrap+test in progress on x86_64-unknown-linux-gnu and aarch64-linux-gnu.
>> Richi, does the patch look OK to you ?
> Hi,
> Bootstrap+test passes for x86_64-unknown-linux-gnu and aarch64-linux-gnu.
> On x86_64, there's a "strange" failure of c-c++-common/builtins.c, log shows:
>
> /home/prathamesh.kulkarni/gnu-toolchain/gcc/pr86753-v2-3/gcc/gcc/test
> FAIL: c-c++-common/builtins.c  -Wc++-compat  (test for excess errors)
> Excess errors:
> /home/prathamesh.kulkarni/gnu-toolchain/gcc/pr86753-v2-3/gcc/gcc/test
>
> Which shouldn't really happen since the test doesn't seem relevant to patch,
> and only passes -O2 which shouldn't enable the vectorizer ? Manually
> testing it results in PASS with:
> make check-gcc RUNTESTFLAGS="dg.exp=builtins.c"
> Would it be OK to ignore the FAIL during reg-test ?


Looks like the lines got truncated by the cut-&-paste.  What was
the excess error?

Richard

Re: [SVE] PR86753

Reply via email to