Re: [2/3][vect] Add widening add, subtract vect patterns

Richard Sandiford via Gcc-patches Fri, 13 Nov 2020 04:17:09 -0800

[ There was a discussion on irc about how easy it would be to support
  internal functions and tree codes at the same time, so the agreement
  was to go for tree codes for now with a promise to convert the
  widening-related code to use internal functions for GCC 12. ]


Like Richard said, the new patterns need to be documented in md.texi
and the new tree codes need to be documented in generic.texi.

While we're using tree codes, I think we need to make the naming
consistent with other tree codes: WIDEN_PLUS_EXPR instead of
WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
Same idea for the VEC_* codes.

(In constrast, the internal functions do try to follow the optab names,
since there's usually a 1:1 correspondence.)

Joel Hutton via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi all,
>
> This patch adds widening add and widening subtract patterns to 
> tree-vect-patterns.
>
> All 3 patches together bootstrapped and regression tested on aarch64.
>
> gcc/ChangeLog:
>
> 2020-11-12  Joel Hutton  <joel.hut...@arm.com>
>
>         * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases

Not that I personally care about this stuff (would love to see changelogs
go away :-)) but some nits:

Each description is supposed to start with a capital letter and end with
a full stop (even if it's not a complete sentence).  Same for the rest
of the log.

>         * optabs-tree.c (optab_for_tree_code): optabs for widening 
> adds,subtracts

The line limit for changelogs is 80 characters.  The entry should say
what changed, so “Handle …” or “Add case for …” or something.

>         * optabs.def (OPTAB_D): define vectorized widen add, subtracts
>         * tree-cfg.c (verify_gimple_assign_binary): Add case for widening 
> adds, subtracts
>         * tree-inline.c (estimate_operator_cost): Add case for widening adds, 
> subtracts
>         * tree-vect-generic.c (expand_vector_operations_1): Add case for 
> widening adds, subtracts
>         * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog 
> ptatern

typo: pattern

>         (vect_recog_widen_sub_pattern): New recog pattern
>         (vect_recog_average_pattern): Update widened add code
>         (vect_recog_average_pattern): Update widened add code
>         * tree-vect-stmts.c (vectorizable_conversion): Add case for widened 
> add, subtract
>         (supportable_widening_operation): Add case for widened add, subtract
>         * tree.def (WIDEN_ADD_EXPR): New tree code
>         (WIDEN_SUB_EXPR): New tree code
>         (VEC_WIDEN_ADD_HI_EXPR): New tree code
>         (VEC_WIDEN_ADD_LO_EXPR): New tree code
>         (VEC_WIDEN_SUB_HI_EXPR): New tree code
>         (VEC_WIDEN_SUB_LO_EXPR): New tree code
>
> gcc/testsuite/ChangeLog:
>
> 2020-11-12  Joel Hutton  <joel.hut...@arm.com>
>
>         * gcc.target/aarch64/vect-widen-add.c: New test.
>         * gcc.target/aarch64/vect-widen-sub.c: New test.
>
>
> Ok for trunk?
>
> From e0c10ca554729b9e6d58dbd3f18ba72b2c3ee8bc Mon Sep 17 00:00:00 2001
> From: Joel Hutton <joel.hut...@arm.com>
> Date: Mon, 9 Nov 2020 15:44:18 +0000
> Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns
>
> Add widening add, subtract patterns to tree-vect-patterns.
> Add aarch64 tests for patterns.
>
> fix sad

Would be good to expand on this for the final commit message.

> […]
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index 
> 4dfda756932de1693667c39c6fabed043b20b63b..009dccfa3bd298bca7b3b45401a4cc2acc90ff21
>  100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -170,6 +170,23 @@ optab_for_tree_code (enum tree_code code, const_tree 
> type,
>        return (TYPE_UNSIGNED (type)
>             ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab);
>  
> +    case VEC_WIDEN_ADD_LO_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +           ? vec_widen_uaddl_lo_optab  : vec_widen_saddl_lo_optab);
> +
> +    case VEC_WIDEN_ADD_HI_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +           ? vec_widen_uaddl_hi_optab  : vec_widen_saddl_hi_optab);
> +
> +    case VEC_WIDEN_SUB_LO_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +           ? vec_widen_usubl_lo_optab  : vec_widen_ssubl_lo_optab);
> +
> +    case VEC_WIDEN_SUB_HI_EXPR:
> +      return (TYPE_UNSIGNED (type)
> +           ? vec_widen_usubl_hi_optab  : vec_widen_ssubl_hi_optab);
> +
> +

Nits: excess blank line at the end and excess space before the “:”s.

>      case VEC_UNPACK_HI_EXPR:
>        return (TYPE_UNSIGNED (type)
>             ? vec_unpacku_hi_optab : vec_unpacks_hi_optab);
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 78409aa14537d259bf90277751aac00d452a0d3f..a97cdb360781ca9c743e2991422c600626c75aa5
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -383,6 +383,14 @@ OPTAB_D (vec_widen_smult_even_optab, 
> "vec_widen_smult_even_$a")
>  OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a")
>  OPTAB_D (vec_widen_smult_lo_optab, "vec_widen_smult_lo_$a")
>  OPTAB_D (vec_widen_smult_odd_optab, "vec_widen_smult_odd_$a")
> +OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a")
> +OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a")
> +OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a")
> +OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a")
> +OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
> +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
>  OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
>  OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
>  OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")

Looks like the current code groups signed stuff together and
unsigned stuff together, so would be good to follow that.

> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c 
> b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..fc6966fd9f257170501247411d50428aaaabbdae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
> @@ -0,0 +1,90 @@
> +/* { dg-do run } */
> +/* { dg-options "-O3 -save-temps" } */
> +#include <stdint.h>
> +#include <string.h>
> +
> +#define ARR_SIZE 1024
> +
> +/* Should produce an uaddl */
> +void uadd_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +__attribute__((optimize (0)))
> +void uadd_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +/* Should produce an saddl */
> +void sadd_opt (int32_t *foo, int16_t *a, int16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +__attribute__((optimize (0)))
> +void sadd_nonopt (int32_t *foo, int16_t *a, int16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE - 3;i=i+4)
> +    {
> +        foo[i]   = a[i]   + b[i];
> +        foo[i+1] = a[i+1] + b[i+1];
> +        foo[i+2] = a[i+2] + b[i+2];
> +        foo[i+3] = a[i+3] + b[i+3];
> +    }
> +}
> +
> +
> +void __attribute__((optimize (0)))
> +init(uint16_t *a, uint16_t *b)
> +{
> +    for( int i = 0; i < ARR_SIZE;i++)
> +    {
> +      a[i] = i;
> +      b[i] = 2*i;
> +    }
> +}
> +
> +int __attribute__((optimize (0)))
> +main()
> +{
> +    uint32_t foo_arr[ARR_SIZE];
> +    uint32_t bar_arr[ARR_SIZE];
> +    uint16_t a[ARR_SIZE];
> +    uint16_t b[ARR_SIZE];
> +
> +    init(a, b);
> +    uadd_opt(foo_arr, a, b);
> +    uadd_nonopt(bar_arr, a, b);
> +    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
> +      return 1;
> +    sadd_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
> +    sadd_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
> +    if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
> +      return 1;
> +    return 0;
> +}
> +
> +/* { dg-final { scan-assembler-times "uaddl\t" 1} } */
> +/* { dg-final { scan-assembler-times "uaddl2\t" 1} } */
> +/* { dg-final { scan-assembler-times "saddl\t" 1} } */
> +/* { dg-final { scan-assembler-times "saddl2\t" 1} } */

Same comments as the previous patch about having a "+nosve" pragma
and about the scan-assembler-times lines.  Same for the sub test.

> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 
> 5139f111fecc7ec6e0902145b808308a5e47450b..532692a5da03d6d9653e2d47a1218982b27c4539
>  100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -3864,6 +3864,12 @@ verify_gimple_assign_binary (gassign *stmt)
>          return false;
>        }
>  
> +    case VEC_WIDEN_SUB_HI_EXPR:
> +    case VEC_WIDEN_SUB_LO_EXPR:
> +    case VEC_WIDEN_ADD_HI_EXPR:
> +    case VEC_WIDEN_ADD_LO_EXPR:
> +      return false;
> +

I think these should get the same validity checking as
VEC_WIDEN_MULT_HI_EXPR etc.

>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>        {
> […]
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 
> f68a87e05ed54145a25ccff598eeef9e57f9a759..331027444a6d95343eb110f7f9c7db19b40ee5ee
>  100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
>       of the above pattern.  */
>  
>    tree plus_oprnd0, plus_oprnd1;
> -  if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> -                                    &plus_oprnd0, &plus_oprnd1))
> +  if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> +                                    &plus_oprnd0, &plus_oprnd1)
> +     || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
> +                                    &plus_oprnd0, &plus_oprnd1)))
>      return NULL;
>  
>    tree sum_type = gimple_expr_type (last_stmt);

I think we should make:

  /* Any non-truncating sequence of conversions is OK here, since
     with a successful match, the result of the ABS(U) is known to fit
     within the nonnegative range of the result type.  (It cannot be the
     negative of the minimum signed value due to the range of the widening
     MINUS_EXPR.)  */
  vect_unpromoted_value unprom_abs;
  plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
                                                      &unprom_abs);

specific to the PLUS_EXPR case.  If we look through promotions on
the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
and one on its inputs.

E.g. if we had:

    uint32_t x;
    int32_t y = (int32_t) x;
    int64_t z = WIDEN_ADD_EXPR <y, …>;

I think the code above would look through the signedness change and
give the SAD_EXPR an unsigned extension instead of a signed one.

(Alternatively, we could try to handle WIDEN_ADD_EXPR with promoted
operands, but I think in practice it would be wasted work.)

> @@ -1148,7 +1150,7 @@ vect_recog_sad_pattern (vec_info *vinfo,
>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a 
> phi
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom[2];
> -  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, MINUS_EXPR,
> +  if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, 
> WIDEN_SUB_EXPR,
>                            false, 2, unprom, &half_type))
>      return NULL;
>  
> @@ -1262,6 +1264,24 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, 
> stmt_vec_info last_stmt_info,
>                                     "vect_recog_widen_mult_pattern");
>  }
>  
> +static gimple *
> +vect_recog_widen_add_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
> +                            tree *type_out)
> +{
> +  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
> +                                   PLUS_EXPR, WIDEN_ADD_EXPR, false,
> +                                   "vect_recog_widen_add_pattern");
> +}
> +
> +static gimple *
> +vect_recog_widen_sub_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
> +                            tree *type_out)
> +{
> +  return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
> +                                   MINUS_EXPR, WIDEN_SUB_EXPR, false,
> +                                   "vect_recog_widen_sub_pattern");
> +}
> +

The new functions should have comments before them.  Can probably
just use the vect_recog_widen_mult_pattern comment as a template.

Thanks,
Richard

Re: [2/3][vect] Add widening add, subtract vect patterns

Reply via email to