[ There was a discussion on irc about how easy it would be to support internal functions and tree codes at the same time, so the agreement was to go for tree codes for now with a promise to convert the widening-related code to use internal functions for GCC 12. ]
Like Richard said, the new patterns need to be documented in md.texi and the new tree codes need to be documented in generic.texi. While we're using tree codes, I think we need to make the naming consistent with other tree codes: WIDEN_PLUS_EXPR instead of WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR. Same idea for the VEC_* codes. (In constrast, the internal functions do try to follow the optab names, since there's usually a 1:1 correspondence.) Joel Hutton via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > Hi all, > > This patch adds widening add and widening subtract patterns to > tree-vect-patterns. > > All 3 patches together bootstrapped and regression tested on aarch64. > > gcc/ChangeLog: > > 2020-11-12 Joel Hutton <joel.hut...@arm.com> > > * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases Not that I personally care about this stuff (would love to see changelogs go away :-)) but some nits: Each description is supposed to start with a capital letter and end with a full stop (even if it's not a complete sentence). Same for the rest of the log. > * optabs-tree.c (optab_for_tree_code): optabs for widening > adds,subtracts The line limit for changelogs is 80 characters. The entry should say what changed, so “Handle …” or “Add case for …” or something. > * optabs.def (OPTAB_D): define vectorized widen add, subtracts > * tree-cfg.c (verify_gimple_assign_binary): Add case for widening > adds, subtracts > * tree-inline.c (estimate_operator_cost): Add case for widening adds, > subtracts > * tree-vect-generic.c (expand_vector_operations_1): Add case for > widening adds, subtracts > * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog > ptatern typo: pattern > (vect_recog_widen_sub_pattern): New recog pattern > (vect_recog_average_pattern): Update widened add code > (vect_recog_average_pattern): Update widened add code > * tree-vect-stmts.c (vectorizable_conversion): Add case for widened > add, subtract > (supportable_widening_operation): Add case for widened add, subtract > * tree.def (WIDEN_ADD_EXPR): New tree code > (WIDEN_SUB_EXPR): New tree code > (VEC_WIDEN_ADD_HI_EXPR): New tree code > (VEC_WIDEN_ADD_LO_EXPR): New tree code > (VEC_WIDEN_SUB_HI_EXPR): New tree code > (VEC_WIDEN_SUB_LO_EXPR): New tree code > > gcc/testsuite/ChangeLog: > > 2020-11-12 Joel Hutton <joel.hut...@arm.com> > > * gcc.target/aarch64/vect-widen-add.c: New test. > * gcc.target/aarch64/vect-widen-sub.c: New test. > > > Ok for trunk? > > From e0c10ca554729b9e6d58dbd3f18ba72b2c3ee8bc Mon Sep 17 00:00:00 2001 > From: Joel Hutton <joel.hut...@arm.com> > Date: Mon, 9 Nov 2020 15:44:18 +0000 > Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns > > Add widening add, subtract patterns to tree-vect-patterns. > Add aarch64 tests for patterns. > > fix sad Would be good to expand on this for the final commit message. > […] > diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c > index > 4dfda756932de1693667c39c6fabed043b20b63b..009dccfa3bd298bca7b3b45401a4cc2acc90ff21 > 100644 > --- a/gcc/optabs-tree.c > +++ b/gcc/optabs-tree.c > @@ -170,6 +170,23 @@ optab_for_tree_code (enum tree_code code, const_tree > type, > return (TYPE_UNSIGNED (type) > ? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab); > > + case VEC_WIDEN_ADD_LO_EXPR: > + return (TYPE_UNSIGNED (type) > + ? vec_widen_uaddl_lo_optab : vec_widen_saddl_lo_optab); > + > + case VEC_WIDEN_ADD_HI_EXPR: > + return (TYPE_UNSIGNED (type) > + ? vec_widen_uaddl_hi_optab : vec_widen_saddl_hi_optab); > + > + case VEC_WIDEN_SUB_LO_EXPR: > + return (TYPE_UNSIGNED (type) > + ? vec_widen_usubl_lo_optab : vec_widen_ssubl_lo_optab); > + > + case VEC_WIDEN_SUB_HI_EXPR: > + return (TYPE_UNSIGNED (type) > + ? vec_widen_usubl_hi_optab : vec_widen_ssubl_hi_optab); > + > + Nits: excess blank line at the end and excess space before the “:”s. > case VEC_UNPACK_HI_EXPR: > return (TYPE_UNSIGNED (type) > ? vec_unpacku_hi_optab : vec_unpacks_hi_optab); > diff --git a/gcc/optabs.def b/gcc/optabs.def > index > 78409aa14537d259bf90277751aac00d452a0d3f..a97cdb360781ca9c743e2991422c600626c75aa5 > 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -383,6 +383,14 @@ OPTAB_D (vec_widen_smult_even_optab, > "vec_widen_smult_even_$a") > OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a") > OPTAB_D (vec_widen_smult_lo_optab, "vec_widen_smult_lo_$a") > OPTAB_D (vec_widen_smult_odd_optab, "vec_widen_smult_odd_$a") > +OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a") > +OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a") > +OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a") > +OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a") > +OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a") > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") > OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a") > OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a") > OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a") Looks like the current code groups signed stuff together and unsigned stuff together, so would be good to follow that. > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c > b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..fc6966fd9f257170501247411d50428aaaabbdae > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c > @@ -0,0 +1,90 @@ > +/* { dg-do run } */ > +/* { dg-options "-O3 -save-temps" } */ > +#include <stdint.h> > +#include <string.h> > + > +#define ARR_SIZE 1024 > + > +/* Should produce an uaddl */ > +void uadd_opt (uint32_t *foo, uint16_t *a, uint16_t *b) > +{ > + for( int i = 0; i < ARR_SIZE - 3;i=i+4) > + { > + foo[i] = a[i] + b[i]; > + foo[i+1] = a[i+1] + b[i+1]; > + foo[i+2] = a[i+2] + b[i+2]; > + foo[i+3] = a[i+3] + b[i+3]; > + } > +} > + > +__attribute__((optimize (0))) > +void uadd_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b) > +{ > + for( int i = 0; i < ARR_SIZE - 3;i=i+4) > + { > + foo[i] = a[i] + b[i]; > + foo[i+1] = a[i+1] + b[i+1]; > + foo[i+2] = a[i+2] + b[i+2]; > + foo[i+3] = a[i+3] + b[i+3]; > + } > +} > + > +/* Should produce an saddl */ > +void sadd_opt (int32_t *foo, int16_t *a, int16_t *b) > +{ > + for( int i = 0; i < ARR_SIZE - 3;i=i+4) > + { > + foo[i] = a[i] + b[i]; > + foo[i+1] = a[i+1] + b[i+1]; > + foo[i+2] = a[i+2] + b[i+2]; > + foo[i+3] = a[i+3] + b[i+3]; > + } > +} > + > +__attribute__((optimize (0))) > +void sadd_nonopt (int32_t *foo, int16_t *a, int16_t *b) > +{ > + for( int i = 0; i < ARR_SIZE - 3;i=i+4) > + { > + foo[i] = a[i] + b[i]; > + foo[i+1] = a[i+1] + b[i+1]; > + foo[i+2] = a[i+2] + b[i+2]; > + foo[i+3] = a[i+3] + b[i+3]; > + } > +} > + > + > +void __attribute__((optimize (0))) > +init(uint16_t *a, uint16_t *b) > +{ > + for( int i = 0; i < ARR_SIZE;i++) > + { > + a[i] = i; > + b[i] = 2*i; > + } > +} > + > +int __attribute__((optimize (0))) > +main() > +{ > + uint32_t foo_arr[ARR_SIZE]; > + uint32_t bar_arr[ARR_SIZE]; > + uint16_t a[ARR_SIZE]; > + uint16_t b[ARR_SIZE]; > + > + init(a, b); > + uadd_opt(foo_arr, a, b); > + uadd_nonopt(bar_arr, a, b); > + if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0) > + return 1; > + sadd_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b); > + sadd_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b); > + if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0) > + return 1; > + return 0; > +} > + > +/* { dg-final { scan-assembler-times "uaddl\t" 1} } */ > +/* { dg-final { scan-assembler-times "uaddl2\t" 1} } */ > +/* { dg-final { scan-assembler-times "saddl\t" 1} } */ > +/* { dg-final { scan-assembler-times "saddl2\t" 1} } */ Same comments as the previous patch about having a "+nosve" pragma and about the scan-assembler-times lines. Same for the sub test. > diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c > index > 5139f111fecc7ec6e0902145b808308a5e47450b..532692a5da03d6d9653e2d47a1218982b27c4539 > 100644 > --- a/gcc/tree-cfg.c > +++ b/gcc/tree-cfg.c > @@ -3864,6 +3864,12 @@ verify_gimple_assign_binary (gassign *stmt) > return false; > } > > + case VEC_WIDEN_SUB_HI_EXPR: > + case VEC_WIDEN_SUB_LO_EXPR: > + case VEC_WIDEN_ADD_HI_EXPR: > + case VEC_WIDEN_ADD_LO_EXPR: > + return false; > + I think these should get the same validity checking as VEC_WIDEN_MULT_HI_EXPR etc. > case VEC_WIDEN_LSHIFT_HI_EXPR: > case VEC_WIDEN_LSHIFT_LO_EXPR: > { > […] > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c > index > f68a87e05ed54145a25ccff598eeef9e57f9a759..331027444a6d95343eb110f7f9c7db19b40ee5ee > 100644 > --- a/gcc/tree-vect-patterns.c > +++ b/gcc/tree-vect-patterns.c > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo, > of the above pattern. */ > > tree plus_oprnd0, plus_oprnd1; > - if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR, > - &plus_oprnd0, &plus_oprnd1)) > + if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR, > + &plus_oprnd0, &plus_oprnd1) > + || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR, > + &plus_oprnd0, &plus_oprnd1))) > return NULL; > > tree sum_type = gimple_expr_type (last_stmt); I think we should make: /* Any non-truncating sequence of conversions is OK here, since with a successful match, the result of the ABS(U) is known to fit within the nonnegative range of the result type. (It cannot be the negative of the minimum signed value due to the range of the widening MINUS_EXPR.) */ vect_unpromoted_value unprom_abs; plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0, &unprom_abs); specific to the PLUS_EXPR case. If we look through promotions on the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture of signednesses involved, one on the operands of the WIDEN_ADD_EXPR and one on its inputs. E.g. if we had: uint32_t x; int32_t y = (int32_t) x; int64_t z = WIDEN_ADD_EXPR <y, …>; I think the code above would look through the signedness change and give the SAD_EXPR an unsigned extension instead of a signed one. (Alternatively, we could try to handle WIDEN_ADD_EXPR with promoted operands, but I think in practice it would be wasted work.) > @@ -1148,7 +1150,7 @@ vect_recog_sad_pattern (vec_info *vinfo, > /* FORNOW. Can continue analyzing the def-use chain when this stmt in a > phi > inside the loop (in case we are analyzing an outer-loop). */ > vect_unpromoted_value unprom[2]; > - if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, MINUS_EXPR, > + if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, > WIDEN_SUB_EXPR, > false, 2, unprom, &half_type)) > return NULL; > > @@ -1262,6 +1264,24 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, > stmt_vec_info last_stmt_info, > "vect_recog_widen_mult_pattern"); > } > > +static gimple * > +vect_recog_widen_add_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, > + tree *type_out) > +{ > + return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, > + PLUS_EXPR, WIDEN_ADD_EXPR, false, > + "vect_recog_widen_add_pattern"); > +} > + > +static gimple * > +vect_recog_widen_sub_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, > + tree *type_out) > +{ > + return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, > + MINUS_EXPR, WIDEN_SUB_EXPR, false, > + "vect_recog_widen_sub_pattern"); > +} > + The new functions should have comments before them. Can probably just use the vect_recog_widen_mult_pattern comment as a template. Thanks, Richard