Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification
On Wed, Jul 12, 2023 at 6:37 AM Richard Biener via Gcc-patches wrote: > > The PRs ask for optimizing of > > _1 = BIT_FIELD_REF ; > result_4 = BIT_INSERT_EXPR ; > > to a vector permutation. The following implements this as > match.pd pattern, improving code generation on x86_64. This is more general case of PR 93080 really where we had: _1 = BIT_FIELD_REF <_2, 64, 64>; result_4 = BIT_INSERT_EXPR <_2, _1, 64>; Thanks, Andrew Pinski > > On the RTL level we face the issue that backend patterns inconsistently > use vec_merge and vec_select of vec_concat to represent permutes. > > I think using a (supported) permute is almost always better > than an extract plus insert, maybe excluding the case we extract > element zero and that's aliased to a register that can be used > directly for insertion (not sure how to query that). > > But this regresses for example gcc.target/i386/pr54855-8.c because PRE > now realizes that > > _1 = BIT_FIELD_REF ; > if (_1 > a_4(D)) > goto ; [50.00%] > else > goto ; [50.00%] > >[local count: 536870913]: > >[local count: 1073741824]: > # iftmp.0_2 = PHI <_1(3), a_4(D)(2)> > x_5 = BIT_INSERT_EXPR ; > > is equal to > >[local count: 1073741824]: > _1 = BIT_FIELD_REF ; > if (_1 > a_4(D)) > goto ; [50.00%] > else > goto ; [50.00%] > >[local count: 536870912]: > _7 = BIT_INSERT_EXPR ; > >[local count: 1073741824]: > # prephitmp_8 = PHI > > and that no longer produces the desired maxsd operation at the RTL > level (we fail to match .FMAX at the GIMPLE level earlier). > > Bootstrapped and tested on x86_64-unknown-linux-gnu with regressions: > > FAIL: gcc.target/i386/pr54855-13.c scan-assembler-times vmaxsh[ t] 1 > FAIL: gcc.target/i386/pr54855-13.c scan-assembler-not vcomish[ t] > FAIL: gcc.target/i386/pr54855-8.c scan-assembler-times maxsd 1 > FAIL: gcc.target/i386/pr54855-8.c scan-assembler-not movsd > FAIL: gcc.target/i386/pr54855-9.c scan-assembler-times minss 1 > FAIL: gcc.target/i386/pr54855-9.c scan-assembler-not movss > > I think this is also PR88540 (the lack of min/max detection, not > sure if the SSE min/max are suitable here) > > PR tree-optimization/94864 > PR tree-optimization/94865 > * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern > for vector insertion from vector extraction. > > * gcc.target/i386/pr94864.c: New testcase. > * gcc.target/i386/pr94865.c: Likewise. > --- > gcc/match.pd| 25 + > gcc/testsuite/gcc.target/i386/pr94864.c | 13 + > gcc/testsuite/gcc.target/i386/pr94865.c | 13 + > 3 files changed, 51 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr94864.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr94865.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 8543f777a28..8cc106049c4 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -7770,6 +7770,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > wi::to_wide (@ipos) + isize)) > (BIT_FIELD_REF @0 @rsize @rpos) > > +/* Simplify vector inserts of other vector extracts to a permute. */ > +(simplify > + (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos) > + (if (VECTOR_TYPE_P (type) > + && types_match (@0, @1) > + && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2)) > + && TYPE_VECTOR_SUBPARTS (type).is_constant ()) > + (with > + { > + unsigned HOST_WIDE_INT elsz > + = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1; > + poly_uint64 relt = exact_div (tree_to_poly_uint64 (@rpos), elsz); > + poly_uint64 ielt = exact_div (tree_to_poly_uint64 (@ipos), elsz); > + unsigned nunits = TYPE_VECTOR_SUBPARTS (type).to_constant (); > + vec_perm_builder builder; > + builder.new_vector (nunits, nunits, 1); > + for (unsigned i = 0; i < nunits; ++i) > + builder.quick_push (known_eq (ielt, i) ? nunits + relt : i); > + vec_perm_indices sel (builder, 2, nunits); > + } > + (if (!VECTOR_MODE_P (TYPE_MODE (type)) > + || can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, > false)) > +(vec_perm @0 @1 { vec_perm_indices_to_tree > +(build_vector_type (ssizetype, nunits), sel); }) > + > (if (canonicalize_math_after_vectorization_p ()) > (for fmas (FMA) >(simplify > diff --git a/gcc/testsuite/gcc.target/i386/pr94864.c > b/gcc/testsuite/gcc.target/i386/pr94864.c > new file mode 100644 > index 000..69cb481fcfe > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr94864.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -msse2 -mno-avx" } */ > + > +typedef double v2df __attribute__((vector_size(16))); > + > +v2df move_sd(v2df a, v2df b) > +{ > +v2df result = a; > +result[0] = b[1]; > +return result; > +} > + > +/* { dg-final { scan-assembler "unpckhpd\[\\t \]%xmm0, %xmm1" } } */ > diff --git a/gcc/test
Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification
On Sat, Jul 15, 2023 at 10:31 AM Andrew Pinski wrote: > > On Wed, Jul 12, 2023 at 6:37 AM Richard Biener via Gcc-patches > wrote: > > > > The PRs ask for optimizing of > > > > _1 = BIT_FIELD_REF ; > > result_4 = BIT_INSERT_EXPR ; > > > > to a vector permutation. The following implements this as > > match.pd pattern, improving code generation on x86_64. > > This is more general case of PR 93080 really where we had: > > _1 = BIT_FIELD_REF <_2, 64, 64>; > result_4 = BIT_INSERT_EXPR <_2, _1, 64>; I should mention the i386 failures show up even with the limited patch for PR 93080 which is why I didn't move forward on my patch there. > > Thanks, > Andrew Pinski > > > > > On the RTL level we face the issue that backend patterns inconsistently > > use vec_merge and vec_select of vec_concat to represent permutes. > > > > I think using a (supported) permute is almost always better > > than an extract plus insert, maybe excluding the case we extract > > element zero and that's aliased to a register that can be used > > directly for insertion (not sure how to query that). > > > > But this regresses for example gcc.target/i386/pr54855-8.c because PRE > > now realizes that > > > > _1 = BIT_FIELD_REF ; > > if (_1 > a_4(D)) > > goto ; [50.00%] > > else > > goto ; [50.00%] > > > >[local count: 536870913]: > > > >[local count: 1073741824]: > > # iftmp.0_2 = PHI <_1(3), a_4(D)(2)> > > x_5 = BIT_INSERT_EXPR ; > > > > is equal to > > > >[local count: 1073741824]: > > _1 = BIT_FIELD_REF ; > > if (_1 > a_4(D)) > > goto ; [50.00%] > > else > > goto ; [50.00%] > > > >[local count: 536870912]: > > _7 = BIT_INSERT_EXPR ; > > > >[local count: 1073741824]: > > # prephitmp_8 = PHI > > > > and that no longer produces the desired maxsd operation at the RTL > > level (we fail to match .FMAX at the GIMPLE level earlier). > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu with regressions: > > > > FAIL: gcc.target/i386/pr54855-13.c scan-assembler-times vmaxsh[ t] 1 > > FAIL: gcc.target/i386/pr54855-13.c scan-assembler-not vcomish[ t] > > FAIL: gcc.target/i386/pr54855-8.c scan-assembler-times maxsd 1 > > FAIL: gcc.target/i386/pr54855-8.c scan-assembler-not movsd > > FAIL: gcc.target/i386/pr54855-9.c scan-assembler-times minss 1 > > FAIL: gcc.target/i386/pr54855-9.c scan-assembler-not movss > > > > I think this is also PR88540 (the lack of min/max detection, not > > sure if the SSE min/max are suitable here) > > > > PR tree-optimization/94864 > > PR tree-optimization/94865 > > * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern > > for vector insertion from vector extraction. > > > > * gcc.target/i386/pr94864.c: New testcase. > > * gcc.target/i386/pr94865.c: Likewise. > > --- > > gcc/match.pd| 25 + > > gcc/testsuite/gcc.target/i386/pr94864.c | 13 + > > gcc/testsuite/gcc.target/i386/pr94865.c | 13 + > > 3 files changed, 51 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr94864.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr94865.c > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 8543f777a28..8cc106049c4 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -7770,6 +7770,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > wi::to_wide (@ipos) + isize)) > > (BIT_FIELD_REF @0 @rsize @rpos) > > > > +/* Simplify vector inserts of other vector extracts to a permute. */ > > +(simplify > > + (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos) > > + (if (VECTOR_TYPE_P (type) > > + && types_match (@0, @1) > > + && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2)) > > + && TYPE_VECTOR_SUBPARTS (type).is_constant ()) > > + (with > > + { > > + unsigned HOST_WIDE_INT elsz > > + = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1; > > + poly_uint64 relt = exact_div (tree_to_poly_uint64 (@rpos), elsz); > > + poly_uint64 ielt = exact_div (tree_to_poly_uint64 (@ipos), elsz); > > + unsigned nunits = TYPE_VECTOR_SUBPARTS (type).to_constant (); > > + vec_perm_builder builder; > > + builder.new_vector (nunits, nunits, 1); > > + for (unsigned i = 0; i < nunits; ++i) > > + builder.quick_push (known_eq (ielt, i) ? nunits + relt : i); > > + vec_perm_indices sel (builder, 2, nunits); > > + } > > + (if (!VECTOR_MODE_P (TYPE_MODE (type)) > > + || can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, > > false)) > > +(vec_perm @0 @1 { vec_perm_indices_to_tree > > +(build_vector_type (ssizetype, nunits), sel); > > }) > > + > > (if (canonicalize_math_after_vectorization_p ()) > > (for fmas (FMA) > >(simplify > > diff --git a/gcc/testsuite/gcc.target/i386/pr94864.c > > b/gcc/testsuite/gcc.target/i386/pr94864.c > > new file mode 100644 > > index 00
[committed] hppa: Modify TLS patterns to provide both 32 and 64-bit support
Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11. Committed to trunk. Dave --- hppa: Modify TLS patterns to provide both 32 and 64-bit support. 2023-07-15 John David Anglin gcc/ChangeLog: * config/pa/pa.md: Define constants R1_REGNUM, R19_REGNUM and R27_REGNUM. (tgd_load): Restrict to !TARGET_64BIT. Use register constants. (tld_load): Likewise. (tgd_load_pic): Change to expander. (tld_load_pic, tld_offset_load, tp_load): Likewise. (tie_load_pic, tle_load): Likewise. (tgd_load_picsi, tgd_load_picdi): New. (tld_load_picsi, tld_load_picdi): New. (tld_offset_load): New. (tp_load): New. (tie_load_picsi, tie_load_picdi): New. (tle_load): New. diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md index 726e12768f8..f603591447d 100644 --- a/gcc/config/pa/pa.md +++ b/gcc/config/pa/pa.md @@ -108,6 +108,14 @@ (MAX_17BIT_OFFSET 262100) ; 17-bit branch ]) +;; Register numbers + +(define_constants + [(R1_REGNUM 1) + (R19_REGNUM 19) + (R27_REGNUM 27) + ]) + ;; Mode and code iterators ;; This mode iterator allows :P to be used for patterns that operate on @@ -10262,9 +10270,9 @@ add,l %2,%3,%3\;bv,n %%r0(%3)" (define_insn "tgd_load" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand 1 "tgd_symbolic_operand" "")] UNSPEC_TLSGD)) - (clobber (reg:SI 1)) - (use (reg:SI 27))] - "" + (clobber (reg:SI R1_REGNUM)) + (use (reg:SI R27_REGNUM))] + "!TARGET_64BIT" "* { return \"addil LR'%1-$tls_gdidx$,%%r27\;ldo RR'%1-$tls_gdidx$(%%r1),%0\"; @@ -10272,12 +10280,25 @@ add,l %2,%3,%3\;bv,n %%r0(%3)" [(set_attr "type" "multi") (set_attr "length" "8")]) -(define_insn "tgd_load_pic" +(define_expand "tgd_load_pic" + [(set (match_operand 0 "register_operand") + (unspec [(match_operand 1 "tgd_symbolic_operand")] UNSPEC_TLSGD_PIC)) + (clobber (reg R1_REGNUM))] + "" +{ + if (TARGET_64BIT) +emit_insn (gen_tgd_load_picdi (operands[0], operands[1])); + else +emit_insn (gen_tgd_load_picsi (operands[0], operands[1])); + DONE; +}) + +(define_insn "tgd_load_picsi" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand 1 "tgd_symbolic_operand" "")] UNSPEC_TLSGD_PIC)) - (clobber (reg:SI 1)) - (use (reg:SI 19))] - "" + (clobber (reg:SI R1_REGNUM)) + (use (reg:SI R19_REGNUM))] + "!TARGET_64BIT" "* { return \"addil LT'%1-$tls_gdidx$,%%r19\;ldo RT'%1-$tls_gdidx$(%%r1),%0\"; @@ -10285,12 +10306,25 @@ add,l %2,%3,%3\;bv,n %%r0(%3)" [(set_attr "type" "multi") (set_attr "length" "8")]) +(define_insn "tgd_load_picdi" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec:DI [(match_operand 1 "tgd_symbolic_operand" "")] UNSPEC_TLSGD_PIC)) + (clobber (reg:DI R1_REGNUM)) + (use (reg:DI R27_REGNUM))] + "TARGET_64BIT" + "* +{ + return \"addil LT'%1-$tls_gdidx$,%%r27\;ldo RT'%1-$tls_gdidx$(%%r1),%0\"; +}" + [(set_attr "type" "multi") + (set_attr "length" "8")]) + (define_insn "tld_load" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand 1 "tld_symbolic_operand" "")] UNSPEC_TLSLDM)) - (clobber (reg:SI 1)) - (use (reg:SI 27))] - "" + (clobber (reg:SI R1_REGNUM)) + (use (reg:SI R27_REGNUM))] + "!TARGET_64BIT" "* { return \"addil LR'%1-$tls_ldidx$,%%r27\;ldo RR'%1-$tls_ldidx$(%%r1),%0\"; @@ -10298,12 +10332,25 @@ add,l %2,%3,%3\;bv,n %%r0(%3)" [(set_attr "type" "multi") (set_attr "length" "8")]) -(define_insn "tld_load_pic" +(define_expand "tld_load_pic" + [(set (match_operand 0 "register_operand") + (unspec [(match_operand 1 "tld_symbolic_operand")] UNSPEC_TLSLDM_PIC)) + (clobber (reg R1_REGNUM))] + "" +{ + if (TARGET_64BIT) +emit_insn (gen_tld_load_picdi (operands[0], operands[1])); + else +emit_insn (gen_tld_load_picsi (operands[0], operands[1])); + DONE; +}) + +(define_insn "tld_load_picsi" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand 1 "tld_symbolic_operand" "")] UNSPEC_TLSLDM_PIC)) - (clobber (reg:SI 1)) - (use (reg:SI 19))] - "" + (clobber (reg:SI R1_REGNUM)) + (use (reg:SI R19_REGNUM))] + "!TARGET_64BIT" "* { return \"addil LT'%1-$tls_ldidx$,%%r19\;ldo RT'%1-$tls_ldidx$(%%r1),%0\"; @@ -10311,12 +10358,40 @@ add,l %2,%3,%3\;bv,n %%r0(%3)" [(set_attr "type" "multi") (set_attr "length" "8")]) -(define_insn "tld_offset_load" - [(set (match_operand:SI 0 "register_operand" "=r") -(plus:SI (unspec:SI [(match_operand 1 "tld_symbolic_operand" "")] +(define_insn "tld_load_picdi" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec:DI [(match_operand 1 "tld_symbolic_operand" "")] UNSPEC_TLSLDM_PIC)) + (clobber (reg:DI R1_REGNUM)) + (use (reg:DI R27_REGNUM))] + "TARGET_64BIT" + "* +{ + return \"addil LT'%1-$tls_ldidx$,%%r27\;ldo RT'%1-$tls_ldidx$(%%r1),%0\"; +}" + [(set_attr "typ
[PATCH] Update my contrib entry
Committed as obvious after making sure the documentation still builds. gcc/ChangeLog: * doc/contrib.texi: Update my entry. --- gcc/doc/contrib.texi | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/doc/contrib.texi b/gcc/doc/contrib.texi index fa551c5f900..d7b73e179a5 100644 --- a/gcc/doc/contrib.texi +++ b/gcc/doc/contrib.texi @@ -809,7 +809,8 @@ Marek Polacek for his work on the C front end, the sanitizers and general bug fixing. @item -Andrew Pinski for processing bug reports by the dozen. +Andrew Pinski for processing bug reports by the dozen, maintenance of the +Objective-C runtime libraries, and many scalar optimizations. @item Ovidiu Predescu for his work on the Objective-C front end and runtime -- 2.31.1
Re: [PATCH v3] Introduce attribute reverse_alias
Not commenting on the semantics, but the name seems unfortunate (hello bikeshed). The documentation starts with 'attribute causes @var{name} to be emitted as an alias to the definition'. So not emitting a 'reverse alias', whatever that might be. It doesn;t seem to mention how reverse alias differs from 'alias'. Why would 'alias' not DTRT? Is is emitting a an additiona symbol -- ie, something like 'altname'. Or is it something else? Is that symbol known in the current TU, or other TUs? nathan On 7/14/23 21:08, Alexandre Oliva wrote: This patch introduces an attribute to add extra aliases to a symbol when its definition is output. The main goal is to ease interfacing C++ with Ada, as C++ mangled names have to be named, and in some cases (e.g. when using stdint.h typedefs in function arguments) the symbol names may vary across platforms. The attribute is usable in C and C++, presumably in all C-family languages. It can be attached to global variables and functions. In C++, it can also be attached to namespace-scoped variables and functions, static data members, member functions, explicit instantiations and specializations of template functions, members and classes. When applied to constructors or destructor, additional reverse_aliases with _Base and _Del suffixes are defined for variants other than complete-object ones. This changes the assumption that clones always carry the same attributes as their abstract declarations, so there is now a function to adjust them. C++ also had a bug in which attributes from local extern declarations failed to be propagated to a preexisting corresponding namespace-scoped decl. I've fixed that, and adjusted acc tests that distinguished between C and C++ in this regard. Applying the attribute to class types is only valid in C++, and the effect is to attach the alias to the RTTI object associated with the class type. Regstrapped on x86_64-linux-gnu. Ok to install? This is refreshed and renamed from earlier versions that named the attribute 'exalias', and that AFAICT got stuck in name bikeshedding. https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551614.html for gcc/ChangeLog * attribs.cc: Include cgraph.h. (decl_attributes): Allow late introduction of reverse_alias in types. (create_reverse_alias_decl, create_reverse_alias_decls): New. * attribs.h: Declare them. (FOR_EACH_REVERSE_ALIAS): New macro. * cgraph.cc (cgraph_node::create): Create reverse_alias decls. * varpool.cc (varpool_node::get_create): Create reverse_alias decls. * cgraph.h (symtab_node::remap_reverse_alias_target): New. * symtab.cc (symtab_node::remap_reverse_alias_target): Define. * cgraphunit.cc (cgraph_node::analyze): Create alias_target node if needed. (analyze_functions): Fixup visibility of implicit alias only after its node is analyzed. * doc/extend.texi (reverse_alias): Document for variables, functions and types. for gcc/ada/ChangeLog * doc/gnat_rm/interfacing_to_other_languages.rst: Mention attribute reverse_alias to give RTTI symbols mnemonic names. * doc/gnat_ugn/the_gnat_compilation_model.rst: Mention attribute reverse_alias. Fix incorrect ref to C1 ctor variant. for gcc/c-family/ChangeLog * c-ada-spec.cc (pp_asm_name): Use first reverse_alias if available. * c-attribs.cc (handle_reverse_alias_attribute): New. (c_common_attribute_table): Add reverse_alias. (handle_copy_attribute): Do not copy reverse_alias. for gcc/c/ChangeLog * c-decl.cc (duplicate_decls): Remap reverse_alias target. for gcc/cp/ChangeLog * class.cc (adjust_clone_attributes): New. (copy_fndecl_with_name, build_clone): Call it. * cp-tree.h (adjust_clone_attributes): Declare. (update_reverse_alias_interface): Declare. (update_tinfo_reverse_alias): Declare. * decl.cc (duplicate_decls): Remap reverse_alias target. Adjust clone attributes. (grokfndecl): Tentatively create reverse_alias decls after adding attributes in e.g. a template member function explicit instantiation. * decl2.cc (cplus_decl_attributes): Update tinfo reverse_alias. (copy_interface, update_reverse_alias_interface): New. (determine_visibility): Update reverse_alias interface. (tentative_decl_linkage, import_export_decl): Likewise. * name-lookup.cc: Include target.h and cgraph.h. (push_local_extern_decl_alias): Merge attributes with namespace-scoped decl, and drop duplicate reverse_alias. * optimize.cc (maybe_clone_body): Re-adjust attributes after cloning them. Update reverse_alias interface. * rtti.cc: Include attribs.h and cgraph.h. (get_tinfo_decl): Copy reverse_alias attributes from type to tinfo decl. Create re
[PATCH v1|GCC-13] RISC-V: Bugfix for riscv-vsetvl pass.
From: Ju-Zhe Zhong This patch comes from part of below change, which locate one bug of rvv vsetvel pass when auto-vectorization. https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624523.html Unforunately, It is not easy to reproduce this bug by intrinsic APIs but it is worth to backport to GCC 13. Signed-off-by: Ju-Zhe Zhong gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Add vl parameter. (change_vsetvl_insn): Ditto. (change_insn): Add validate change as well as assert. (pass_vsetvl::backward_demand_fusion): Allow forward. --- gcc/config/riscv/riscv-vsetvl.cc | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 3355ca4e3fb..fbd26988106 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -633,7 +633,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info &info, rtx vl) } static rtx -gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info) +gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info, + rtx vl = NULL_RTX) { rtx new_pat; vl_vtype_info new_info = info; @@ -644,7 +645,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info) if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ())) { rtx dest = get_vl (rinsn); - new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest); + new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest); } else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only) new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX); @@ -926,7 +927,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat) print_rtl_single (dump_file, PATTERN (rinsn)); } - validate_change (rinsn, &PATTERN (rinsn), new_pat, false); + bool change_p = validate_change (rinsn, &PATTERN (rinsn), new_pat, false); + gcc_assert (change_p); if (dump_file) { @@ -1039,7 +1041,8 @@ change_insn (function_info *ssa, insn_change change, insn_info *insn, } static void -change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info) +change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info, + rtx vl = NULL_RTX) { rtx_insn *rinsn; if (vector_config_insn_p (insn->rtl ())) @@ -1053,7 +1056,7 @@ change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info) rinsn = PREV_INSN (insn->rtl ()); gcc_assert (vector_config_insn_p (rinsn)); } - rtx new_pat = gen_vsetvl_pat (rinsn, info); + rtx new_pat = gen_vsetvl_pat (rinsn, info, vl); change_insn (rinsn, new_pat); } @@ -3331,7 +3334,21 @@ pass_vsetvl::backward_demand_fusion (void) new_info)) continue; - change_vsetvl_insn (new_info.get_insn (), new_info); + rtx vl = NULL_RTX; + /* Backward VLMAX VL: + bb 3: +vsetivli zero, 1 ... -> vsetvli t1, zero +vmv.s.x + bb 5: +vsetvli t1, zero ... -> to be elided. +vlse16.v + + We should forward "t1". */ + if (!block_info.reaching_out.has_avl_reg () + && vlmax_avl_p (new_info.get_avl ())) + vl = get_vl (prop.get_insn ()->rtl ()); +change_vsetvl_insn (new_info.get_insn (), new_info, vl); + if (block_info.local_dem == block_info.reaching_out) block_info.local_dem = new_info; block_info.reaching_out = new_info; -- 2.34.1
RE: Re: [PATCH] RISC-V: Support non-SLP unordered reduction
File a separated PATCH target GCC 13 for this bug with rvv.exp and riscv.exp test passed. Unfortunately, it is not easy to reproduce this by Intrinsic API. https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624574.html Pan -Original Message- From: Gcc-patches On Behalf Of ??? Sent: Friday, July 14, 2023 8:51 PM To: kito.cheng Cc: gcc-patches ; kito.cheng ; palmer ; rdapp.gcc ; Jeff Law Subject: Re: Re: [PATCH] RISC-V: Support non-SLP unordered reduction So to be safe, I think it should be backport to GCC 13 even though I didn't have a intrinsic testcase to reproduce it. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-07-14 20:38 To: 钟居哲 CC: GCC Patches; Kito Cheng; Palmer Dabbelt; Robin Dapp; Jeff Law Subject: Re: [PATCH] RISC-V: Support non-SLP unordered reduction 於 2023年7月14日 週五 20:31 寫道: From: Ju-Zhe Zhong This patch add reduc_*_scal to support reduction auto-vectorization. Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization. Consider this following case: int __attribute__((noipa)) and_loop (int32_t * __restrict x, int32_t n, int res) { for (int i = 0; i < n; ++i) res &= x[i]; return res; } ASM: and_loop: ble a1,zero,.L4 vsetvli a3,zero,e32,m1,ta,ma vmv.v.i v1,-1 .L3: vsetvli a5,a1,e32,m1,tu,ma > MUST BE "TU". sllia4,a5,2 sub a1,a1,a5 vle32.v v2,0(a0) add a0,a0,a4 vand.vv v1,v2,v1 bne a1,zero,.L3 vsetivlizero,1,e32,m1,ta,ma vmv.v.i v2,-1 vsetvli a3,zero,e32,m1,ta,ma vredand.vs v1,v1,v2 vmv.x.s a5,v1 and a0,a2,a5 ret .L4: mv a0,a2 ret Fix bug of VSETVL PASS which is caused by reduction testcase. It's performance bug or correctness bug? Does it's also appeared in gcc 13 if it's a correctness bug? SLP reduction and floating-point in-order reduction are not supported yet. gcc/ChangeLog: * config/riscv/autovec.md (reduc_plus_scal_): New pattern. (reduc_smax_scal_): Ditto. (reduc_umax_scal_): Ditto. (reduc_smin_scal_): Ditto. (reduc_umin_scal_): Ditto. (reduc_and_scal_): Ditto. (reduc_ior_scal_): Ditto. (reduc_xor_scal_): Ditto. * config/riscv/riscv-protos.h (enum insn_type): New enum. (emit_nonvlmax_integer_move_insn): Add reduction. (expand_reduction): New function. * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto. (emit_vlmax_fp_reduction_insn): Ditto. (get_m1_mode): Ditto. (expand_cond_len_binop): Fix name. (expand_reduction): New function. * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix bug. (change_insn): Ditto. (change_vsetvl_insn): Ditto. (pass_vsetvl::backward_demand_fusion): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add reduction tests. * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test. --- gcc/config/riscv/autovec.md | 138 ++ gcc/config/riscv/riscv-protos.h | 3 + gcc/config/riscv/riscv-v.cc | 84 ++- gcc/config/riscv/riscv-vsetvl.cc | 28 +++- .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++ .../riscv/rvv/autovec/reduc/reduc-2.c | 129 .../riscv/rvv/autovec/reduc/reduc-3.c | 65 + .../riscv/rvv/autovec/reduc/reduc-4.c | 59 .../riscv/rvv/autovec/reduc/reduc_run-1.c | 56 +++ .../riscv/rvv/autovec/reduc/reduc_run-2.c | 79 ++ .../riscv/rvv/autovec/reduc/reduc_run-3.c | 49 +++ .../riscv/rvv/autovec/reduc/reduc_run-4.c | 66 + gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 2 + 13 files changed, 868 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c create mode 100644 gcc