Re: [PATCH] testsuite/102690 - XFAIL g++.dg/warn/Warray-bounds-16.C
On Wed, 10 Nov 2021, Martin Sebor wrote: > On 11/10/21 3:09 AM, Richard Biener via Gcc-patches wrote: > > This XFAILs the bogus diagnostic test and rectifies the expectation > > on the optimization. > > > > Tested on x86_64-unknown-linux-gnu, pushed. > > > > 2021-11-10 Richard Biener > > > > PR testsuite/102690 > > * g++.dg/warn/Warray-bounds-16.C: XFAIL diagnostic part > > and optimization. > > --- > > gcc/testsuite/g++.dg/warn/Warray-bounds-16.C | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-16.C > > b/gcc/testsuite/g++.dg/warn/Warray-bounds-16.C > > index 17b4d0d194e..89cbadb91c7 100644 > > --- a/gcc/testsuite/g++.dg/warn/Warray-bounds-16.C > > +++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-16.C > > @@ -19,11 +19,11 @@ struct S > > p = (int*) new unsigned char [sizeof (int) * m]; > > > > for (int i = 0; i < m; i++) > > - new (p + i) int (); > > + new (p + i) int (); /* { dg-bogus "bounds" "pr102690" { xfail *-*-* } > > } */ > > } > > }; > > > > S a (0); > > > > -/* Verify the loop has been eliminated. > > - { dg-final { scan-tree-dump-not "goto" "optimized" } } */ > > +/* The loop cannot be eliminated since the global 'new' can change 'm'. */ > > I don't understand this comment. Can you please explain how > the global operator new (i.e., the one outside the loop below) > can change the member of the class whose ctor calls the new? > > The member, or more precisely the enclosing object, doesn't > yet exist at the time the global new is called because its > ctor hasn't finished, so nothing outside the ctor can access > it. A pointer to the S under construction can be used (and > could be accessed by a replacement new) but it cannot be > dereferenced to access its members because the object it > points to doesn't exist until after the ctor completes. Yes, that's the C++ legalise - which is why I XFAILed that part of the test rather than just removed it. The middle-end sees the object *this as existing and being global, thus accessible and mutable by '::new' which when replaced by the user could access and alter *this. Like maybe for S s; void *operator new(..) { s.m = 0; } main() { new (&s) (1); } that may be invalid C++ but this detail of C++ is not reflected in the GIMPLE IL. Before the change that regressed this if S::S() would call a global function foo() instead of new to do the allocation the behavior would be as after the change. Isn't the call to new or foo part of the construction and as such obviously allowed to access and alter the in-construction object? > I copy the test below: > > inline void* operator new (__SIZE_TYPE__, void * v) > { > return v; > } > > struct S > { > int* p; > int m; > > S (int i) > { > m = i; > p = (int*) new unsigned char [sizeof (int) * m]; > > for (int i = 0; i < m; i++) > new (p + i) int (); /* { dg-bogus "bounds" "pr102690" { xfail *-*-* } } > */ > } > }; > > S a (0); > > Thanks > Martin > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
[PATCH] dwarf2out, v2: Fix up field_byte_offset [PR101378]
Hi! Bootstrapped/regtested now successfully on x86_64-linux and i686-linux, verified the struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s; struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t; int main () { s.c = 0x55; s.d = 0x; t.c = 0x55; t.d = 0x; s.e++; } testcase is compiled the same way as before again, ok for trunk? > 2021-11-10 Jakub Jelinek > > PR debug/101378 > * dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS > handling only for DECL_BIT_FIELD_TYPE decls. > > * g++.dg/debug/dwarf2/pr101378.C: New test. > > --- gcc/dwarf2out.c.jj2021-11-05 10:19:46.339457342 +0100 > +++ gcc/dwarf2out.c 2021-11-09 15:01:51.425437717 +0100 > @@ -19646,6 +19646,7 @@ field_byte_offset (const_tree decl, stru > properly dynamic byte offsets only when PCC bitfield type doesn't > matter. */ >if (PCC_BITFIELD_TYPE_MATTERS > + && DECL_BIT_FIELD_TYPE (decl) >&& TREE_CODE (DECL_FIELD_OFFSET (decl)) == INTEGER_CST) > { >offset_int object_offset_in_bits; > --- gcc/testsuite/g++.dg/debug/dwarf2/pr101378.C.jj 2021-11-09 > 15:17:39.504975396 +0100 > +++ gcc/testsuite/g++.dg/debug/dwarf2/pr101378.C 2021-11-09 > 15:17:28.067137556 +0100 > @@ -0,0 +1,13 @@ > +// PR debug/101378 > +// { dg-do compile { target c++11 } } > +// { dg-options "-gdwarf-5 -dA" } > +// { dg-final { scan-assembler-times "0\[^0-9x\\r\\n\]* > DW_AT_data_member_location" 1 } } > +// { dg-final { scan-assembler-times "1\[^0-9x\\r\\n\]* > DW_AT_data_member_location" 1 } } > +// { dg-final { scan-assembler-times "2\[^0-9x\\r\\n\]* > DW_AT_data_member_location" 1 } } > +// { dg-final { scan-assembler-not "-1\[^0-9x\\r\\n\]* > DW_AT_data_member_location" } } > + > +struct E {}; > +struct S > +{ > + [[no_unique_address]] E e, f, g; > +} s; Jakub
[PATCH] Remove find_pdom and find_dom
This removes now useless wrappers around get_immediate_dominator. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-11-11 Richard Biener * cfganal.c (find_pdom): Remove. (control_dependences::find_control_dependence): Remove special-casing of entry block, call get_immediate_dominator directly. * gimple-predicate-analysis.cc (find_pdom): Remove. (find_dom): Likewise. (find_control_equiv_block): Call get_immediate_dominator directly. (compute_control_dep_chain): Likewise. (predicate::init_from_phi_def): Likewise. --- gcc/cfganal.c| 28 - gcc/gimple-predicate-analysis.cc | 36 +++- 2 files changed, 7 insertions(+), 57 deletions(-) diff --git a/gcc/cfganal.c b/gcc/cfganal.c index 11ab23623ae..0cba612738d 100644 --- a/gcc/cfganal.c +++ b/gcc/cfganal.c @@ -372,25 +372,6 @@ control_dependences::clear_control_dependence_bitmap (basic_block bb) bitmap_clear (&control_dependence_map[bb->index]); } -/* Find the immediate postdominator PDOM of the specified basic block BLOCK. - This function is necessary because some blocks have negative numbers. */ - -static inline basic_block -find_pdom (basic_block block) -{ - gcc_assert (block != ENTRY_BLOCK_PTR_FOR_FN (cfun)); - - if (block == EXIT_BLOCK_PTR_FOR_FN (cfun)) -return EXIT_BLOCK_PTR_FOR_FN (cfun); - else -{ - basic_block bb = get_immediate_dominator (CDI_POST_DOMINATORS, block); - if (! bb) - return EXIT_BLOCK_PTR_FOR_FN (cfun); - return bb; -} -} - /* Determine all blocks' control dependences on the given edge with edge_list EL index EDGE_INDEX, ala Morgan, Section 3.6. */ @@ -402,15 +383,14 @@ control_dependences::find_control_dependence (int edge_index) gcc_assert (get_edge_src (edge_index) != EXIT_BLOCK_PTR_FOR_FN (cfun)); - if (get_edge_src (edge_index) == ENTRY_BLOCK_PTR_FOR_FN (cfun)) -ending_block = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)); - else -ending_block = find_pdom (get_edge_src (edge_index)); + ending_block = get_immediate_dominator (CDI_POST_DOMINATORS, + get_edge_src (edge_index)); for (current_block = get_edge_dest (edge_index); current_block != ending_block && current_block != EXIT_BLOCK_PTR_FOR_FN (cfun); - current_block = find_pdom (current_block)) + current_block = get_immediate_dominator (CDI_POST_DOMINATORS, + current_block)) set_control_dependence_map_bit (current_block, edge_index); } diff --git a/gcc/gimple-predicate-analysis.cc b/gcc/gimple-predicate-analysis.cc index f0c84446194..454113d532e 100644 --- a/gcc/gimple-predicate-analysis.cc +++ b/gcc/gimple-predicate-analysis.cc @@ -45,36 +45,6 @@ #define DEBUG_PREDICATE_ANALYZER 1 -/* Find the immediate postdominator of the specified basic block BB. */ - -static inline basic_block -find_pdom (basic_block bb) -{ - basic_block exit_bb = EXIT_BLOCK_PTR_FOR_FN (cfun); - if (bb == exit_bb) -return exit_bb; - - if (basic_block pdom = get_immediate_dominator (CDI_POST_DOMINATORS, bb)) -return pdom; - - return exit_bb; -} - -/* Find the immediate dominator of the specified basic block BB. */ - -static inline basic_block -find_dom (basic_block bb) -{ - basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun); - if (bb == entry_bb) -return entry_bb; - - if (basic_block dom = get_immediate_dominator (CDI_DOMINATORS, bb)) -return dom; - - return entry_bb; -} - /* Return true if BB1 is postdominating BB2 and BB1 is not a loop exit bb. The loop exit bb check is simple and does not cover all cases. */ @@ -96,7 +66,7 @@ is_non_loop_exit_postdominating (basic_block bb1, basic_block bb2) static inline basic_block find_control_equiv_block (basic_block bb) { - basic_block pdom = find_pdom (bb); + basic_block pdom = get_immediate_dominator (CDI_POST_DOMINATORS, bb); /* Skip the postdominating bb that is also a loop exit. */ if (!is_non_loop_exit_postdominating (pdom, bb)) @@ -1167,7 +1137,7 @@ compute_control_dep_chain (basic_block dom_bb, const_basic_block dep_bb, break; } - cd_bb = find_pdom (cd_bb); + cd_bb = get_immediate_dominator (CDI_POST_DOMINATORS, cd_bb); post_dom_check++; if (cd_bb == EXIT_BLOCK_PTR_FOR_FN (cfun) || post_dom_check > MAX_POSTDOM_CHECK) @@ -1788,7 +1758,7 @@ predicate::init_from_phi_def (gphi *phi) basic_block phi_bb = gimple_bb (phi); /* Find the closest dominating bb to be the control dependence root. */ - basic_block cd_root = find_dom (phi_bb); + basic_block cd_root = get_immediate_dominator (CDI_DOMINATORS, phi_bb); if (!cd_root) return false; -- 2.31.1
Re: [PATCH] vect: Remove vec_outside/inside_cost fields
On 11/10/21 18:18, Richard Sandiford wrote: Martin Liška writes: On 11/8/21 11:43, Richard Sandiford via Gcc-patches wrote: |Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install?| I think the patch causes the following on x86_64-linux-gnu: FAIL: gfortran.dg/inline_matmul_17.f90 -O scan-tree-dump-times optimized "matmul_r4" 2 I get that failure even with d70ef65692f (from before the patches I committed today). Sorry, you are right, it's one revision before: d70ef65692fced7ab72e0aceeff7407e5a34d96d Honza, can you please take a look? Cheers, Martin Thanks, Richard
[PATCH v3] c-family: Add __builtin_assoc_barrier
On Wednesday, 8 September 2021 15:49:27 CET Matthias Kretz wrote: > On Wednesday, 8 September 2021 15:44:28 CEST Jason Merrill wrote: > > On 9/8/21 5:37 AM, Matthias Kretz wrote: > > > On Tuesday, 7 September 2021 19:36:22 CEST Jason Merrill wrote: > > >>> case PAREN_EXPR: > > >>> - RETURN (finish_parenthesized_expr (RECUR (TREE_OPERAND (t, > > >>> 0; > > >>> + if (REF_PARENTHESIZED_P (t)) > > >>> + RETURN (finish_parenthesized_expr (RECUR (TREE_OPERAND (t, > > >>> 0; > > >>> + else > > >>> + RETURN (RECUR (TREE_OPERAND (t, 0))); > > >> > > >> I think you need to build a new PAREN_EXPR in the assoc barrier case as > > >> well, for it to have any effect in templates. > > > > > > My intent was to ignore __builtin_assoc_barrier in templates / constexpr > > > evaluation since it's not affected by -fassociative-math anyway. Or do > > > you > > > mean something else? > > > > I agree about constexpr, but why wouldn't template instantiations be > > affected by -fassociative-math like any other function? > > Oh, that seems like a major misunderstanding on my part. I assumed > tsubst_copy_and_build would evaluate the expressions in template arguments > 🤦. I'll expand the test and will fix. Sorry for the long delay. New patch is attached. OK for trunk? New builtin to enable explicit use of PAREN_EXPR in C & C++ code. Signed-off-by: Matthias Kretz gcc/testsuite/ChangeLog: * c-c++-common/builtin-assoc-barrier-1.c: New test. gcc/cp/ChangeLog: * constexpr.c (cxx_eval_constant_expression): Handle PAREN_EXPR via cxx_eval_constant_expression. * cp-objcp-common.c (names_builtin_p): Handle RID_BUILTIN_ASSOC_BARRIER. * cp-tree.h: Adjust TREE_LANG_FLAG documentation to include PAREN_EXPR in REF_PARENTHESIZED_P. (REF_PARENTHESIZED_P): Add PAREN_EXPR. * parser.c (cp_parser_postfix_expression): Handle RID_BUILTIN_ASSOC_BARRIER. * pt.c (tsubst_copy_and_build): If the PAREN_EXPR is not a parenthesized initializer, build a new PAREN_EXPR. * semantics.c (force_paren_expr): Simplify conditionals. Set REF_PARENTHESIZED_P on PAREN_EXPR. (maybe_undo_parenthesized_ref): Test PAREN_EXPR for REF_PARENTHESIZED_P. gcc/c-family/ChangeLog: * c-common.c (c_common_reswords): Add __builtin_assoc_barrier. * c-common.h (enum rid): Add RID_BUILTIN_ASSOC_BARRIER. gcc/c/ChangeLog: * c-decl.c (names_builtin_p): Handle RID_BUILTIN_ASSOC_BARRIER. * c-parser.c (c_parser_postfix_expression): Likewise. gcc/ChangeLog: * doc/extend.texi: Document __builtin_assoc_barrier. --- gcc/c-family/c-common.c | 1 + gcc/c-family/c-common.h | 2 +- gcc/c/c-decl.c| 1 + gcc/c/c-parser.c | 20 ++ gcc/cp/constexpr.c| 8 +++ gcc/cp/cp-objcp-common.c | 1 + gcc/cp/cp-tree.h | 12 ++-- gcc/cp/parser.c | 14 gcc/cp/pt.c | 10 ++- gcc/cp/semantics.c| 23 ++ gcc/doc/extend.texi | 18 + .../c-c++-common/builtin-assoc-barrier-1.c| 71 +++ 12 files changed, 158 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/builtin-assoc-barrier-1.c -- ── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de stdₓ::simd ── diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 436df45df68..dd2a3d5da9e 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -384,6 +384,7 @@ const struct c_common_resword c_common_reswords[] = { "__builtin_convertvector", RID_BUILTIN_CONVERTVECTOR, 0 }, { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 }, { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY }, + { "__builtin_assoc_barrier", RID_BUILTIN_ASSOC_BARRIER, 0 }, { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 }, { "__builtin_shufflevector", RID_BUILTIN_SHUFFLEVECTOR, 0 }, { "__builtin_tgmath", RID_BUILTIN_TGMATH, D_CONLY }, diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index d5dad99ff97..c089fda12e4 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -108,7 +108,7 @@ enum rid RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL, RID_CHOOSE_EXPR, RID_TYPES_COMPATIBLE_P, RID_BUILTIN_COMPLEX, RID_BUILTIN_SHUFFLE, RID_BUILTIN_SHUFFLEVECTOR, RID_BUILTIN_CONVERTVECTOR, RID_BUILTIN_TGMATH, - RID_BUILTIN_HAS_ATTRIBUTE, + RID_BUILTIN_HAS_ATTRIBUTE, RID_BUILTIN_ASSOC_BARRIER, R
Re: [PATCH] dwarf2out, v2: Fix up field_byte_offset [PR101378]
On Thu, 11 Nov 2021, Jakub Jelinek wrote: > Hi! > > Bootstrapped/regtested now successfully on x86_64-linux and i686-linux, > verified the > struct S { int e; int a : 1, b : 7, c : 8, d : 16; } s; > struct T { int a : 1, b : 7; long long c : 8; int d : 16; } t; > > int > main () > { > s.c = 0x55; > s.d = 0x; > t.c = 0x55; > t.d = 0x; > s.e++; > } > testcase is compiled the same way as before again, ok for trunk? OK, also for affected branches. Thanks, Richard. > > 2021-11-10 Jakub Jelinek > > > > PR debug/101378 > > * dwarf2out.c (field_byte_offset): Do the PCC_BITFIELD_TYPE_MATTERS > > handling only for DECL_BIT_FIELD_TYPE decls. > > > > * g++.dg/debug/dwarf2/pr101378.C: New test. > > > > --- gcc/dwarf2out.c.jj 2021-11-05 10:19:46.339457342 +0100 > > +++ gcc/dwarf2out.c 2021-11-09 15:01:51.425437717 +0100 > > @@ -19646,6 +19646,7 @@ field_byte_offset (const_tree decl, stru > > properly dynamic byte offsets only when PCC bitfield type doesn't > > matter. */ > >if (PCC_BITFIELD_TYPE_MATTERS > > + && DECL_BIT_FIELD_TYPE (decl) > >&& TREE_CODE (DECL_FIELD_OFFSET (decl)) == INTEGER_CST) > > { > >offset_int object_offset_in_bits; > > --- gcc/testsuite/g++.dg/debug/dwarf2/pr101378.C.jj 2021-11-09 > > 15:17:39.504975396 +0100 > > +++ gcc/testsuite/g++.dg/debug/dwarf2/pr101378.C2021-11-09 > > 15:17:28.067137556 +0100 > > @@ -0,0 +1,13 @@ > > +// PR debug/101378 > > +// { dg-do compile { target c++11 } } > > +// { dg-options "-gdwarf-5 -dA" } > > +// { dg-final { scan-assembler-times "0\[^0-9x\\r\\n\]* > > DW_AT_data_member_location" 1 } } > > +// { dg-final { scan-assembler-times "1\[^0-9x\\r\\n\]* > > DW_AT_data_member_location" 1 } } > > +// { dg-final { scan-assembler-times "2\[^0-9x\\r\\n\]* > > DW_AT_data_member_location" 1 } } > > +// { dg-final { scan-assembler-not "-1\[^0-9x\\r\\n\]* > > DW_AT_data_member_location" } } > > + > > +struct E {}; > > +struct S > > +{ > > + [[no_unique_address]] E e, f, g; > > +} s; > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
[PATCH] Adjust CPP_FOR_BUILD
Hi. CPP/CPPFLAGS were changed by commit 84401ce5fb4ecab55decb472b168100e7593e01f. That commit uses CPP as a default for CPP_FOR_BUILD. Unless CPP is defined, GNU make defaults CPP as `$(CC) -E'. Given the context, this is now incorrect, since CC_FOR_BUILD should be used. Fixes PR103011. -- Pekka gcc/Changelog: * configure: Regenerate. * configure.ac: For CPP_FOR_BUILD use $(CC_FOR_BUILD) -E instead of $(CPP). --- configure| 2 +- configure.ac | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/configure b/configure index 58979d6e3b1..a5eca91fb2a 100755 --- a/configure +++ b/configure @@ -4092,7 +4092,7 @@ if test "${build}" != "${host}" ; then AR_FOR_BUILD=${AR_FOR_BUILD-ar} AS_FOR_BUILD=${AS_FOR_BUILD-as} CC_FOR_BUILD=${CC_FOR_BUILD-gcc} - CPP_FOR_BUILD="${CPP_FOR_BUILD-\$(CPP)}" + CPP_FOR_BUILD="${CPP_FOR_BUILD-\$(CC_FOR_BUILD) -E}" CXX_FOR_BUILD=${CXX_FOR_BUILD-g++} DSYMUTIL_FOR_BUILD=${DSYMUTIL_FOR_BUILD-dsymutil} GFORTRAN_FOR_BUILD=${GFORTRAN_FOR_BUILD-gfortran} diff --git a/configure.ac b/configure.ac index 550e6993b59..b8055dad573 100644 --- a/configure.ac +++ b/configure.ac @@ -1334,7 +1334,7 @@ if test "${build}" != "${host}" ; then AR_FOR_BUILD=${AR_FOR_BUILD-ar} AS_FOR_BUILD=${AS_FOR_BUILD-as} CC_FOR_BUILD=${CC_FOR_BUILD-gcc} - CPP_FOR_BUILD="${CPP_FOR_BUILD-\$(CPP)}" + CPP_FOR_BUILD="${CPP_FOR_BUILD-\$(CC_FOR_BUILD) -E}" CXX_FOR_BUILD=${CXX_FOR_BUILD-g++} DSYMUTIL_FOR_BUILD=${DSYMUTIL_FOR_BUILD-dsymutil} GFORTRAN_FOR_BUILD=${GFORTRAN_FOR_BUILD-gfortran}
[committed] openmp: Add support for 2 argument num_teams clause
Hi! In OpenMP 5.1, num_teams clause can accept either one expression as before, but it in that case changed meaning, rather than create <= expression teams it is now create == expression teams. Or it accepts two expressions separated by :, with the meaning that the first is low bound and second upper bound on how many teams should be created. The other ways to set number of teams are upper bounds with lower bound of 1. The following patch does parsing of this for C/C++. For host teams, we actually don't need to do anything further right now, we always create (pretend to create) exactly the requested number of teams, so we can just evaluate and throw away the lower bound for now. For teams nested in target, we don't guarantee that though and further work will be needed. In particular, omplower now turns the teams part of: struct S { S (); S (const S &); ~S (); int s; }; void bar (S &, S &); int baz (); _Pragma ("omp declare target to (baz)"); void foo (void) { S a, b; #pragma omp target private (a) map (b) { #pragma omp teams firstprivate (b) num_teams (baz ()) { bar (a, b); } } } into: retval.0 = baz (); retval.1 = retval.0; { unsigned int retval.3; struct S * D.2549; struct S b; retval.3 = (unsigned int) retval.1; D.2549 = .omp_data_i->b; S::S (&b, D.2549); #pragma omp teams num_teams(retval.1) firstprivate(b) shared(a) __builtin_GOMP_teams (retval.3, 0); { bar (&a, &b); } S::~S (&b); #pragma omp return(nowait) } IMHO we want a new API, say GOMP_teams3 which will take 3 arguments instead of 2 (the lower and upper bounds from num_teams and thread_limit) and will return a bool whether it should do the teams body or not. And, we should add right before outermost {} above while (__builtin_GOMP_teams3 ((unsigned) retval.1, (unsigned) retval.1, 0)) and remove the __builtin_GOMP_teams call. The current function performs exit equivalent (at least on NVPTX) which seems bad because that means the destructors of e.g. private variables on target aren't invoked, and at the current placement neither destructors of the already constructed privatized variables in teams. I'll do this next on the compiler side, but I'm afraid I'll need help with the nvptx and amdgcn implementations. E.g. for nvptx, we won't be able to use %ctaid.x . I think ideal would be to use a .shared integer variable for the omp_get_team_num value, but I don't have any experience with that, are .shared variables zero initialized by default, or do they have random value at start? PTX docs say they aren't initializable. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2021-11-11 Jakub Jelinek gcc/ * tree.h (OMP_CLAUSE_NUM_TEAMS_EXPR): Rename to ... (OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR): ... this. (OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR): Define. * tree.c (omp_clause_num_ops): Increase num ops for OMP_CLAUSE_NUM_TEAMS to 2. * tree-pretty-print.c (dump_omp_clause): Print optional lower bound for OMP_CLAUSE_NUM_TEAMS. * gimplify.c (gimplify_scan_omp_clauses): Gimplify OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR if non-NULL. (optimize_target_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. * omp-low.c (lower_omp_teams): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. * omp-expand.c (expand_teams_call, get_target_arguments): Likewise. gcc/c/ * c-parser.c (c_parser_omp_clause_num_teams): Parse optional lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. (c_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. gcc/cp/ * parser.c (cp_parser_omp_clause_num_teams): Parse optional lower-bound and store it into OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR. Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. (cp_parser_omp_target): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. * semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_NUM_TEAMS_LOWER_EXPR of OMP_CLAUSE_NUM_TEAMS clause. * pt.c (tsubst_omp_clauses): Likewise. (tsubst_expr): For OMP_CLAUSE_NUM_TEAMS evaluate before combined target teams even lower-bound expression. gcc/fortran/ * trans-openmp.c (gfc_trans_omp_clauses): Use OMP_CLAUSE_NUM_TEAMS_UPPER_EXPR instead of OMP_CLAUSE_NUM_TEAMS_EXPR. gcc/testsuite/ * c-c++-common/gomp/clauses-1.c (bar): Supply lower-bound expression to half of the num_teams clauses. * c-c++-common/gomp/num-teams-1.c: New test. * c-c++-common/gomp/num-teams-2.c: New test. * g++.dg/gomp/
Re: [aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr
On Tue, 9 Nov 2021 at 20:27, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Thu, 4 Nov 2021 at 14:19, Richard Sandiford > > wrote: > >> > >> Prathamesh Kulkarni writes: > >> > On Wed, 20 Oct 2021 at 15:05, Richard Sandiford > >> > wrote: > >> >> > >> >> Prathamesh Kulkarni writes: > >> >> > On Tue, 19 Oct 2021 at 19:58, Richard Sandiford > >> >> > wrote: > >> >> >> > >> >> >> Prathamesh Kulkarni writes: > >> >> >> > Hi, > >> >> >> > The attached patch emits a more verbose diagnostic for target > >> >> >> > attribute that > >> >> >> > is an architecture extension needing a leading '+'. > >> >> >> > > >> >> >> > For the following test, > >> >> >> > void calculate(void) __attribute__ ((__target__ ("sve"))); > >> >> >> > > >> >> >> > With patch, the compiler now emits: > >> >> >> > 102376.c:1:1: error: arch extension ‘sve’ should be prepended with > >> >> >> > ‘+’ > >> >> >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve"))); > >> >> >> > | ^~~~ > >> >> >> > > >> >> >> > instead of: > >> >> >> > 102376.c:1:1: error: pragma or attribute ‘target("sve")’ is not > >> >> >> > valid > >> >> >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve"))); > >> >> >> > | ^~~~ > >> >> >> > >> >> >> Nice :-) > >> >> >> > >> >> >> > (This isn't specific to sve though). > >> >> >> > OK to commit after bootstrap+test ? > >> >> >> > > >> >> >> > Thanks, > >> >> >> > Prathamesh > >> >> >> > > >> >> >> > diff --git a/gcc/config/aarch64/aarch64.c > >> >> >> > b/gcc/config/aarch64/aarch64.c > >> >> >> > index a9a1800af53..975f7faf968 100644 > >> >> >> > --- a/gcc/config/aarch64/aarch64.c > >> >> >> > +++ b/gcc/config/aarch64/aarch64.c > >> >> >> > @@ -17821,7 +17821,16 @@ aarch64_process_target_attr (tree args) > >> >> >> >num_attrs++; > >> >> >> >if (!aarch64_process_one_target_attr (token)) > >> >> >> > { > >> >> >> > - error ("pragma or attribute % is not > >> >> >> > valid", token); > >> >> >> > + /* Check if token is possibly an arch extension without > >> >> >> > + leading '+'. */ > >> >> >> > + char *str = (char *) xmalloc (strlen (token) + 2); > >> >> >> > + str[0] = '+'; > >> >> >> > + strcpy(str + 1, token); > >> >> >> > >> >> >> I think std::string would be better here, e.g.: > >> >> >> > >> >> >> auto with_plus = std::string ("+") + token; > >> >> >> > >> >> >> > + if (aarch64_handle_attr_isa_flags (str)) > >> >> >> > + error("arch extension %<%s%> should be prepended with > >> >> >> > %<+%>", token); > >> >> >> > >> >> >> Nit: should be a space before the “(”. > >> >> >> > >> >> >> In principle, a fixit hint would have been nice here, but I don't > >> >> >> think > >> >> >> we have enough information to provide one. (Just saying for the > >> >> >> record.) > >> >> > Thanks for the suggestions. > >> >> > Does the attached patch look OK ? > >> >> > >> >> Looks good apart from a couple of formatting nits. > >> >> > > >> >> > Thanks, > >> >> > Prathamesh > >> >> >> > >> >> >> Thanks, > >> >> >> Richard > >> >> >> > >> >> >> > + else > >> >> >> > + error ("pragma or attribute % is not > >> >> >> > valid", token); > >> >> >> > + free (str); > >> >> >> > return false; > >> >> >> > } > >> >> >> > > >> >> > > >> >> > [aarch64] PR102376 - Emit better diagnostics for arch extension in > >> >> > target attribute. > >> >> > > >> >> > gcc/ChangeLog: > >> >> > PR target/102376 > >> >> > * config/aarch64/aarch64.c (aarch64_handle_attr_isa_flags): > >> >> > Change str's > >> >> > type to const char *. > >> >> > (aarch64_process_target_attr): Check if token is possibly an > >> >> > arch extension > >> >> > without leading '+' and emit diagnostic accordingly. > >> >> > > >> >> > gcc/testsuite/ChangeLog: > >> >> > PR target/102376 > >> >> > * gcc.target/aarch64/pr102376.c: New test. > >> >> > diff --git a/gcc/config/aarch64/aarch64.c > >> >> > b/gcc/config/aarch64/aarch64.c > >> >> > index a9a1800af53..b72079bc466 100644 > >> >> > --- a/gcc/config/aarch64/aarch64.c > >> >> > +++ b/gcc/config/aarch64/aarch64.c > >> >> > @@ -17548,7 +17548,7 @@ aarch64_handle_attr_tune (const char *str) > >> >> > modified. */ > >> >> > > >> >> > static bool > >> >> > -aarch64_handle_attr_isa_flags (char *str) > >> >> > +aarch64_handle_attr_isa_flags (const char *str) > >> >> > { > >> >> >enum aarch64_parse_opt_result parse_res; > >> >> >uint64_t isa_flags = aarch64_isa_flags; > >> >> > @@ -17821,7 +17821,13 @@ aarch64_process_target_attr (tree args) > >> >> >num_attrs++; > >> >> >if (!aarch64_process_one_target_attr (token)) > >> >> > { > >> >> > - error ("pragma or attribute % is not valid", > >> >> > token); > >> >> > + /* Check if token is possibly an arch extension without > >> >> > + leading '+'. */ > >> >> > + auto with_plus = std::string("+") + token; > >>
[PATCH] middle-end/103181 - fix operation_could_trap_p for vector division
For integer vector division we only checked for all zero vector constants rather than checking whether any element in the constant vector is zero. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-11-11 Richard Biener PR middle-end/103181 * tree-eh.c (operation_could_trap_helper_p): Properly check vector constants for a zero element for integer division. Separate floating point and integer division code. * gcc.dg/torture/pr103181.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr103181.c | 24 +++ gcc/tree-eh.c | 26 - 2 files changed, 45 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr103181.c diff --git a/gcc/testsuite/gcc.dg/torture/pr103181.c b/gcc/testsuite/gcc.dg/torture/pr103181.c new file mode 100644 index 000..6bc705ab52e --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr103181.c @@ -0,0 +1,24 @@ +/* { dg-do run } */ + +typedef unsigned char __attribute__((__vector_size__ (2))) U; +typedef unsigned short S; +typedef unsigned int __attribute__((__vector_size__ (64))) V; + +V v; +U a, b, c; + +U +foo (S s) +{ + v += __builtin_bswap16 (s) || (S) (a / ((U){3, 0})); + return b + c; +} + +int +main (void) +{ + U x = foo (4); + if (x[0] || x[1]) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c index 3a09de95025..3eff07fc8fe 100644 --- a/gcc/tree-eh.c +++ b/gcc/tree-eh.c @@ -2454,15 +2454,31 @@ operation_could_trap_helper_p (enum tree_code op, case FLOOR_MOD_EXPR: case ROUND_MOD_EXPR: case TRUNC_MOD_EXPR: -case RDIV_EXPR: - if (honor_snans) - return true; - if (fp_operation) - return flag_trapping_math; if (!TREE_CONSTANT (divisor) || integer_zerop (divisor)) return true; + if (TREE_CODE (divisor) == VECTOR_CST) + { + /* Inspired by initializer_each_zero_or_onep. */ + unsigned HOST_WIDE_INT nelts = vector_cst_encoded_nelts (divisor); + if (VECTOR_CST_STEPPED_P (divisor) + && !TYPE_VECTOR_SUBPARTS (TREE_TYPE (divisor)) + .is_constant (&nelts)) + return true; + for (unsigned int i = 0; i < nelts; ++i) + { + tree elt = vector_cst_elt (divisor, i); + if (integer_zerop (elt)) + return true; + } + } return false; +case RDIV_EXPR: + if (honor_snans) + return true; + gcc_assert (fp_operation); + return flag_trapping_math; + case LT_EXPR: case LE_EXPR: case GT_EXPR: -- 2.31.1
Re: [PATCH] vect: Remove vec_outside/inside_cost fields
> > > > > > I think the patch causes the following on x86_64-linux-gnu: > > > FAIL: gfortran.dg/inline_matmul_17.f90 -O scan-tree-dump-times > > > optimized "matmul_r4" 2 > > > > I get that failure even with d70ef65692f (from before the patches > > I committed today). > > Sorry, you are right, it's one revision before: > d70ef65692fced7ab72e0aceeff7407e5a34d96d > > Honza, can you please take a look? The test looks for matmul_r4 calls which we now optimize out in fre1. This is because alias info is better now afunc (&__var_5_mma); _188 = __var_5_mma.dim[0].ubound; _189 = __var_5_mma.dim[0].lbound; _190 = _188 - _189; _191 = _190 + 1; _192 = MAX_EXPR <_191, 0>; _193 = (real(kind=4)) _192; _194 = __var_5_mma.dim[1].ubound; _195 = __var_5_mma.dim[1].lbound; _196 = _194 - _195; _197 = _196 + 1; _198 = MAX_EXPR <_197, 0>; _199 = (real(kind=4)) _198; _200 = _193 * _199; _201 = _200 * 3.0e+0; if (_201 <= 1.0e+9) goto ; [INV] else goto ; [INV] : c = {}; afunc (&__var_5_mma); c = {}; Now afunc writes to __var_5_mma only indirectly so I think it is correct that we optimize the conditional out. Easy fix would be to add -fno-ipa-modref, but perhaps someone with better understanding of Fortran would help me to improve the testcase so the calls to matmul_r4 remains reachable? Honza
[PATCH] aarch64: Use type-qualified builtins for unsigned MLA/MLS intrinsics
Hi, This patch declares type-qualified builtins and uses them for MLA/MLS Neon intrinsics that operate on unsigned types. This eliminates lots of casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-08 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtin generators for unsigned MLA/MLS intrinsics. * config/aarch64/arm_neon.h (vmla_n_u16): Use type-qualified builtin. (vmla_n_u32): Likewise. (vmla_u8): Likewise. (vmla_u16): Likewise. (vmla_u32): Likewise. (vmlaq_n_u16): Likewise. (vmlaq_n_u32): Likewise. (vmlaq_u8): Likewise. (vmlaq_u16): Likewise. (vmlaq_u32): Likewise. (vmls_n_u16): Likewise. (vmls_n_u32): Likewise. (vmls_u8): Likewise. (vmls_u16): Likewise. (vmls_u32): Likewise. (vmlsq_n_u16): Likewise. (vmlsq_n_u32): Likewise. (vmlsq_u8): Likewise. (vmlsq_u16): Likewise. (vmlsq_u32): Likewise. rb15027.patch Description: rb15027.patch
Re: [PATCH] vect: Remove vec_outside/inside_cost fields
On Thu, Nov 11, 2021 at 10:45 AM Jan Hubicka via Gcc-patches wrote: > > > > > > > > > I think the patch causes the following on x86_64-linux-gnu: > > > > FAIL: gfortran.dg/inline_matmul_17.f90 -O scan-tree-dump-times > > > > optimized "matmul_r4" 2 > > > > > > I get that failure even with d70ef65692f (from before the patches > > > I committed today). > > > > Sorry, you are right, it's one revision before: > > d70ef65692fced7ab72e0aceeff7407e5a34d96d > > > > Honza, can you please take a look? > The test looks for matmul_r4 calls which we now optimize out in fre1. > This is because alias info is better now > > afunc (&__var_5_mma); > _188 = __var_5_mma.dim[0].ubound; > _189 = __var_5_mma.dim[0].lbound; > _190 = _188 - _189; > _191 = _190 + 1; > _192 = MAX_EXPR <_191, 0>; > _193 = (real(kind=4)) _192; > _194 = __var_5_mma.dim[1].ubound; > _195 = __var_5_mma.dim[1].lbound; > _196 = _194 - _195; > _197 = _196 + 1; > _198 = MAX_EXPR <_197, 0>; > _199 = (real(kind=4)) _198; > _200 = _193 * _199; > _201 = _200 * 3.0e+0; > if (_201 <= 1.0e+9) > goto ; [INV] > else > goto ; [INV] >: > c = {}; > > > afunc (&__var_5_mma); > c = {}; > > Now afunc writes to __var_5_mma only indirectly so I think it is correct that > we optimize the conditional out. > > Easy fix would be to add -fno-ipa-modref, but perhaps someone with > better understanding of Fortran would help me to improve the testcase so > the calls to matmul_r4 remains reachable? I think the two matmul_r4 cases were missed optimizations before so just changing the expected number of calls to zero is the correct fix here. Indeed we can now statically determine the matrices are not large and so only keep the inline copy. Richard. > > Honza
[PATCH] aarch64: Use type-qualified builtins for PMUL[L] Neon intrinsics
Hi, This patch declares poly type-qualified builtins and uses them for PMUL[L] Neon intrinsics. This removes the need for casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-08 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: Use poly type qualifier in builtin generator macros. * config/aarch64/arm_neon.h (vmul_p8): Use type-qualified builtin and remove casts. (vmulq_p8): Likewise. (vmull_high_p8): Likewise. (vmull_p8): Likewise. rb15030.patch Description: rb15030.patch
[PATCH] aarch64: Use type-qualified builtins for XTN[2] Neon intrinsics
Hi, This patch declares unsigned type-qualified builtins and uses them for XTN[2] Neon intrinsics. This removes the need for casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-08 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: Declare unsigned type-qualified builtins for XTN[2]. * config/aarch64/arm_neon.h (vmovn_high_u16): Use type- qualified builtin and remove casts. (vmovn_high_u32): Likewise. (vmovn_high_u64): Likewise. (vmovn_u16): Likewise. (vmovn_u32): Likewise. (vmovn_u64): Likewise. rb15031.patch Description: rb15031.patch
Re: [PATCH] aarch64: Use type-qualified builtins for unsigned MLA/MLS intrinsics
Jonathan Wright writes: > Hi, > > This patch declares type-qualified builtins and uses them for MLA/MLS > Neon intrinsics that operate on unsigned types. This eliminates lots of > casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-08 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: Declare type- > qualified builtin generators for unsigned MLA/MLS intrinsics. > * config/aarch64/arm_neon.h (vmla_n_u16): Use type-qualified > builtin. > (vmla_n_u32): Likewise. > (vmla_u8): Likewise. > (vmla_u16): Likewise. > (vmla_u32): Likewise. > (vmlaq_n_u16): Likewise. > (vmlaq_n_u32): Likewise. > (vmlaq_u8): Likewise. > (vmlaq_u16): Likewise. > (vmlaq_u32): Likewise. > (vmls_n_u16): Likewise. > (vmls_n_u32): Likewise. > (vmls_u8): Likewise. > (vmls_u16): Likewise. > (vmls_u32): Likewise. > (vmlsq_n_u16): Likewise. > (vmlsq_n_u32): Likewise. > (vmlsq_u8): Likewise. > (vmlsq_u16): Likewise. > (vmlsq_u32): Likewise. OK, thanks. Richard > > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > 4a7e2cf4125fe674dbb31c8f068b3b9970e9ea80..cdc44f0a22fd29715472e5b2dfe6a19ad0c729dd > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -238,13 +238,17 @@ > >/* Implemented by aarch64_mla. */ >BUILTIN_VDQ_BHSI (TERNOP, mla, 0, NONE) > + BUILTIN_VDQ_BHSI (TERNOPU, mla, 0, NONE) >/* Implemented by aarch64_mla_n. */ >BUILTIN_VDQHS (TERNOP, mla_n, 0, NONE) > + BUILTIN_VDQHS (TERNOPU, mla_n, 0, NONE) > >/* Implemented by aarch64_mls. */ >BUILTIN_VDQ_BHSI (TERNOP, mls, 0, NONE) > + BUILTIN_VDQ_BHSI (TERNOPU, mls, 0, NONE) >/* Implemented by aarch64_mls_n. */ >BUILTIN_VDQHS (TERNOP, mls_n, 0, NONE) > + BUILTIN_VDQHS (TERNOPU, mls_n, 0, NONE) > >/* Implemented by aarch64_shrn". */ >BUILTIN_VQN (SHIFTIMM, shrn, 0, NONE) > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > 398a2e3a021fc488519acf6b54ff114805340e8a..de29b3b7da9a2ab16f6c5bdc832907df5deb7d61 > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -6608,18 +6608,14 @@ __extension__ extern __inline uint16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmla_n_u16 (uint16x4_t __a, uint16x4_t __b, uint16_t __c) > { > - return (uint16x4_t) __builtin_aarch64_mla_nv4hi ((int16x4_t) __a, > - (int16x4_t) __b, > - (int16_t) __c); > + return __builtin_aarch64_mla_nv4hi_ (__a, __b, __c); > } > > __extension__ extern __inline uint32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmla_n_u32 (uint32x2_t __a, uint32x2_t __b, uint32_t __c) > { > - return (uint32x2_t) __builtin_aarch64_mla_nv2si ((int32x2_t) __a, > - (int32x2_t) __b, > - (int32_t) __c); > + return __builtin_aarch64_mla_nv2si_ (__a, __b, __c); > } > > __extension__ extern __inline int8x8_t > @@ -6647,27 +6643,21 @@ __extension__ extern __inline uint8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmla_u8 (uint8x8_t __a, uint8x8_t __b, uint8x8_t __c) > { > - return (uint8x8_t) __builtin_aarch64_mlav8qi ((int8x8_t) __a, > -(int8x8_t) __b, > -(int8x8_t) __c); > + return __builtin_aarch64_mlav8qi_ (__a, __b, __c); > } > > __extension__ extern __inline uint16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmla_u16 (uint16x4_t __a, uint16x4_t __b, uint16x4_t __c) > { > - return (uint16x4_t) __builtin_aarch64_mlav4hi ((int16x4_t) __a, > - (int16x4_t) __b, > - (int16x4_t) __c); > + return __builtin_aarch64_mlav4hi_ (__a, __b, __c); > } > > __extension__ extern __inline uint32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmla_u32 (uint32x2_t __a, uint32x2_t __b, uint32x2_t __c) > { > - return (uint32x2_t) __builtin_aarch64_mlav2si ((int32x2_t) __a, > - (int32x2_t) __b, > - (int32x2_t) __c); > + return __builtin_aarch64_mlav2si_ (__a, __b, __c); > } > > __extension__ extern __inline int32x4_t > @@ -6955,18 +6945,14 @@ __extension__ extern __inline uint16x8_t > __attribute__
Re: [PATCH] aarch64: Use type-qualified builtins for PMUL[L] Neon intrinsics
Jonathan Wright writes: > Hi, > > This patch declares poly type-qualified builtins and uses them for > PMUL[L] Neon intrinsics. This removes the need for casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-08 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: Use poly type > qualifier in builtin generator macros. > * config/aarch64/arm_neon.h (vmul_p8): Use type-qualified > builtin and remove casts. > (vmulq_p8): Likewise. > (vmull_high_p8): Likewise. > (vmull_p8): Likewise. OK, thanks. Richard > > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > cdc44f0a22fd29715472e5b2dfe6a19ad0c729dd..35e065fe938e6a6d488dc1b0f084f6ddf2d3618f > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -45,9 +45,9 @@ > >BUILTIN_VDC (COMBINE, combine, 0, AUTO_FP) >VAR1 (COMBINEP, combine, 0, NONE, di) > - BUILTIN_VB (BINOP, pmul, 0, NONE) > - VAR1 (BINOP, pmull, 0, NONE, v8qi) > - VAR1 (BINOP, pmull_hi, 0, NONE, v16qi) > + BUILTIN_VB (BINOPP, pmul, 0, NONE) > + VAR1 (BINOPP, pmull, 0, NONE, v8qi) > + VAR1 (BINOPP, pmull_hi, 0, NONE, v16qi) >BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0, FP) >BUILTIN_VHSDF_DF (UNOP, sqrt, 2, FP) >BUILTIN_VDQ_I (BINOP, addp, 0, NONE) > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > de29b3b7da9a2ab16f6c5bdc832907df5deb7d61..b4a8ec3e328b138c0f368f60bf2534fb10126bd5 > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -1007,8 +1007,7 @@ __extension__ extern __inline poly8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmul_p8 (poly8x8_t __a, poly8x8_t __b) > { > - return (poly8x8_t) __builtin_aarch64_pmulv8qi ((int8x8_t) __a, > - (int8x8_t) __b); > + return __builtin_aarch64_pmulv8qi_ppp (__a, __b); > } > > __extension__ extern __inline int8x16_t > @@ -1071,8 +1070,7 @@ __extension__ extern __inline poly8x16_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmulq_p8 (poly8x16_t __a, poly8x16_t __b) > { > - return (poly8x16_t) __builtin_aarch64_pmulv16qi ((int8x16_t) __a, > -(int8x16_t) __b); > + return __builtin_aarch64_pmulv16qi_ppp (__a, __b); > } > > __extension__ extern __inline int8x8_t > @@ -7716,8 +7714,7 @@ __extension__ extern __inline poly16x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmull_high_p8 (poly8x16_t __a, poly8x16_t __b) > { > - return (poly16x8_t) __builtin_aarch64_pmull_hiv16qi ((int8x16_t) __a, > -(int8x16_t) __b); > + return __builtin_aarch64_pmull_hiv16qi_ppp (__a, __b); > } > > __extension__ extern __inline int16x8_t > @@ -7850,8 +7847,7 @@ __extension__ extern __inline poly16x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmull_p8 (poly8x8_t __a, poly8x8_t __b) > { > - return (poly16x8_t) __builtin_aarch64_pmullv8qi ((int8x8_t) __a, > -(int8x8_t) __b); > + return __builtin_aarch64_pmullv8qi_ppp (__a, __b); > } > > __extension__ extern __inline int16x8_t
[PATCH] aarch64: Use type-qualified builtins for [R]SHRN[2] Neon intrinsics
Hi, Thus patch declares unsigned type-qualified builtins and uses them for [R]SHRN[2] Neon intrinsics. This removes the need for casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-08 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtins for [R]SHRN[2]. * config/aarch64/arm_neon.h (vshrn_n_u16): Use type-qualified builtin and remove casts. (vshrn_n_u32): Likewise. (vshrn_n_u64): Likewise. (vrshrn_high_n_u16): Likewise. (vrshrn_high_n_u32): Likewise. (vrshrn_high_n_u64): Likewise. (vrshrn_n_u16): Likewise. (vrshrn_n_u32): Likewise. (vrshrn_n_u64): Likewise. (vshrn_high_n_u16): Likewise. (vshrn_high_n_u32): Likewise. (vshrn_high_n_u64): Likewise. rb15032.patch Description: rb15032.patch
Re: [PATCH] aarch64: Use type-qualified builtins for XTN[2] Neon intrinsics
Jonathan Wright writes: > Hi, > > This patch declares unsigned type-qualified builtins and uses them for > XTN[2] Neon intrinsics. This removes the need for casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-08 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: Declare unsigned > type-qualified builtins for XTN[2]. > * config/aarch64/arm_neon.h (vmovn_high_u16): Use type- > qualified builtin and remove casts. > (vmovn_high_u32): Likewise. > (vmovn_high_u64): Likewise. > (vmovn_u16): Likewise. > (vmovn_u32): Likewise. > (vmovn_u64): Likewise. OK, thanks. Richard > > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > 35e065fe938e6a6d488dc1b0f084f6ddf2d3618f..5e6df6abe3f5b42710a266d0b2a7a1e4597975a6 > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -235,6 +235,7 @@ > >/* Implemented by aarch64_xtn. */ >BUILTIN_VQN (UNOP, xtn, 0, NONE) > + BUILTIN_VQN (UNOPU, xtn, 0, NONE) > >/* Implemented by aarch64_mla. */ >BUILTIN_VDQ_BHSI (TERNOP, mla, 0, NONE) > @@ -489,7 +490,8 @@ >BUILTIN_VSDQ_I (USHIFTIMM, uqshl_n, 0, NONE) > >/* Implemented by aarch64_xtn2. */ > - BUILTIN_VQN (UNOP, xtn2, 0, NONE) > + BUILTIN_VQN (BINOP, xtn2, 0, NONE) > + BUILTIN_VQN (BINOPU, xtn2, 0, NONE) > >/* Implemented by vec_unpack_hi_. */ >BUILTIN_VQW (UNOP, vec_unpacks_hi_, 10, NONE) > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > b4a8ec3e328b138c0f368f60bf2534fb10126bd5..51cedab19d8d1c261fbcf9a6d3202c2e1b513183 > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -7522,24 +7522,21 @@ __extension__ extern __inline uint8x16_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmovn_high_u16 (uint8x8_t __a, uint16x8_t __b) > { > - return (uint8x16_t) > -__builtin_aarch64_xtn2v8hi ((int8x8_t) __a, (int16x8_t) __b); > + return __builtin_aarch64_xtn2v8hi_uuu (__a, __b); > } > > __extension__ extern __inline uint16x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmovn_high_u32 (uint16x4_t __a, uint32x4_t __b) > { > - return (uint16x8_t) > -__builtin_aarch64_xtn2v4si ((int16x4_t) __a, (int32x4_t) __b); > + return __builtin_aarch64_xtn2v4si_uuu (__a, __b); > } > > __extension__ extern __inline uint32x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmovn_high_u64 (uint32x2_t __a, uint64x2_t __b) > { > - return (uint32x4_t) > -__builtin_aarch64_xtn2v2di ((int32x2_t) __a, (int64x2_t) __b); > + return __builtin_aarch64_xtn2v2di_uuu (__a, __b); > } > > __extension__ extern __inline int8x8_t > @@ -7567,21 +7564,21 @@ __extension__ extern __inline uint8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmovn_u16 (uint16x8_t __a) > { > - return (uint8x8_t)__builtin_aarch64_xtnv8hi ((int16x8_t) __a); > + return __builtin_aarch64_xtnv8hi_uu (__a); > } > > __extension__ extern __inline uint16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmovn_u32 (uint32x4_t __a) > { > - return (uint16x4_t) __builtin_aarch64_xtnv4si ((int32x4_t )__a); > + return __builtin_aarch64_xtnv4si_uu (__a); > } > > __extension__ extern __inline uint32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vmovn_u64 (uint64x2_t __a) > { > - return (uint32x2_t) __builtin_aarch64_xtnv2di ((int64x2_t) __a); > + return __builtin_aarch64_xtnv2di_uu (__a); > } > > __extension__ extern __inline int8x8_t
[PATCH] aarch64: Use type-qualified builtins for UADD[LW][2] Neon intrinsics
Hi, This patch declares unsigned type-qualified builtins and uses them to implement widening-add Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-09 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for uadd[lw][2] builtins. * config/aarch64/arm_neon.h (vaddl_s8): Remove unnecessary cast. (vaddl_s16): Likewise. (vaddl_s32): Likewise. (vaddl_u8): Use type-qualified builtin and remove casts. (vaddl_u16): Likewise. (vaddl_u32): Likewise. (vaddl_high_s8): Remove unnecessary cast. (vaddl_high_s16): Likewise. (vaddl_high_s32): Likewise. (vaddl_high_u8): Use type-qualified builtin and remove casts. (vaddl_high_u16): Likewise. (vaddl_high_u32): Likewise. (vaddw_s8): Remove unnecessary cast. (vaddw_s16): Likewise. (vaddw_s32): Likewise. (vaddw_u8): Use type-qualified builtin and remove casts. (vaddw_u16): Likewise. (vaddw_u32): Likewise. (vaddw_high_s8): Remove unnecessary cast. (vaddw_high_s16): Likewise. (vaddw_high_s32): Likewise. (vaddw_high_u8): Use type-qualified builtin and remove casts. (vaddw_high_u16): Likewise. (vaddw_high_u32): Likewise. rb15033.patch Description: rb15033.patch
[PATCH] aarch64: Use type-qualified builtins for USUB[LW][2] Neon intrinsics
Hi, This patch declares unsigned type-qualified builtins and uses them to implement widening-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-09 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for usub[lw][2] builtins. * config/aarch64/arm_neon.h (vsubl_s8): Remove unnecessary cast. (vsubl_s16): Likewise. (vsubl_s32): Likewise. (vsubl_u8): Use type-qualified builtin and remove casts. (vsubl_u16): Likewise. (vsubl_u32): Likewise. (vsubl_high_s8): Remove unnecessary cast. (vsubl_high_s16): Likewise. (vsubl_high_s32): Likewise. (vsubl_high_u8): Use type-qualified builtin and remove casts. (vsubl_high_u16): Likewise. (vsubl_high_u32): Likewise. (vsubw_s8): Remove unnecessary casts. (vsubw_s16): Likewise. (vsubw_s32): Likewise. (vsubw_u8): Use type-qualified builtin and remove casts. (vsubw_u16): Likewise. (vsubw_u32): Likewise. (vsubw_high_s8): Remove unnecessary cast. (vsubw_high_s16): Likewise. (vsubw_high_s32): Likewise. (vsubw_high_u8): Use type-qualified builtin and remove casts. (vsubw_high_u16): Likewise. (vsubw_high_u32): Likewise. rb15034.patch Description: rb15034.patch
Re: [PATCH] aarch64: Use type-qualified builtins for [R]SHRN[2] Neon intrinsics
Jonathan Wright writes: > Hi, > > Thus patch declares unsigned type-qualified builtins and uses them for > [R]SHRN[2] Neon intrinsics. This removes the need for casts in > arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-08 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: Declare type- > qualified builtins for [R]SHRN[2]. > * config/aarch64/arm_neon.h (vshrn_n_u16): Use type-qualified > builtin and remove casts. > (vshrn_n_u32): Likewise. > (vshrn_n_u64): Likewise. > (vrshrn_high_n_u16): Likewise. > (vrshrn_high_n_u32): Likewise. > (vrshrn_high_n_u64): Likewise. > (vrshrn_n_u16): Likewise. > (vrshrn_n_u32): Likewise. > (vrshrn_n_u64): Likewise. > (vshrn_high_n_u16): Likewise. > (vshrn_high_n_u32): Likewise. > (vshrn_high_n_u64): Likewise. OK, thanks. Richard > > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > 5e6df6abe3f5b42710a266d0b2a7a1e4597975a6..46ec2f9bfc509e5e460334d4c5324ddf18703639 > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -253,15 +253,19 @@ > >/* Implemented by aarch64_shrn". */ >BUILTIN_VQN (SHIFTIMM, shrn, 0, NONE) > + BUILTIN_VQN (USHIFTIMM, shrn, 0, NONE) > >/* Implemented by aarch64_shrn2. */ > - BUILTIN_VQN (SHIFTACC, shrn2, 0, NONE) > + BUILTIN_VQN (SHIFT2IMM, shrn2, 0, NONE) > + BUILTIN_VQN (USHIFT2IMM, shrn2, 0, NONE) > >/* Implemented by aarch64_rshrn". */ >BUILTIN_VQN (SHIFTIMM, rshrn, 0, NONE) > + BUILTIN_VQN (USHIFTIMM, rshrn, 0, NONE) > >/* Implemented by aarch64_rshrn2. */ > - BUILTIN_VQN (SHIFTACC, rshrn2, 0, NONE) > + BUILTIN_VQN (SHIFT2IMM, rshrn2, 0, NONE) > + BUILTIN_VQN (USHIFT2IMM, rshrn2, 0, NONE) > >/* Implemented by aarch64_mlsl. */ >BUILTIN_VD_BHSI (TERNOP, smlsl, 0, NONE) > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > 51cedab19d8d1c261fbcf9a6d3202c2e1b513183..37f02e2a24fbc85f23ea73e2fd0e06deac7db87e > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -7606,21 +7606,21 @@ __extension__ extern __inline uint8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vshrn_n_u16 (uint16x8_t __a, const int __b) > { > - return (uint8x8_t)__builtin_aarch64_shrnv8hi ((int16x8_t)__a, __b); > + return __builtin_aarch64_shrnv8hi_uus (__a, __b); > } > > __extension__ extern __inline uint16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vshrn_n_u32 (uint32x4_t __a, const int __b) > { > - return (uint16x4_t)__builtin_aarch64_shrnv4si ((int32x4_t)__a, __b); > + return __builtin_aarch64_shrnv4si_uus (__a, __b); > } > > __extension__ extern __inline uint32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vshrn_n_u64 (uint64x2_t __a, const int __b) > { > - return (uint32x2_t)__builtin_aarch64_shrnv2di ((int64x2_t)__a, __b); > + return __builtin_aarch64_shrnv2di_uus (__a, __b); > } > > __extension__ extern __inline int32x4_t > @@ -8387,24 +8387,21 @@ __extension__ extern __inline uint8x16_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vrshrn_high_n_u16 (uint8x8_t __a, uint16x8_t __b, const int __c) > { > - return (uint8x16_t) __builtin_aarch64_rshrn2v8hi ((int8x8_t) __a, > - (int16x8_t) __b, __c); > + return __builtin_aarch64_rshrn2v8hi_uuus (__a, __b, __c); > } > > __extension__ extern __inline uint16x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vrshrn_high_n_u32 (uint16x4_t __a, uint32x4_t __b, const int __c) > { > - return (uint16x8_t) __builtin_aarch64_rshrn2v4si ((int16x4_t) __a, > - (int32x4_t) __b, __c); > + return __builtin_aarch64_rshrn2v4si_uuus (__a, __b, __c); > } > > __extension__ extern __inline uint32x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vrshrn_high_n_u64 (uint32x2_t __a, uint64x2_t __b, const int __c) > { > - return (uint32x4_t) __builtin_aarch64_rshrn2v2di ((int32x2_t)__a, > - (int64x2_t)__b, __c); > + return __builtin_aarch64_rshrn2v2di_uuus (__a, __b, __c); > } > > __extension__ extern __inline int8x8_t > @@ -8432,21 +8429,21 @@ __extension__ extern __inline uint8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vrshrn_n_u16 (uint16x8_t __a, const int __b) > { > - return (uint8x8_t) __builtin_aarch64_rshrnv8hi ((int16x8_t) __a, __b); > + return __builtin_aarch64_rshrnv8hi_uus (__a, __b); > } > > __extension__ extern __inline uint16x
[PATCH] aarch64: Use type-qualified builtins for U[R]HADD Neon intrinsics
Hi, This patch declares unsigned type-qualified builtins and uses them to implement (rounding) halving-add Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-09 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for u[r]hadd builtins. * config/aarch64/arm_neon.h (vhadd_s8): Remove unnecessary cast. (vhadd_s16): Likewise. (vhadd_s32): Likewise. (vhadd_u8): Use type-qualified builtin and remove casts. (vhadd_u16): Likewise. (vhadd_u32): Likewise. (vhaddq_s8): Remove unnecessary cast. (vhaddq_s16): Likewise. (vhaddq_s32): Likewise. (vhaddq_u8): Use type-qualified builtin and remove casts. (vhaddq_u16): Likewise. (vhaddq_u32): Likewise. (vrhadd_s8): Remove unnecessary cast. (vrhadd_s16): Likewise. (vrhadd_s32): Likewise. (vrhadd_u8): Use type-qualified builtin and remove casts. (vrhadd_u16): Likewise. (vrhadd_u32): Likewise. (vrhaddq_s8): Remove unnecessary cast. (vrhaddq_s16): Likewise. (vrhaddq_s32): Likewise. (vrhaddq_u8): Use type-wualified builtin and remove casts. (vrhaddq_u16): Likewise. (vrhaddq_u32): Likewise. rb15035.patch Description: rb15035.patch
[PATCH] aarch64: Use type-qualified builtins for UHSUB Neon intrinsics
Hi, This patch declares unsigned type-qualified builtins and uses them to implement halving-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-09 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type qualifiers in generator macros for uhsub builtins. * config/aarch64/arm_neon.h (vhsub_s8): Remove unnecessary cast. (vhsub_s16): Likewise. (vhsub_s32): Likewise. (vhsub_u8): Use type-qualified builtin and remove casts. (vhsub_u16): Likewise. (vhsub_u32): Likewise. (vhsubq_s8): Remove unnecessary cast. (vhsubq_s16): Likewise. (vhsubq_s32): Likewise. (vhsubq_u8): Use type-qualified builtin and remove casts. (vhsubq_u16): Likewise. (vhsubq_u32): Likewise. rb15036.patch Description: rb15036.patch
[PATCH] aarch64: Use type-qualified builtins for [R]ADDHN[2] Neon intrinsics
Hi, This patch declares unsigned type-qualified builtins and uses them to implement (rounding) halving-narrowing-add Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-09 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: Declare unsigned builtins for [r]addhn[2]. * config/aarch64/arm_neon.h (vaddhn_s16): Remove unnecessary cast. (vaddhn_s32): Likewise. (vaddhn_s64): Likewise. (vaddhn_u16): Use type-qualified builtin and remove casts. (vaddhn_u32): Likewise. (vaddhn_u64): Likewise. (vraddhn_s16): Remove unnecessary cast. (vraddhn_s32): Likewise. (vraddhn_s64): Likewise. (vraddhn_u16): Use type-qualified builtin and remove casts. (vraddhn_u32): Likewise. (vraddhn_u64): Likewise. (vaddhn_high_s16): Remove unnecessary cast. (vaddhn_high_s32): Likewise. (vaddhn_high_s64): Likewise. (vaddhn_high_u16): Use type-qualified builtin and remove casts. (vaddhn_high_u32): Likewise. (vaddhn_high_u64): Likewise. (vraddhn_high_s16): Remove unnecessary cast. (vraddhn_high_s32): Likewise. (vraddhn_high_s64): Likewise. (vraddhn_high_u16): Use type-qualified builtin and remove casts. (vraddhn_high_u32): Likewise. (vraddhn_high_u64): Likewise. rb15037.patch Description: rb15037.patch
Re: [PATCH] aarch64: Use type-qualified builtins for USUB[LW][2] Neon intrinsics
Jonathan Wright writes: > Hi, > > This patch declares unsigned type-qualified builtins and uses them to > implement widening-subtract Neon intrinsics. This removes the need > for many casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-09 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type > qualifiers in generator macros for usub[lw][2] builtins. > * config/aarch64/arm_neon.h (vsubl_s8): Remove unnecessary > cast. > (vsubl_s16): Likewise. > (vsubl_s32): Likewise. > (vsubl_u8): Use type-qualified builtin and remove casts. > (vsubl_u16): Likewise. > (vsubl_u32): Likewise. > (vsubl_high_s8): Remove unnecessary cast. > (vsubl_high_s16): Likewise. > (vsubl_high_s32): Likewise. > (vsubl_high_u8): Use type-qualified builtin and remove casts. > (vsubl_high_u16): Likewise. > (vsubl_high_u32): Likewise. > (vsubw_s8): Remove unnecessary casts. > (vsubw_s16): Likewise. > (vsubw_s32): Likewise. > (vsubw_u8): Use type-qualified builtin and remove casts. > (vsubw_u16): Likewise. > (vsubw_u32): Likewise. > (vsubw_high_s8): Remove unnecessary cast. > (vsubw_high_s16): Likewise. > (vsubw_high_s32): Likewise. > (vsubw_high_u8): Use type-qualified builtin and remove casts. > (vsubw_high_u16): Likewise. > (vsubw_high_u32): Likewise. OK, thanks. Richard > > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > ccd194978f948201698aec16d74baa82c187cad4..be06a80cea379b8b78c798dbec47fb95eec68db1 > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -160,21 +160,21 @@ >BUILTIN_VQW (BINOP, saddl2, 0, NONE) >BUILTIN_VQW (BINOPU, uaddl2, 0, NONE) >BUILTIN_VQW (BINOP, ssubl2, 0, NONE) > - BUILTIN_VQW (BINOP, usubl2, 0, NONE) > + BUILTIN_VQW (BINOPU, usubl2, 0, NONE) >BUILTIN_VQW (BINOP, saddw2, 0, NONE) >BUILTIN_VQW (BINOPU, uaddw2, 0, NONE) >BUILTIN_VQW (BINOP, ssubw2, 0, NONE) > - BUILTIN_VQW (BINOP, usubw2, 0, NONE) > + BUILTIN_VQW (BINOPU, usubw2, 0, NONE) >/* Implemented by aarch64_l. */ >BUILTIN_VD_BHSI (BINOP, saddl, 0, NONE) >BUILTIN_VD_BHSI (BINOPU, uaddl, 0, NONE) >BUILTIN_VD_BHSI (BINOP, ssubl, 0, NONE) > - BUILTIN_VD_BHSI (BINOP, usubl, 0, NONE) > + BUILTIN_VD_BHSI (BINOPU, usubl, 0, NONE) >/* Implemented by aarch64_w. */ >BUILTIN_VD_BHSI (BINOP, saddw, 0, NONE) >BUILTIN_VD_BHSI (BINOPU, uaddw, 0, NONE) >BUILTIN_VD_BHSI (BINOP, ssubw, 0, NONE) > - BUILTIN_VD_BHSI (BINOP, usubw, 0, NONE) > + BUILTIN_VD_BHSI (BINOPU, usubw, 0, NONE) >/* Implemented by aarch64_h. */ >BUILTIN_VDQ_BHSI (BINOP, shadd, 0, NONE) >BUILTIN_VDQ_BHSI (BINOP, shsub, 0, NONE) > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > a3d742f25a896f8e736a5fb01535d372cd4b20db..58b3dddb2c4ebf856de0e9cf0399e42d322beff9 > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -1765,180 +1765,168 @@ __extension__ extern __inline int16x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vsubl_s8 (int8x8_t __a, int8x8_t __b) > { > - return (int16x8_t) __builtin_aarch64_ssublv8qi (__a, __b); > + return __builtin_aarch64_ssublv8qi (__a, __b); > } > > __extension__ extern __inline int32x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vsubl_s16 (int16x4_t __a, int16x4_t __b) > { > - return (int32x4_t) __builtin_aarch64_ssublv4hi (__a, __b); > + return __builtin_aarch64_ssublv4hi (__a, __b); > } > > __extension__ extern __inline int64x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vsubl_s32 (int32x2_t __a, int32x2_t __b) > { > - return (int64x2_t) __builtin_aarch64_ssublv2si (__a, __b); > + return __builtin_aarch64_ssublv2si (__a, __b); > } > > __extension__ extern __inline uint16x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vsubl_u8 (uint8x8_t __a, uint8x8_t __b) > { > - return (uint16x8_t) __builtin_aarch64_usublv8qi ((int8x8_t) __a, > -(int8x8_t) __b); > + return __builtin_aarch64_usublv8qi_uuu (__a, __b); > } > > __extension__ extern __inline uint32x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vsubl_u16 (uint16x4_t __a, uint16x4_t __b) > { > - return (uint32x4_t) __builtin_aarch64_usublv4hi ((int16x4_t) __a, > -(int16x4_t) __b); > + return __builtin_aarch64_usublv4hi_uuu (__a, __b); > } > > __extension__ extern __inline uint64x2_t > __attrib
Re: [PATCH] aarch64: Use type-qualified builtins for U[R]HADD Neon intrinsics
Jonathan Wright writes: > Hi, > > This patch declares unsigned type-qualified builtins and uses them to > implement (rounding) halving-add Neon intrinsics. This removes the > need for many casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-09 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type > qualifiers in generator macros for u[r]hadd builtins. > * config/aarch64/arm_neon.h (vhadd_s8): Remove unnecessary > cast. > (vhadd_s16): Likewise. > (vhadd_s32): Likewise. > (vhadd_u8): Use type-qualified builtin and remove casts. > (vhadd_u16): Likewise. > (vhadd_u32): Likewise. > (vhaddq_s8): Remove unnecessary cast. > (vhaddq_s16): Likewise. > (vhaddq_s32): Likewise. > (vhaddq_u8): Use type-qualified builtin and remove casts. > (vhaddq_u16): Likewise. > (vhaddq_u32): Likewise. > (vrhadd_s8): Remove unnecessary cast. > (vrhadd_s16): Likewise. > (vrhadd_s32): Likewise. > (vrhadd_u8): Use type-qualified builtin and remove casts. > (vrhadd_u16): Likewise. > (vrhadd_u32): Likewise. > (vrhaddq_s8): Remove unnecessary cast. > (vrhaddq_s16): Likewise. > (vrhaddq_s32): Likewise. > (vrhaddq_u8): Use type-wualified builtin and remove casts. > (vrhaddq_u16): Likewise. > (vrhaddq_u32): Likewise. OK, thanks. Richard > > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > be06a80cea379b8b78c798dbec47fb95eec68db1..8f9a8d1707dfdf6111d740da53275e79500e8cde > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -178,10 +178,10 @@ >/* Implemented by aarch64_h. */ >BUILTIN_VDQ_BHSI (BINOP, shadd, 0, NONE) >BUILTIN_VDQ_BHSI (BINOP, shsub, 0, NONE) > - BUILTIN_VDQ_BHSI (BINOP, uhadd, 0, NONE) > + BUILTIN_VDQ_BHSI (BINOPU, uhadd, 0, NONE) >BUILTIN_VDQ_BHSI (BINOP, uhsub, 0, NONE) >BUILTIN_VDQ_BHSI (BINOP, srhadd, 0, NONE) > - BUILTIN_VDQ_BHSI (BINOP, urhadd, 0, NONE) > + BUILTIN_VDQ_BHSI (BINOPU, urhadd, 0, NONE) > >/* Implemented by aarch64_addlp. */ >BUILTIN_VDQV_L (UNOP, saddlp, 0, NONE) > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > 58b3dddb2c4ebf856de0e9cf0399e42d322beff9..73eea7c261f49155d616a2ddf1d96d4be9bca53f > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -545,180 +545,168 @@ __extension__ extern __inline int8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhadd_s8 (int8x8_t __a, int8x8_t __b) > { > - return (int8x8_t) __builtin_aarch64_shaddv8qi (__a, __b); > + return __builtin_aarch64_shaddv8qi (__a, __b); > } > > __extension__ extern __inline int16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhadd_s16 (int16x4_t __a, int16x4_t __b) > { > - return (int16x4_t) __builtin_aarch64_shaddv4hi (__a, __b); > + return __builtin_aarch64_shaddv4hi (__a, __b); > } > > __extension__ extern __inline int32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhadd_s32 (int32x2_t __a, int32x2_t __b) > { > - return (int32x2_t) __builtin_aarch64_shaddv2si (__a, __b); > + return __builtin_aarch64_shaddv2si (__a, __b); > } > > __extension__ extern __inline uint8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhadd_u8 (uint8x8_t __a, uint8x8_t __b) > { > - return (uint8x8_t) __builtin_aarch64_uhaddv8qi ((int8x8_t) __a, > - (int8x8_t) __b); > + return __builtin_aarch64_uhaddv8qi_uuu (__a, __b); > } > > __extension__ extern __inline uint16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhadd_u16 (uint16x4_t __a, uint16x4_t __b) > { > - return (uint16x4_t) __builtin_aarch64_uhaddv4hi ((int16x4_t) __a, > -(int16x4_t) __b); > + return __builtin_aarch64_uhaddv4hi_uuu (__a, __b); > } > > __extension__ extern __inline uint32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhadd_u32 (uint32x2_t __a, uint32x2_t __b) > { > - return (uint32x2_t) __builtin_aarch64_uhaddv2si ((int32x2_t) __a, > -(int32x2_t) __b); > + return __builtin_aarch64_uhaddv2si_uuu (__a, __b); > } > > __extension__ extern __inline int8x16_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhaddq_s8 (int8x16_t __a, int8x16_t __b) > { > - return (int8x16_t) __builtin_aarch64_shaddv16qi (__a, __b); > + return __builtin_aarch64_shaddv16qi (__a, __b); > } > > __extension__ ex
[PATCH] aarch64: Use type-qualified builtins for [R]SUBHN[2] Neon intrinsics
Hi, This patch declares unsigned type-qualified builtins and uses them to implement (rounding) halving-narrowing-subtract Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-09 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: Declare unsigned builtins for [r]subhn[2]. * config/aarch64/arm_neon.h (vsubhn_s16): Remove unnecessary cast. (vsubhn_s32): Likewise. (vsubhn_s64): Likewise. (vsubhn_u16): Use type-qualified builtin and remove casts. (vsubhn_u32): Likewise. (vsubhn_u64): Likewise. (vrsubhn_s16): Remove unnecessary cast. (vrsubhn_s32): Likewise. (vrsubhn_s64): Likewise. (vrsubhn_u16): Use type-qualified builtin and remove casts. (vrsubhn_u32): Likewise. (vrsubhn_u64): Likewise. (vrsubhn_high_s16): Remove unnecessary cast. (vrsubhn_high_s32): Likewise. (vrsubhn_high_s64): Likewise. (vrsubhn_high_u16): Use type-qualified builtin and remove casts. (vrsubhn_high_u32): Likewise. (vrsubhn_high_u64): Likewise. (vsubhn_high_s16): Remove unnecessary cast. (vsubhn_high_s32): Likewise. (vsubhn_high_s64): Likewise. (vsubhn_high_u16): Use type-qualified builtin and remove casts. (vsubhn_high_u32): Likewise. (vsubhn_high_u64): Likewise. rb15038.patch Description: rb15038.patch
Re: [PATCH] aarch64: Use type-qualified builtins for UHSUB Neon intrinsics
Jonathan Wright writes: > Hi, > > This patch declares unsigned type-qualified builtins and uses them to > implement halving-subtract Neon intrinsics. This removes the need for > many casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-09 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type > qualifiers in generator macros for uhsub builtins. > * config/aarch64/arm_neon.h (vhsub_s8): Remove unnecessary > cast. > (vhsub_s16): Likewise. > (vhsub_s32): Likewise. > (vhsub_u8): Use type-qualified builtin and remove casts. > (vhsub_u16): Likewise. > (vhsub_u32): Likewise. > (vhsubq_s8): Remove unnecessary cast. > (vhsubq_s16): Likewise. > (vhsubq_s32): Likewise. > (vhsubq_u8): Use type-qualified builtin and remove casts. > (vhsubq_u16): Likewise. > (vhsubq_u32): Likewise. OK, thanks. Richard > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > 8f9a8d1707dfdf6111d740da53275e79500e8cde..af04b732227439dcaaa2f3751097050d988eb729 > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -179,7 +179,7 @@ >BUILTIN_VDQ_BHSI (BINOP, shadd, 0, NONE) >BUILTIN_VDQ_BHSI (BINOP, shsub, 0, NONE) >BUILTIN_VDQ_BHSI (BINOPU, uhadd, 0, NONE) > - BUILTIN_VDQ_BHSI (BINOP, uhsub, 0, NONE) > + BUILTIN_VDQ_BHSI (BINOPU, uhsub, 0, NONE) >BUILTIN_VDQ_BHSI (BINOP, srhadd, 0, NONE) >BUILTIN_VDQ_BHSI (BINOPU, urhadd, 0, NONE) > > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > 73eea7c261f49155d616a2ddf1d96d4be9bca53f..b2781f680d142b848f622d2f4965b42985885502 > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -1956,90 +1956,84 @@ __extension__ extern __inline int8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhsub_s8 (int8x8_t __a, int8x8_t __b) > { > - return (int8x8_t)__builtin_aarch64_shsubv8qi (__a, __b); > + return __builtin_aarch64_shsubv8qi (__a, __b); > } > > __extension__ extern __inline int16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhsub_s16 (int16x4_t __a, int16x4_t __b) > { > - return (int16x4_t) __builtin_aarch64_shsubv4hi (__a, __b); > + return __builtin_aarch64_shsubv4hi (__a, __b); > } > > __extension__ extern __inline int32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhsub_s32 (int32x2_t __a, int32x2_t __b) > { > - return (int32x2_t) __builtin_aarch64_shsubv2si (__a, __b); > + return __builtin_aarch64_shsubv2si (__a, __b); > } > > __extension__ extern __inline uint8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhsub_u8 (uint8x8_t __a, uint8x8_t __b) > { > - return (uint8x8_t) __builtin_aarch64_uhsubv8qi ((int8x8_t) __a, > - (int8x8_t) __b); > + return __builtin_aarch64_uhsubv8qi_uuu (__a, __b); > } > > __extension__ extern __inline uint16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhsub_u16 (uint16x4_t __a, uint16x4_t __b) > { > - return (uint16x4_t) __builtin_aarch64_uhsubv4hi ((int16x4_t) __a, > -(int16x4_t) __b); > + return __builtin_aarch64_uhsubv4hi_uuu (__a, __b); > } > > __extension__ extern __inline uint32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhsub_u32 (uint32x2_t __a, uint32x2_t __b) > { > - return (uint32x2_t) __builtin_aarch64_uhsubv2si ((int32x2_t) __a, > -(int32x2_t) __b); > + return __builtin_aarch64_uhsubv2si_uuu (__a, __b); > } > > __extension__ extern __inline int8x16_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhsubq_s8 (int8x16_t __a, int8x16_t __b) > { > - return (int8x16_t) __builtin_aarch64_shsubv16qi (__a, __b); > + return __builtin_aarch64_shsubv16qi (__a, __b); > } > > __extension__ extern __inline int16x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhsubq_s16 (int16x8_t __a, int16x8_t __b) > { > - return (int16x8_t) __builtin_aarch64_shsubv8hi (__a, __b); > + return __builtin_aarch64_shsubv8hi (__a, __b); > } > > __extension__ extern __inline int32x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhsubq_s32 (int32x4_t __a, int32x4_t __b) > { > - return (int32x4_t) __builtin_aarch64_shsubv4si (__a, __b); > + return __builtin_aarch64_shsubv4si (__a, __b); > } > > __extension__ extern __inline uint8x16_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vhsubq_u8 (uint8x
[PATCH] aarch64: Use type-qualified builtins for ADDP Neon intrinsics
Hi, This patch declares unsigned type-qualified builtins and uses them to implement the pairwise addition Neon intrinsics. This removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-09 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: * config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified builtin and remove casts. (vpaddq_u16): Likewise. (vpaddq_u32): Likewise. (vpaddq_u64): Likewise. (vpadd_u8): Likewise. (vpadd_u16): Likewise. (vpadd_u32): Likewise. (vpaddd_u64): Likewise. rb15039.patch Description: rb15039.patch
Re: [PATCH] aarch64: Use type-qualified builtins for [R]ADDHN[2] Neon intrinsics
Jonathan Wright writes: > Hi, > > This patch declares unsigned type-qualified builtins and uses them to > implement (rounding) halving-narrowing-add Neon intrinsics. This > removes the need for many casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-09 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: Declare unsigned > builtins for [r]addhn[2]. > * config/aarch64/arm_neon.h (vaddhn_s16): Remove unnecessary > cast. > (vaddhn_s32): Likewise. > (vaddhn_s64): Likewise. > (vaddhn_u16): Use type-qualified builtin and remove casts. > (vaddhn_u32): Likewise. > (vaddhn_u64): Likewise. > (vraddhn_s16): Remove unnecessary cast. > (vraddhn_s32): Likewise. > (vraddhn_s64): Likewise. > (vraddhn_u16): Use type-qualified builtin and remove casts. > (vraddhn_u32): Likewise. > (vraddhn_u64): Likewise. > (vaddhn_high_s16): Remove unnecessary cast. > (vaddhn_high_s32): Likewise. > (vaddhn_high_s64): Likewise. > (vaddhn_high_u16): Use type-qualified builtin and remove > casts. > (vaddhn_high_u32): Likewise. > (vaddhn_high_u64): Likewise. > (vraddhn_high_s16): Remove unnecessary cast. > (vraddhn_high_s32): Likewise. > (vraddhn_high_s64): Likewise. > (vraddhn_high_u16): Use type-qualified builtin and remove > casts. > (vraddhn_high_u32): Likewise. > (vraddhn_high_u64): Likewise. OK, thanks. Richard > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > af04b732227439dcaaa2f3751097050d988eb729..6372da80be33c40cb27e5811bfb4f4f672f28a35 > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -220,13 +220,17 @@ > >/* Implemented by aarch64_hn. */ >BUILTIN_VQN (BINOP, addhn, 0, NONE) > + BUILTIN_VQN (BINOPU, addhn, 0, NONE) >BUILTIN_VQN (BINOP, subhn, 0, NONE) >BUILTIN_VQN (BINOP, raddhn, 0, NONE) > + BUILTIN_VQN (BINOPU, raddhn, 0, NONE) >BUILTIN_VQN (BINOP, rsubhn, 0, NONE) >/* Implemented by aarch64_hn2. */ >BUILTIN_VQN (TERNOP, addhn2, 0, NONE) > + BUILTIN_VQN (TERNOPU, addhn2, 0, NONE) >BUILTIN_VQN (TERNOP, subhn2, 0, NONE) >BUILTIN_VQN (TERNOP, raddhn2, 0, NONE) > + BUILTIN_VQN (TERNOPU, raddhn2, 0, NONE) >BUILTIN_VQN (TERNOP, rsubhn2, 0, NONE) > >/* Implemented by aarch64_xtl. */ > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > b2781f680d142b848f622d2f4965b42985885502..cb481542ba0d6ffb7cc8ffe7c1a098930fc5e746 > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -713,186 +713,168 @@ __extension__ extern __inline int8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddhn_s16 (int16x8_t __a, int16x8_t __b) > { > - return (int8x8_t) __builtin_aarch64_addhnv8hi (__a, __b); > + return __builtin_aarch64_addhnv8hi (__a, __b); > } > > __extension__ extern __inline int16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddhn_s32 (int32x4_t __a, int32x4_t __b) > { > - return (int16x4_t) __builtin_aarch64_addhnv4si (__a, __b); > + return __builtin_aarch64_addhnv4si (__a, __b); > } > > __extension__ extern __inline int32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddhn_s64 (int64x2_t __a, int64x2_t __b) > { > - return (int32x2_t) __builtin_aarch64_addhnv2di (__a, __b); > + return __builtin_aarch64_addhnv2di (__a, __b); > } > > __extension__ extern __inline uint8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddhn_u16 (uint16x8_t __a, uint16x8_t __b) > { > - return (uint8x8_t) __builtin_aarch64_addhnv8hi ((int16x8_t) __a, > - (int16x8_t) __b); > + return __builtin_aarch64_addhnv8hi_uuu (__a, __b); > } > > __extension__ extern __inline uint16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddhn_u32 (uint32x4_t __a, uint32x4_t __b) > { > - return (uint16x4_t) __builtin_aarch64_addhnv4si ((int32x4_t) __a, > -(int32x4_t) __b); > + return __builtin_aarch64_addhnv4si_uuu (__a, __b); > } > > __extension__ extern __inline uint32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddhn_u64 (uint64x2_t __a, uint64x2_t __b) > { > - return (uint32x2_t) __builtin_aarch64_addhnv2di ((int64x2_t) __a, > -(int64x2_t) __b); > + return __builtin_aarch64_addhnv2di_uuu (__a, __b); > } > > __extension__ extern __inline int8x8_t > __attribute__ ((__always_inli
Re: [PATCH 1/5] Add IFN_COND_FMIN/FMAX functions
On Wed, Nov 10, 2021 at 1:44 PM Richard Sandiford via Gcc-patches wrote: > > This patch adds conditional forms of FMAX and FMIN, following > the pattern for existing conditional binary functions. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? OK. Thanks, Richard. > Richard > > > gcc/ > * doc/md.texi (cond_fmin@var{mode}, cond_fmax@var{mode}): Document. > * optabs.def (cond_fmin_optab, cond_fmax_optab): New optabs. > * internal-fn.def (COND_FMIN, COND_FMAX): New functions. > * internal-fn.c (first_commutative_argument): Handle them. > (FOR_EACH_COND_FN_PAIR): Likewise. > * match.pd (UNCOND_BINARY, COND_BINARY): Likewise. > * config/aarch64/aarch64-sve.md (cond_): New > pattern. > > gcc/testsuite/ > * gcc.target/aarch64/sve/cond_fmaxnm_5.c: New test. > * gcc.target/aarch64/sve/cond_fmaxnm_5_run.c: Likewise. > * gcc.target/aarch64/sve/cond_fmaxnm_6.c: Likewise. > * gcc.target/aarch64/sve/cond_fmaxnm_6_run.c: Likewise. > * gcc.target/aarch64/sve/cond_fmaxnm_7.c: Likewise. > * gcc.target/aarch64/sve/cond_fmaxnm_7_run.c: Likewise. > * gcc.target/aarch64/sve/cond_fmaxnm_8.c: Likewise. > * gcc.target/aarch64/sve/cond_fmaxnm_8_run.c: Likewise. > * gcc.target/aarch64/sve/cond_fminnm_5.c: Likewise. > * gcc.target/aarch64/sve/cond_fminnm_5_run.c: Likewise. > * gcc.target/aarch64/sve/cond_fminnm_6.c: Likewise. > * gcc.target/aarch64/sve/cond_fminnm_6_run.c: Likewise. > * gcc.target/aarch64/sve/cond_fminnm_7.c: Likewise. > * gcc.target/aarch64/sve/cond_fminnm_7_run.c: Likewise. > * gcc.target/aarch64/sve/cond_fminnm_8.c: Likewise. > * gcc.target/aarch64/sve/cond_fminnm_8_run.c: Likewise. > --- > gcc/config/aarch64/aarch64-sve.md | 19 +++- > gcc/doc/md.texi | 4 +++ > gcc/internal-fn.c | 4 +++ > gcc/internal-fn.def | 2 ++ > gcc/match.pd | 2 ++ > gcc/optabs.def| 2 ++ > .../gcc.target/aarch64/sve/cond_fmaxnm_5.c| 28 ++ > .../aarch64/sve/cond_fmaxnm_5_run.c | 4 +++ > .../gcc.target/aarch64/sve/cond_fmaxnm_6.c| 22 ++ > .../aarch64/sve/cond_fmaxnm_6_run.c | 4 +++ > .../gcc.target/aarch64/sve/cond_fmaxnm_7.c| 27 + > .../aarch64/sve/cond_fmaxnm_7_run.c | 4 +++ > .../gcc.target/aarch64/sve/cond_fmaxnm_8.c| 26 + > .../aarch64/sve/cond_fmaxnm_8_run.c | 4 +++ > .../gcc.target/aarch64/sve/cond_fminnm_5.c| 29 +++ > .../aarch64/sve/cond_fminnm_5_run.c | 4 +++ > .../gcc.target/aarch64/sve/cond_fminnm_6.c| 23 +++ > .../aarch64/sve/cond_fminnm_6_run.c | 4 +++ > .../gcc.target/aarch64/sve/cond_fminnm_7.c| 28 ++ > .../aarch64/sve/cond_fminnm_7_run.c | 4 +++ > .../gcc.target/aarch64/sve/cond_fminnm_8.c| 27 + > .../aarch64/sve/cond_fminnm_8_run.c | 4 +++ > 22 files changed, 274 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_5.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_5_run.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_6.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_6_run.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_7.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_7_run.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_8.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fmaxnm_8_run.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_5.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_5_run.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_6.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_6_run.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_7.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_7_run.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_8.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_fminnm_8_run.c > > diff --git a/gcc/config/aarch64/aarch64-sve.md > b/gcc/config/aarch64/aarch64-sve.md > index 5de479e141a..0f5bf5ea8cb 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -6287,7 +6287,7 @@ (define_expand "xorsign3" > ;; - > > ;; Unpredicated fmax/fmin (the libm functions). The optabs for the > -;; smin/smax rtx codes are handled in the generic section above. > +;; sma
[PATCH] aarch64: Use type-qualified builtins for ADDV Neon intrinsics
Hi, This patch declares unsigned type-qualified builtins and uses them to implement the vector reduction Neon intrinsics. This removes the need for many casts in arm_neon.h. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-09 Jonathan Wright * config/aarch64/aarch64-simd-builtins.def: Declare unsigned builtins for vector reduction. * config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified builtin and remove casts. (vaddv_u16): Likewise. (vaddv_u32): Likewise. (vaddvq_u8): Likewise. (vaddvq_u16): Likewise. (vaddvq_u32): Likewise. (vaddvq_u64): Likewise. rb15057.patch Description: rb15057.patch
Re: [PATCH] aarch64: Use type-qualified builtins for [R]SUBHN[2] Neon intrinsics
Jonathan Wright writes: > Hi, > > This patch declares unsigned type-qualified builtins and uses them to > implement (rounding) halving-narrowing-subtract Neon intrinsics. This > removes the need for many casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-09 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: Declare unsigned > builtins for [r]subhn[2]. > * config/aarch64/arm_neon.h (vsubhn_s16): Remove unnecessary > cast. > (vsubhn_s32): Likewise. > (vsubhn_s64): Likewise. > (vsubhn_u16): Use type-qualified builtin and remove casts. > (vsubhn_u32): Likewise. > (vsubhn_u64): Likewise. > (vrsubhn_s16): Remove unnecessary cast. > (vrsubhn_s32): Likewise. > (vrsubhn_s64): Likewise. > (vrsubhn_u16): Use type-qualified builtin and remove casts. > (vrsubhn_u32): Likewise. > (vrsubhn_u64): Likewise. > (vrsubhn_high_s16): Remove unnecessary cast. > (vrsubhn_high_s32): Likewise. > (vrsubhn_high_s64): Likewise. > (vrsubhn_high_u16): Use type-qualified builtin and remove > casts. > (vrsubhn_high_u32): Likewise. > (vrsubhn_high_u64): Likewise. > (vsubhn_high_s16): Remove unnecessary cast. > (vsubhn_high_s32): Likewise. > (vsubhn_high_s64): Likewise. > (vsubhn_high_u16): Use type-qualified builtin and remove > casts. > (vsubhn_high_u32): Likewise. > (vsubhn_high_u64): Likewise. OK, thanks. Richard > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > 6372da80be33c40cb27e5811bfb4f4f672f28a35..035bddcb660e34146b709fdae244571cdeb06272 > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -222,16 +222,20 @@ >BUILTIN_VQN (BINOP, addhn, 0, NONE) >BUILTIN_VQN (BINOPU, addhn, 0, NONE) >BUILTIN_VQN (BINOP, subhn, 0, NONE) > + BUILTIN_VQN (BINOPU, subhn, 0, NONE) >BUILTIN_VQN (BINOP, raddhn, 0, NONE) >BUILTIN_VQN (BINOPU, raddhn, 0, NONE) >BUILTIN_VQN (BINOP, rsubhn, 0, NONE) > + BUILTIN_VQN (BINOPU, rsubhn, 0, NONE) >/* Implemented by aarch64_hn2. */ >BUILTIN_VQN (TERNOP, addhn2, 0, NONE) >BUILTIN_VQN (TERNOPU, addhn2, 0, NONE) >BUILTIN_VQN (TERNOP, subhn2, 0, NONE) > + BUILTIN_VQN (TERNOPU, subhn2, 0, NONE) >BUILTIN_VQN (TERNOP, raddhn2, 0, NONE) >BUILTIN_VQN (TERNOPU, raddhn2, 0, NONE) >BUILTIN_VQN (TERNOP, rsubhn2, 0, NONE) > + BUILTIN_VQN (TERNOPU, rsubhn2, 0, NONE) > >/* Implemented by aarch64_xtl. */ >BUILTIN_VQN (UNOP, sxtl, 0, NONE) > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > cb481542ba0d6ffb7cc8ffe7c1a098930fc5e746..ac871d4e503c634b453cd1f1d3e61182ce4a5a88 > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -2022,186 +2022,168 @@ __extension__ extern __inline int8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vsubhn_s16 (int16x8_t __a, int16x8_t __b) > { > - return (int8x8_t) __builtin_aarch64_subhnv8hi (__a, __b); > + return __builtin_aarch64_subhnv8hi (__a, __b); > } > > __extension__ extern __inline int16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vsubhn_s32 (int32x4_t __a, int32x4_t __b) > { > - return (int16x4_t) __builtin_aarch64_subhnv4si (__a, __b); > + return __builtin_aarch64_subhnv4si (__a, __b); > } > > __extension__ extern __inline int32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vsubhn_s64 (int64x2_t __a, int64x2_t __b) > { > - return (int32x2_t) __builtin_aarch64_subhnv2di (__a, __b); > + return __builtin_aarch64_subhnv2di (__a, __b); > } > > __extension__ extern __inline uint8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vsubhn_u16 (uint16x8_t __a, uint16x8_t __b) > { > - return (uint8x8_t) __builtin_aarch64_subhnv8hi ((int16x8_t) __a, > - (int16x8_t) __b); > + return __builtin_aarch64_subhnv8hi_uuu (__a, __b); > } > > __extension__ extern __inline uint16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vsubhn_u32 (uint32x4_t __a, uint32x4_t __b) > { > - return (uint16x4_t) __builtin_aarch64_subhnv4si ((int32x4_t) __a, > -(int32x4_t) __b); > + return __builtin_aarch64_subhnv4si_uuu (__a, __b); > } > > __extension__ extern __inline uint32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vsubhn_u64 (uint64x2_t __a, uint64x2_t __b) > { > - return (uint32x2_t) __builtin_aarch64_subhnv2di ((int64x2_t) __a, > -
Re: [PATCH] Allow loop header copying when first iteration condition is known.
On Thu, Nov 11, 2021 at 8:30 AM Richard Biener wrote: > > On Wed, Nov 10, 2021 at 9:42 PM Jeff Law wrote: > > > > > > > > On 11/10/2021 11:20 AM, Aldy Hernandez via Gcc-patches wrote: > > > As discussed in the PR, the loop header copying pass avoids doing so > > > when optimizing for size. However, sometimes we can determine the > > > loop entry conditional statically for the first iteration of the loop. > > > > > > This patch uses the path solver to determine the outgoing edge > > > out of preheader->header->xx. If so, it allows header copying. Doing > > > this in the loop optimizer saves us from doing gymnastics in the > > > threader which doesn't have the context to determine if a loop > > > transformation is profitable. > > > > > > I am only returning true in entry_loop_condition_is_static for > > > a true conditional. Technically a false conditional is also > > > provably static, but allowing any boolean value causes a regression > > > in gfortran.dg/vector_subscript_1.f90. > > > > > > I would have preferred not passing around the query object, but the > > > layout of pass_ch and should_duplicate_loop_header_p make it a bit > > > awkward to get it right without an outright refactor to the > > > pass. > > > > > > Tested on x86-64 Linux. > > > > > > OK? > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/102906 > > > * tree-ssa-loop-ch.c (entry_loop_condition_is_static): New. > > > (should_duplicate_loop_header_p): Call > > > entry_loop_condition_is_static. > > > (class ch_base): Add m_ranger and m_query. > > > (ch_base::copy_headers): Pass m_query to > > > entry_loop_condition_is_static. > > > (pass_ch::execute): Allocate and deallocate m_ranger and > > > m_query. > > > (pass_ch_vect::execute): Same. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/tree-ssa/pr102906.c: New test. > > OK. It also makes a nice little example of how to use a Ranger within > > an existing pass. > > Note if you just test for the condition to be true it will only catch 50% > of the desired cases since we have no idea whether the 'true' edge > is the edge existing the loop or the edge remaining in the loop. > For loop header copying we like to resolve statically to the edge > remaining in the loop, so you want Ahh, I figured there was some block shuffling needed. I was cautious not to touch much because of the gfortran.dg/vector_subscript_1.f90 regression, but now I see that the test fails for all optimization levels except -Os. With this fix we properly fail for all levels. I assume this is expected ;-). > > extract_true_false_edges_from_block (gimple_bb (last), &true_e, &false_e); > > /* If neither edge is the exit edge this is not a case we'd like to >special-case. */ > if (!loop_exit_edge_p (l, true_e) && !loop_exit_edge_p (l, false_e)) > return false; > > tree desired_static_value; > if (loop_exit_edge_p (l, true_e)) > desired_static_value = boolean_false_node; > else > desired_static_value = boolean_true_node; > > and test for desired_static_value. Thanks for the code! OK pending tests? From 9609cff278d3ddea9f74b805b395d5c0293a126c Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Thu, 11 Nov 2021 11:27:07 +0100 Subject: [PATCH] Resolve entry loop condition for the edge remaining in the loop. There is a known failure for gfortran.dg/vector_subscript_1.f90. It was previously failing for all optimization levels except -Os. Getting the loop header copying right, now makes it fail for all levels :-). Co-authored-by: Richard Biener gcc/ChangeLog: * tree-ssa-loop-ch.c (entry_loop_condition_is_static): Resolve statically to the edge remaining in the loop. --- gcc/tree-ssa-loop-ch.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c index c7d86d751d4..af3401f112c 100644 --- a/gcc/tree-ssa-loop-ch.c +++ b/gcc/tree-ssa-loop-ch.c @@ -57,10 +57,24 @@ entry_loop_condition_is_static (class loop *l, path_range_query *query) || !irange::supports_type_p (TREE_TYPE (gimple_cond_lhs (last return false; + edge true_e, false_e; + extract_true_false_edges_from_block (e->dest, &true_e, &false_e); + + /* If neither edge is the exit edge, this is not a case we'd like to + special-case. */ + if (!loop_exit_edge_p (l, true_e) && !loop_exit_edge_p (l, false_e)) +return false; + + tree desired_static_value; + if (loop_exit_edge_p (l, true_e)) +desired_static_value = boolean_false_node; + else +desired_static_value = boolean_true_node; + int_range<2> r; query->compute_ranges (e); query->range_of_stmt (r, last); - return r == int_range<2> (boolean_true_node, boolean_true_node); + return r == int_range<2> (desired_static_value, desired_static_value); } /* Check whether we should duplicate HEADER of LOOP. At most *LIMIT -- 2.31.1
[PATCH] aarch64: Use type-qualified builtins for LD1/ST1 Neon intrinsics
Hi, This patch declares unsigned and polynomial type-qualified builtins and uses them to implement the LD1/ST1 Neon intrinsics. This removes the need for many casts in arm_neon.h. The new type-qualified builtins are also lowered to gimple - as the unqualified builtins are already. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-10 Jonathan Wright * config/aarch64/aarch64-builtins.c (TYPES_LOAD1_U): Define. (TYPES_LOAD1_P): Define. (TYPES_STORE1_U): Define. (TYPES_STORE1P): Rename to... (TYPES_STORE1_P): This. (get_mem_type_for_load_store): Add unsigned and poly types. (aarch64_general_gimple_fold_builtin): Add unsigned and poly type-qualified builtin declarations. * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtins for LD1/ST1. * config/aarch64/arm_neon.h (vld1_p8): Use type-qualified builtin and remove cast. (vld1_p16): Likewise. (vld1_u8): Likewise. (vld1_u16): Likewise. (vld1_u32): Likewise. (vld1q_p8): Likewise. (vld1q_p16): Likewise. (vld1q_p64): Likewise. (vld1q_u8): Likewise. (vld1q_u16): Likewise. (vld1q_u32): Likewise. (vld1q_u64): Likewise. (vst1_p8): Likewise. (vst1_p16): Likewise. (vst1_u8): Likewise. (vst1_u16): Likewise. (vst1_u32): Likewise. (vst1q_p8): Likewise. (vst1q_p16): Likewise. (vst1q_p64): Likewise. (vst1q_u8): Likewise. (vst1q_u16): Likewise. (vst1q_u32): Likewise. (vst1q_u64): Likewise. * config/aarch64/iterators.md (VALLP_NO_DI): New iterator. rb15058.patch Description: rb15058.patch
[PATCH] aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics
Hi, This patch declares unsigned and polynomial type-qualified builtins for vcombine_* Neon intrinsics. Using these builtins removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-10 Jonathan Wright * config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete. (TYPES_COMBINEP): Delete. * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtins for vcombine_* intrinsics. * config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary cast. (vcombine_s16): Likewise. (vcombine_s32): Likewise. (vcombine_f32): Likewise. (vcombine_u8): Use type-qualified builtin and remove casts. (vcombine_u16): Likewise. (vcombine_u32): Likewise. (vcombine_u64): Likewise. (vcombine_p8): Likewise. (vcombine_p16): Likewise. (vcombine_p64): Likewise. (vcombine_bf16): Remove unnecessary cast. * config/aarch64/iterators.md (VDC_I): New mode iterator. (VDC_P): New mode iterator. rb15059.patch Description: rb15059.patch
Re: [PATCH] aarch64: Use type-qualified builtins for ADDP Neon intrinsics
Jonathan Wright writes: > Hi, > > This patch declares unsigned type-qualified builtins and uses them to > implement the pairwise addition Neon intrinsics. This removes the need > for many casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-09 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: > * config/aarch64/arm_neon.h (vpaddq_u8): Use type-qualified > builtin and remove casts. > (vpaddq_u16): Likewise. > (vpaddq_u32): Likewise. > (vpaddq_u64): Likewise. > (vpadd_u8): Likewise. > (vpadd_u16): Likewise. > (vpadd_u32): Likewise. > (vpaddd_u64): Likewise. OK, thanks. Was initially caught out by vpaddd_u64 not oreviously having a return cast, but of course that's because it's scalar, and so an implicit cast was allowed. So the patch is still avoiding two casts there. Richard > > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > 035bddcb660e34146b709fdae244571cdeb06272..7d6de6728cf7c63872e09850a394101f7abf21d4 > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -51,7 +51,9 @@ >BUILTIN_VHSDF_HSDF (BINOP, fmulx, 0, FP) >BUILTIN_VHSDF_DF (UNOP, sqrt, 2, FP) >BUILTIN_VDQ_I (BINOP, addp, 0, NONE) > + BUILTIN_VDQ_I (BINOPU, addp, 0, NONE) >VAR1 (UNOP, addp, 0, NONE, di) > + VAR1 (UNOPU, addp, 0, NONE, di) >BUILTIN_VDQ_BHSI (UNOP, clrsb, 2, NONE) >BUILTIN_VDQ_BHSI (UNOP, clz, 2, NONE) >BUILTIN_VS (UNOP, ctz, 2, NONE) > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > ac871d4e503c634b453cd1f1d3e61182ce4a5a88..ab46897d784b81bec9654d87557640ca4c1e5681 > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -8011,32 +8011,28 @@ __extension__ extern __inline uint8x16_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vpaddq_u8 (uint8x16_t __a, uint8x16_t __b) > { > - return (uint8x16_t) __builtin_aarch64_addpv16qi ((int8x16_t) __a, > -(int8x16_t) __b); > + return __builtin_aarch64_addpv16qi_uuu (__a, __b); > } > > __extension__ extern __inline uint16x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vpaddq_u16 (uint16x8_t __a, uint16x8_t __b) > { > - return (uint16x8_t) __builtin_aarch64_addpv8hi ((int16x8_t) __a, > - (int16x8_t) __b); > + return __builtin_aarch64_addpv8hi_uuu (__a, __b); > } > > __extension__ extern __inline uint32x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vpaddq_u32 (uint32x4_t __a, uint32x4_t __b) > { > - return (uint32x4_t) __builtin_aarch64_addpv4si ((int32x4_t) __a, > - (int32x4_t) __b); > + return __builtin_aarch64_addpv4si_uuu (__a, __b); > } > > __extension__ extern __inline uint64x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vpaddq_u64 (uint64x2_t __a, uint64x2_t __b) > { > - return (uint64x2_t) __builtin_aarch64_addpv2di ((int64x2_t) __a, > - (int64x2_t) __b); > + return __builtin_aarch64_addpv2di_uuu (__a, __b); > } > > __extension__ extern __inline int16x4_t > @@ -20293,24 +20289,21 @@ __extension__ extern __inline uint8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vpadd_u8 (uint8x8_t __a, uint8x8_t __b) > { > - return (uint8x8_t) __builtin_aarch64_addpv8qi ((int8x8_t) __a, > - (int8x8_t) __b); > + return __builtin_aarch64_addpv8qi_uuu (__a, __b); > } > > __extension__ extern __inline uint16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vpadd_u16 (uint16x4_t __a, uint16x4_t __b) > { > - return (uint16x4_t) __builtin_aarch64_addpv4hi ((int16x4_t) __a, > - (int16x4_t) __b); > + return __builtin_aarch64_addpv4hi_uuu (__a, __b); > } > > __extension__ extern __inline uint32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vpadd_u32 (uint32x2_t __a, uint32x2_t __b) > { > - return (uint32x2_t) __builtin_aarch64_addpv2si ((int32x2_t) __a, > - (int32x2_t) __b); > + return __builtin_aarch64_addpv2si_uuu (__a, __b); > } > > __extension__ extern __inline float32_t > @@ -20338,7 +20331,7 @@ __extension__ extern __inline uint64_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vpaddd_u64 (uint64x2_t __a) > { > - return __builtin_aarch64_addpdi ((int64x2_t) __a); > + return __builtin_aarch64_addpdi_uu (__a); > } > > /* vqabs */
[PATCH] aarch64: Use type-qualified builtins for vget_low/high intrinsics
Hi, This patch declares unsigned and polynomial type-qualified builtins for vget_low_*/vget_high_* Neon intrinsics. Using these builtins removes the need for many casts in arm_neon.h. Bootstrapped and regression tested on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-11-10 Jonathan Wright * config/aarch64/aarch64-builtins.c (TYPES_UNOPP): Define. * config/aarch64/aarch64-simd-builtins.def: Declare type- qualified builtins for vget_low/high. * config/aarch64/arm_neon.h (vget_low_p8): Use type-qualified builtin and remove casts. (vget_low_p16): Likewise. (vget_low_p64): Likewise. (vget_low_u8): Likewise. (vget_low_u16): Likewise. (vget_low_u32): Likewise. (vget_low_u64): Likewise. (vget_high_p8): Likewise. (vget_high_p16): Likewise. (vget_high_p64): Likewise. (vget_high_u8): Likewise. (vget_high_u16): Likewise. (vget_high_u32): Likewise. (vget_high_u64): Likewise. * config/aarch64/iterators.md (VQ_P): New mode iterator. rb15060.patch Description: rb15060.patch
Re: [PATCH] aarch64: Use type-qualified builtins for ADDV Neon intrinsics
Jonathan Wright writes: > Hi, > > This patch declares unsigned type-qualified builtins and uses them to > implement the vector reduction Neon intrinsics. This removes the need > for many casts in arm_neon.h. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-09 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: Declare unsigned > builtins for vector reduction. > * config/aarch64/arm_neon.h (vaddv_u8): Use type-qualified > builtin and remove casts. > (vaddv_u16): Likewise. > (vaddv_u32): Likewise. > (vaddvq_u8): Likewise. > (vaddvq_u16): Likewise. > (vaddvq_u32): Likewise. > (vaddvq_u64): Likewise. OK, thanks. Richard > > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > 7d6de6728cf7c63872e09850a394101f7abf21d4..35a099e1fb8dd1acb9e35583d1267df257d961b0 > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -513,6 +513,7 @@ > >/* Implemented by aarch64_reduc_plus_. */ >BUILTIN_VALL (UNOP, reduc_plus_scal_, 10, NONE) > + BUILTIN_VDQ_I (UNOPU, reduc_plus_scal_, 10, NONE) > >/* Implemented by reduc__scal_ (producing scalar). */ >BUILTIN_VDQIF_F16 (UNOP, reduc_smax_scal_, 10, NONE) > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > ab46897d784b81bec9654d87557640ca4c1e5681..3c03432b5b6c6cd0f349671366615925d38121e5 > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -9695,21 +9695,21 @@ __extension__ extern __inline uint8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddv_u8 (uint8x8_t __a) > { > - return (uint8_t) __builtin_aarch64_reduc_plus_scal_v8qi ((int8x8_t) __a); > + return __builtin_aarch64_reduc_plus_scal_v8qi_uu (__a); > } > > __extension__ extern __inline uint16_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddv_u16 (uint16x4_t __a) > { > - return (uint16_t) __builtin_aarch64_reduc_plus_scal_v4hi ((int16x4_t) __a); > + return __builtin_aarch64_reduc_plus_scal_v4hi_uu (__a); > } > > __extension__ extern __inline uint32_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddv_u32 (uint32x2_t __a) > { > - return (int32_t) __builtin_aarch64_reduc_plus_scal_v2si ((int32x2_t) __a); > + return __builtin_aarch64_reduc_plus_scal_v2si_uu (__a); > } > > __extension__ extern __inline int8_t > @@ -9744,28 +9744,28 @@ __extension__ extern __inline uint8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddvq_u8 (uint8x16_t __a) > { > - return (uint8_t) __builtin_aarch64_reduc_plus_scal_v16qi ((int8x16_t) __a); > + return __builtin_aarch64_reduc_plus_scal_v16qi_uu (__a); > } > > __extension__ extern __inline uint16_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddvq_u16 (uint16x8_t __a) > { > - return (uint16_t) __builtin_aarch64_reduc_plus_scal_v8hi ((int16x8_t) __a); > + return __builtin_aarch64_reduc_plus_scal_v8hi_uu (__a); > } > > __extension__ extern __inline uint32_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddvq_u32 (uint32x4_t __a) > { > - return (uint32_t) __builtin_aarch64_reduc_plus_scal_v4si ((int32x4_t) __a); > + return __builtin_aarch64_reduc_plus_scal_v4si_uu (__a); > } > > __extension__ extern __inline uint64_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vaddvq_u64 (uint64x2_t __a) > { > - return (uint64_t) __builtin_aarch64_reduc_plus_scal_v2di ((int64x2_t) __a); > + return __builtin_aarch64_reduc_plus_scal_v2di_uu (__a); > } > > __extension__ extern __inline float32_t
Re: [PATCH] aarch64: Use type-qualified builtins for LD1/ST1 Neon intrinsics
Jonathan Wright writes: > Hi, > > This patch declares unsigned and polynomial type-qualified builtins and > uses them to implement the LD1/ST1 Neon intrinsics. This removes the > need for many casts in arm_neon.h. > > The new type-qualified builtins are also lowered to gimple - as the > unqualified builtins are already. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-10 Jonathan Wright > > * config/aarch64/aarch64-builtins.c (TYPES_LOAD1_U): Define. > (TYPES_LOAD1_P): Define. > (TYPES_STORE1_U): Define. > (TYPES_STORE1P): Rename to... > (TYPES_STORE1_P): This. > (get_mem_type_for_load_store): Add unsigned and poly types. > (aarch64_general_gimple_fold_builtin): Add unsigned and poly > type-qualified builtin declarations. > * config/aarch64/aarch64-simd-builtins.def: Declare type- > qualified builtins for LD1/ST1. > * config/aarch64/arm_neon.h (vld1_p8): Use type-qualified > builtin and remove cast. > (vld1_p16): Likewise. > (vld1_u8): Likewise. > (vld1_u16): Likewise. > (vld1_u32): Likewise. > (vld1q_p8): Likewise. > (vld1q_p16): Likewise. > (vld1q_p64): Likewise. > (vld1q_u8): Likewise. > (vld1q_u16): Likewise. > (vld1q_u32): Likewise. > (vld1q_u64): Likewise. > (vst1_p8): Likewise. > (vst1_p16): Likewise. > (vst1_u8): Likewise. > (vst1_u16): Likewise. > (vst1_u32): Likewise. > (vst1q_p8): Likewise. > (vst1q_p16): Likewise. > (vst1q_p64): Likewise. > (vst1q_u8): Likewise. > (vst1q_u16): Likewise. > (vst1q_u32): Likewise. > (vst1q_u64): Likewise. > * config/aarch64/iterators.md (VALLP_NO_DI): New iterator. > > diff --git a/gcc/config/aarch64/aarch64-builtins.c > b/gcc/config/aarch64/aarch64-builtins.c > index > 5053bf0f8fd6638bf84a6df06c0987a0216b69e7..f286401ff3ab01dd860ae22858ca07e364247414 > 100644 > --- a/gcc/config/aarch64/aarch64-builtins.c > +++ b/gcc/config/aarch64/aarch64-builtins.c > @@ -372,10 +372,12 @@ aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS] > static enum aarch64_type_qualifiers > aarch64_types_load1_u_qualifiers[SIMD_MAX_BUILTIN_ARGS] >= { qualifier_unsigned, qualifier_const_pointer_map_mode }; > +#define TYPES_LOAD1_U (aarch64_types_load1_u_qualifiers) > #define TYPES_LOADSTRUCT_U (aarch64_types_load1_u_qualifiers) > static enum aarch64_type_qualifiers > aarch64_types_load1_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] >= { qualifier_poly, qualifier_const_pointer_map_mode }; > +#define TYPES_LOAD1_P (aarch64_types_load1_p_qualifiers) > #define TYPES_LOADSTRUCT_P (aarch64_types_load1_p_qualifiers) > > static enum aarch64_type_qualifiers > @@ -423,11 +425,12 @@ aarch64_types_store1_qualifiers[SIMD_MAX_BUILTIN_ARGS] > static enum aarch64_type_qualifiers > aarch64_types_store1_u_qualifiers[SIMD_MAX_BUILTIN_ARGS] >= { qualifier_void, qualifier_pointer_map_mode, qualifier_unsigned }; > +#define TYPES_STORE1_U (aarch64_types_store1_u_qualifiers) > #define TYPES_STORESTRUCT_U (aarch64_types_store1_u_qualifiers) > static enum aarch64_type_qualifiers > aarch64_types_store1_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] >= { qualifier_void, qualifier_pointer_map_mode, qualifier_poly }; > -#define TYPES_STORE1P (aarch64_types_store1_p_qualifiers) > +#define TYPES_STORE1_P (aarch64_types_store1_p_qualifiers) > #define TYPES_STORESTRUCT_P (aarch64_types_store1_p_qualifiers) > > static enum aarch64_type_qualifiers > @@ -2590,47 +2593,83 @@ get_mem_type_for_load_store (unsigned int fcode) > { >switch (fcode) >{ > -VAR1 (LOAD1, ld1 , 0, LOAD, v8qi) > -VAR1 (STORE1, st1 , 0, STORE, v8qi) > +VAR1 (LOAD1, ld1, 0, LOAD, v8qi) > +VAR1 (STORE1, st1, 0, STORE, v8qi) >return Int8x8_t; > -VAR1 (LOAD1, ld1 , 0, LOAD, v16qi) > -VAR1 (STORE1, st1 , 0, STORE, v16qi) > +VAR1 (LOAD1, ld1, 0, LOAD, v16qi) > +VAR1 (STORE1, st1, 0, STORE, v16qi) >return Int8x16_t; > -VAR1 (LOAD1, ld1 , 0, LOAD, v4hi) > -VAR1 (STORE1, st1 , 0, STORE, v4hi) > +VAR1 (LOAD1, ld1, 0, LOAD, v4hi) > +VAR1 (STORE1, st1, 0, STORE, v4hi) >return Int16x4_t; > -VAR1 (LOAD1, ld1 , 0, LOAD, v8hi) > -VAR1 (STORE1, st1 , 0, STORE, v8hi) > +VAR1 (LOAD1, ld1, 0, LOAD, v8hi) > +VAR1 (STORE1, st1, 0, STORE, v8hi) >return Int16x8_t; > -VAR1 (LOAD1, ld1 , 0, LOAD, v2si) > -VAR1 (STORE1, st1 , 0, STORE, v2si) > +VAR1 (LOAD1, ld1, 0, LOAD, v2si) > +VAR1 (STORE1, st1, 0, STORE, v2si) >return Int32x2_t; > -VAR1 (LOAD1, ld1 , 0, LOAD, v4si) > -VAR1 (STORE1, st1 , 0, STORE, v4si) > +VAR1 (LOAD1, ld1, 0, LOAD, v4si) > +VAR1 (STORE1, st1, 0, STORE, v4si) >return Int32x4_t; > -VAR1 (LOAD1, ld
Re: [PATCH 2/5] gimple-match: Add a gimple_extract_op function
On Wed, Nov 10, 2021 at 1:46 PM Richard Sandiford via Gcc-patches wrote: > > code_helper and gimple_match_op seem like generally useful ways > of summing up a gimple_assign or gimple_call (or gimple_cond). > This patch adds a gimple_extract_op function that can be used > for that. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > Richard > > > gcc/ > * gimple-match.h (gimple_extract_op): Declare. > * gimple-match.c (gimple_extract): New function, extracted from... > (gimple_simplify): ...here. > (gimple_extract_op): New function. > --- > gcc/gimple-match-head.c | 261 +++- > gcc/gimple-match.h | 1 + > 2 files changed, 149 insertions(+), 113 deletions(-) > > diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c > index 9d88b2f8551..4c6e0883ba4 100644 > --- a/gcc/gimple-match-head.c > +++ b/gcc/gimple-match-head.c > @@ -890,12 +890,29 @@ try_conditional_simplification (internal_fn ifn, > gimple_match_op *res_op, >return true; > } > > -/* The main STMT based simplification entry. It is used by the fold_stmt > - and the fold_stmt_to_constant APIs. */ > +/* Common subroutine of gimple_extract_op and gimple_simplify. Try to > + describe STMT in RES_OP. Return: > > -bool > -gimple_simplify (gimple *stmt, gimple_match_op *res_op, gimple_seq *seq, > -tree (*valueize)(tree), tree (*top_valueize)(tree)) > + - -1 if extraction failed > + - otherwise, 0 if no simplification should take place > + - otherwise, the number of operands for a GIMPLE_ASSIGN or GIMPLE_COND > + - otherwise, -2 for a GIMPLE_CALL > + > + Before recording an operand, call: > + > + - VALUEIZE_CONDITION for a COND_EXPR condition > + - VALUEIZE_NAME if the rhs of a GIMPLE_ASSIGN is an SSA_NAME I think at least VALUEIZE_NAME is unnecessary, see below > + - VALUEIZE_OP for every other top-level operand > + > + Each routine takes a tree argument and returns a tree. */ > + > +template +typename ValueizeName> > +inline int > +gimple_extract (gimple *stmt, gimple_match_op *res_op, > + ValueizeOp valueize_op, > + ValueizeCondition valueize_condition, > + ValueizeName valueize_name) > { >switch (gimple_code (stmt)) > { > @@ -911,100 +928,53 @@ gimple_simplify (gimple *stmt, gimple_match_op > *res_op, gimple_seq *seq, > || code == VIEW_CONVERT_EXPR) > { > tree op0 = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0); > - bool valueized = false; > - op0 = do_valueize (op0, top_valueize, valueized); > - res_op->set_op (code, type, op0); > - return (gimple_resimplify1 (seq, res_op, valueize) > - || valueized); > + res_op->set_op (code, type, valueize_op (op0)); > + return 1; > } > else if (code == BIT_FIELD_REF) > { > tree rhs1 = gimple_assign_rhs1 (stmt); > - tree op0 = TREE_OPERAND (rhs1, 0); > - bool valueized = false; > - op0 = do_valueize (op0, top_valueize, valueized); > + tree op0 = valueize_op (TREE_OPERAND (rhs1, 0)); > res_op->set_op (code, type, op0, > TREE_OPERAND (rhs1, 1), > TREE_OPERAND (rhs1, 2), > REF_REVERSE_STORAGE_ORDER (rhs1)); > - if (res_op->reverse) > - return valueized; > - return (gimple_resimplify3 (seq, res_op, valueize) > - || valueized); > + return res_op->reverse ? 0 : 3; > } > - else if (code == SSA_NAME > -&& top_valueize) > + else if (code == SSA_NAME) > { > tree op0 = gimple_assign_rhs1 (stmt); > - tree valueized = top_valueize (op0); > + tree valueized = valueize_name (op0); > if (!valueized || op0 == valueized) > - return false; > + return -1; > res_op->set_op (TREE_CODE (op0), type, valueized); > - return true; > + return 0; So the old code in an obfuscated way just knowed nothing simplifies on the plain not valueized name but returned true when valueization changed the stmt. So I'd expect tree valueized = valueize_op (op0); res_op->set_op (TREE_CODE (op0), type, valueized); return 0; here and the gimple_simplify caller returning 'valueized'. I think that the old code treated a NULL top_valueize () as "fail" is just premature optimization without any effect. > } > break; > case GIMPLE_UNARY_RHS: > { > tree rhs1 = gimple_assign_rhs1 (stmt
Re: [PATCH 3/5] gimple-match: Make code_helper conversions explicit
On Wed, Nov 10, 2021 at 1:47 PM Richard Sandiford via Gcc-patches wrote: > > code_helper provides conversions to tree_code and combined_fn. > Now that the codebase is C++11, we can mark these conversions as > explicit. This avoids accidentally using code_helpers with > functions that take tree_codes, which would previously entail > a hidden unchecked conversion. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? OK. Thanks, Richard. > Richard > > > gcc/ > * gimple-match.h (code_helper): Provide == and != overloads. > (code_helper::operator tree_code): Make explicit. > (code_helper::operator combined_fn): Likewise. > * gimple-match-head.c (convert_conditional_op): Use explicit > conversions where necessary. > (gimple_resimplify1, gimple_resimplify2, gimple_resimplify3): > Likewise. > (maybe_push_res_to_seq, gimple_simplify): Likewise. > * gimple-fold.c (replace_stmt_with_simplification): Likewise. > --- > gcc/gimple-fold.c | 18 --- > gcc/gimple-match-head.c | 51 ++--- > gcc/gimple-match.h | 9 ++-- > 3 files changed, 45 insertions(+), 33 deletions(-) > > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > index 6e25a7c05db..9daf2cc590c 100644 > --- a/gcc/gimple-fold.c > +++ b/gcc/gimple-fold.c > @@ -5828,18 +5828,19 @@ replace_stmt_with_simplification > (gimple_stmt_iterator *gsi, >if (gcond *cond_stmt = dyn_cast (stmt)) > { >gcc_assert (res_op->code.is_tree_code ()); > - if (TREE_CODE_CLASS ((enum tree_code) res_op->code) == tcc_comparison > + auto code = tree_code (res_op->code); > + if (TREE_CODE_CLASS (code) == tcc_comparison > /* GIMPLE_CONDs condition may not throw. */ > && (!flag_exceptions > || !cfun->can_throw_non_call_exceptions > - || !operation_could_trap_p (res_op->code, > + || !operation_could_trap_p (code, > FLOAT_TYPE_P (TREE_TYPE (ops[0])), > false, NULL_TREE))) > - gimple_cond_set_condition (cond_stmt, res_op->code, ops[0], ops[1]); > - else if (res_op->code == SSA_NAME) > + gimple_cond_set_condition (cond_stmt, code, ops[0], ops[1]); > + else if (code == SSA_NAME) > gimple_cond_set_condition (cond_stmt, NE_EXPR, ops[0], >build_zero_cst (TREE_TYPE (ops[0]))); > - else if (res_op->code == INTEGER_CST) > + else if (code == INTEGER_CST) > { > if (integer_zerop (ops[0])) > gimple_cond_make_false (cond_stmt); > @@ -5870,11 +5871,12 @@ replace_stmt_with_simplification > (gimple_stmt_iterator *gsi, >else if (is_gimple_assign (stmt) >&& res_op->code.is_tree_code ()) > { > + auto code = tree_code (res_op->code); >if (!inplace > - || gimple_num_ops (stmt) > get_gimple_rhs_num_ops (res_op->code)) > + || gimple_num_ops (stmt) > get_gimple_rhs_num_ops (code)) > { > maybe_build_generic_op (res_op); > - gimple_assign_set_rhs_with_ops (gsi, res_op->code, > + gimple_assign_set_rhs_with_ops (gsi, code, > res_op->op_or_null (0), > res_op->op_or_null (1), > res_op->op_or_null (2)); > @@ -5891,7 +5893,7 @@ replace_stmt_with_simplification (gimple_stmt_iterator > *gsi, > } > } >else if (res_op->code.is_fn_code () > - && gimple_call_combined_fn (stmt) == res_op->code) > + && gimple_call_combined_fn (stmt) == combined_fn (res_op->code)) > { >gcc_assert (num_ops == gimple_call_num_args (stmt)); >for (unsigned int i = 0; i < num_ops; ++i) > diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c > index 4c6e0883ba4..d4d7d767075 100644 > --- a/gcc/gimple-match-head.c > +++ b/gcc/gimple-match-head.c > @@ -96,7 +96,7 @@ convert_conditional_op (gimple_match_op *orig_op, > ifn = get_conditional_internal_fn ((tree_code) orig_op->code); >else > { > - combined_fn cfn = orig_op->code; > + auto cfn = combined_fn (orig_op->code); >if (!internal_fn_p (cfn)) > return false; >ifn = get_conditional_internal_fn (as_internal_fn (cfn)); > @@ -206,10 +206,10 @@ gimple_resimplify1 (gimple_seq *seq, gimple_match_op > *res_op, >tree tem = NULL_TREE; >if (res_op->code.is_tree_code ()) > { > - tree_code code = res_op->code; > + auto code = tree_code (res_op->code); > if (IS_EXPR_CODE_CLASS (TREE_CODE_CLASS (code)) > && TREE_CODE_LENGTH (code) == 1) > - tem = const_unop (res_op->code, res_op->type, res_op->ops[0]); > + tem = const_unop (code, res_op->type, res_op->ops[0]); > } >else >
Re: [PATCH] aarch64: Use type-qualified builtins for vcombine_* Neon intrinsics
Jonathan Wright writes: > Hi, > > This patch declares unsigned and polynomial type-qualified builtins for > vcombine_* Neon intrinsics. Using these builtins removes the need for > many casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-10 Jonathan Wright > > * config/aarch64/aarch64-builtins.c (TYPES_COMBINE): Delete. > (TYPES_COMBINEP): Delete. > * config/aarch64/aarch64-simd-builtins.def: Declare type- > qualified builtins for vcombine_* intrinsics. > * config/aarch64/arm_neon.h (vcombine_s8): Remove unnecessary > cast. > (vcombine_s16): Likewise. > (vcombine_s32): Likewise. > (vcombine_f32): Likewise. > (vcombine_u8): Use type-qualified builtin and remove casts. > (vcombine_u16): Likewise. > (vcombine_u32): Likewise. > (vcombine_u64): Likewise. > (vcombine_p8): Likewise. > (vcombine_p16): Likewise. > (vcombine_p64): Likewise. > (vcombine_bf16): Remove unnecessary cast. > * config/aarch64/iterators.md (VDC_I): New mode iterator. > (VDC_P): New mode iterator. > > diff --git a/gcc/config/aarch64/aarch64-builtins.c > b/gcc/config/aarch64/aarch64-builtins.c > index > f286401ff3ab01dd860ae22858ca07e364247414..7abf8747b69591815068709af42598c47d73269e > 100644 > --- a/gcc/config/aarch64/aarch64-builtins.c > +++ b/gcc/config/aarch64/aarch64-builtins.c > @@ -353,17 +353,6 @@ > aarch64_types_unsigned_shiftacc_qualifiers[SIMD_MAX_BUILTIN_ARGS] >qualifier_immediate }; > #define TYPES_USHIFTACC (aarch64_types_unsigned_shiftacc_qualifiers) > > - > -static enum aarch64_type_qualifiers > -aarch64_types_combine_qualifiers[SIMD_MAX_BUILTIN_ARGS] > - = { qualifier_none, qualifier_none, qualifier_none }; > -#define TYPES_COMBINE (aarch64_types_combine_qualifiers) > - > -static enum aarch64_type_qualifiers > -aarch64_types_combine_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] > - = { qualifier_poly, qualifier_poly, qualifier_poly }; > -#define TYPES_COMBINEP (aarch64_types_combine_p_qualifiers) > - > static enum aarch64_type_qualifiers > aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS] >= { qualifier_none, qualifier_const_pointer_map_mode }; > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > 404696a71e0c1fc37cdf53fc42439a28bc9a745a..ab5f3a098f2047d0f1ba933f4418609678102c3d > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -43,8 +43,9 @@ > help describe the attributes (for example, pure) for the intrinsic > function. */ > > - BUILTIN_VDC (COMBINE, combine, 0, AUTO_FP) > - VAR1 (COMBINEP, combine, 0, NONE, di) > + BUILTIN_VDC (BINOP, combine, 0, AUTO_FP) > + BUILTIN_VDC_I (BINOPU, combine, 0, NONE) > + BUILTIN_VDC_P (BINOPP, combine, 0, NONE) >BUILTIN_VB (BINOPP, pmul, 0, NONE) >VAR1 (BINOPP, pmull, 0, NONE, v8qi) >VAR1 (BINOPP, pmull_hi, 0, NONE, v16qi) > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > 7abd1821840f84a79c37c40a33214294b06edbc6..c374e90f31546886a519ba270113ccedd4ca7abf > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -5975,21 +5975,21 @@ __extension__ extern __inline int8x16_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vcombine_s8 (int8x8_t __a, int8x8_t __b) > { > - return (int8x16_t) __builtin_aarch64_combinev8qi (__a, __b); > + return __builtin_aarch64_combinev8qi (__a, __b); > } > > __extension__ extern __inline int16x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vcombine_s16 (int16x4_t __a, int16x4_t __b) > { > - return (int16x8_t) __builtin_aarch64_combinev4hi (__a, __b); > + return __builtin_aarch64_combinev4hi (__a, __b); > } > > __extension__ extern __inline int32x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vcombine_s32 (int32x2_t __a, int32x2_t __b) > { > - return (int32x4_t) __builtin_aarch64_combinev2si (__a, __b); > + return __builtin_aarch64_combinev2si (__a, __b); > } > > __extension__ extern __inline int64x2_t > @@ -6010,38 +6010,35 @@ __extension__ extern __inline float32x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vcombine_f32 (float32x2_t __a, float32x2_t __b) > { > - return (float32x4_t) __builtin_aarch64_combinev2sf (__a, __b); > + return __builtin_aarch64_combinev2sf (__a, __b); > } > > __extension__ extern __inline uint8x16_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vcombine_u8 (uint8x8_t __a, uint8x8_t __b) > { > - return (uint8x16_t) __builtin_aarch64_combinev8qi ((int8x8_t) __a, > - (int8x8_t) __b); > +
Re: [PATCH 2/4] Mark IFN_COMPLEX_MUL as commutative
On Wed, Nov 10, 2021 at 1:51 PM Richard Sandiford via Gcc-patches wrote: > > Mark IFN_COMPLEX_MUL as commutative. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? OK > Richard > > > gcc/ > * internal-fn.c (commutative_binary_fn_p): Handle IFN_COMPLEX_MUL. > > gcc/testsuite/ > * gcc.target/aarch64/sve/complex_mul_1.c: New test. > --- > gcc/internal-fn.c| 1 + > .../gcc.target/aarch64/sve/complex_mul_1.c | 16 > 2 files changed, 17 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/complex_mul_1.c > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c > index 7b13db6dfe3..ff7d43f1801 100644 > --- a/gcc/internal-fn.c > +++ b/gcc/internal-fn.c > @@ -3829,6 +3829,7 @@ commutative_binary_fn_p (internal_fn fn) > case IFN_MULHRS: > case IFN_FMIN: > case IFN_FMAX: > +case IFN_COMPLEX_MUL: >return true; > > default: > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/complex_mul_1.c > b/gcc/testsuite/gcc.target/aarch64/sve/complex_mul_1.c > new file mode 100644 > index 000..d197e7d0d8e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/complex_mul_1.c > @@ -0,0 +1,16 @@ > +/* { dg-options "-O2 -fgimple -fdump-tree-optimized" } */ > + > +void __GIMPLE > +foo (__SVFloat64_t x, __SVFloat64_t y, __SVFloat64_t *res1, > + __SVFloat64_t *res2) > +{ > + __SVFloat64_t a1; > + __SVFloat64_t a2; > + > + a1 = .COMPLEX_MUL (x, y); > + a2 = .COMPLEX_MUL (y, x); > + __MEM<__SVFloat64_t> (res1) = a1; > + __MEM<__SVFloat64_t> (res2) = a2; > +} > + > +/* { dg-final { scan-tree-dump-times {\.COMPLEX_MUL} 1 "optimized" } } */ > -- > 2.25.1 >
Re: [PATCH] aarch64: Use type-qualified builtins for vget_low/high intrinsics
Jonathan Wright writes: > Hi, > > This patch declares unsigned and polynomial type-qualified builtins for > vget_low_*/vget_high_* Neon intrinsics. Using these builtins removes > the need for many casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-10 Jonathan Wright > > * config/aarch64/aarch64-builtins.c (TYPES_UNOPP): Define. > * config/aarch64/aarch64-simd-builtins.def: Declare type- > qualified builtins for vget_low/high. > * config/aarch64/arm_neon.h (vget_low_p8): Use type-qualified > builtin and remove casts. > (vget_low_p16): Likewise. > (vget_low_p64): Likewise. > (vget_low_u8): Likewise. > (vget_low_u16): Likewise. > (vget_low_u32): Likewise. > (vget_low_u64): Likewise. > (vget_high_p8): Likewise. > (vget_high_p16): Likewise. > (vget_high_p64): Likewise. > (vget_high_u8): Likewise. > (vget_high_u16): Likewise. > (vget_high_u32): Likewise. > (vget_high_u64): Likewise. > * config/aarch64/iterators.md (VQ_P): New mode iterator. > > diff --git a/gcc/config/aarch64/aarch64-builtins.c > b/gcc/config/aarch64/aarch64-builtins.c > index > 7abf8747b69591815068709af42598c47d73269e..3edc2f55e571c1a34a24add842c47b130d900cf6 > 100644 > --- a/gcc/config/aarch64/aarch64-builtins.c > +++ b/gcc/config/aarch64/aarch64-builtins.c > @@ -204,6 +204,10 @@ aarch64_types_unopu_qualifiers[SIMD_MAX_BUILTIN_ARGS] >= { qualifier_unsigned, qualifier_unsigned }; > #define TYPES_UNOPU (aarch64_types_unopu_qualifiers) > static enum aarch64_type_qualifiers > +aarch64_types_unopp_qualifiers[SIMD_MAX_BUILTIN_ARGS] > + = { qualifier_poly, qualifier_poly }; > +#define TYPES_UNOPP (aarch64_types_unopp_qualifiers) > +static enum aarch64_type_qualifiers > aarch64_types_unopus_qualifiers[SIMD_MAX_BUILTIN_ARGS] >= { qualifier_unsigned, qualifier_none }; > #define TYPES_UNOPUS (aarch64_types_unopus_qualifiers) > diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def > b/gcc/config/aarch64/aarch64-simd-builtins.def > index > ab5f3a098f2047d0f1ba933f4418609678102c3d..08d6bbe635424217687a429709c696c3282feea0 > 100644 > --- a/gcc/config/aarch64/aarch64-simd-builtins.def > +++ b/gcc/config/aarch64/aarch64-simd-builtins.def > @@ -62,8 +62,12 @@ > >/* Implemented by aarch64_get_low. */ >BUILTIN_VQMOV (UNOP, get_low, 0, AUTO_FP) > + BUILTIN_VQ_I (UNOPU, get_low, 0, NONE) > + BUILTIN_VQ_P (UNOPP, get_low, 0, NONE) >/* Implemented by aarch64_get_high. */ >BUILTIN_VQMOV (UNOP, get_high, 0, AUTO_FP) > + BUILTIN_VQ_I (UNOPU, get_high, 0, NONE) > + BUILTIN_VQ_P (UNOPP, get_high, 0, NONE) > >/* Implemented by aarch64_qshl. */ >BUILTIN_VSDQ_I (BINOP, sqshl, 0, NONE) > diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h > index > c374e90f31546886a519ba270113ccedd4ca7abf..6137d53297863aaad0cad31c7eb6eef24bc4316a > 100644 > --- a/gcc/config/aarch64/arm_neon.h > +++ b/gcc/config/aarch64/arm_neon.h > @@ -5799,21 +5799,21 @@ __extension__ extern __inline poly8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vget_low_p8 (poly8x16_t __a) > { > - return (poly8x8_t) __builtin_aarch64_get_lowv16qi ((int8x16_t) __a); > + return __builtin_aarch64_get_lowv16qi_pp (__a); > } > > __extension__ extern __inline poly16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vget_low_p16 (poly16x8_t __a) > { > - return (poly16x4_t) __builtin_aarch64_get_lowv8hi ((int16x8_t) __a); > + return __builtin_aarch64_get_lowv8hi_pp (__a); > } > > __extension__ extern __inline poly64x1_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vget_low_p64 (poly64x2_t __a) > { > - return (poly64x1_t) __builtin_aarch64_get_lowv2di ((int64x2_t) __a); > + return (poly64x1_t) __builtin_aarch64_get_lowv2di_pp (__a); I think we could define the intrinsics such that the return cast isn't needed either. poly64x1_t has the same mode (DI) as the scalar type, so it should “just” be a case of using qualifiers to pick the x1 vector type instead of the scalar type. Thanks, Richard > } > > __extension__ extern __inline int8x8_t > @@ -5848,28 +5848,28 @@ __extension__ extern __inline uint8x8_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vget_low_u8 (uint8x16_t __a) > { > - return (uint8x8_t) __builtin_aarch64_get_lowv16qi ((int8x16_t) __a); > + return __builtin_aarch64_get_lowv16qi_uu (__a); > } > > __extension__ extern __inline uint16x4_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vget_low_u16 (uint16x8_t __a) > { > - return (uint16x4_t) __builtin_aarch64_get_lowv8hi ((int16x8_t) __a); > + return __builtin_aarch64_get_lowv8hi_uu (__a); > } > > __extension__ extern __inlin
Re: [PATCH 1/4] Canonicalize argument order for commutative functions
On Wed, Nov 10, 2021 at 1:50 PM Richard Sandiford via Gcc-patches wrote: > > This patch uses information about internal functions to canonicalize > the argument order of calls. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? OK. Note the gimple_resimplifyN functions also canonicalize operand order, currently for is_tree_code only: /* Canonicalize operand order. */ bool canonicalized = false; if (res_op->code.is_tree_code () && (TREE_CODE_CLASS ((enum tree_code) res_op->code) == tcc_comparison || commutative_tree_code (res_op->code)) && tree_swap_operands_p (res_op->ops[0], res_op->ops[1])) { std::swap (res_op->ops[0], res_op->ops[1]); if (TREE_CODE_CLASS ((enum tree_code) res_op->code) == tcc_comparison) res_op->code = swap_tree_comparison (res_op->code); canonicalized = true; } that's maybe not the best place. The function assumes the operands are already valueized, so it maybe should be valueization that does the canonicalization - but I think doing it elsewhere made operand order unreliable (we do end up with non-canonical order in the IL sometimes). So maybe you should amend the code in resimplifyN as well. Richard. > Richard > > > gcc/ > * gimple-fold.c: Include internal-fn.h. > (fold_stmt_1): If a function maps to an internal one, use > first_commutative_argument to canonicalize the order of > commutative arguments. > > gcc/testsuite/ > * gcc.dg/fmax-fmin-1.c: New test. > --- > gcc/gimple-fold.c | 25 ++--- > gcc/testsuite/gcc.dg/fmax-fmin-1.c | 18 ++ > 2 files changed, 40 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/fmax-fmin-1.c > > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > index a937f130815..6a7d4507c89 100644 > --- a/gcc/gimple-fold.c > +++ b/gcc/gimple-fold.c > @@ -69,6 +69,7 @@ along with GCC; see the file COPYING3. If not see > #include "varasm.h" > #include "memmodel.h" > #include "optabs.h" > +#include "internal-fn.h" > > enum strlen_range_kind { >/* Compute the exact constant string length. */ > @@ -6140,18 +6141,36 @@ fold_stmt_1 (gimple_stmt_iterator *gsi, bool inplace, > tree (*valueize) (tree)) >break; > case GIMPLE_CALL: >{ > - for (i = 0; i < gimple_call_num_args (stmt); ++i) > + gcall *call = as_a (stmt); > + for (i = 0; i < gimple_call_num_args (call); ++i) > { > - tree *arg = gimple_call_arg_ptr (stmt, i); > + tree *arg = gimple_call_arg_ptr (call, i); > if (REFERENCE_CLASS_P (*arg) > && maybe_canonicalize_mem_ref_addr (arg)) > changed = true; > } > - tree *lhs = gimple_call_lhs_ptr (stmt); > + tree *lhs = gimple_call_lhs_ptr (call); > if (*lhs > && REFERENCE_CLASS_P (*lhs) > && maybe_canonicalize_mem_ref_addr (lhs)) > changed = true; > + if (*lhs) > + { > + combined_fn cfn = gimple_call_combined_fn (call); > + internal_fn ifn = associated_internal_fn (cfn, TREE_TYPE (*lhs)); > + int opno = first_commutative_argument (ifn); > + if (opno >= 0) > + { > + tree arg1 = gimple_call_arg (call, opno); > + tree arg2 = gimple_call_arg (call, opno + 1); > + if (tree_swap_operands_p (arg1, arg2)) > + { > + gimple_call_set_arg (call, opno, arg2); > + gimple_call_set_arg (call, opno + 1, arg1); > + changed = true; > + } > + } > + } > break; >} > case GIMPLE_ASM: > diff --git a/gcc/testsuite/gcc.dg/fmax-fmin-1.c > b/gcc/testsuite/gcc.dg/fmax-fmin-1.c > new file mode 100644 > index 000..e7e0518d8bb > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/fmax-fmin-1.c > @@ -0,0 +1,18 @@ > +/* { dg-options "-O -fdump-tree-optimized" } */ > + > +void > +f1 (double *res, double x, double y) > +{ > + res[0] = __builtin_fmax (x, y); > + res[1] = __builtin_fmax (y, x); > +} > + > +void > +f2 (double *res, double x, double y) > +{ > + res[0] = __builtin_fmin (x, y); > + res[1] = __builtin_fmin (y, x); > +} > + > +/* { dg-final { scan-tree-dump-times {__builtin_fmax} 1 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times {__builtin_fmin} 1 "optimized" } } */ > -- > 2.25.1 >
Re: [PATCH] Allow loop header copying when first iteration condition is known.
On Thu, Nov 11, 2021 at 11:33 AM Aldy Hernandez wrote: > > On Thu, Nov 11, 2021 at 8:30 AM Richard Biener > wrote: > > > > On Wed, Nov 10, 2021 at 9:42 PM Jeff Law wrote: > > > > > > > > > > > > On 11/10/2021 11:20 AM, Aldy Hernandez via Gcc-patches wrote: > > > > As discussed in the PR, the loop header copying pass avoids doing so > > > > when optimizing for size. However, sometimes we can determine the > > > > loop entry conditional statically for the first iteration of the loop. > > > > > > > > This patch uses the path solver to determine the outgoing edge > > > > out of preheader->header->xx. If so, it allows header copying. Doing > > > > this in the loop optimizer saves us from doing gymnastics in the > > > > threader which doesn't have the context to determine if a loop > > > > transformation is profitable. > > > > > > > > I am only returning true in entry_loop_condition_is_static for > > > > a true conditional. Technically a false conditional is also > > > > provably static, but allowing any boolean value causes a regression > > > > in gfortran.dg/vector_subscript_1.f90. > > > > > > > > I would have preferred not passing around the query object, but the > > > > layout of pass_ch and should_duplicate_loop_header_p make it a bit > > > > awkward to get it right without an outright refactor to the > > > > pass. > > > > > > > > Tested on x86-64 Linux. > > > > > > > > OK? > > > > > > > > gcc/ChangeLog: > > > > > > > > PR tree-optimization/102906 > > > > * tree-ssa-loop-ch.c (entry_loop_condition_is_static): New. > > > > (should_duplicate_loop_header_p): Call > > > > entry_loop_condition_is_static. > > > > (class ch_base): Add m_ranger and m_query. > > > > (ch_base::copy_headers): Pass m_query to > > > > entry_loop_condition_is_static. > > > > (pass_ch::execute): Allocate and deallocate m_ranger and > > > > m_query. > > > > (pass_ch_vect::execute): Same. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.dg/tree-ssa/pr102906.c: New test. > > > OK. It also makes a nice little example of how to use a Ranger within > > > an existing pass. > > > > Note if you just test for the condition to be true it will only catch 50% > > of the desired cases since we have no idea whether the 'true' edge > > is the edge existing the loop or the edge remaining in the loop. > > For loop header copying we like to resolve statically to the edge > > remaining in the loop, so you want > > Ahh, I figured there was some block shuffling needed. > > I was cautious not to touch much because of the > gfortran.dg/vector_subscript_1.f90 regression, but now I see that the > test fails for all optimization levels except -Os. With this fix we > properly fail for all levels. I assume this is expected ;-). > > > > > extract_true_false_edges_from_block (gimple_bb (last), &true_e, &false_e); > > > > /* If neither edge is the exit edge this is not a case we'd like to > >special-case. */ > > if (!loop_exit_edge_p (l, true_e) && !loop_exit_edge_p (l, false_e)) > > return false; > > > > tree desired_static_value; > > if (loop_exit_edge_p (l, true_e)) > > desired_static_value = boolean_false_node; > > else > > desired_static_value = boolean_true_node; > > > > and test for desired_static_value. > > Thanks for the code! > > OK pending tests? OK, thanks! Richard.
Re: [PATCH] rs6000/doc: Rename future cpu with power10
on 2021/11/10 下午6:03, Segher Boessenkool wrote: > Hi! > > On Wed, Nov 10, 2021 at 05:39:27PM +0800, Kewen.Lin wrote: >> @@ -27779,10 +27779,10 @@ Enable/disable the @var{__float128} keyword for >> IEEE 128-bit floating point >> and use either software emulation for IEEE 128-bit floating point or >> hardware instructions. >> >> -The VSX instruction set (@option{-mvsx}, @option{-mcpu=power7}, >> -@option{-mcpu=power8}), or @option{-mcpu=power9} must be enabled to >> -use the IEEE 128-bit floating point support. The IEEE 128-bit >> -floating point support only works on PowerPC Linux systems. >> +The VSX instruction set (@option{-mvsx}, @option{-mcpu=power7} (or later >> +@var{cpu_type})) must be enabled to use the IEEE 128-bit floating point >> +support. The IEEE 128-bit floating point support only works on PowerPC >> +Linux systems. > > I'd just say -mvsx. This is default on for -mcpu=power7 and later, and > cannot be enabled elsewhere, but that is beside the point. > > If you say more than the essentials here it becomes harder to read > (simply because there is more to read then), harder to find what you > are looking for, and harder to keep it updated if things change (like > what this patch is for :-) ) > > The part about "works only on Linux" isn't quite true. "Is only > supported on Linux" is a bit better. > >> Generate (do not generate) addressing modes using prefixed load and >> -store instructions when the option @option{-mcpu=future} is used. >> +store instructions. The @option{-mprefixed} option requires that >> +the option @option{-mcpu=power10} (or later @var{cpu_type}) is enabled. > > Just "or later" please. The "CPU_TYPE" thing is local to the -mcpu= > description, let's not refer to it from elsewhere. > >> @item -mmma >> @itemx -mno-mma >> @opindex mmma >> @opindex mno-mma >> -Generate (do not generate) the MMA instructions when the option >> -@option{-mcpu=future} is used. >> +Generate (do not generate) the MMA instructions. The @option{-mma} >> +option requires that the option @option{-mcpu=power10} (or later >> +@var{cpu_type}) is enabled. > > (once more) > > Okay for trunk with those changes. Thanks! > > Thanks! All comments are addressed and committed as r12-5143. BR, Kewen > Segher >
Re: Use modref summary to DSE calls to non-pure functions
On Wed, Nov 10, 2021 at 1:43 PM Jan Hubicka via Gcc-patches wrote: > > Hi, > this patch implements DSE using modref summaries: if function has no side > effects > besides storing to memory pointed to by its argument and if we can prove > those stores > to be dead, we can optimize out. So we handle for example: > > volatile int *ptr; > struct a { > int a,b,c; > } a; > __attribute__((noinline)) > static int init (struct a*a) > { > a->a=0; > a->b=1; > } > __attribute__((noinline)) > static int use (struct a*a) > { > if (a->c != 3) > *ptr=5; > } > > void > main(void) > { > struct a a; > init (&a); > a.c=3; > use (&a); > } > > And optimize out call to init (&a). > > We work quite hard to inline such constructors and this patch is only > effective if inlining did not happen (for whatever reason). Still, we > optimize about 26 calls building tramp3d and about 70 calls during > bootstrap (mostly ctors of poly_int). During bootstrap most removal > happens early and we would inline the ctors unless we decide to optimize > for size. 1 call per cc1* binary is removed late during LTO build. > > This is more frequent in codebases with higher abstraction penalty, with > -Os or with profile feedback in sections optimized for size. I also hope > we will be able to CSE such calls and that would make DSE more > important. > > Bootstrapped/regtested x86_64-linux, OK? > > gcc/ChangeLog: > > * tree-ssa-alias.c (ao_ref_alias_ptr_type): Export. ao_ref_init_from_ptr_and_range it is > * tree-ssa-alias.h (ao_ref_init_from_ptr_and_range): Declare. > * tree-ssa-dse.c (dse_optimize_stmt): Rename to ... > (dse_optimize_store): ... this; > (dse_optimize_call): New function. > (pass_dse::execute): Use dse_optimize_call and update > call to dse_optimize_store. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/modref-dse-1.c: New test. > * gcc.dg/tree-ssa/modref-dse-2.c: New test. > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c > b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c > new file mode 100644 > index 000..e78693b349a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c > @@ -0,0 +1,28 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-dse1" } */ > +volatile int *ptr; > +struct a { > + int a,b,c; > +} a; > +__attribute__((noinline)) > +static int init (struct a*a) > +{ > + a->a=0; > + a->b=1; > +} > +__attribute__((noinline)) > +static int use (struct a*a) > +{ > + if (a->c != 3) > + *ptr=5; > +} > + > +void > +main(void) > +{ > + struct a a; > + init (&a); > + a.c=3; > + use (&a); > +} > +/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse1" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c > b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c > new file mode 100644 > index 000..99c8ceb8127 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c > @@ -0,0 +1,31 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-dse2 -fno-ipa-sra -fno-ipa-cp" } */ > +volatile int *ptr; > +struct a { > + int a,b,c; > +} a; > +__attribute__((noinline)) > +static int init (struct a*a) > +{ > + a->a=0; > + a->b=1; > + a->c=1; > +} > +__attribute__((noinline)) > +static int use (struct a*a) > +{ > + if (a->c != 3) > + *ptr=5; > +} > + > +void > +main(void) > +{ > + struct a a; > + init (&a); > + a.c=3; > + use (&a); > +} > +/* Only DSE2 is tracking live bytes needed to figure out that store to c is > + also dead above. */ > +/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse2" } } */ > diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c > index eabf6805f2b..affb5d40d4b 100644 > --- a/gcc/tree-ssa-alias.c > +++ b/gcc/tree-ssa-alias.c > @@ -782,7 +782,7 @@ ao_ref_alias_ptr_type (ao_ref *ref) > The access is assumed to be only to or after of the pointer target > adjusted > by the offset, not before it (even in the case RANGE_KNOWN is false). */ > > -static void > +void > ao_ref_init_from_ptr_and_range (ao_ref *ref, tree ptr, > bool range_known, > poly_int64 offset, > diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h > index 275dea10397..c2e28a74999 100644 > --- a/gcc/tree-ssa-alias.h > +++ b/gcc/tree-ssa-alias.h > @@ -111,6 +111,8 @@ ao_ref::max_size_known_p () const > /* In tree-ssa-alias.c */ > extern void ao_ref_init (ao_ref *, tree); > extern void ao_ref_init_from_ptr_and_size (ao_ref *, tree, tree); > +void ao_ref_init_from_ptr_and_range (ao_ref *, tree, bool, > +poly_int64, poly_int64, poly_int64); > extern tree ao_ref_base (ao_ref *); > extern alias_set_type ao_ref_alias_set (ao_ref *); > extern alias_set_type ao_
[PATCH 01/15] frv: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/frv/frv.md (*abssi2_internal, *minmax_si_signed, *minmax_si_unsigned, *minmax_sf, *minmax_df): Fix split condition. --- gcc/config/frv/frv.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/config/frv/frv.md b/gcc/config/frv/frv.md index a2aa1b2d2ac..fea6dedc53d 100644 --- a/gcc/config/frv/frv.md +++ b/gcc/config/frv/frv.md @@ -4676,7 +4676,7 @@ (define_insn_and_split "*abssi2_internal" (clobber (match_operand:CC_CCR 3 "icr_operand" "=v,v"))] "TARGET_COND_MOVE" "#" - "reload_completed" + "&& reload_completed" [(match_dup 4)] "operands[4] = frv_split_abs (operands);" [(set_attr "length" "12,16") @@ -4717,7 +4717,7 @@ (define_insn_and_split "*minmax_si_signed" (clobber (match_operand:CC_CCR 5 "icr_operand" "=v,v,v"))] "TARGET_COND_MOVE" "#" - "reload_completed" + "&& reload_completed" [(match_dup 6)] "operands[6] = frv_split_minmax (operands);" [(set_attr "length" "12,12,16") @@ -4758,7 +4758,7 @@ (define_insn_and_split "*minmax_si_unsigned" (clobber (match_operand:CC_CCR 5 "icr_operand" "=v,v,v"))] "TARGET_COND_MOVE" "#" - "reload_completed" + "&& reload_completed" [(match_dup 6)] "operands[6] = frv_split_minmax (operands);" [(set_attr "length" "12,12,16") @@ -4799,7 +4799,7 @@ (define_insn_and_split "*minmax_sf" (clobber (match_operand:CC_CCR 5 "fcr_operand" "=w,w,w"))] "TARGET_COND_MOVE && TARGET_HARD_FLOAT" "#" - "reload_completed" + "&& reload_completed" [(match_dup 6)] "operands[6] = frv_split_minmax (operands);" [(set_attr "length" "12,12,16") @@ -4840,7 +4840,7 @@ (define_insn_and_split "*minmax_df" (clobber (match_operand:CC_CCR 5 "fcr_operand" "=w,w,w"))] "TARGET_COND_MOVE && TARGET_HARD_FLOAT && TARGET_DOUBLE" "#" - "reload_completed" + "&& reload_completed" [(match_dup 6)] "operands[6] = frv_split_minmax (operands);" [(set_attr "length" "12,12,16") -- 2.27.0
[PATCH 00/15] Fix non-robust split condition in define_insn_and_split
Hi, This trivial patch series is the secondary product from the previous investigation to see how many define_insn_and_split cases where split_condition isn't applied on top of condition for define_insn part and doesn't contain it, when there were some discussions on whether we should warn for empty split condition or join both conditions implicitly etc. (See the threads[1][2]). For some of investigated define_insn_and_splits, the corresponding split_condition is suspected not robust, especially the split condition has only reload_complete. Lacking of good understanding on the related port and the context of the code, I could be wrong. But I think it may be a good idea to raise them and get them either fixed or clarified. It would be also good as preparation for the possible conditions joining in future. For some ports with the proposed fixes applied, the split conditions in all define_insn_and_splits will either have the explicit leading "&&" or fully contain the condition for define_insn part. In other words, the implicit conditions joining would be one nop for this kind of port, we don't need any other checks/fixes for it. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571647.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572120.html BR, Kewen - Kewen Lin (15): frv: Fix non-robust split condition in define_insn_and_split m32c: Fix non-robust split condition in define_insn_and_split rx: Fix non-robust split condition in define_insn_and_split s390: Fix non-robust split condition in define_insn_and_split v850: Fix non-robust split condition in define_insn_and_split visium: Fix non-robust split condition in define_insn_and_split xtensa: Fix non-robust split condition in define_insn_and_split alpha: Fix non-robust split condition in define_insn_and_split arm: Fix non-robust split condition in define_insn_and_split bfin: Fix non-robust split condition in define_insn_and_split csky: Fix non-robust split condition in define_insn_and_split i386: Fix non-robust split condition in define_insn_and_split ia64: Fix non-robust split condition in define_insn_and_split mips: Fix non-robust split condition in define_insn_and_split sh: Fix non-robust split condition in define_insn_and_split gcc/config/alpha/alpha.md | 4 +-- gcc/config/arm/arm.md | 2 +- gcc/config/bfin/bfin.md | 4 +-- gcc/config/csky/csky.md | 28 ++--- gcc/config/frv/frv.md | 10 gcc/config/i386/i386.md | 20 +++ gcc/config/ia64/vect.md | 4 +-- gcc/config/m32c/cond.md | 4 +-- gcc/config/mips/mips.md | 4 +-- gcc/config/rx/rx.md | 2 +- gcc/config/s390/s390.md | 2 +- gcc/config/s390/vector.md | 4 +-- gcc/config/sh/sh.md | 8 +++--- gcc/config/v850/v850.md | 8 +++--- gcc/config/visium/visium.md | 50 ++--- gcc/config/xtensa/xtensa.md | 4 +-- 16 files changed, 79 insertions(+), 79 deletions(-) -- 2.27.0
[PATCH 03/15] rx: Fix non-robust split condition in define_insn_and_split
This patch is to fix one non-robust split condition, to make it applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/rx/rx.md (cstoresf4): Fix split condition. --- gcc/config/rx/rx.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/rx/rx.md b/gcc/config/rx/rx.md index b76fce97bdc..c5297685a38 100644 --- a/gcc/config/rx/rx.md +++ b/gcc/config/rx/rx.md @@ -714,7 +714,7 @@ (define_insn_and_split "cstoresf4" (match_operand:SF 3 "rx_source_operand" "rFQ")]))] "ALLOW_RX_FPU_INSNS" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { rtx flags, x; -- 2.27.0
[PATCH 02/15] m32c: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/m32c/cond.md (stzx_reversed_, movhicc__): Fix split condition. --- gcc/config/m32c/cond.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/m32c/cond.md b/gcc/config/m32c/cond.md index b80b10320fb..ce6493fc9f6 100644 --- a/gcc/config/m32c/cond.md +++ b/gcc/config/m32c/cond.md @@ -106,7 +106,7 @@ (define_insn_and_split "stzx_reversed_" (match_operand:QHI 2 "const_int_operand" "")))] "(TARGET_A24 || GET_MODE (operands[0]) == QImode) && reload_completed" "#" - "" + "&& 1" [(set (match_dup 0) (if_then_else:QHI (eq (reg:CC FLG_REGNO) (const_int 0)) (match_dup 2) @@ -230,7 +230,7 @@ (define_insn_and_split "movhicc__" (match_operand:HI 4 "const_int_operand" "")))] "TARGET_A24" "#" - "reload_completed" + "&& reload_completed" [(set (reg:CC FLG_REGNO) (compare (match_dup 1) (match_dup 2))) -- 2.27.0
[PATCH 04/15] s390: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/s390/s390.md (*cstorecc_z13): Fix split condition. * config/s390/vector.md (fprx2_to_tf, tf_to_fprx2): Likewise. --- gcc/config/s390/s390.md | 2 +- gcc/config/s390/vector.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 4debdcd1247..1d66c30b9d5 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -6941,7 +6941,7 @@ (define_insn_and_split "*cstorecc_z13" (match_operand 3 "const_int_operand" "")]))] "TARGET_Z13" "#" - "reload_completed" + "&& reload_completed" [(set (match_dup 0) (const_int 0)) (set (match_dup 0) (if_then_else:GPR diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 1ed1d0665d4..8aa4e82c28d 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -641,7 +641,7 @@ (define_insn_and_split "fprx2_to_tf" "@ vmrhg\t%v0,%1,%N1 #" - "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))" + "&& !(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))" [(set (match_dup 2) (match_dup 3)) (set (match_dup 4) (match_dup 5))] { @@ -916,7 +916,7 @@ (define_insn_and_split "tf_to_fprx2" (subreg:FPRX2 (match_operand:TF 1 "general_operand" "v,AR") 0))] "TARGET_VXE" "#" - "!(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))" + "&& !(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))" [(set (match_dup 2) (match_dup 3)) (set (match_dup 4) (match_dup 5))] { -- 2.27.0
[PATCH 05/15] v850: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/v850/v850.md (cbranchsf4, cbranchdf4, *movsicc_normal, *movsicc_reversed): Fix split condition. --- gcc/config/v850/v850.md | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/v850/v850.md b/gcc/config/v850/v850.md index 872f17913de..d4a953c6bdb 100644 --- a/gcc/config/v850/v850.md +++ b/gcc/config/v850/v850.md @@ -374,7 +374,7 @@ (define_insn_and_split "cbranchsf4" (pc)))] "TARGET_USE_FPU" "#" - "reload_completed" + "&& reload_completed" [(set (match_dup 4) (match_dup 5)) (set (pc) (if_then_else (match_dup 6) @@ -428,7 +428,7 @@ (define_insn_and_split "cbranchdf4" (pc)))] "TARGET_USE_FPU" "#" - "reload_completed" + "&& reload_completed" ;; How to get the mode here? [(set (match_dup 4) (match_dup 5)) (set (pc) @@ -1210,7 +1210,7 @@ (define_insn_and_split "*movsicc_normal" (match_operand:SI 3 "reg_or_0_operand" "rI")))] "(TARGET_V850E_UP)" "#" - "reload_completed" + "&& reload_completed" [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 4) (match_dup 5))) (set (match_dup 0) @@ -1229,7 +1229,7 @@ (define_insn_and_split "*movsicc_reversed" (match_operand:SI 3 "reg_or_0_operand" "rJ")))] "(TARGET_V850E_UP)" "#" - "reload_completed" + "&& reload_completed" [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 4) (match_dup 5))) (set (match_dup 0) -- 2.27.0
[PATCH 07/15] xtensa: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/xtensa/xtensa.md (movdi_internal, movdf_internal): Fix split condition. --- gcc/config/xtensa/xtensa.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index cdf22f14b94..e0bf720d6e0 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -779,7 +779,7 @@ (define_insn_and_split "movdi_internal" "register_operand (operands[0], DImode) || register_operand (operands[1], DImode)" "#" - "reload_completed" + "&& reload_completed" [(set (match_dup 0) (match_dup 2)) (set (match_dup 1) (match_dup 3))] { @@ -1053,7 +1053,7 @@ (define_insn_and_split "movdf_internal" "register_operand (operands[0], DFmode) || register_operand (operands[1], DFmode)" "#" - "reload_completed" + "&& reload_completed" [(set (match_dup 0) (match_dup 2)) (set (match_dup 1) (match_dup 3))] { -- 2.27.0
[PATCH 06/15] visium: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/visium/visium.md (*add3_insn, *addsi3_insn, *addi3_insn, *sub3_insn, *subsi3_insn, *subdi3_insn, *neg2_insn, *negdi2_insn, *and3_insn, *ior3_insn, *xor3_insn, *one_cmpl2_insn, *ashl3_insn, *ashr3_insn, *lshr3_insn, *trunchiqi2_insn, *truncsihi2_insn, *truncdisi2_insn, *extendqihi2_insn, *extendqisi2_insn, *extendhisi2_insn, *extendsidi2_insn, *zero_extendqihi2_insn, *zero_extendqisi2_insn, *zero_extendsidi2_insn): Fix split condition. --- gcc/config/visium/visium.md | 50 ++--- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/gcc/config/visium/visium.md b/gcc/config/visium/visium.md index 83ccf088124..ca2234bf253 100644 --- a/gcc/config/visium/visium.md +++ b/gcc/config/visium/visium.md @@ -792,7 +792,7 @@ (define_insn_and_split "*add3_insn" (match_operand:QHI 2 "register_operand" "r")))] "ok_for_simple_arith_logic_operands (operands, mode)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (match_dup 0) (plus:QHI (match_dup 1) (match_dup 2))) (clobber (reg:CC R_FLAGS))])] @@ -850,7 +850,7 @@ (define_insn_and_split "*addsi3_insn" (match_operand:SI 2 "add_operand" " L,r,J")))] "ok_for_simple_arith_logic_operands (operands, SImode)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2))) (clobber (reg:CC R_FLAGS))])] @@ -912,7 +912,7 @@ (define_insn_and_split "*addi3_insn" (match_operand:DI 2 "add_operand" " L,J, r")))] "ok_for_simple_arith_logic_operands (operands, DImode)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { visium_split_double_add (PLUS, operands[0], operands[1], operands[2]); @@ -1007,7 +1007,7 @@ (define_insn_and_split "*sub3_insn" (match_operand:QHI 2 "register_operand" "r")))] "ok_for_simple_arith_logic_operands (operands, mode)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (match_dup 0) (minus:QHI (match_dup 1) (match_dup 2))) (clobber (reg:CC R_FLAGS))])] @@ -1064,7 +1064,7 @@ (define_insn_and_split "*subsi3_insn" (match_operand:SI 2 "add_operand" " L,r, J")))] "ok_for_simple_arith_logic_operands (operands, SImode)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2))) (clobber (reg:CC R_FLAGS))])] @@ -1125,7 +1125,7 @@ (define_insn_and_split "*subdi3_insn" (match_operand:DI 2 "add_operand" " L,J, r")))] "ok_for_simple_arith_logic_operands (operands, DImode)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { visium_split_double_add (MINUS, operands[0], operands[1], operands[2]); @@ -1209,7 +1209,7 @@ (define_insn_and_split "*neg2_insn" (neg:I (match_operand:I 1 "register_operand" "r")))] "ok_for_simple_arith_logic_operands (operands, mode)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (match_dup 0) (neg:I (match_dup 1))) (clobber (reg:CC R_FLAGS))])] "" @@ -1253,7 +1253,7 @@ (define_insn_and_split "*negdi2_insn" (neg:DI (match_operand:DI 1 "register_operand" "r")))] "ok_for_simple_arith_logic_operands (operands, DImode)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { visium_split_double_add (MINUS, operands[0], const0_rtx, operands[1]); @@ -1415,7 +1415,7 @@ (define_insn_and_split "*and3_insn" (match_operand:I 2 "register_operand" "r")))] "ok_for_simple_arith_logic_operands (operands, mode)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (match_dup 0) (and:I (match_dup 1) (match_dup 2))) (clobber (reg:CC R_FLAGS))])] @@ -1453,7 +1453,7 @@ (define_insn_and_split "*ior3_insn" (match_operand:I 2 "register_operand" "r")))] "ok_for_simple_arith_logic_operands (operands, mode)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (match_dup 0) (ior:I (match_dup 1) (match_dup 2))) (clobber (reg:CC R_FLAGS))])] @@ -1491,7 +1491,7 @@ (define_insn_and_split "*xor3_insn" (match_operand:I 2 "register_operand" "r")))] "ok_for_simple_arith_logic_operands (operands, mode)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (match_dup 0) (xor:I (match_dup 1) (match_dup 2))) (clobber (reg:CC R_FLAGS))])]
[PATCH 08/15] alpha: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/alpha/alpha.md (*movtf_internal, *movti_internal): Fix split condition. --- gcc/config/alpha/alpha.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md index 98d09d43721..87617afd0c6 100644 --- a/gcc/config/alpha/alpha.md +++ b/gcc/config/alpha/alpha.md @@ -3830,7 +3830,7 @@ (define_insn_and_split "*movtf_internal" "register_operand (operands[0], TFmode) || reg_or_0_operand (operands[1], TFmode)" "#" - "reload_completed" + "&& reload_completed" [(set (match_dup 0) (match_dup 2)) (set (match_dup 1) (match_dup 3))] "alpha_split_tmode_pair (operands, TFmode, true);") @@ -4091,7 +4091,7 @@ (define_insn_and_split "*movti_internal" && ! CONSTANT_P (operands[1])) || reg_or_0_operand (operands[1], TImode)" "#" - "reload_completed" + "&& reload_completed" [(set (match_dup 0) (match_dup 2)) (set (match_dup 1) (match_dup 3))] "alpha_split_tmode_pair (operands, TImode, true);") -- 2.27.0
[PATCH 09/15] arm: Fix non-robust split condition in define_insn_and_split
This patch is to fix one non-robust split condition, to make it applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/arm/arm.md (*minmax_arithsi_non_canon): Fix split condition. --- gcc/config/arm/arm.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 4adc976b8b6..9a27d421484 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -4198,7 +4198,7 @@ (define_insn_and_split "*minmax_arithsi_non_canon" "TARGET_32BIT && !arm_eliminable_register (operands[1]) && !(arm_restrict_it && CONST_INT_P (operands[3]))" "#" - "TARGET_32BIT && !arm_eliminable_register (operands[1]) && reload_completed" + "&& reload_completed" [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3))) -- 2.27.0
[PATCH 10/15] bfin: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/bfin/bfin.md (movdi_insn, movdf_insn): Fix split condition. --- gcc/config/bfin/bfin.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/bfin/bfin.md b/gcc/config/bfin/bfin.md index fd65f4d9e63..41a50974136 100644 --- a/gcc/config/bfin/bfin.md +++ b/gcc/config/bfin/bfin.md @@ -506,7 +506,7 @@ (define_insn_and_split "movdi_insn" (match_operand:DI 1 "general_operand" "iFx,r,mx"))] "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) == REG" "#" - "reload_completed" + "&& reload_completed" [(set (match_dup 2) (match_dup 3)) (set (match_dup 4) (match_dup 5))] { @@ -718,7 +718,7 @@ (define_insn_and_split "movdf_insn" (match_operand:DF 1 "general_operand" "iFx,r,mx"))] "GET_CODE (operands[0]) != MEM || GET_CODE (operands[1]) == REG" "#" - "reload_completed" + "&& reload_completed" [(set (match_dup 2) (match_dup 3)) (set (match_dup 4) (match_dup 5))] { -- 2.27.0
[PATCH 13/15] ia64: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/ia64/vect.md (*vec_extractv2sf_0_le, *vec_extractv2sf_0_be): Fix split condition. --- gcc/config/ia64/vect.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/ia64/vect.md b/gcc/config/ia64/vect.md index 1a2452289b7..0f3a406d620 100644 --- a/gcc/config/ia64/vect.md +++ b/gcc/config/ia64/vect.md @@ -1422,7 +1422,7 @@ (define_insn_and_split "*vec_extractv2sf_0_le" UNSPEC_VECT_EXTR))] "!TARGET_BIG_ENDIAN" "#" - "reload_completed" + "&& reload_completed" [(set (match_dup 0) (match_dup 1))] { if (REG_P (operands[1]) && FR_REGNO_P (REGNO (operands[1]))) @@ -1440,7 +1440,7 @@ (define_insn_and_split "*vec_extractv2sf_0_be" UNSPEC_VECT_EXTR))] "TARGET_BIG_ENDIAN" "#" - "reload_completed" + "&& reload_completed" [(set (match_dup 0) (match_dup 1))] { if (MEM_P (operands[1])) -- 2.27.0
[PATCH 12/15] i386: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/i386/i386.md (*add3_doubleword, *addv4_doubleword, *addv4_doubleword_1, *sub3_doubleword, *subv4_doubleword, *subv4_doubleword_1, *add3_doubleword_cc_overflow_1, *divmodsi4_const, *neg2_doubleword, *tls_dynamic_gnu2_combine_64_): Fix split condition. --- gcc/config/i386/i386.md | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 6eb9de81921..2bd09e502ae 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -5491,7 +5491,7 @@ (define_insn_and_split "*add3_doubleword" (clobber (reg:CC FLAGS_REG))] "ix86_binary_operator_ok (PLUS, mode, operands)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (reg:CCC FLAGS_REG) (compare:CCC (plus:DWIH (match_dup 1) (match_dup 2)) @@ -6300,7 +6300,7 @@ (define_insn_and_split "*addv4_doubleword" (plus: (match_dup 1) (match_dup 2)))] "ix86_binary_operator_ok (PLUS, mode, operands)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (reg:CCC FLAGS_REG) (compare:CCC (plus:DWIH (match_dup 1) (match_dup 2)) @@ -6347,7 +6347,7 @@ (define_insn_and_split "*addv4_doubleword_1" && CONST_SCALAR_INT_P (operands[2]) && rtx_equal_p (operands[2], operands[3])" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (reg:CCC FLAGS_REG) (compare:CCC (plus:DWIH (match_dup 1) (match_dup 2)) @@ -6641,7 +6641,7 @@ (define_insn_and_split "*sub3_doubleword" (clobber (reg:CC FLAGS_REG))] "ix86_binary_operator_ok (MINUS, mode, operands)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (reg:CC FLAGS_REG) (compare:CC (match_dup 1) (match_dup 2))) (set (match_dup 0) @@ -6817,7 +6817,7 @@ (define_insn_and_split "*subv4_doubleword" (minus: (match_dup 1) (match_dup 2)))] "ix86_binary_operator_ok (MINUS, mode, operands)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (reg:CC FLAGS_REG) (compare:CC (match_dup 1) (match_dup 2))) (set (match_dup 0) @@ -6862,7 +6862,7 @@ (define_insn_and_split "*subv4_doubleword_1" && CONST_SCALAR_INT_P (operands[2]) && rtx_equal_p (operands[2], operands[3])" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (reg:CC FLAGS_REG) (compare:CC (match_dup 1) (match_dup 2))) (set (match_dup 0) @@ -7542,7 +7542,7 @@ (define_insn_and_split "*add3_doubleword_cc_overflow_1" (plus: (match_dup 1) (match_dup 2)))] "ix86_binary_operator_ok (PLUS, mode, operands)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (reg:CCC FLAGS_REG) (compare:CCC (plus:DWIH (match_dup 1) (match_dup 2)) @@ -9000,7 +9000,7 @@ (define_insn_and_split "*divmodsi4_const" (clobber (reg:CC FLAGS_REG))] "!optimize_function_for_size_p (cfun)" "#" - "reload_completed" + "&& reload_completed" [(set (match_dup 0) (match_dup 2)) (set (match_dup 1) (match_dup 4)) (parallel [(set (match_dup 0) @@ -10515,7 +10515,7 @@ (define_insn_and_split "*neg2_doubleword" (clobber (reg:CC FLAGS_REG))] "ix86_unary_operator_ok (NEG, mode, operands)" "#" - "reload_completed" + "&& reload_completed" [(parallel [(set (reg:CCC FLAGS_REG) (ne:CCC (match_dup 1) (const_int 0))) @@ -16898,7 +16898,7 @@ (define_insn_and_split "*tls_dynamic_gnu2_combine_64_" (clobber (reg:CC FLAGS_REG))] "TARGET_64BIT && TARGET_GNU2_TLS" "#" - "" + "&& 1" [(set (match_dup 0) (match_dup 4))] { operands[4] = can_create_pseudo_p () ? gen_reg_rtx (ptr_mode) : operands[0]; -- 2.27.0
[PATCH 14/15] mips: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/mips/mips.md (*udivmod4, udivmod4_mips16): Fix split condition. --- gcc/config/mips/mips.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md index 455b9b802f6..4efb7503df3 100644 --- a/gcc/config/mips/mips.md +++ b/gcc/config/mips/mips.md @@ -2961,7 +2961,7 @@ (define_insn_and_split "*udivmod4" (match_dup 2)))] "ISA_HAS_DIV && !TARGET_MIPS16" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { emit_insn (gen_udivmod4_split (operands[3], operands[1], operands[2])); @@ -2982,7 +2982,7 @@ (define_insn_and_split "udivmod4_mips16" (clobber (match_operand:GPR 4 "lo_operand" "=l"))] "ISA_HAS_DIV && TARGET_MIPS16" "#" - "cse_not_expected" + "&& cse_not_expected" [(const_int 0)] { emit_insn (gen_udivmod4_split (operands[3], operands[1], operands[2])); -- 2.27.0
[PATCH 15/15] sh: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/sh/sh.md (call_pcrel, call_value_pcrel, sibcall_pcrel, sibcall_value_pcrel): Fix split condition. --- gcc/config/sh/sh.md | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md index 93ee7c9a7de..1bb325c7044 100644 --- a/gcc/config/sh/sh.md +++ b/gcc/config/sh/sh.md @@ -6566,7 +6566,7 @@ (define_insn_and_split "call_pcrel" (clobber (match_scratch:SI 2 "=&r"))] "TARGET_SH2" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { rtx lab = PATTERN (gen_call_site ()); @@ -6678,7 +6678,7 @@ (define_insn_and_split "call_value_pcrel" (clobber (match_scratch:SI 3 "=&r"))] "TARGET_SH2" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { rtx lab = PATTERN (gen_call_site ()); @@ -6877,7 +6877,7 @@ (define_insn_and_split "sibcall_pcrel" (return)] "TARGET_SH2 && !TARGET_FDPIC" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { rtx lab = PATTERN (gen_call_site ()); @@ -7043,7 +7043,7 @@ (define_insn_and_split "sibcall_value_pcrel" (return)] "TARGET_SH2 && !TARGET_FDPIC" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { rtx lab = PATTERN (gen_call_site ()); -- 2.27.0
[PATCH 11/15] csky: Fix non-robust split condition in define_insn_and_split
This patch is to fix some non-robust split conditions in some define_insn_and_splits, to make each of them applied on top of the corresponding condition for define_insn part, otherwise the splitting could perform unexpectedly. gcc/ChangeLog: * config/csky/csky.md (*cskyv2_adddi3, *ck801_adddi3, *cskyv2_adddi1_1, *cskyv2_subdi3, *ck801_subdi3, *cskyv2_subdi1_1, cskyv2_addcc, cskyv2_addcc_invert, *cskyv2_anddi3, *ck801_anddi3, *cskyv2_iordi3, *ck801_iordi3, *cskyv2_xordi3, *ck801_xordi3,): Fix split condition. --- gcc/config/csky/csky.md | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/gcc/config/csky/csky.md b/gcc/config/csky/csky.md index f91d851cb2c..54143a0efea 100644 --- a/gcc/config/csky/csky.md +++ b/gcc/config/csky/csky.md @@ -850,7 +850,7 @@ (define_insn_and_split "*cskyv2_adddi3" (clobber (reg:CC CSKY_CC_REGNUM))] "CSKY_ISA_FEATURE (E2)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; @@ -877,7 +877,7 @@ (define_insn_and_split "*ck801_adddi3" (clobber (reg:CC CSKY_CC_REGNUM))] "CSKY_ISA_FEATURE (E1)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; @@ -906,7 +906,7 @@ (define_insn_and_split "*cskyv2_adddi1_1" (clobber (reg:CC CSKY_CC_REGNUM))] "CSKY_ISA_FEATURE (E2)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; @@ -1048,7 +1048,7 @@ (define_insn_and_split "*cskyv2_subdi3" (clobber (reg:CC CSKY_CC_REGNUM))] "CSKY_ISA_FEATURE (E2)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; @@ -1075,7 +1075,7 @@ (define_insn_and_split "*ck801_subdi3" (clobber (reg:CC CSKY_CC_REGNUM))] "CSKY_ISA_FEATURE (E1)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; @@ -1104,7 +1104,7 @@ (define_insn_and_split "*cskyv2_subdi1_1" (clobber (reg:CC CSKY_CC_REGNUM))] "CSKY_ISA_FEATURE (E2)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; @@ -1276,7 +1276,7 @@ (define_insn_and_split "cskyv2_addcc" dect\t%0, %1, %M2 # #" - "reload_completed && !rtx_equal_p (operands[0], operands[1])" + "&& reload_completed && !rtx_equal_p (operands[0], operands[1])" [(set (match_dup 0) (if_then_else:SI (ne (reg:CC CSKY_CC_REGNUM) (const_int 0)) (plus:SI (match_dup 0) (match_dup 2] @@ -1302,7 +1302,7 @@ (define_insn_and_split "cskyv2_addcc_invert" decf\t%0, %1, %M2 # #" - "reload_completed && !rtx_equal_p (operands[0], operands[1])" + "&& reload_completed && !rtx_equal_p (operands[0], operands[1])" [(set (match_dup 0) (if_then_else:SI (eq (reg:CC CSKY_CC_REGNUM) (const_int 0)) (plus:SI (match_dup 0) (match_dup 2] @@ -1691,7 +1691,7 @@ (define_insn_and_split "*cskyv2_anddi3" (match_operand:DI 2 "register_operand" "b,r")))] "CSKY_ISA_FEATURE (E2)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; @@ -1716,7 +1716,7 @@ (define_insn_and_split "*ck801_anddi3" (match_operand:DI 2 "register_operand" "r")))] "CSKY_ISA_FEATURE (E1)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; @@ -1801,7 +1801,7 @@ (define_insn_and_split "*cskyv2_iordi3" (match_operand:DI 2 "register_operand" "b, r")))] "CSKY_ISA_FEATURE (E2)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; @@ -1826,7 +1826,7 @@ (define_insn_and_split "*ck801_iordi3" (match_operand:DI 2 "register_operand" "r")))] "CSKY_ISA_FEATURE (E1)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; @@ -1911,7 +1911,7 @@ (define_insn_and_split "*cskyv2_xordi3" (match_operand:DI 2 "register_operand" "b, r")))] "CSKY_ISA_FEATURE (E2)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; @@ -1936,7 +1936,7 @@ (define_insn_and_split "*ck801_xordi3" (match_operand:DI 2 "register_operand" "r")))] "CSKY_ISA_FEATURE (E1)" "#" - "reload_completed" + "&& reload_completed" [(const_int 0)] { int hi = TARGET_BIG_ENDIAN ? 0 : UNITS_PER_WORD; -- 2.27.0
Re: Use modref summary to DSE calls to non-pure functions
> > + /* Unlike alias oracle we can not skip subtrees based on TBAA check. > > + Count the size of the whole tree to verify that we will not need too > > many > > + tests. */ > > + FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node) > > +FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node) > > + FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node) > > + if (num_tests++ > max_tests) > > + return false; > > at least the innermost loop can be done as > > if (num_tests += ref_node->accesses.length () > max_tests) > > no? Yep that was stupid, sorry for that ;)) > > > + > > + /* Walk all memory writes and verify that they are dead. */ > > + FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node) > > +FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node) > > + FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node) > > + { > > + /* ??? if offset is unkonwn it may be negative. Not sure > > +how to construct ref here. */ > > I think you can't, you could use -poly_int64_max or so. I need a ref to give to dse_classify_store. It needs base to track live bytes etc which is not very useful if I do not know the range. However DSE is still useful since I can hit free or end of lifetime of the decl. I was wondering if I should simply implement a lightweight version of dse_clasify_store that handles this case? > > > + if (!access_node->parm_offset_known) > > + return false; > > But you could do this check in the loop computing num_tests ... > (we could also cache the count and whether any of the refs have unknown offset > in the summary?) Yep, I plan to add cache for bits like this (and the check for accessing global memory). Just want to push bit more of the cleanups I have in my local tree. > > > + tree arg; > > + if (access_node->parm_index == MODREF_STATIC_CHAIN_PARM) > > + arg = gimple_call_chain (stmt); > > + else > > + arg = gimple_call_arg (stmt, access_node->parm_index); > > + > > + ao_ref ref; > > + poly_offset_int off = (poly_offset_int)access_node->offset > > + + ((poly_offset_int)access_node->parm_offset > > + << LOG2_BITS_PER_UNIT); > > + poly_int64 off2; > > + if (!off.to_shwi (&off2)) > > + return false; > > + ao_ref_init_from_ptr_and_range > > +(&ref, arg, true, off2, access_node->size, > > + access_node->max_size); > > + ref.ref_alias_set = ref_node->ref; > > + ref.base_alias_set = base_node->base; > > + > > + bool byte_tracking_enabled > > + = setup_live_bytes_from_ref (&ref, live_bytes); > > + enum dse_store_status store_status; > > + > > + store_status = dse_classify_store (&ref, stmt, > > +byte_tracking_enabled, > > +live_bytes, &by_clobber_p); > > + if (store_status != DSE_STORE_DEAD) > > + return false; > > + } > > + /* Check also value stored by the call. */ > > + if (gimple_store_p (stmt)) > > +{ > > + ao_ref ref; > > + > > + if (!initialize_ao_ref_for_dse (stmt, &ref)) > > + gcc_unreachable (); > > + bool byte_tracking_enabled > > + = setup_live_bytes_from_ref (&ref, live_bytes); > > + enum dse_store_status store_status; > > + > > + store_status = dse_classify_store (&ref, stmt, > > +byte_tracking_enabled, > > +live_bytes, &by_clobber_p); > > + if (store_status != DSE_STORE_DEAD) > > + return false; > > +} > > + delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup); > > + return true; > > +} > > + > > namespace { > > > > const pass_data pass_data_dse = > > @@ -1235,7 +1363,14 @@ pass_dse::execute (function *fun) > > gimple *stmt = gsi_stmt (gsi); > > > > if (gimple_vdef (stmt)) > > - dse_optimize_stmt (fun, &gsi, live_bytes); > > + { > > + gcall *call = dyn_cast (stmt); > > + > > + if (call && dse_optimize_call (&gsi, live_bytes)) > > + /* We removed a dead call. */; > > + else > > + dse_optimize_store (fun, &gsi, live_bytes); > > I think we want to refactor both functions, dse_optimize_stmt has some > early outs that apply generally, and it handles some builtin calls > that we don't want to re-handle with dse_optimize_call. > > So I wonder if it is either possible to call the new function from > inside dse_optimize_stmt instead, after we handled the return > value of call for example or different refactoring can make the flow > more obvious. It was my initial plan. However I was not sure how much I would get from that. The function starts with: /* Don't return early on *this_2(D)
Re: Use modref summary to DSE calls to non-pure functions
On Thu, Nov 11, 2021 at 1:07 PM Jan Hubicka wrote: > > > > + /* Unlike alias oracle we can not skip subtrees based on TBAA check. > > > + Count the size of the whole tree to verify that we will not need > > > too many > > > + tests. */ > > > + FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node) > > > +FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node) > > > + FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node) > > > + if (num_tests++ > max_tests) > > > + return false; > > > > at least the innermost loop can be done as > > > > if (num_tests += ref_node->accesses.length () > max_tests) > > > > no? > > Yep that was stupid, sorry for that ;)) > > > > > + > > > + /* Walk all memory writes and verify that they are dead. */ > > > + FOR_EACH_VEC_SAFE_ELT (summary->stores->bases, i, base_node) > > > +FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node) > > > + FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node) > > > + { > > > + /* ??? if offset is unkonwn it may be negative. Not sure > > > +how to construct ref here. */ > > > > I think you can't, you could use -poly_int64_max or so. > > I need a ref to give to dse_classify_store. It needs base to track live > bytes etc which is not very useful if I do not know the range. However > DSE is still useful since I can hit free or end of lifetime of the decl. > I was wondering if I should simply implement a lightweight version of > dse_clasify_store that handles this case? No, I think if it turns out useful then we want a way to have such ref represented by an ao_ref. Note that when we come from a ref tree we know handled-components only will increase offset, only the base MEM_REF can contain a pointer subtraction (but the result of that is the base then). In what cases does parm_offset_known end up false? Is that when seeing a POINTER_PLUS_EXPR with unknown offset? So yes, that's a case we cannot capture right now - the only thing that remains is a pointer with a known points-to-set - a similar problem as with the pure call PRE. You could in theory allocate a scratch SSA name and attach points-to-info to it. And when the call argument is &decl based then you could set offset to zero. > > > > > + if (!access_node->parm_offset_known) > > > + return false; > > > > But you could do this check in the loop computing num_tests ... > > (we could also cache the count and whether any of the refs have unknown > > offset > > in the summary?) > > Yep, I plan to add cache for bits like this (and the check for accessing > global memory). Just want to push bit more of the cleanups I have in my > local tree. > > > > > + tree arg; > > > + if (access_node->parm_index == MODREF_STATIC_CHAIN_PARM) > > > + arg = gimple_call_chain (stmt); > > > + else > > > + arg = gimple_call_arg (stmt, access_node->parm_index); > > > + > > > + ao_ref ref; > > > + poly_offset_int off = (poly_offset_int)access_node->offset > > > + + ((poly_offset_int)access_node->parm_offset > > > + << LOG2_BITS_PER_UNIT); > > > + poly_int64 off2; > > > + if (!off.to_shwi (&off2)) > > > + return false; > > > + ao_ref_init_from_ptr_and_range > > > +(&ref, arg, true, off2, access_node->size, > > > + access_node->max_size); > > > + ref.ref_alias_set = ref_node->ref; > > > + ref.base_alias_set = base_node->base; > > > + > > > + bool byte_tracking_enabled > > > + = setup_live_bytes_from_ref (&ref, live_bytes); > > > + enum dse_store_status store_status; > > > + > > > + store_status = dse_classify_store (&ref, stmt, > > > +byte_tracking_enabled, > > > +live_bytes, &by_clobber_p); > > > + if (store_status != DSE_STORE_DEAD) > > > + return false; > > > + } > > > + /* Check also value stored by the call. */ > > > + if (gimple_store_p (stmt)) > > > +{ > > > + ao_ref ref; > > > + > > > + if (!initialize_ao_ref_for_dse (stmt, &ref)) > > > + gcc_unreachable (); > > > + bool byte_tracking_enabled > > > + = setup_live_bytes_from_ref (&ref, live_bytes); > > > + enum dse_store_status store_status; > > > + > > > + store_status = dse_classify_store (&ref, stmt, > > > +byte_tracking_enabled, > > > +live_bytes, &by_clobber_p); > > > + if (store_status != DSE_STORE_DEAD) > > > + return false; > > > +} > > > + delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup); > > > + return true; > > > +} > > > + > > > namespace { > > > > > > const pass_data pass_data_dse = > > > @@ -1235,7 +1363,14 @@ pass_dse::execute (function *fun) > > > gi
Re: Use modref summary to DSE calls to non-pure functions
Hi, > > No, I think if it turns out useful then we want a way to have such ref > represented by an ao_ref. Note that when we come from a > ref tree we know handled-components only will increase offset, > only the base MEM_REF can contain a pointer subtraction (but > the result of that is the base then). Yep, that is why I introduced the parm_offset at first place - it can be negative or unknown... > > In what cases does parm_offset_known end up false? Is that > when seeing a POINTER_PLUS_EXPR with unknown offset? Yep, a typical example is a loop with pointer walking an array . > So yes, that's a case we cannot capture right now - the only > thing that remains is a pointer with a known points-to-set - a > similar problem as with the pure call PRE. You could in theory > allocate a scratch SSA name and attach points-to-info > to it. And when the call argument is &decl based then you could set > offset to zero. Hmm, I could try to do this, but possibly incrementally? Basically I want to have foo (&decl) decl = {} To be matched since even if I do not know the offset I know it is dead after end of lifetime of the decl. I am not quite sure PTA will give me that? > > It was my initial plan. However I was not sure how much I would get from > > that. > > > > The function starts with: > > > > /* Don't return early on *this_2(D) ={v} {CLOBBER}. */ > > if (gimple_has_volatile_ops (stmt) > > && (!gimple_clobber_p (stmt) > > || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF)) > > return; > > > > ao_ref ref; > > if (!initialize_ao_ref_for_dse (stmt, &ref)) > > return; > > > > The check about clobber does not apply to calls and then it gives up on > > functions not returning aggregates (that is a common case). > > > > For functions returing aggregates it tries to prove that retval is dead > > and replace it. > > > > I guess I can simply call my analysis from the second return above and > > from the code removing dead LHS call instead of doing it from the main > > walker and drop the LHS handling? > > Yeah, something like that. OK, I will prepare updated patch, thanks! Honza > > Richard. > > > Thank you, > > Honza > > > > > > Thanks, > > > Richard. > > > > > > > + } > > > > else if (def_operand_p > > > > def_p = single_ssa_def_operand (stmt, SSA_OP_DEF)) > > > > {
Basic kill analysis for modref
Hi, This patch enables optimization of stores that are killed by calls. Modref summary is extended by array containing list of access ranges, relative to function parameters, that are known to be killed by the function. This array is collected during local analysis and optimized (so separate stores are glued together). Kill analysis in ipa-modref.c is quite simplistic. In particular no WPA propagation is done and also we take very simple approach to prove that given store is executed each invocation of the function. I simply require it to be in the first basic block and before anyting that can throw externally. I have more fancy code for that but with this patch I want to primarily discuss interace to tree-ssa-alias.c. I wonder if thre are some helpers I can re-use? >From GCC linktime I get 814 functions with non-empty kill vector. Modref stats: modref kill: 39 kills, 7162 queries modref use: 25169 disambiguations, 697722 queries modref clobber: 2290122 disambiguations, 22750147 queries 5240008 tbaa queries (0.230329 per modref query) 806190 base compares (0.035437 per modref query) (note that more kills happens at early optimization where we did not inlined that much yet). For tramp3d (non-lto -O3 build): Modref stats: modref kill: 45 kills, 630 queries modref use: 750 disambiguations, 10061 queries modref clobber: 35253 disambiguations, 543262 queries 85347 tbaa queries (0.157101 per modref query) 18727 base compares (0.034471 per modref query) So it is not that high, but it gets better after improving the analysis side and also with -Os and/or PGO (wehre we offline cdtors) and also wiring in same_addr_size_stores_p which I want to discuss incrementally. But at least there are not that many queries to slow down compile times noticeably :) Honza gcc/ChangeLog: * ipa-modref-tree.h (struct modref_access_node): New member function * ipa-modref.c (modref_summary::useful_p): Kills are not useful when we can not analyze loads. (struct modref_summary_lto): Add kills. (modref_summary::dump): Dump kills. (record_access): Take access node as parameter. (record_access_lto): Likewise. (add_kill): New function. (merge_call_side_effects): Merge kills. (analyze_call): Pass around always_executed. (struct summary_ptrs): Add always_executed flag. (analyze_load): Update. (analyze_store): Handle kills. (analyze_stmt): Pass around always_executed flag; handle kills from clobbers. (analyze_function): Compute always_executed. (modref_summaries::duplicate): Copy kills. (update_signature): Release kills. * ipa-modref.h (struct modref_summary): Add kills. * tree-ssa-alias.c (dump_alias_stats): Dump kills. (stmt_kills_ref_p): Handle modref kills. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/modref-dse-2.c: New test. diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c index 17ff6bb582c..6f8caa331a6 100644 --- a/gcc/tree-ssa-alias.c +++ b/gcc/tree-ssa-alias.c @@ -120,6 +120,8 @@ static struct { unsigned HOST_WIDE_INT modref_use_no_alias; unsigned HOST_WIDE_INT modref_clobber_may_alias; unsigned HOST_WIDE_INT modref_clobber_no_alias; + unsigned HOST_WIDE_INT modref_kill_no; + unsigned HOST_WIDE_INT modref_kill_yes; unsigned HOST_WIDE_INT modref_tests; unsigned HOST_WIDE_INT modref_baseptr_tests; } alias_stats; @@ -169,6 +171,12 @@ dump_alias_stats (FILE *s) + alias_stats.aliasing_component_refs_p_may_alias); dump_alias_stats_in_alias_c (s); fprintf (s, "\nModref stats:\n"); + fprintf (s, " modref kill: " + HOST_WIDE_INT_PRINT_DEC" kills, " + HOST_WIDE_INT_PRINT_DEC" queries\n", + alias_stats.modref_kill_yes, + alias_stats.modref_kill_yes + + alias_stats.modref_kill_no); fprintf (s, " modref use: " HOST_WIDE_INT_PRINT_DEC" disambiguations, " HOST_WIDE_INT_PRINT_DEC" queries\n", @@ -3373,6 +3381,107 @@ stmt_kills_ref_p (gimple *stmt, ao_ref *ref) if (is_gimple_call (stmt)) { tree callee = gimple_call_fndecl (stmt); + struct cgraph_node *node; + modref_summary *summary; + + /* Try to disambiguate using modref summary. Modref records a vector +of stores with known offsets relative to function parameters that must +happen every execution of function. Find if we have a matching +store and verify that function can not use the value. */ + if (callee != NULL_TREE + && (node = cgraph_node::get (callee)) != NULL + && node->binds_to_current_def_p () + && (summary = get_modref_function_summary (node)) != NULL + && summary->kills.length ()) + { + tree base = ao_ref_base (ref); + for (unsigned int i = 0; i < summary->kills.length (); i++) + { + modref_access_node &a = summary->kills[i]; +
Re: [PATCH] fixincludes: don't assume getcwd() can handle NULL argument
On Tue, Nov 9, 2021 at 8:50 AM Xi Ruoyao via Gcc-patches wrote: > > POSIX says: > > On some implementations, if buf is a null pointer, getcwd() may obtain > size bytes of memory using malloc(). In this case, the pointer returned > by getcwd() may be used as the argument in a subsequent call to free(). > Invoking getcwd() with buf as a null pointer is not recommended in > conforming applications. > > This produces an error building GCC with --enable-werror-always: > > ../../../fixincludes/fixincl.c: In function ‘process’: > ../../../fixincludes/fixincl.c:1356:7: error: argument 1 is null but > the corresponding size argument 2 value is 4096 [-Werror=nonnull] > > And, at least we've been leaking memory even if getcwd() supports this > non-standard extension. > > fixincludes/ChangeLog: > > * fixincl.c (process): Allocate and deallocate the buffer for > getcwd() explicitly. > --- > fixincludes/fixincl.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c > index 6dba2f6e830..b4b1e38ede7 100644 > --- a/fixincludes/fixincl.c > +++ b/fixincludes/fixincl.c > @@ -1353,9 +1353,11 @@ process (void) >if (access (pz_curr_file, R_OK) != 0) > { >int erno = errno; > + char *buf = xmalloc (MAXPATHLEN); >fprintf (stderr, "Cannot access %s from %s\n\terror %d (%s)\n", > - pz_curr_file, getcwd ((char *) NULL, MAXPATHLEN), > + pz_curr_file, getcwd (buf, MAXPATHLEN), > erno, xstrerror (erno)); > + free (buf); >return; > } > > -- > 2.33.1 This seems to contradict bug 21823: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21823 It would fix bug 80047, though: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80047
Re: [PATCH] rs6000: Fix a handful of 32-bit built-in function problems in the new support
On Wed, Nov 10, 2021 at 03:28:18PM -0600, Bill Schmidt wrote: > On 11/10/21 2:33 AM, Segher Boessenkool wrote: > > On Tue, Nov 09, 2021 at 03:46:54PM -0600, Bill Schmidt wrote: > >>* config/rs6000/rs6000-builtin-new.def (CMPB): Flag as no32bit. > >>(BPERMD): Flag as 32bit. So, change this to something like "flag this as needing special handling on 32 bit" or something? > >> - void __builtin_set_texasr (unsigned long long); > >> + void __builtin_set_texasr (unsigned long); > >> SET_TEXASR nothing {htm,htmspr} > >> > >> - void __builtin_set_texasru (unsigned long long); > >> + void __builtin_set_texasru (unsigned long); > >> SET_TEXASRU nothing {htm,htmspr} > >> > >> - void __builtin_set_tfhar (unsigned long long); > >> + void __builtin_set_tfhar (unsigned long); > >> SET_TFHAR nothing {htm,htmspr} > >> > >> - void __builtin_set_tfiar (unsigned long long); > >> + void __builtin_set_tfiar (unsigned long); > >> SET_TFIAR nothing {htm,htmspr} > > This does not seem to be what the exiting code does, either? Try with > > -m32 -mpowerpc64 (it extends to 64 bit there, so the builtin does not > > have long int as parameter, it has long long int). > > This uses a tfiar_t, which is a typedef for uintptr_t, so long int is > appropriate. > This is necessary to make the HTM tests pass on 32-bit powerpc64. void f(long x) { __builtin_set_texasr(x); } built with -m32 -mpowerpc64 gives (in the expand dump): void f (long int x) { long long unsigned int _1; ;; basic block 2, loop depth 0 ;;pred: ENTRY _1 = (long long unsigned int) x_2(D); __builtin_set_texasr (_1); [tail call] return; ;;succ: EXIT } The builtins have a "long long" argument in the existing code, in this configuration. And this is not the same as "long" here. > >> --- a/gcc/testsuite/gcc.target/powerpc/cmpb-3.c > >> +++ b/gcc/testsuite/gcc.target/powerpc/cmpb-3.c > >> @@ -8,7 +8,7 @@ void abort (); > >> long long int > >> do_compare (long long int a, long long int b) > >> { > >> - return __builtin_cmpb (a, b); /* { dg-error "'__builtin_cmpb' is not > >> supported in this compiler configuration" } */ > >> + return __builtin_cmpb (a, b); /* { dg-error "'__builtin_p6_cmpb' is > >> not supported in 32-bit mode" } */ > >> } > > The original spelling is the correct one? > > This is something I have on my to-do list for the future, to see whether I > can improve it. The overloaded function __builtin_cmpb gets translated to > the underlying non-overloaded builtin __builtin_p6_cmpb, and that's the only > name that's still around by the time we get to the error processing. I want > to see whether I can add some infrastructure to recover the overloaded > function name in such cases. Is it okay to defer this for now? It is fine to defer it. It is not fine to change the testcase like this. The user did not write __builtin_p6_cmpb (which is not even documented btw), so the compiler should not talk about that. It is fine to leave the test failing for now. Segher
Re: [committed] openmp: Fix handling of numa_domains(1)
Hi! On 2021-10-18T15:03:08+0200, Jakub Jelinek via Gcc-patches wrote: > On Fri, Oct 15, 2021 at 12:26:34PM -0700, sunil.k.pandey wrote: >> 4764049dd620affcd3e2658dc7f03a6616370a29 is the first bad commit >> commit 4764049dd620affcd3e2658dc7f03a6616370a29 >> Author: Jakub Jelinek >> Date: Fri Oct 15 16:25:25 2021 +0200 >> >> openmp: Fix up handling of OMP_PLACES=threads(1) >> >> caused >> >> FAIL: libgomp.c/places-10.c execution test > > Reproduced on gcc112 in CompileFarm (my ws isn't NUMA). > If numa-domains is used with num-places count, sometimes the function > could create more places than requested and crash. This depended on the > content of /sys/devices/system/node/online file, e.g. if the file > contains > 0-1,16-17 > and all NUMA nodes contain at least one CPU in the cpuset of the program, > then numa_domains(2) or numa_domains(4) (or 5+) work fine while > numa_domains(1) or numa_domains(3) misbehave. I.e. the function was able > to stop after reaching limit on the , separators (or trivially at the end), > but not within in the ranges. > > Fixed thusly, tested on powerpc64le-linux, committed to trunk. There appears to be yet another issue: there still are quite a number of 'FAIL: libgomp.c/places-10.c execution test' reports on . Also in my testing testing, on a system where '/sys/devices/system/node/online' contains '0-1', I get a FAIL: [...] OPENMP DISPLAY ENVIRONMENT BEGIN _OPENMP = '201511' OMP_DYNAMIC = 'FALSE' OMP_NESTED = 'FALSE' OMP_NUM_THREADS = '8' OMP_SCHEDULE = 'DYNAMIC' OMP_PROC_BIND = 'TRUE' OMP_PLACES = '{0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30},{FAIL: libgomp.c/places-10.c execution test Grüße Thomas > 2021-10-18 Jakub Jelinek > > * config/linux/affinity.c (gomp_affinity_init_numa_domains): Add > && gomp_places_list_len < count after nfirst <= nlast loop condition. > > --- libgomp/config/linux/affinity.c.jj2021-10-15 16:28:30.374460522 > +0200 > +++ libgomp/config/linux/affinity.c 2021-10-18 14:44:51.559667127 +0200 > @@ -401,7 +401,7 @@ gomp_affinity_init_numa_domains (unsigne > break; > q = end; > } > - for (; nfirst <= nlast; nfirst++) > + for (; nfirst <= nlast && gomp_places_list_len < count; nfirst++) > { > sprintf (name + prefix_len, "node%lu/cpulist", nfirst); > f = fopen (name, "r"); - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
RE: [PATCH] aarch64: Use type-qualified builtins for UADD[LW][2] Neon intrinsics
Hi Jonathan, > -Original Message- > From: Jonathan Wright > Sent: Thursday, November 11, 2021 10:18 AM > To: gcc-patches@gcc.gnu.org > Cc: Richard Sandiford ; Kyrylo Tkachov > > Subject: [PATCH] aarch64: Use type-qualified builtins for UADD[LW][2] Neon > intrinsics > > Hi, > > This patch declares unsigned type-qualified builtins and uses them to > implement widening-add Neon intrinsics. This removes the need for > many casts in arm_neon.h. > > Bootstrapped and regression tested on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-11-09 Jonathan Wright > > * config/aarch64/aarch64-simd-builtins.def: Use BINOPU type > qualifiers in generator macros for uadd[lw][2] builtins. > * config/aarch64/arm_neon.h (vaddl_s8): Remove unnecessary > cast. > (vaddl_s16): Likewise. > (vaddl_s32): Likewise. > (vaddl_u8): Use type-qualified builtin and remove casts. > (vaddl_u16): Likewise. > (vaddl_u32): Likewise. > (vaddl_high_s8): Remove unnecessary cast. > (vaddl_high_s16): Likewise. > (vaddl_high_s32): Likewise. > (vaddl_high_u8): Use type-qualified builtin and remove casts. > (vaddl_high_u16): Likewise. > (vaddl_high_u32): Likewise. > (vaddw_s8): Remove unnecessary cast. > (vaddw_s16): Likewise. > (vaddw_s32): Likewise. > (vaddw_u8): Use type-qualified builtin and remove casts. > (vaddw_u16): Likewise. > (vaddw_u32): Likewise. > (vaddw_high_s8): Remove unnecessary cast. > (vaddw_high_s16): Likewise. > (vaddw_high_s32): Likewise. > (vaddw_high_u8): Use type-qualified builtin and remove casts. > (vaddw_high_u16): Likewise. > (vaddw_high_u32): Likewise. Ok. Thanks, Kyrill
[committed] libgomp: Use TLS storage for omp_get_num_teams()/omp_get_team_num() values
Hi! When thinking about GOMP_teams3, I've realized that using global variables for the values returned by omp_get_num_teams()/omp_get_team_num() calls is incorrect even with our right now dumb way of implementing host teams. The problems are two, one is if host teams is used from multiple pthread_create created threads - the spec says that host teams can't be nested inside of explicit parallel or other teams constructs, but with pthread_create the standard says obviously nothing about it. Another more important thing is host fallback, right now we don't do anything for omp_get_num_teams() or omp_get_team_num() which was fine before host teams was introduced and the 5.1 requirement that num_teams clause specifies minimum of teams, but with the global vars it means inside of target teams num_teams (2) we happily return omp_get_num_teams() == 4 if the target teams is inside of host teams with num_teams(4). With target fallback being invoked from parallel regions global vars simply can't work right on the host. Both with nowait target and with synchronous target too, as while doing host fallback from one thread a different thread could see wrong values. So, this patch moves them to struct gomp_thread and propagates those for parallel to child threads. For host fallback, the implicit zeroing of *thr results in us returning omp_get_num_teams () == 1 and omp_get_team_num () == 0 which is fine for target teams without num_teams clause, for target teams with num_teams clause something to work on and for target without teams nested in it I've asked on omp-lang what should be done. Regtested on x86_64-linux, committed to trunk. 2021-11-11 Jakub Jelinek * libgomp.h (struct gomp_thread): Add num_teams and team_num members. * team.c (struct gomp_thread_start_data): Likewise. (gomp_thread_start): Initialize thr->num_teams and thr->team_num. (gomp_team_start): Initialize start_data->num_teams and start_data->team_num. Update nthr->num_teams and nthr->team_num. * teams.c (gomp_num_teams, gomp_team_num): Remove. (GOMP_teams_reg): Set and restore thr->num_teams and thr->team_num instead of gomp_num_teams and gomp_team_num. (omp_get_num_teams): Use thr->num_teams + 1 instead of gomp_num_teams. (omp_get_team_num): Use thr->team_num instead of gomp_team_num. * testsuite/libgomp.c/teams-4.c: New test. --- libgomp/libgomp.h.jj2021-10-20 09:34:47.004331626 +0200 +++ libgomp/libgomp.h 2021-11-11 12:44:47.710092897 +0100 @@ -768,6 +768,14 @@ struct gomp_thread /* User pthread thread pool */ struct gomp_thread_pool *thread_pool; +#ifdef LIBGOMP_USE_PTHREADS + /* omp_get_num_teams () - 1. */ + unsigned int num_teams; + + /* omp_get_team_num (). */ + unsigned int team_num; +#endif + #if defined(LIBGOMP_USE_PTHREADS) \ && (!defined(HAVE_TLS) \ || !defined(__GLIBC__) \ --- libgomp/team.c.jj 2021-09-28 11:34:29.380146749 +0200 +++ libgomp/team.c 2021-11-11 12:55:22.524952564 +0100 @@ -56,6 +56,8 @@ struct gomp_thread_start_data struct gomp_task *task; struct gomp_thread_pool *thread_pool; unsigned int place; + unsigned int num_teams; + unsigned int team_num; bool nested; pthread_t handle; }; @@ -88,6 +90,8 @@ gomp_thread_start (void *xdata) thr->ts = data->ts; thr->task = data->task; thr->place = data->place; + thr->num_teams = data->num_teams; + thr->team_num = data->team_num; #ifdef GOMP_NEEDS_THREAD_HANDLE thr->handle = data->handle; #endif @@ -645,6 +649,8 @@ gomp_team_start (void (*fn) (void *), vo nthr->ts.single_count = 0; #endif nthr->ts.static_trip = 0; + nthr->num_teams = thr->num_teams; + nthr->team_num = thr->team_num; nthr->task = &team->implicit_task[i]; nthr->place = place; gomp_init_task (nthr->task, task, icv); @@ -833,6 +839,8 @@ gomp_team_start (void (*fn) (void *), vo start_data->ts.single_count = 0; #endif start_data->ts.static_trip = 0; + start_data->num_teams = thr->num_teams; + start_data->team_num = thr->team_num; start_data->task = &team->implicit_task[i]; gomp_init_task (start_data->task, task, icv); team->implicit_task[i].icv.nthreads_var = nthreads_var; --- libgomp/teams.c.jj 2021-10-11 12:20:21.927063104 +0200 +++ libgomp/teams.c 2021-11-11 12:43:58.769797557 +0100 @@ -28,14 +28,12 @@ #include "libgomp.h" #include -static unsigned gomp_num_teams = 1, gomp_team_num = 0; - void GOMP_teams_reg (void (*fn) (void *), void *data, unsigned int num_teams, unsigned int thread_limit, unsigned int flags) { + struct gomp_thread *thr = gomp_thread (); (void) flags; - (void) num_teams; unsigned old_thread_limit_var = 0; if (thread_limit == 0) thread_limit = gomp_teams_thread_limit_var; @@ -48,11 +46,11 @@ GOMP_teams_reg (void (*fn) (void *), voi } if (num_teams == 0) num_teams
Re: Use modref summary to DSE calls to non-pure functions
On Thu, Nov 11, 2021 at 1:42 PM Jan Hubicka wrote: > > Hi, > > > > No, I think if it turns out useful then we want a way to have such ref > > represented by an ao_ref. Note that when we come from a > > ref tree we know handled-components only will increase offset, > > only the base MEM_REF can contain a pointer subtraction (but > > the result of that is the base then). > > Yep, that is why I introduced the parm_offset at first place - it can be > negative or unknown... > > > > In what cases does parm_offset_known end up false? Is that > > when seeing a POINTER_PLUS_EXPR with unknown offset? > > Yep, a typical example is a loop with pointer walking an array . > > > So yes, that's a case we cannot capture right now - the only > > thing that remains is a pointer with a known points-to-set - a > > similar problem as with the pure call PRE. You could in theory > > allocate a scratch SSA name and attach points-to-info > > to it. And when the call argument is &decl based then you could set > > offset to zero. > > Hmm, I could try to do this, but possibly incrementally? You mean handle a &decl argument specially for unknown param offset? Yeah, I guess so. > Basically I want to have > > foo (&decl) > decl = {} > > To be matched since even if I do not know the offset I know it is dead > after end of lifetime of the decl. I am not quite sure PTA will give me > that? for this case PTA should tell you the alias is to 'decl' only but then I'm not sure if stmt_kills_ref_p is up to the task to determine that 'decl = {}', from a quick look it doesn't. So indeed the only interesting case will be a &decl based parameter which we can special-case. > > > It was my initial plan. However I was not sure how much I would get from > > > that. > > > > > > The function starts with: > > > > > > /* Don't return early on *this_2(D) ={v} {CLOBBER}. */ > > > if (gimple_has_volatile_ops (stmt) > > > && (!gimple_clobber_p (stmt) > > > || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF)) > > > return; > > > > > > ao_ref ref; > > > if (!initialize_ao_ref_for_dse (stmt, &ref)) > > > return; > > > > > > The check about clobber does not apply to calls and then it gives up on > > > functions not returning aggregates (that is a common case). > > > > > > For functions returing aggregates it tries to prove that retval is dead > > > and replace it. > > > > > > I guess I can simply call my analysis from the second return above and > > > from the code removing dead LHS call instead of doing it from the main > > > walker and drop the LHS handling? > > > > Yeah, something like that. > OK, I will prepare updated patch, thanks! > > Honza > > > > Richard. > > > > > Thank you, > > > Honza > > > > > > > > Thanks, > > > > Richard. > > > > > > > > > + } > > > > > else if (def_operand_p > > > > > def_p = single_ssa_def_operand (stmt, > > > > > SSA_OP_DEF)) > > > > > {
Re: [PATCH][committed]middle-end: Fix signbit tests when ran on ISA with support for masks.
Ah yes that particular test checks the vector code. I see that the function wasn't vectorized but that the scalar replacement was done. _15 = _4 > 0; So the test is checking if (-x >> bitsize-1) gets optimized to -(x > 0) I see that the replacement was made on the scalar correctly so I will modify The test to check for either the vector is vect_int or the scalar replacement if not. Cheers, Tamar From: Sandra Loosemore Sent: Wednesday, November 10, 2021 8:03 PM To: Tamar Christina ; gcc-patches@gcc.gnu.org Cc: nd ; rguent...@suse.de Subject: Re: [PATCH][committed]middle-end: Fix signbit tests when ran on ISA with support for masks. On 11/10/21 11:53 AM, Tamar Christina wrote: > FAIL: gcc.dg/signbit-2.c scan-tree-dump-times optimized > "[file://\\s+]\\s+>\\s+{ 0, > 0, 0, 0 }" 1 > > That's the old test which this patch has changed. Does it still fail > with the new patch? My test results are indeed from a couple days ago. But, I looked at your new modifications to this test, and still don't see anything like the pattern it's looking for, or understand what output you expect to be happening here. Is the whole test specific to vector ISAs, and not just your recent changes to it? I've attached the .optimized dump I got on nios2-elf. -Sandra
Re: Use modref summary to DSE calls to non-pure functions
> > Hmm, I could try to do this, but possibly incrementally? > > You mean handle a &decl argument specially for unknown param offset? > Yeah, I guess so. I think it is also pointer that was allocated and is going to be freed... > > > Basically I want to have > > > > foo (&decl) > > decl = {} > > > > To be matched since even if I do not know the offset I know it is dead > > after end of lifetime of the decl. I am not quite sure PTA will give me > > that? > > for this case PTA should tell you the alias is to 'decl' only but then I'm > not sure if stmt_kills_ref_p is up to the task to determine that 'decl = {}', > from a quick look it doesn't. So indeed the only interesting case will > be a &decl based parameter which we can special-case. Yep, i do not think it understands this. I will look into it - I guess it is common enough to care about. Honza
Fix noreturn discovery
Hi, this patch fixes ipa-pure-const handling of noreturn flags. It is not safe to set it for interposable symbols and we should also set it for aliases (just like we do for other flags). This patch merely copies other flag handling and implements it here. Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza gcc/ChangeLog: 2021-11-11 Jan Hubicka * cgraph.c (set_noreturn_flag_1): New function. (cgraph_node::set_noreturn_flag): New member function * cgraph.h (cgraph_node::set_noreturn_flags): Declare. * ipa-pure-const.c (pass_local_pure_const::execute): Use it. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index c67d300e7a4..466b66d5ba5 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2614,6 +2614,53 @@ cgraph_node::set_malloc_flag (bool malloc_p) return changed; } +/* Worker to set noreturng flag. */ +static void +set_noreturn_flag_1 (cgraph_node *node, bool noreturn_p, bool *changed) +{ + if (noreturn_p && !TREE_THIS_VOLATILE (node->decl)) +{ + TREE_THIS_VOLATILE (node->decl) = true; + *changed = true; +} + + ipa_ref *ref; + FOR_EACH_ALIAS (node, ref) +{ + cgraph_node *alias = dyn_cast (ref->referring); + if (!noreturn_p || alias->get_availability () > AVAIL_INTERPOSABLE) + set_noreturn_flag_1 (alias, noreturn_p, changed); +} + + for (cgraph_edge *e = node->callers; e; e = e->next_caller) +if (e->caller->thunk + && (!noreturn_p || e->caller->get_availability () > AVAIL_INTERPOSABLE)) + set_noreturn_flag_1 (e->caller, noreturn_p, changed); +} + +/* Set TREE_THIS_VOLATILE on NODE's decl and on NODE's aliases if any. */ + +bool +cgraph_node::set_noreturn_flag (bool noreturn_p) +{ + bool changed = false; + + if (!noreturn_p || get_availability () > AVAIL_INTERPOSABLE) +set_noreturn_flag_1 (this, noreturn_p, &changed); + else +{ + ipa_ref *ref; + + FOR_EACH_ALIAS (this, ref) + { + cgraph_node *alias = dyn_cast (ref->referring); + if (!noreturn_p || alias->get_availability () > AVAIL_INTERPOSABLE) + set_noreturn_flag_1 (alias, noreturn_p, &changed); + } +} + return changed; +} + /* Worker to set_const_flag. */ static void diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 0a1f7c8960e..e42e305cdb6 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1167,6 +1167,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node if any. */ bool set_malloc_flag (bool malloc_p); + /* SET TREE_THIS_VOLATILE on cgraph_node's decl and on aliases of the node + if any. */ + bool set_noreturn_flag (bool noreturn_p); + /* If SET_CONST is true, mark function, aliases and thunks to be ECF_CONST. If SET_CONST if false, clear the flag. diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c index 505ed4f8a3b..84a028bcf8e 100644 --- a/gcc/ipa-pure-const.c +++ b/gcc/ipa-pure-const.c @@ -2132,11 +2132,10 @@ pass_local_pure_const::execute (function *fun) current_function_name ()); /* Update declaration and reduce profile to executed once. */ - TREE_THIS_VOLATILE (current_function_decl) = 1; + if (cgraph_node::get (current_function_decl)->set_noreturn_flag (true)) + changed = true; if (node->frequency > NODE_FREQUENCY_EXECUTED_ONCE) node->frequency = NODE_FREQUENCY_EXECUTED_ONCE; - - changed = true; } switch (l->pure_const_state)
Fix recursion discovery in ipa-pure-const
Hi, We make self recursive functions as looping of fear of endless recursion. This is done correctly for local pure/const and for non-trivial SCCs in callgraph, but for trivial SCCs we miss the flag. I think it is bad decision since infinite recursion will run out of stack, but changing it upsets some testcases and should be done independently. So this patch is fixing current behaviour to be consistent. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: 2021-11-11 Jan Hubicka * ipa-pure-const.c (propagate_pure_const): Self recursion is a side effects. diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c index 505ed4f8a3b..64777cd2d91 100644 --- a/gcc/ipa-pure-const.c +++ b/gcc/ipa-pure-const.c @@ -1513,6 +1611,9 @@ propagate_pure_const (void) enum pure_const_state_e edge_state = IPA_CONST; bool edge_looping = false; + if (e->recursive_p ()) + looping = true; + if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Call to %s",
Re: [PATCH] libgcc: fix backtrace fallback on PowerPC Big-endian. [PR103004]
Hi! On Wed, Nov 10, 2021 at 06:59:23PM -0300, Raphael Moreira Zinsly wrote: > At the end of the backtrace stream _Unwind_Find_FDE() may not be able > to find the frame unwind info and will later call the backtrace fallback > instead of finishing. This occurs when using an old libc on ppc64 due to > dl_iterate_phdr() not being able to set the fde in the last trace. > When this occurs the cfa of the trace will be behind of context's cfa. > Also, libgo’s probestackmaps() calls the backtrace with a null pointer > and can get to the backchain fallback with the same problem, in this case > we are only interested in find a stack map, we don't need nor can do a > backchain. > _Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses > uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP. > > libgcc/ChangeLog: > > * config/rs6000/linux-unwind.h (ppc_backchain_fallback): turn into >static to fix -Wmissing-prototypes. Check if it's called with a null >argument or at the end of the backtrace and return. > * unwind.inc (_Unwind_ForcedUnwind_Phase2): treat _URC_NORMAL_STOP. Formatting is messed up. Lines start with a capital. Two spaces after full stop, while you're at it. > -void ppc_backchain_fallback (struct _Unwind_Context *context, void *a) > +static void > +ppc_backchain_fallback (struct _Unwind_Context *context, void *a) This was already fixed in 75ef0353a2d3. > { >struct frame_layout *current; >struct trace_arg *arg = a; >int count; > > - /* Get the last address computed and start with the next. */ > + /* Get the last address computed. */ >current = context->cfa; Empty line after here please. Most of the time if you have a full-line comment it means a new paragraph is starting. > + /* If the trace CFA is not the context CFA the backtrace is done. */ > + if (arg == NULL || arg->cfa != current) > + return; > + > + /* Start with next address. */ >current = current->backchain; Like you did here :-) Do you have a testcase (that failed without this, but now doesn't)? Looks okay, but please update and resend. Segher
[PATCH] tree-optimization/103188 - avoid running ranger on not-up-to-date SSA
The following splits loop header copying into an analysis phase that uses ranger and a transform phase that can do without to avoid running ranger on IL that has SSA form not updated. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-11-11 Richard Biener PR tree-optimization/103188 * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Remove query parameter, split out check for size optimization. (ch_base::m_ranger, cb_base::m_query): Remove. (ch_base::copy_headers): Split processing loop into analysis around which we allocate and use ranger and transform where we do not. (pass_ch::execute): Do not allocate/free ranger here. (pass_ch_vect::execute): Likewise. * gcc.dg/torture/pr103188.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr103188.c | 38 + gcc/tree-ssa-loop-ch.c | 72 ++--- 2 files changed, 78 insertions(+), 32 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr103188.c diff --git a/gcc/testsuite/gcc.dg/torture/pr103188.c b/gcc/testsuite/gcc.dg/torture/pr103188.c new file mode 100644 index 000..0412f6f9b79 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr103188.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ + +int a, b, c, d = 10, e = 1, f, g, h, i; +int main() +{ + int j = -1; +k: + h = c; +l: + c = ~c; + if (e) + m: +a = 0; + if (j > 1) +goto m; + if (!e) +goto l; + if (c) +goto p; +n: + goto m; +o: + if (f) { +if (g) + goto k; +j = 0; + p: +if (d) + goto o; +goto n; + } + if (i) +goto l; + for (; a < 1; a++) +while (a > d) + b++; + return 0; +} diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c index c7d86d751d4..0cee38159fb 100644 --- a/gcc/tree-ssa-loop-ch.c +++ b/gcc/tree-ssa-loop-ch.c @@ -69,26 +69,12 @@ entry_loop_condition_is_static (class loop *l, path_range_query *query) static bool should_duplicate_loop_header_p (basic_block header, class loop *loop, - int *limit, path_range_query *query) + int *limit) { gimple_stmt_iterator bsi; gcc_assert (!header->aux); - /* Avoid loop header copying when optimizing for size unless we can - determine that the loop condition is static in the first - iteration. */ - if (optimize_loop_for_size_p (loop) - && !loop->force_vectorize - && !entry_loop_condition_is_static (loop, query)) -{ - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, -" Not duplicating bb %i: optimizing for size.\n", -header->index); - return false; -} - gcc_assert (EDGE_COUNT (header->succs) > 0); if (single_succ_p (header)) { @@ -223,8 +209,6 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop, return false; } - if (dump_file && (dump_flags & TDF_DETAILS)) -fprintf (dump_file, "Will duplicate bb %i\n", header->index); return true; } @@ -289,9 +273,6 @@ class ch_base : public gimple_opt_pass /* Return true to copy headers of LOOP or false to skip. */ virtual bool process_loop_p (class loop *loop) = 0; - - gimple_ranger *m_ranger = NULL; - path_range_query *m_query = NULL; }; const pass_data pass_data_ch = @@ -386,8 +367,11 @@ ch_base::copy_headers (function *fun) copied_bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun)); bbs_size = n_basic_blocks_for_fn (fun); + auto_vec candidates; auto_vec > copied; + gimple_ranger *ranger = new gimple_ranger; + path_range_query *query = new path_range_query (*ranger, /*resolve=*/true); for (auto loop : loops_list (cfun, 0)) { int initial_limit = param_max_loop_header_insns; @@ -406,6 +390,37 @@ ch_base::copy_headers (function *fun) || !process_loop_p (loop)) continue; + /* Avoid loop header copying when optimizing for size unless we can +determine that the loop condition is static in the first +iteration. */ + if (optimize_loop_for_size_p (loop) + && !loop->force_vectorize + && !entry_loop_condition_is_static (loop, query)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, +" Not duplicating bb %i: optimizing for size.\n", +header->index); + continue; + } + + if (should_duplicate_loop_header_p (header, loop, &remaining_limit)) + candidates.safe_push (loop); +} + /* Do not use ranger after we change the IL and not have updated SSA. */ + delete query; + delete ranger; + + for (auto loop : candidates) +{ + int initial_limit = param_max_loop_header_insns; + int remaining_limit = initial_limit; + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, +"Copying headers of loop
[PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode
The RISC-V Zbb extension adds an XLEN (i.e. SImode for rv32, DImode for rv64) bswap instruction (rev8). While, with the current master, SImode is synthesized correctly from DImode, HImode is not. This change adds an appropriate expansion for a HImode bswap, if a wider bswap is available. Without this change, the following rv64gc_zbb code is generated for __builtin_bswap16(): slliw a5,a0,8 zext.h a0,a0 srliw a0,a0,8 or a0,a5,a0 sext.h a0,a0 // this is a 16bit sign-extension following // the byteswap (e.g. on a 'short' function // return). After this change, a bswap (rev8) is used and any extensions are combined into the shift-right: rev8a0,a0 sraia0,a0,48 // the sign-extension is combined into the // shift; a srli is emitted otherwise... gcc/ChangeLog: * optabs.c (expand_unop): support expanding a HImode bswap using SImode or DImode, followed by a shift. gcc/testsuite/ChangeLog: * gcc.target/riscv/zbb-bswap.c: New test. Signed-off-by: Philipp Tomsich --- gcc/optabs.c | 6 ++ gcc/testsuite/gcc.target/riscv/zbb-bswap.c | 22 ++ 2 files changed, 28 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c diff --git a/gcc/optabs.c b/gcc/optabs.c index 019bbb62882..7a3ffbe4525 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -3307,6 +3307,12 @@ expand_unop (machine_mode mode, optab unoptab, rtx op0, rtx target, return temp; } + /* If we are missing a HImode BSWAP, but have one for SImode or +DImode, use a BSWAP followed by a SHIFT. */ + temp = widen_bswap (as_a (mode), op0, target); + if (temp) + return temp; + last = get_last_insn (); temp1 = expand_binop (mode, ashl_optab, op0, diff --git a/gcc/testsuite/gcc.target/riscv/zbb-bswap.c b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c new file mode 100644 index 000..6ee27d9f47a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */ + +unsigned long +func64 (unsigned long i) +{ + return __builtin_bswap64(i); +} + +unsigned int +func32 (unsigned int i) +{ + return __builtin_bswap32(i); +} + +unsigned short +func16 (unsigned short i) +{ + return __builtin_bswap16(i); +} + +/* { dg-final { scan-assembler-times "rev8" 3 } } */ -- 2.32.0
[PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support
This series provides assorted improvements for the RISC-V Zb[abcs] support collected over the last year and a half and forward-ported to the recently merged upstream support for the Zb[abcs] extensions. Improvements include: - synthesis of HImode bswap from SImode/DImode rev8 - cost-model change to support shift-and-add (sh[123]add) in the strength-reduction of multiplication operations - support for constant-loading of (1ULL << 31) on RV64 using bseti - generating a polarity-reversed mask from a bit-test - adds orc.b as UNSPEC - improves min/minu/max/maxu patterns to suppress redundant extensions Philipp Tomsich (8): bswap: synthesize HImode bswap from SImode or DImode RISC-V: costs: handle BSWAP RISC-V: costs: support shift-and-add in strength-reduction RISC-V: bitmanip: fix constant-loading for (1ULL << 31) in DImode RISC-V: bitmanip: improvements to rotate instructions RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ? 0 : -1" RISC-V: bitmanip: add orc.b as an unspec RISC-V: bitmanip: relax minmax to operate on GPR gcc/config/riscv/bitmanip.md | 74 +--- gcc/config/riscv/riscv.c | 31 gcc/config/riscv/riscv.h | 11 ++- gcc/config/riscv/riscv.md| 3 + gcc/optabs.c | 6 ++ gcc/testsuite/gcc.target/riscv/zbb-bswap.c | 22 ++ gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +- gcc/testsuite/gcc.target/riscv/zbs-bexti.c | 14 8 files changed, 162 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bexti.c -- 2.32.0
[PATCH v1 2/8] RISC-V: costs: handle BSWAP
The BSWAP operation is not handled in rtx_costs. Add it. gcc/ChangeLog: * config/riscv/riscv.c (rtx_costs): Add BSWAP. Signed-off-by: Philipp Tomsich --- gcc/config/riscv/riscv.c | 8 1 file changed, 8 insertions(+) diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index c77b0322869..8480cf09294 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -2131,6 +2131,14 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN *total = riscv_extend_cost (XEXP (x, 0), GET_CODE (x) == ZERO_EXTEND); return false; +case BSWAP: + if (TARGET_ZBB) + { + *total = COSTS_N_INSNS (1); + return true; + } + return false; + case FLOAT: case UNSIGNED_FLOAT: case FIX: -- 2.32.0
[PATCH v1 3/8] RISC-V: costs: support shift-and-add in strength-reduction
The strength-reduction implementation in expmed.c will assess the profitability of using shift-and-add using a RTL expression that wraps a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs function recognizes this as expressing a sh[123]add instruction, we will return an inflated cost, thus defeating the optimization. This change adds the necessary idiom recognition to provide an accurate cost for this for of expressing sh[123]add. Instead on expanding to li a5,200 mulwa0,a5,a0 with this change, the expression 'a * 200' is sythesized as: sh2add a0,a0,a0 // *5 = a + 4 * a sh2add a0,a0,a0 // *5 = a + 4 * a sllia0,a0,3// *8 gcc/ChangeLog: * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd, if expressed as a plus and multiplication with a power-of-2. Signed-off-by: Philipp Tomsich --- gcc/config/riscv/riscv.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index 8480cf09294..dff4e370471 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -2020,6 +2020,20 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN *total = COSTS_N_INSNS (1); return true; } + /* Before strength-reduction, the shNadd can be expressed as the addition +of a multiplication with a power-of-two. If this case is not handled, +the strength-reduction in expmed.c will calculate an inflated cost. */ + if (TARGET_ZBA + && ((!TARGET_64BIT && (mode == SImode)) || + (TARGET_64BIT && (mode == DImode))) + && (GET_CODE (XEXP (x, 0)) == MULT) + && REG_P (XEXP (XEXP (x, 0), 0)) + && CONST_INT_P (XEXP (XEXP (x, 0), 1)) + && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3)) + { + *total = COSTS_N_INSNS (1); + return true; + } /* shNadd.uw pattern for zba. [(set (match_operand:DI 0 "register_operand" "=r") (plus:DI -- 2.32.0
[PATCH v1 4/8] RISC-V: bitmanip: fix constant-loading for (1ULL << 31) in DImode
The SINGLE_BIT_MASK_OPERAND() is overly restrictive, triggering for bits above 31 only (to side-step any issues with the negative SImode value 0x8000). This moves the special handling of this SImode value (i.e. the check for -2147483648) to riscv.c and relaxes the SINGLE_BIT_MASK_OPERAND() test. This changes the code-generation for loading (1ULL << 31) from: li a0,1 sllia0,a0,31 to: bseti a0,zero,31 gcc/ChangeLog: * config/riscv/riscv.c (riscv_build_integer_1): Rewrite value as -2147483648 for the single-bit case, when operating on 0x8000 in SImode. * gcc/config/riscv/riscv.h (SINGLE_BIT_MASK_OPERAND): Allow for any single-bit value, moving the special case for 0x8000 to riscv_build_integer_1 (in riscv.c). Signed-off-by: Philipp Tomsich --- gcc/config/riscv/riscv.c | 9 + gcc/config/riscv/riscv.h | 11 --- 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c index dff4e370471..4c30d4e521d 100644 --- a/gcc/config/riscv/riscv.c +++ b/gcc/config/riscv/riscv.c @@ -415,6 +415,15 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], /* Simply BSETI. */ codes[0].code = UNKNOWN; codes[0].value = value; + + /* RISC-V sign-extends all 32bit values that life in a 32bit +register. To avoid paradoxes, we thus need to use the +sign-extended (negative) representation for the value, if we +want to build 0x8000 in SImode. This will then expand +to an ADDI/LI instruction. */ + if (mode == SImode && value == 0x8000) + codes[0].value = -2147483648; + return 1; } diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 64287124735..abb121ddbea 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -526,13 +526,10 @@ enum reg_class (((VALUE) | ((1UL<<31) - IMM_REACH)) == ((1UL<<31) - IMM_REACH) \ || ((VALUE) | ((1UL<<31) - IMM_REACH)) + IMM_REACH == 0) -/* If this is a single bit mask, then we can load it with bseti. But this - is not useful for any of the low 31 bits because we can use addi or lui - to load them. It is wrong for loading SImode 0x8000 on rv64 because it - needs to be sign-extended. So we restrict this to the upper 32-bits - only. */ -#define SINGLE_BIT_MASK_OPERAND(VALUE) \ - (pow2p_hwi (VALUE) && (ctz_hwi (VALUE) >= 32)) +/* If this is a single bit mask, then we can load it with bseti. Special + handling of SImode 0x8000 on RV64 is done in riscv_build_integer_1. */ +#define SINGLE_BIT_MASK_OPERAND(VALUE) \ + (pow2p_hwi (VALUE)) /* Stack layout; function entry, exit and calling. */ -- 2.32.0
[PATCH v1 5/8] RISC-V: bitmanip: improvements to rotate instructions
This change improves rotate instructions (motivated by a review of the code generated for OpenSSL): rotate-left by a constant are synthesized using a rotate-right-immediate to avoid putting the shift-amount into a temporary; to do so, we allow either a register or an immediate for the expansion of rotl3 and then check if the shift-amount is a constant. Without these changes, the function unsigned int f(unsigned int a) { return (a << 2) | (a >> 30); } turns into li a5,2 rolwa0,a0,a5 while these changes give us: roriw a0,a0,30 gcc/ChangeLog: * config/riscv/bitmanip.md (rotlsi3, rotldi3, rotlsi3_sext): Synthesize rotate-left-by-immediate from a rotate-right insn. Signed-off-by: Philipp Tomsich --- gcc/config/riscv/bitmanip.md | 39 ++-- 1 file changed, 33 insertions(+), 6 deletions(-) diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md index 59779b48f27..178d1ca0e4b 100644 --- a/gcc/config/riscv/bitmanip.md +++ b/gcc/config/riscv/bitmanip.md @@ -204,25 +204,52 @@ (define_insn "rotrsi3_sext" (define_insn "rotlsi3" [(set (match_operand:SI 0 "register_operand" "=r") (rotate:SI (match_operand:SI 1 "register_operand" "r") - (match_operand:QI 2 "register_operand" "r")))] + (match_operand:QI 2 "arith_operand" "rI")))] "TARGET_ZBB" - { return TARGET_64BIT ? "rolw\t%0,%1,%2" : "rol\t%0,%1,%2"; } + { +/* If the rotate-amount is constant, let's synthesize using a + rotate-right-immediate instead of using a temporary. */ + +if (CONST_INT_P(operands[2])) { + operands[2] = GEN_INT(32 - INTVAL(operands[2])); + return TARGET_64BIT ? "roriw\t%0,%1,%2" : "rori\t%0,%1,%2"; +} + +return TARGET_64BIT ? "rolw\t%0,%1,%2" : "rol\t%0,%1,%2"; + } [(set_attr "type" "bitmanip")]) (define_insn "rotldi3" [(set (match_operand:DI 0 "register_operand" "=r") (rotate:DI (match_operand:DI 1 "register_operand" "r") - (match_operand:QI 2 "register_operand" "r")))] + (match_operand:QI 2 "arith_operand" "rI")))] "TARGET_64BIT && TARGET_ZBB" - "rol\t%0,%1,%2" + { +if (CONST_INT_P(operands[2])) { + operands[2] = GEN_INT(64 - INTVAL(operands[2])); + return "rori\t%0,%1,%2"; +} + +return "rol\t%0,%1,%2"; + } [(set_attr "type" "bitmanip")]) +;; Until we have improved REE to understand that sign-extending the result of +;; an implicitly sign-extending operation is redundant, we need an additional +;; pattern to gobble up the redundant sign-extension. (define_insn "rotlsi3_sext" [(set (match_operand:DI 0 "register_operand" "=r") (sign_extend:DI (rotate:SI (match_operand:SI 1 "register_operand" "r") - (match_operand:QI 2 "register_operand" "r"] + (match_operand:QI 2 "arith_operand" "rI"] "TARGET_64BIT && TARGET_ZBB" - "rolw\t%0,%1,%2" + { +if (CONST_INT_P(operands[2])) { + operands[2] = GEN_INT(32 - INTVAL(operands[2])); + return "roriw\t%0,%1,%2"; +} + +return "rolw\t%0,%1,%2"; + } [(set_attr "type" "bitmanip")]) (define_insn "bswap2" -- 2.32.0
[PATCH v1 6/8] RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ? 0 : -1"
Consider creating a polarity-reversed mask from a set-bit (i.e., if the bit is set, produce all-ones; otherwise: all-zeros). Using Zbb, this can be expressed as bexti, followed by an addi of minus-one. To enable the combiner to discover this opportunity, we need to split the canonical expression for "(a & (1 << BIT_NO)) ? 0 : -1" into a form combinable into bexti. Consider the function: long f(long a) { return (a & (1 << BIT_NO)) ? 0 : -1; } This produces the following sequence prior to this change: andia0,a0,16 seqza0,a0 neg a0,a0 ret Following this change, it results in: bexti a0,a0,4 addia0,a0,-1 ret gcc/ChangeLog: * config/riscv/bitmanip.md: Add a splitter to generate polarity-reversed masks from a set bit using bexti + addi. gcc/testsuite/ChangeLog: * gcc.target/riscv/zbs-bexti.c: New test. Signed-off-by: Philipp Tomsich --- gcc/config/riscv/bitmanip.md | 13 + gcc/testsuite/gcc.target/riscv/zbs-bexti.c | 14 ++ 2 files changed, 27 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bexti.c diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md index 178d1ca0e4b..9e10280e306 100644 --- a/gcc/config/riscv/bitmanip.md +++ b/gcc/config/riscv/bitmanip.md @@ -367,3 +367,16 @@ (define_insn "*bexti" "TARGET_ZBS" "bexti\t%0,%1,%2" [(set_attr "type" "bitmanip")]) + +;; We can create a polarity-reversed mask (i.e. bit N -> { set = 0, clear = -1 }) +;; using a bext(i) followed by an addi instruction. +;; This splits the canonical representation of "(a & (1 << BIT_NO)) ? 0 : -1". +(define_split + [(set (match_operand:GPR 0 "register_operand") + (neg:GPR (eq:GPR (zero_extract:GPR (match_operand:GPR 1 "register_operand") + (const_int 1) + (match_operand 2)) +(const_int 0] + "TARGET_ZBB" + [(set (match_dup 0) (zero_extract:GPR (match_dup 1) (const_int 1) (match_dup 2))) + (set (match_dup 0) (plus:GPR (match_dup 0) (const_int -1)))]) diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c new file mode 100644 index 000..d02c3f7a98d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc_zbs -mabi=lp64 -O2" } */ + +/* bexti */ +#define BIT_NO 27 + +long +foo0 (long a) +{ + return (a & (1 << BIT_NO)) ? 0 : -1; +} + +/* { dg-final { scan-assembler "bexti" } } */ +/* { dg-final { scan-assembler "addi" } } */ -- 2.32.0
[PATCH v1 7/8] RISC-V: bitmanip: add orc.b as an unspec
As a basis for optimized string functions (e.g., the by-pieces implementations), we need orc.b available. This adds orc.b as an unspec, so we can expand to it. gcc/ChangeLog: * config/riscv/bitmanip.md (orcb2): Add orc.b as an unspec. * config/riscv/riscv.md: Add UNSPEC_ORC_B. Signed-off-by: Philipp Tomsich --- gcc/config/riscv/bitmanip.md | 8 gcc/config/riscv/riscv.md| 3 +++ 2 files changed, 11 insertions(+) diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md index 9e10280e306..000deb48b16 100644 --- a/gcc/config/riscv/bitmanip.md +++ b/gcc/config/riscv/bitmanip.md @@ -267,6 +267,14 @@ (define_insn "3" "\t%0,%1,%2" [(set_attr "type" "bitmanip")]) +;; orc.b (or-combine) is added as an unspec for the benefit of the support +;; for optimized string functions (such as strcmp). +(define_insn "orcb2" + [(set (match_operand:X 0 "register_operand" "=r") + (unspec:X [(match_operand:X 1 "register_operand")] UNSPEC_ORC_B))] + "TARGET_ZBB" + "orc.b\t%0,%1") + ;; ZBS extension. (define_insn "*bset" diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 225e5b259c1..7a2501ec7a9 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -45,6 +45,9 @@ (define_c_enum "unspec" [ ;; Stack tie UNSPEC_TIE + + ;; Zbb OR-combine instruction + UNSPEC_ORC_B ]) (define_c_enum "unspecv" [ -- 2.32.0
[PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR
While min/minu/max/maxu instructions are provided for XLEN only, these can safely operate on GPRs (i.e. SImode or DImode for RV64): SImode is always sign-extended, which ensures that the XLEN-wide instructions can be used for signed and unsigned comparisons on SImode yielding a correct ordering of value. This commit - relaxes the minmax pattern to express for GPR (instead of X only), providing both a si3 and di3 expansion on RV64 - adds a sign-extending form for thee si3 pattern for RV64 to all REE to eliminate redundant extensions - adds test-cases for both gcc/ChangeLog: * config/riscv/bitmanip.md: Relax minmax to GPR (i.e SImode or DImode) on RV64. * config/riscv/bitmanip.md (si3_sext): Add pattern for REE. gcc/testsuite/ChangeLog: * gcc.target/riscv/zbb-min-max.c: Add testcases for SImode operands checking that no redundant sign- or zero-extensions are emitted. Signed-off-by: Philipp Tomsich --- gcc/config/riscv/bitmanip.md | 14 +++--- gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +--- 2 files changed, 28 insertions(+), 6 deletions(-) diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md index 000deb48b16..2a28f78f5f6 100644 --- a/gcc/config/riscv/bitmanip.md +++ b/gcc/config/riscv/bitmanip.md @@ -260,13 +260,21 @@ (define_insn "bswap2" [(set_attr "type" "bitmanip")]) (define_insn "3" - [(set (match_operand:X 0 "register_operand" "=r") -(bitmanip_minmax:X (match_operand:X 1 "register_operand" "r") - (match_operand:X 2 "register_operand" "r")))] + [(set (match_operand:GPR 0 "register_operand" "=r") +(bitmanip_minmax:GPR (match_operand:GPR 1 "register_operand" "r") +(match_operand:GPR 2 "register_operand" "r")))] "TARGET_ZBB" "\t%0,%1,%2" [(set_attr "type" "bitmanip")]) +(define_insn "si3_sext" + [(set (match_operand:DI 0 "register_operand" "=r") +(sign_extend:DI (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r") +(match_operand:SI 2 "register_operand" "r"] + "TARGET_64BIT && TARGET_ZBB" + "\t%0,%1,%2" + [(set_attr "type" "bitmanip")]) + ;; orc.b (or-combine) is added as an unspec for the benefit of the support ;; for optimized string functions (such as strcmp). (define_insn "orcb2" diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c index f44c398ea08..7169e873551 100644 --- a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c +++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */ +/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64 -O2" } */ long foo1 (long i, long j) @@ -25,7 +25,21 @@ foo4 (unsigned long i, unsigned long j) return i > j ? i : j; } +unsigned int +foo5(unsigned int a, unsigned int b) +{ + return a > b ? a : b; +} + +int +foo6(int a, int b) +{ + return a > b ? a : b; +} + /* { dg-final { scan-assembler-times "min" 3 } } */ -/* { dg-final { scan-assembler-times "max" 3 } } */ +/* { dg-final { scan-assembler-times "max" 4 } } */ /* { dg-final { scan-assembler-times "minu" 1 } } */ -/* { dg-final { scan-assembler-times "maxu" 1 } } */ +/* { dg-final { scan-assembler-times "maxu" 3 } } */ +/* { dg-final { scan-assembler-not "zext.w" } } */ +/* { dg-final { scan-assembler-not "sext.w" } } */ -- 2.32.0
Re: [PATCH] libgcc: fix backtrace fallback on PowerPC Big-endian. [PR103004]
Hi Segher, On 11/11/2021 10:43, Segher Boessenkool wrote: Hi! On Wed, Nov 10, 2021 at 06:59:23PM -0300, Raphael Moreira Zinsly wrote: At the end of the backtrace stream _Unwind_Find_FDE() may not be able to find the frame unwind info and will later call the backtrace fallback instead of finishing. This occurs when using an old libc on ppc64 due to dl_iterate_phdr() not being able to set the fde in the last trace. When this occurs the cfa of the trace will be behind of context's cfa. Also, libgo’s probestackmaps() calls the backtrace with a null pointer and can get to the backchain fallback with the same problem, in this case we are only interested in find a stack map, we don't need nor can do a backchain. _Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP. libgcc/ChangeLog: * config/rs6000/linux-unwind.h (ppc_backchain_fallback): turn into static to fix -Wmissing-prototypes. Check if it's called with a null argument or at the end of the backtrace and return. * unwind.inc (_Unwind_ForcedUnwind_Phase2): treat _URC_NORMAL_STOP. Formatting is messed up. Lines start with a capital. Two spaces after full stop, while you're at it. Ok. -void ppc_backchain_fallback (struct _Unwind_Context *context, void *a) +static void +ppc_backchain_fallback (struct _Unwind_Context *context, void *a) This was already fixed in 75ef0353a2d3. Ops, missed that. { struct frame_layout *current; struct trace_arg *arg = a; int count; - /* Get the last address computed and start with the next. */ + /* Get the last address computed. */ current = context->cfa; Empty line after here please. Most of the time if you have a full-line comment it means a new paragraph is starting. Ok. + /* If the trace CFA is not the context CFA the backtrace is done. */ + if (arg == NULL || arg->cfa != current) + return; + + /* Start with next address. */ current = current->backchain; Like you did here :-) Do you have a testcase (that failed without this, but now doesn't)? I don't have a simple testcase for that, but many of the asan and go tests catch that. Looks okay, but please update and resend. Segher Thanks, -- Raphael Moreira Zinsly
Re: Fix recursion discovery in ipa-pure-const
On Thu, Nov 11, 2021 at 2:41 PM Jan Hubicka via Gcc-patches wrote: > > Hi, > We make self recursive functions as looping of fear of endless recursion. > This is done correctly for local pure/const and for non-trivial SCCs in > callgraph, but for trivial SCCs we miss the flag. > > I think it is bad decision since infinite recursion will run out of stack, Note it might not always in case we can eliminate the tail-recursion or avoid stack use by the recursion by other means. So I think it is conservatively correct. Richard. > but changing it upsets some testcases and should be done independently. > So this patch is fixing current behaviour to be consistent. > > Bootstrapped/regtested x86_64-linux, comitted. > > gcc/ChangeLog: > > 2021-11-11 Jan Hubicka > > * ipa-pure-const.c (propagate_pure_const): Self recursion is > a side effects. > > diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c > index 505ed4f8a3b..64777cd2d91 100644 > --- a/gcc/ipa-pure-const.c > +++ b/gcc/ipa-pure-const.c > @@ -1513,6 +1611,9 @@ propagate_pure_const (void) > enum pure_const_state_e edge_state = IPA_CONST; > bool edge_looping = false; > > + if (e->recursive_p ()) > + looping = true; > + > if (dump_file && (dump_flags & TDF_DETAILS)) > { > fprintf (dump_file, "Call to %s",
[COMMITTED] Move import population from threader to path solver.
Imports are our nomenclature for external SSA names to a block that are used to calculate the outgoing edges for said block. For example, in the following snippet: : _1 = b_10 == block_11; _2 = b_10 != -1; _3 = _1 & _2; if (_3 != 0) goto ; [INV] else goto ; [INV] ...the imports to the block are b_10 and block_11 since they are both needed to calculate _3. The path solver takes a bitmap of imports in addition to the path itself. This sets up the number of SSA names to be on the lookout for, while resolving the final conditional. Calculating these imports was initially done in the threader, since it was the only user of the path solver. With new clients, it has become obvious that populating the imports should be a task for the path solver, so it can be shared among the clients. This patch moves the import code to the solver, making both the solver and the threader simpler in the process. This is because intent is clearer and some duplicate code was removed. This reshuffling had the net effect of giving us a handful of new threads through my suite of .ii files (125). This was unexpected, but welcome nevertheless. There is no performance difference in callgrind over the same suite. Regstrapped on x86-64 Linux. gcc/ChangeLog: * gimple-range-path.cc (path_range_query::add_copies_to_imports): Rename to... (path_range_query::compute_imports): ...this. Adapt it so it can be passed the imports bitmap instead of working on m_imports. (path_range_query::compute_ranges): Call compute_imports in all cases unless an imports bitmap is passed. * gimple-range-path.h (path_range_query::compute_imports): New. (path_range_query::add_copies_to_imports): Remove. * tree-ssa-threadbackward.c (back_threader::resolve_def): Remove. (back_threader::find_paths_to_names): Inline resolve_def. (back_threader::find_paths): Call compute_imports. (back_threader::resolve_phi): Adjust comment. --- gcc/gimple-range-path.cc | 45 - gcc/gimple-range-path.h | 2 +- gcc/tree-ssa-threadbackward.c | 47 ++- 3 files changed, 30 insertions(+), 64 deletions(-) diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc index 6da01c7067f..4843c133e62 100644 --- a/gcc/gimple-range-path.cc +++ b/gcc/gimple-range-path.cc @@ -439,26 +439,32 @@ path_range_query::add_to_imports (tree name, bitmap imports) return false; } -// Add the copies of any SSA names in IMPORTS to IMPORTS. +// Compute the imports to the path ending in EXIT. These are +// essentially the SSA names used to calculate the final conditional +// along the path. // -// These are hints for the solver. Adding more elements (within -// reason) doesn't slow us down, because we don't solve anything that -// doesn't appear in the path. On the other hand, not having enough -// imports will limit what we can solve. +// They are hints for the solver. Adding more elements doesn't slow +// us down, because we don't solve anything that doesn't appear in the +// path. On the other hand, not having enough imports will limit what +// we can solve. void -path_range_query::add_copies_to_imports () +path_range_query::compute_imports (bitmap imports, basic_block exit) { - auto_vec worklist (bitmap_count_bits (m_imports)); + // Start with the imports from the exit block... + bitmap r_imports = m_ranger.gori ().imports (exit); + bitmap_copy (imports, r_imports); + + auto_vec worklist (bitmap_count_bits (imports)); bitmap_iterator bi; unsigned i; - - EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi) + EXECUTE_IF_SET_IN_BITMAP (imports, 0, i, bi) { tree name = ssa_name (i); worklist.quick_push (name); } + // ...and add any operands used to define these imports. while (!worklist.is_empty ()) { tree name = worklist.pop (); @@ -466,15 +472,12 @@ path_range_query::add_copies_to_imports () if (is_gimple_assign (def_stmt)) { - // ?? Adding assignment copies doesn't get us much. At the - // time of writing, we got 63 more threaded paths across the - // .ii files from a bootstrap. - add_to_imports (gimple_assign_rhs1 (def_stmt), m_imports); + add_to_imports (gimple_assign_rhs1 (def_stmt), imports); tree rhs = gimple_assign_rhs2 (def_stmt); - if (rhs && add_to_imports (rhs, m_imports)) + if (rhs && add_to_imports (rhs, imports)) worklist.safe_push (rhs); rhs = gimple_assign_rhs3 (def_stmt); - if (rhs && add_to_imports (rhs, m_imports)) + if (rhs && add_to_imports (rhs, imports)) worklist.safe_push (rhs); } else if (gphi *phi = dyn_cast (def_stmt)) @@ -486,7 +489,7 @@ path_range_query::add_copies_to_imports () if (TREE_CODE (arg) == SSA_NAME && m_pat
[PATCH v2] libgcc: fix backtrace fallback on PowerPC Big-endian. [PR103004]
Changes since v1: - Removed -Wmissing-prototypes fix. - Fixed formatting of Changelog and patch. --->8--- At the end of the backtrace stream _Unwind_Find_FDE() may not be able to find the frame unwind info and will later call the backtrace fallback instead of finishing. This occurs when using an old libc on ppc64 due to dl_iterate_phdr() not being able to set the fde in the last trace. When this occurs the cfa of the trace will be behind of context's cfa. Also, libgo’s probestackmaps() calls the backtrace with a null pointer and can get to the backchain fallback with the same problem, in this case we are only interested in find a stack map, we don't need nor can do a backchain. _Unwind_ForcedUnwind_Phase2() can hit the same issue as it uses uw_frame_state_for(), so we need to treat _URC_NORMAL_STOP. libgcc/ChangeLog: * config/rs6000/linux-unwind.h (ppc_backchain_fallback): Check if it's called with a null argument or at the end of the backtrace and return. * unwind.inc (_Unwind_ForcedUnwind_Phase2): Treat _URC_NORMAL_STOP. --- libgcc/config/rs6000/linux-unwind.h | 8 +++- libgcc/unwind.inc | 5 +++-- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/libgcc/config/rs6000/linux-unwind.h b/libgcc/config/rs6000/linux-unwind.h index 8deccc1d650..ad1ab286a2f 100644 --- a/libgcc/config/rs6000/linux-unwind.h +++ b/libgcc/config/rs6000/linux-unwind.h @@ -401,8 +401,14 @@ void ppc_backchain_fallback (struct _Unwind_Context *context, void *a) struct trace_arg *arg = a; int count; - /* Get the last address computed and start with the next. */ + /* Get the last address computed. */ current = context->cfa; + + /* If the trace CFA is not the context CFA the backtrace is done. */ + if (arg == NULL || arg->cfa != current) + return; + + /* Start with next address. */ current = current->backchain; for (count = arg->count; current != NULL; current = current->backchain) diff --git a/libgcc/unwind.inc b/libgcc/unwind.inc index 456a5ee682f..dc2f9c13e97 100644 --- a/libgcc/unwind.inc +++ b/libgcc/unwind.inc @@ -160,12 +160,13 @@ _Unwind_ForcedUnwind_Phase2 (struct _Unwind_Exception *exc, /* Set up fs to describe the FDE for the caller of cur_context. */ code = uw_frame_state_for (context, &fs); - if (code != _URC_NO_REASON && code != _URC_END_OF_STACK) + if (code != _URC_NO_REASON && code != _URC_END_OF_STACK + && code != _URC_NORMAL_STOP) return _URC_FATAL_PHASE2_ERROR; /* Unwind successful. */ action = _UA_FORCE_UNWIND | _UA_CLEANUP_PHASE; - if (code == _URC_END_OF_STACK) + if (code == _URC_END_OF_STACK || code == _URC_NORMAL_STOP) action |= _UA_END_OF_STACK; stop_code = (*stop) (1, action, exc->exception_class, exc, context, stop_argument); -- 2.31.1
Re: Fix recursion discovery in ipa-pure-const
> On Thu, Nov 11, 2021 at 2:41 PM Jan Hubicka via Gcc-patches > wrote: > > > > Hi, > > We make self recursive functions as looping of fear of endless recursion. > > This is done correctly for local pure/const and for non-trivial SCCs in > > callgraph, but for trivial SCCs we miss the flag. > > > > I think it is bad decision since infinite recursion will run out of stack, > > Note it might not always in case we can eliminate the tail-recursion or avoid > stack use by the recursion by other means. So I think it is conservatively > correct. I don't know. If function is pure and has infinite recursion in it it means that it can only run forever without side effects if it gets lucky and we tail-recurse it. There are no other means avoid the stack use from growing. First i think code relying on tail-recurse optimization to not run out of stack is not strictly valid in C/C++ other languages we care. Also in C++ there is the forced progression which makes even the tail optiimzed code invalid. I think in high level code such recursive accessors used for no good reason are not that infrequent. Also we had this bug in tree probably forever since LOOPING_PURE_CONST was added and no one complained ;) Relaxing this rule breaks some testcases, but odd ones - they are infinitely self-recursive builtin implementations where we then both prove function as noreturn & later optimize builtin to constant so the assembly matching does not see expected thing. Honza
[committed] Testsuite: Various fixes for nios2.
I've pushed the attached patch to clean up some test failures I've seen on nios2-elf. This target defaults to -fno-delete-null-pointer-checks so any optimization tests that depend on assumptions that valid pointers are non-zero have to be marked explicitly. The others ought to be obvious, except perhaps struct-by-value-1.c which was giving a link error about overflowing the small data region without -G0. My last set of test results were pretty messy but I think almost all of the problems are not nios2-specific (e.g., PR103166, PR103163). I think it is better to wait until we're into stage 3 and the churn settles down some before I make another pass to triage remaining nios2-specific problems, but I might as well check in what I have now instead of sitting on it. -Sandra commit eb43f1a95d1d7a0f88a8107d860e5343507554dd Author: Sandra Loosemore Date: Thu Nov 11 06:31:02 2021 -0800 Testsuite: Various fixes for nios2. 2021-11-11 Sandra Loosemore gcc/testsuite/ * g++.dg/warn/Wmismatched-new-delete-5.C: Add -fdelete-null-pointer-checks. * gcc.dg/attr-returns-nonnull.c: Likewise. * gcc.dg/debug/btf/btf-datasec-1.c: Add -G0 option for nios2. * gcc.dg/ifcvt-4.c: Skip on nios2. * gcc.dg/struct-by-value-1.c: Add -G0 option for nios2. diff --git a/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-5.C b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-5.C index 92c75df..bac2b68 100644 --- a/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-5.C +++ b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-5.C @@ -1,7 +1,7 @@ /* PR c++/100876 - -Wmismatched-new-delete should either look through or ignore placement new { dg-do compile } - { dg-options "-O2 -Wall" } */ + { dg-options "-O2 -Wall -fdelete-null-pointer-checks" } */ extern "C" { void* malloc (__SIZE_TYPE__); diff --git a/gcc/testsuite/gcc.dg/attr-returns-nonnull.c b/gcc/testsuite/gcc.dg/attr-returns-nonnull.c index 22ee30a..e4e20b8 100644 --- a/gcc/testsuite/gcc.dg/attr-returns-nonnull.c +++ b/gcc/testsuite/gcc.dg/attr-returns-nonnull.c @@ -1,7 +1,7 @@ /* Verify that attribute returns_nonnull on global and local function declarations is merged. { dg-do compile } - { dg-options "-Wall -fdump-tree-optimized" } */ + { dg-options "-Wall -fdump-tree-optimized -fdelete-null-pointer-checks" } */ void foo (void); diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c index f809d93..dbb236b 100644 --- a/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c +++ b/gcc/testsuite/gcc.dg/debug/btf/btf-datasec-1.c @@ -12,6 +12,7 @@ /* { dg-do compile ) */ /* { dg-options "-O0 -gbtf -dA" } */ /* { dg-options "-O0 -gbtf -dA -msdata=none" { target { { powerpc*-*-* } && ilp32 } } } */ +/* { dg-options "-O0 -gbtf -dA -G0" { target { nios2-*-* } } } */ /* Check for two DATASEC entries with vlen 3, and one with vlen 1. */ /* { dg-final { scan-assembler-times "0xf03\[\t \]+\[^\n\]*btt_info" 2 } } */ diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c index e74e449..0525102 100644 --- a/gcc/testsuite/gcc.dg/ifcvt-4.c +++ b/gcc/testsuite/gcc.dg/ifcvt-4.c @@ -2,7 +2,7 @@ /* { dg-additional-options "-misel" { target { powerpc*-*-* } } } */ /* { dg-additional-options "-march=z196" { target { s390x-*-* } } } */ /* { dg-additional-options "-mtune-ctrl=^one_if_conv_insn" { target { i?86-*-* x86_64-*-* } } } */ -/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* avr-*-* hppa*64*-*-* s390-*-* visium-*-*" riscv*-*-* msp430-*-* } } */ +/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* avr-*-* hppa*64*-*-* s390-*-* visium-*-*" riscv*-*-* msp430-*-* nios2-*-*} } */ /* { dg-skip-if "" { "s390x-*-*" } { "-m31" } } */ typedef int word __attribute__((mode(word))); diff --git a/gcc/testsuite/gcc.dg/struct-by-value-1.c b/gcc/testsuite/gcc.dg/struct-by-value-1.c index addf253..ae7adb5 100644 --- a/gcc/testsuite/gcc.dg/struct-by-value-1.c +++ b/gcc/testsuite/gcc.dg/struct-by-value-1.c @@ -1,6 +1,7 @@ /* Test structure passing by value. */ /* { dg-do run } */ /* { dg-options "-O2" } */ +/* { dg-options "-O2 -G0" { target { nios2-*-* } } } */ #define T(N) \ struct S##N { unsigned char i[N]; }; \
Fix some side cases of side effects analysis
Hi, I wrote script comparing modref pure/const discovery with ipa-pure-const and found mistakes on both ends. I fixed ipa-pure-const in previous two patches. This plugs the case where modref was too optimistic in handling looping pure consts which were previously missed due to early exits on ECF_CONST | ECF_PURE. Those early exists are bit anoying and I think as a cleanup I may just drop some of them as premature optimizations coming from time modref was very simplistic on what it propagates. Bootstrapped/regtested x86_64-linux, will commit it shortly. gcc/ChangeLog: 2021-11-11 Jan Hubicka * ipa-modref.c (modref_summary::useful_p): Check also for side-effects with looping const/pure. (modref_summary_lto::useful_p): Likewise. (merge_call_side_effects): Merge side effects before early exit for pure/const. (process_fnspec): Also handle pure functions. (analyze_call): Do not early exit on looping pure const. (propagate_unknown_call): Also handle nontrivial SCC as side-effect. (modref_propagate_in_scc): diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c index f8b7b900527..45b391a565e 100644 --- a/gcc/ipa-modref.c +++ b/gcc/ipa-modref.c @@ -331,11 +331,11 @@ modref_summary::useful_p (int ecf_flags, bool check_flags) && remove_useless_eaf_flags (static_chain_flags, ecf_flags, false)) return true; if (ecf_flags & (ECF_CONST | ECF_NOVOPS)) -return false; +return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE)); if (loads && !loads->every_base) return true; if (ecf_flags & ECF_PURE) -return false; +return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE)); return stores && !stores->every_base; } @@ -416,11 +416,11 @@ modref_summary_lto::useful_p (int ecf_flags, bool check_flags) && remove_useless_eaf_flags (static_chain_flags, ecf_flags, false)) return true; if (ecf_flags & (ECF_CONST | ECF_NOVOPS)) -return false; +return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE)); if (loads && !loads->every_base) return true; if (ecf_flags & ECF_PURE) -return false; +return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE)); return stores && !stores->every_base; } @@ -925,6 +925,18 @@ merge_call_side_effects (modref_summary *cur_summary, auto_vec parm_map; modref_parm_map chain_map; bool changed = false; + int flags = gimple_call_flags (stmt); + + if (!cur_summary->side_effects && callee_summary->side_effects) +{ + if (dump_file) + fprintf (dump_file, " - merging side effects.\n"); + cur_summary->side_effects = true; + changed = true; +} + + if (flags & (ECF_CONST | ECF_NOVOPS)) +return changed; /* We can not safely optimize based on summary of callee if it does not always bind to current def: it is possible that memory load @@ -988,12 +1000,6 @@ merge_call_side_effects (modref_summary *cur_summary, changed = true; } } - if (!cur_summary->side_effects - && callee_summary->side_effects) -{ - cur_summary->side_effects = true; - changed = true; -} return changed; } @@ -1091,7 +1097,7 @@ process_fnspec (modref_summary *cur_summary, attr_fnspec fnspec = gimple_call_fnspec (call); int flags = gimple_call_flags (call); - if (!(flags & (ECF_CONST | ECF_NOVOPS)) + if (!(flags & (ECF_CONST | ECF_NOVOPS | ECF_PURE)) || (flags & ECF_LOOPING_CONST_OR_PURE) || (cfun->can_throw_non_call_exceptions && stmt_could_throw_p (cfun, call))) @@ -1101,6 +1107,8 @@ process_fnspec (modref_summary *cur_summary, if (cur_summary_lto) cur_summary_lto->side_effects = true; } + if (flags & (ECF_CONST | ECF_NOVOPS)) +return true; if (!fnspec.known_p ()) { if (dump_file && gimple_call_builtin_p (call, BUILT_IN_NORMAL)) @@ -1203,7 +1211,8 @@ analyze_call (modref_summary *cur_summary, modref_summary_lto *cur_summary_lto, /* Check flags on the function call. In certain cases, analysis can be simplified. */ int flags = gimple_call_flags (stmt); - if (flags & (ECF_CONST | ECF_NOVOPS)) + if ((flags & (ECF_CONST | ECF_NOVOPS)) + && !(flags & ECF_LOOPING_CONST_OR_PURE)) { if (dump_file) fprintf (dump_file, @@ -3963,7 +3972,8 @@ static bool propagate_unknown_call (cgraph_node *node, cgraph_edge *e, int ecf_flags, modref_summary *cur_summary, - modref_summary_lto *cur_summary_lto) + modref_summary_lto *cur_summary_lto, + bool nontrivial_scc) { bool changed = false; class fnspec_summary *fnspec_sum = fnspec_summaries->get (e); @@ -3973,12 +3983,12 @@ propagate_unknown_call (cgraph_node *node, if (e->callee && builtin_safe_for_const_function_p (&looping, e->callee->decl)) { - if (cur_summary && !cur_
[Patch] Fortran/openmp: Add support for 2 argument num_teams clause
Just the Fortran FE work + Fortranized version for the C tests. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 Fortran/openmp: Add support for 2 argument num_teams clause Fortran part to commit r12-5146-g48d7327f2aaf65 gcc/fortran/ChangeLog: * gfortran.h (struct gfc_omp_clauses): Rename num_teams to num_teams_upper, add num_teams_upper. * dump-parse-tree.c (show_omp_clauses): Update to handle lower-bound num_teams clause. * frontend-passes.c (gfc_code_walker): Likewise * openmp.c (gfc_free_omp_clauses, gfc_match_omp_clauses, resolve_omp_clauses): Likewise. * trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses, gfc_trans_omp_target): Likewise. libgomp/ChangeLog: * testsuite/libgomp.fortran/teams-1.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/num-teams-1.f90: New test. * gfortran.dg/gomp/num-teams-2.f90: New test. gcc/fortran/dump-parse-tree.c | 9 - gcc/fortran/frontend-passes.c | 3 +- gcc/fortran/gfortran.h | 3 +- gcc/fortran/openmp.c | 32 +--- gcc/fortran/trans-openmp.c | 35 - gcc/testsuite/gfortran.dg/gomp/num-teams-1.f90 | 53 ++ gcc/testsuite/gfortran.dg/gomp/num-teams-2.f90 | 37 ++ libgomp/testsuite/libgomp.fortran/teams-1.f90 | 22 +++ 8 files changed, 175 insertions(+), 19 deletions(-) diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c index 14a307856fc..04660d5074a 100644 --- a/gcc/fortran/dump-parse-tree.c +++ b/gcc/fortran/dump-parse-tree.c @@ -1741,10 +1741,15 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses) } fprintf (dumpfile, " BIND(%s)", type); } - if (omp_clauses->num_teams) + if (omp_clauses->num_teams_upper) { fputs (" NUM_TEAMS(", dumpfile); - show_expr (omp_clauses->num_teams); + if (omp_clauses->num_teams_lower) + { + show_expr (omp_clauses->num_teams_lower); + fputc (':', dumpfile); + } + show_expr (omp_clauses->num_teams_upper); fputc (')', dumpfile); } if (omp_clauses->device) diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c index 145bff50f3e..f5ba7cecd54 100644 --- a/gcc/fortran/frontend-passes.c +++ b/gcc/fortran/frontend-passes.c @@ -5634,7 +5634,8 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t codefn, walk_expr_fn_t exprfn, WALK_SUBEXPR (co->ext.omp_clauses->chunk_size); WALK_SUBEXPR (co->ext.omp_clauses->safelen_expr); WALK_SUBEXPR (co->ext.omp_clauses->simdlen_expr); - WALK_SUBEXPR (co->ext.omp_clauses->num_teams); + WALK_SUBEXPR (co->ext.omp_clauses->num_teams_lower); + WALK_SUBEXPR (co->ext.omp_clauses->num_teams_upper); WALK_SUBEXPR (co->ext.omp_clauses->device); WALK_SUBEXPR (co->ext.omp_clauses->thread_limit); WALK_SUBEXPR (co->ext.omp_clauses->dist_chunk_size); diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 9378b4b8a24..1ad2f0df702 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -1502,7 +1502,8 @@ typedef struct gfc_omp_clauses struct gfc_expr *chunk_size; struct gfc_expr *safelen_expr; struct gfc_expr *simdlen_expr; - struct gfc_expr *num_teams; + struct gfc_expr *num_teams_lower; + struct gfc_expr *num_teams_upper; struct gfc_expr *device; struct gfc_expr *thread_limit; struct gfc_expr *grainsize; diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index dcf22ac2c2f..7b2df0d0be3 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -85,7 +85,8 @@ gfc_free_omp_clauses (gfc_omp_clauses *c) gfc_free_expr (c->chunk_size); gfc_free_expr (c->safelen_expr); gfc_free_expr (c->simdlen_expr); - gfc_free_expr (c->num_teams); + gfc_free_expr (c->num_teams_lower); + gfc_free_expr (c->num_teams_upper); gfc_free_expr (c->device); gfc_free_expr (c->thread_limit); gfc_free_expr (c->dist_chunk_size); @@ -2420,11 +2421,22 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask, continue; } if ((mask & OMP_CLAUSE_NUM_TEAMS) - && (m = gfc_match_dupl_check (!c->num_teams, "num_teams", true, - &c->num_teams)) != MATCH_NO) + && (m = gfc_match_dupl_check (!c->num_teams_upper, "num_teams", + true)) != MATCH_NO) { if (m == MATCH_ERROR) goto error; + if (gfc_match ("%e ", &c->num_teams_upper) != MATCH_YES) + goto error; + if (gfc_peek_ascii_char () == ':') + { + c->num_teams_lower = c->num_teams_upper; + c->num_teams_upper = NULL; + if (gfc_match (": %e ", &c->num_teams_upper) != MATCH_YES) + goto error; + } + if (gfc_match (") ") != MATCH_YES) + goto error; continue; }
[PATCH] tree-optimization/103190 - fix assert in reassoc stmt placement with asm
This makes sure to only assert we don't run into a asm goto when inserting a stmt in reassoc, matching the condition in can_reassociate_p. We can handle EH edges from an asm just like EH edges from any other stmt. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2021-11-11 Richard Biener PR tree-optimization/103190 * tree-ssa-reassoc.c (insert_stmt_after): Only assert on asm goto. --- gcc/tree-ssa-reassoc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 6a555e7c553..65316223047 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -1515,7 +1515,8 @@ insert_stmt_after (gimple *stmt, gimple *insert_point) gsi_insert_after (&gsi, stmt, GSI_NEW_STMT); return; } - else if (gimple_code (insert_point) == GIMPLE_ASM) + else if (gimple_code (insert_point) == GIMPLE_ASM + && gimple_asm_nlabels (as_a (insert_point)) != 0) /* We have no idea where to insert - it depends on where the uses will be placed. */ gcc_unreachable (); -- 2.31.1