[PATCH] Speed up LC SSA rewrite
The following avoids collecting all loops exit blocks into bitmaps and computing the union of those up the loop tree possibly repeatedly. Instead we make sure to do this only once for each loop with a definition possibly requiring a LC phi node plus make sure to leverage recorded exits to avoid the intermediate bitmap allocation. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * tree-ssa-loop-manip.cc (compute_live_loop_exits): Take the def loop exit block bitmap as argument instead of re-computing it here. (add_exit_phis_var): Adjust. (loop_name_cmp): New function. (add_exit_phis): Sort variables to insert LC PHI nodes after definition loop, for each definition loop compute the exit block bitmap once. (get_loops_exit): Remove. (rewrite_into_loop_closed_ssa_1): Do not pre-record all loop exit blocks into bitmaps. Record loop exits if required. --- gcc/tree-ssa-loop-manip.cc | 95 ++ 1 file changed, 56 insertions(+), 39 deletions(-) diff --git a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc index 9f3b62652ea..0324ff60a0f 100644 --- a/gcc/tree-ssa-loop-manip.cc +++ b/gcc/tree-ssa-loop-manip.cc @@ -183,12 +183,14 @@ find_sibling_superloop (class loop *use_loop, class loop *def_loop) /* DEF_BB is a basic block containing a DEF that needs rewriting into loop-closed SSA form. USE_BLOCKS is the set of basic blocks containing uses of DEF that "escape" from the loop containing DEF_BB (i.e. blocks in - USE_BLOCKS are dominated by DEF_BB but not in the loop father of DEF_B). + USE_BLOCKS are dominated by DEF_BB but not in the loop father of DEF_BB). ALL_EXITS[I] is the set of all basic blocks that exit loop I. + DEF_LOOP_EXITS is a bitmap of loop exit blocks that exit the loop + containing DEF_BB or its outer loops. - Compute the subset of LOOP_EXITS that exit the loop containing DEF_BB - or one of its loop fathers, in which DEF is live. This set is returned - in the bitmap LIVE_EXITS. + Compute the subset of loop exit destinations that exit the loop + containing DEF_BB or one of its loop fathers, in which DEF is live. + This set is returned in the bitmap LIVE_EXITS. Instead of computing the complete livein set of the def, we use the loop nesting tree as a form of poor man's structure analysis. This greatly @@ -197,18 +199,17 @@ find_sibling_superloop (class loop *use_loop, class loop *def_loop) static void compute_live_loop_exits (bitmap live_exits, bitmap use_blocks, -bitmap *loop_exits, basic_block def_bb) +basic_block def_bb, bitmap def_loop_exits) { unsigned i; bitmap_iterator bi; class loop *def_loop = def_bb->loop_father; unsigned def_loop_depth = loop_depth (def_loop); - bitmap def_loop_exits; /* Normally the work list size is bounded by the number of basic blocks in the largest loop. We don't know this number, but we can be fairly sure that it will be relatively small. */ - auto_vec worklist (MAX (8, n_basic_blocks_for_fn (cfun) / 128)); + auto_vec worklist (MAX (8, n_basic_blocks_for_fn (cfun) / 128)); EXECUTE_IF_SET_IN_BITMAP (use_blocks, 0, i, bi) { @@ -272,13 +273,7 @@ compute_live_loop_exits (bitmap live_exits, bitmap use_blocks, } } - def_loop_exits = BITMAP_ALLOC (&loop_renamer_obstack); - for (class loop *loop = def_loop; - loop != current_loops->tree_root; - loop = loop_outer (loop)) -bitmap_ior_into (def_loop_exits, loop_exits[loop->num]); bitmap_and_into (live_exits, def_loop_exits); - BITMAP_FREE (def_loop_exits); } /* Add a loop-closing PHI for VAR in basic block EXIT. */ @@ -322,23 +317,33 @@ add_exit_phi (basic_block exit, tree var) Exits of the loops are stored in LOOP_EXITS. */ static void -add_exit_phis_var (tree var, bitmap use_blocks, bitmap *loop_exits) +add_exit_phis_var (tree var, bitmap use_blocks, bitmap def_loop_exits) { unsigned index; bitmap_iterator bi; basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var)); - bitmap live_exits = BITMAP_ALLOC (&loop_renamer_obstack); gcc_checking_assert (! bitmap_bit_p (use_blocks, def_bb->index)); - compute_live_loop_exits (live_exits, use_blocks, loop_exits, def_bb); + auto_bitmap live_exits (&loop_renamer_obstack); + compute_live_loop_exits (live_exits, use_blocks, def_bb, def_loop_exits); EXECUTE_IF_SET_IN_BITMAP (live_exits, 0, index, bi) { add_exit_phi (BASIC_BLOCK_FOR_FN (cfun, index), var); } +} - BITMAP_FREE (live_exits); +static int +loop_name_cmp (const void *p1, const void *p2) +{ + auto l1 = (const std::pair *)p1; + auto l2 = (const std::pair *)p2; + if (l1->first < l2->first) +return -1; + else if (l1->first > l2->first) +return 1; + return 0; } /* Add exit phis for the names marked in NAMES_TO_RENAME. @@ -346,31 +35
Re: [PATCH]middle-end: don't lower past veclower [PR106063]
On Tue, 5 Jul 2022, Tamar Christina wrote: > Hi All, > > My previous patch can cause a problem if the pattern matches after veclower > as it may replace the construct with a vector sequence which the target may > not > directly support. > > As such don't perform the rewriting if after veclower. Note that when doing the rewriting before veclower to a variant not supported by the target can cause veclower to generate absymal code. In some cases we are very careful and try to at least preserve code supported by the target over transforming that into a variant not supported. That said, a better fix would be to check whether the target can perform the new comparison. Before veclower it would be OK to do the transform nevertheless in case it cannot do the original transform. Richard. > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > and no issues. > > Ok for master? and backport to GCC 12? > > Thanks, > Tamar > > > gcc/ChangeLog: > > PR tree-optimization/106063 > * match.pd: Do not apply pattern after veclower. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/106063 > * gcc.dg/pr106063.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/match.pd b/gcc/match.pd > index > 40c09bedadb89dabb6622559a8f69df5384e61fd..ba161892a98756c0278dc40fc377d7d0deaacbcf > 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -6040,7 +6040,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) >(simplify > (cmp (bit_and:c@2 @0 cst@1) integer_zerop) > (with { tree csts = bitmask_inv_cst_vector_p (@1); } > - (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2))) > + (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)) > + && optimize_vectors_before_lowering_p ()) >(if (TYPE_UNSIGNED (TREE_TYPE (@1))) > (icmp @0 { csts; }) > (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); } > diff --git a/gcc/testsuite/gcc.dg/pr106063.c b/gcc/testsuite/gcc.dg/pr106063.c > new file mode 100644 > index > ..b23596724f6bb98c53af2dce77d31509bab10378 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr106063.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fno-tree-forwprop --disable-tree-evrp" } */ > +typedef __int128 __attribute__((__vector_size__ (16))) V; > + > +V > +foo (V v) > +{ > + return (v & (V){15}) == v; > +} > > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstra
RE: [PATCH]middle-end: don't lower past veclower [PR106063]
> -Original Message- > From: Richard Biener > Sent: Thursday, July 7, 2022 8:19 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd > Subject: Re: [PATCH]middle-end: don't lower past veclower [PR106063] > > On Tue, 5 Jul 2022, Tamar Christina wrote: > > > Hi All, > > > > My previous patch can cause a problem if the pattern matches after > > veclower as it may replace the construct with a vector sequence which > > the target may not directly support. > > > > As such don't perform the rewriting if after veclower. > > Note that when doing the rewriting before veclower to a variant not > supported by the target can cause veclower to generate absymal code. In > some cases we are very careful and try to at least preserve code supported > by the target over transforming that into a variant not supported. > > That said, a better fix would be to check whether the target can perform the > new comparison. Before veclower it would be OK to do the transform > nevertheless in case it cannot do the original transform. This last statement is somewhat confusing. Did you want me to change it such that before veclower the rewrite is always done and after veclowering only if the target supports it? Or did you want me to never do the rewrite if the target doesn't support it? Thanks, Tamar > > Richard. > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > > and no issues. > > > > Ok for master? and backport to GCC 12? > > > > Thanks, > > Tamar > > > > > > gcc/ChangeLog: > > > > PR tree-optimization/106063 > > * match.pd: Do not apply pattern after veclower. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/106063 > > * gcc.dg/pr106063.c: New test. > > > > --- inline copy of patch -- > > diff --git a/gcc/match.pd b/gcc/match.pd index > > > 40c09bedadb89dabb6622559a8f69df5384e61fd..ba161892a98756c0278dc40fc > 377 > > d7d0deaacbcf 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -6040,7 +6040,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >(simplify > > (cmp (bit_and:c@2 @0 cst@1) integer_zerop) > > (with { tree csts = bitmask_inv_cst_vector_p (@1); } > > - (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2))) > > + (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)) > > + && optimize_vectors_before_lowering_p ()) > >(if (TYPE_UNSIGNED (TREE_TYPE (@1))) > > (icmp @0 { csts; }) > > (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); } > > diff --git a/gcc/testsuite/gcc.dg/pr106063.c > > b/gcc/testsuite/gcc.dg/pr106063.c new file mode 100644 index > > > ..b23596724f6bb98c53af2dce77 > d3 > > 1509bab10378 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/pr106063.c > > @@ -0,0 +1,9 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fno-tree-forwprop --disable-tree-evrp" } */ > > +typedef __int128 __attribute__((__vector_size__ (16))) V; > > + > > +V > > +foo (V v) > > +{ > > + return (v & (V){15}) == v; > > +} > > > > > > > > > > > > -- > Richard Biener > SUSE Software Solutions Germany GmbH, Frankenstra
RE: [PATCH]middle-end: don't lower past veclower [PR106063]
On Thu, 7 Jul 2022, Tamar Christina wrote: > > -Original Message- > > From: Richard Biener > > Sent: Thursday, July 7, 2022 8:19 AM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd > > Subject: Re: [PATCH]middle-end: don't lower past veclower [PR106063] > > > > On Tue, 5 Jul 2022, Tamar Christina wrote: > > > > > Hi All, > > > > > > My previous patch can cause a problem if the pattern matches after > > > veclower as it may replace the construct with a vector sequence which > > > the target may not directly support. > > > > > > As such don't perform the rewriting if after veclower. > > > > Note that when doing the rewriting before veclower to a variant not > > supported by the target can cause veclower to generate absymal code. In > > some cases we are very careful and try to at least preserve code supported > > by the target over transforming that into a variant not supported. > > > > That said, a better fix would be to check whether the target can perform the > > new comparison. Before veclower it would be OK to do the transform > > nevertheless in case it cannot do the original transform. > > This last statement is somewhat confusing. Did you want me to change it such > that > before veclower the rewrite is always done and after veclowering only if the > target > supports it? > > Or did you want me to never do the rewrite if the target doesn't support it? I meant before veclower you can do the rewrite if either the rewriting result is supported by the target OR if the original expression is _not_ supported by the target. The latter case might be not too important to worry doing (it would still canonicalize for those targets then). After veclower you can only rewrite under the former condition. Richard. > Thanks, > Tamar > > > > > Richard. > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > > > and no issues. > > > > > > Ok for master? and backport to GCC 12? > > > > > > Thanks, > > > Tamar > > > > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/106063 > > > * match.pd: Do not apply pattern after veclower. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR tree-optimization/106063 > > > * gcc.dg/pr106063.c: New test. > > > > > > --- inline copy of patch -- > > > diff --git a/gcc/match.pd b/gcc/match.pd index > > > > > 40c09bedadb89dabb6622559a8f69df5384e61fd..ba161892a98756c0278dc40fc > > 377 > > > d7d0deaacbcf 100644 > > > --- a/gcc/match.pd > > > +++ b/gcc/match.pd > > > @@ -6040,7 +6040,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > >(simplify > > > (cmp (bit_and:c@2 @0 cst@1) integer_zerop) > > > (with { tree csts = bitmask_inv_cst_vector_p (@1); } > > > - (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2))) > > > + (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)) > > > + && optimize_vectors_before_lowering_p ()) > > >(if (TYPE_UNSIGNED (TREE_TYPE (@1))) > > > (icmp @0 { csts; }) > > > (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); } > > > diff --git a/gcc/testsuite/gcc.dg/pr106063.c > > > b/gcc/testsuite/gcc.dg/pr106063.c new file mode 100644 index > > > > > ..b23596724f6bb98c53af2dce77 > > d3 > > > 1509bab10378 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/pr106063.c > > > @@ -0,0 +1,9 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -fno-tree-forwprop --disable-tree-evrp" } */ > > > +typedef __int128 __attribute__((__vector_size__ (16))) V; > > > + > > > +V > > > +foo (V v) > > > +{ > > > + return (v & (V){15}) == v; > > > +} > > > > > > > > > > > > > > > > > > > -- > > Richard Biener > > SUSE Software Solutions Germany GmbH, Frankenstra > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstra
RE: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.
> -Original Message- > From: Andrew Pinski > Sent: Wednesday, July 6, 2022 8:37 PM > To: Tamar Christina > Cc: Richard Biener ; nd ; gcc- > patc...@gcc.gnu.org > Subject: Re: [PATCH]middle-end simplify complex if expressions where > comparisons are inverse of one another. > > On Wed, Jul 6, 2022 at 9:06 AM Tamar Christina > wrote: > > > > > -Original Message- > > > From: Andrew Pinski > > > Sent: Wednesday, July 6, 2022 3:10 AM > > > To: Tamar Christina > > > Cc: Richard Biener ; nd ; gcc- > > > patc...@gcc.gnu.org > > > Subject: Re: [PATCH]middle-end simplify complex if expressions where > > > comparisons are inverse of one another. > > > > > > On Tue, Jul 5, 2022 at 8:16 AM Tamar Christina via Gcc-patches > > patc...@gcc.gnu.org> wrote: > > > > > > > > > > > > > > > > > -Original Message- > > > > > From: Richard Biener > > > > > Sent: Monday, June 20, 2022 9:57 AM > > > > > To: Tamar Christina > > > > > Cc: gcc-patches@gcc.gnu.org; nd > > > > > Subject: Re: [PATCH]middle-end simplify complex if expressions > > > > > where comparisons are inverse of one another. > > > > > > > > > > On Thu, 16 Jun 2022, Tamar Christina wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > This optimizes the following sequence > > > > > > > > > > > > ((a < b) & c) | ((a >= b) & d) > > > > > > > > > > > > into > > > > > > > > > > > > (a < b ? c : d) & 1 > > > > > > > > > > > > for scalar. On vector we can omit the & 1. > > > > > > > > > > > > This changes the code generation from > > > > > > > > > > > > zoo2: > > > > > > cmp w0, w1 > > > > > > csetw0, lt > > > > > > csetw1, ge > > > > > > and w0, w0, w2 > > > > > > and w1, w1, w3 > > > > > > orr w0, w0, w1 > > > > > > ret > > > > > > > > > > > > into > > > > > > > > > > > > cmp w0, w1 > > > > > > cselw0, w2, w3, lt > > > > > > and w0, w0, 1 > > > > > > ret > > > > > > > > > > > > and significantly reduces the number of selects we have to do > > > > > > in the vector code. > > > > > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > > > x86_64-pc-linux-gnu and no issues. > > > > > > > > > > > > Ok for master? > > > > > > > > > > > > Thanks, > > > > > > Tamar > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > * fold-const.cc (inverse_conditions_p): Traverse if SSA_NAME. > > > > > > * match.pd: Add new rule. > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > * gcc.target/aarch64/if-compare_1.c: New test. > > > > > > * gcc.target/aarch64/if-compare_2.c: New test. > > > > > > > > > > > > --- inline copy of patch -- > > > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index > > > > > > > > > > > > > > > 39a5a52958d87497f301826e706886b290771a2d..f180599b90150acd3ed895a64 > > > > > 280 > > > > > > aa3255061256 100644 > > > > > > --- a/gcc/fold-const.cc > > > > > > +++ b/gcc/fold-const.cc > > > > > > @@ -2833,15 +2833,38 @@ compcode_to_comparison (enum > > > > > comparison_code > > > > > > code) bool inverse_conditions_p (const_tree cond1, > > > > > > const_tree > > > > > > cond2) { > > > > > > - return (COMPARISON_CLASS_P (cond1) > > > > > > - && COMPARISON_CLASS_P (cond2) > > > > > > - && (invert_tree_comparison > > > > > > - (TREE_CODE (cond1), > > > > > > - HONOR_NANS (TREE_OPERAND (cond1, 0))) == TREE_CODE > > > > > (cond2)) > > > > > > - && operand_equal_p (TREE_OPERAND (cond1, 0), > > > > > > - TREE_OPERAND (cond2, 0), 0) > > > > > > - && operand_equal_p (TREE_OPERAND (cond1, 1), > > > > > > - TREE_OPERAND (cond2, 1), 0)); > > > > > > + if (COMPARISON_CLASS_P (cond1) > > > > > > + && COMPARISON_CLASS_P (cond2) > > > > > > + && (invert_tree_comparison > > > > > > + (TREE_CODE (cond1), > > > > > > + HONOR_NANS (TREE_OPERAND (cond1, 0))) == TREE_CODE > > > > > (cond2)) > > > > > > + && operand_equal_p (TREE_OPERAND (cond1, 0), > > > > > > + TREE_OPERAND (cond2, 0), 0) > > > > > > + && operand_equal_p (TREE_OPERAND (cond1, 1), > > > > > > + TREE_OPERAND (cond2, 1), 0)) > > > > > > +return true; > > > > > > + > > > > > > + if (TREE_CODE (cond1) == SSA_NAME > > > > > > + && TREE_CODE (cond2) == SSA_NAME) > > > > > > +{ > > > > > > + gimple *gcond1 = SSA_NAME_DEF_STMT (cond1); > > > > > > + gimple *gcond2 = SSA_NAME_DEF_STMT (cond2); > > > > > > + if (!is_gimple_assign (gcond1) || !is_gimple_assign (gcond2)) > > > > > > + return false; > > > > > > + > > > > > > + tree_code code1 = gimple_assign_rhs_code (gcond1); > > > > > > + tree_code code2 = gimple_assign_rhs_code (gcond2); > > > > > > + return TREE_CODE_CLASS (code1) == tcc_comparison > > > > > > +&& TREE_CODE_CLASS (code2) == tcc_comparison > > > > > > +&& invert_tree_comparison (code1, > > > > > > +
Re: [GCC 13][PATCH] PR101836: Add a new option -fstrict-flex-array[=n] and use it in __builtin_object_size
On Wed, Jul 6, 2022 at 4:20 PM Qing Zhao wrote: > > (Sorry for the late reply, just came back from a short vacation.) > > > On Jul 4, 2022, at 2:49 AM, Richard Biener > > wrote: > > > > On Fri, Jul 1, 2022 at 5:32 PM Martin Sebor wrote: > >> > >> On 7/1/22 08:01, Qing Zhao wrote: > >>> > >>> > On Jul 1, 2022, at 8:59 AM, Jakub Jelinek wrote: > > On Fri, Jul 01, 2022 at 12:55:08PM +, Qing Zhao wrote: > > If so, comparing to the current implemenation to have all the checking > > in middle-end, what’s the > > major benefit of moving part of the checking into FE, and leaving the > > other part in middle-end? > > The point is recording early what FIELD_DECLs could be vs. can't > possibly be > treated like flexible array members and just use that flag in the > decisions > in the current routines in addition to what it is doing. > >>> > >>> Okay. > >>> > >>> Based on the discussion so far, I will do the following: > >>> > >>> 1. Add a new flag “DECL_NOT_FLEXARRAY” to FIELD_DECL; > >>> 2. In C/C++ FE, set the new flag “DECL_NOT_FLEXARRAY” for a FIELD_DECL > >>> based on [0], [1], > >>> [] and the option -fstrict-flex-array, and whether it’s the last > >>> field of the DECL_CONTEXT. > >>> 3. In Middle end, Add a new utility routine is_flexible_array_member_p, > >>> which bases on > >>> DECL_NOT_FLEXARRAY + array_at_struct_end_p to decide whether the array > >>> reference is a real flexible array member reference. > > > > I would just update all existing users, not introduce another wrapper > > that takes DECL_NOT_FLEXARRAY > > into account additionally. > > Okay. > > > >>> > >>> > >>> Middle end currently is quite mess, array_at_struct_end_p, > >>> component_ref_size, and all the phases that > >>> use these routines need to be updated, + new testing cases for each of > >>> the phases. > >>> > >>> > >>> So, I still plan to separate the patch set into 2 parts: > >>> > >>> Part A:the above 1 + 2 + 3, and use these new utilities in > >>> tree-object-size.cc to resolve PR101836 first. > >>> Then kernel can use __FORTIFY_SOURCE correctly; > >>> > >>> Part B:update all other phases with the new utilities + new testing > >>> cases + resolving regressions. > >>> > >>> Let me know if you have any comment and suggestion. > >> > >> It might be worth considering whether it should be possible to control > >> the "flexible array" property separately for each trailing array member > >> via either a #pragma or an attribute in headers that can't change > >> the struct layout but that need to be usable in programs compiled with > >> stricter -fstrict-flex-array=N settings. > > > > Or an decl attribute. > > Yes, it might be necessary to add a corresponding decl attribute > > strict_flex_array (N) > > Which is attached to a trailing structure array member to provide the user a > finer control when -fstrict-flex-array=N is specified. > > So, I will do the following: > > > *User interface: > > 1. command line option: > -fstrict-flex-array=N (N=0, 1, 2, 3) > 2. decl attribute: > strict_flex_array (N) (N=0, 1, 2, 3) > > > *Implementation: > > 1. Add a new flag “DECL_NOT_FLEXARRAY” to FIELD_DECL; > 2. In C/C++ FE, set the new flag “DECL_NOT_FLEXARRAY” for a FIELD_DECL based > on [0], [1], > [], the option -fstrict-flex-array, the attribute strict_flex_array, > and whether it’s the last field > of the DECL_CONTEXT. > 3. In Middle end, update all users of “array_at_struct_end_p" or > “component_ref_size”, or any place that treats > Trailing array as flexible array member with the new flag > DECL_NOT_FLEXARRAY. > (Still think we need a new consistent utility routine here). > > > I still plan to separate the patch set into 2 parts: > > Part A:the above 1 + 2 + 3, and use these new utilities in > tree-object-size.cc to resolve PR101836 first. >Then kernel can use __FORTIFY_SOURCE correctly. > Part B:update all other phases with the new utilities + new testing cases > + resolving regressions. > > > Let me know any more comment or suggestion. Sounds good. Part 3. is "optimization" and reasonable to do separately, I'm not sure you need 'B' (since we're not supposed to have new utilities), but instead I'd do '3.' as part of 'B', just changing the pieces th resolve PR101836 for part 'A'. Richard. > Thanks a lot. > > Qing > >
[PATCH] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero.
gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_compute_frame_info): Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD or UNITS_PER_FP_REG. --- gcc/config/loongarch/loongarch.cc | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index d72b256df51..5c9a33c14f7 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -917,8 +917,12 @@ loongarch_compute_frame_info (void) frame->frame_pointer_offset = offset; /* Next are the callee-saved FPRs. */ if (frame->fmask) -offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); - frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +{ + offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); + frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +} + else +frame->fp_sp_offset = offset; /* Next are the callee-saved GPRs. */ if (frame->mask) { @@ -931,8 +935,10 @@ loongarch_compute_frame_info (void) frame->save_libcall_adjustment = x_save_size; offset += x_save_size; + frame->gp_sp_offset = offset - UNITS_PER_WORD; } - frame->gp_sp_offset = offset - UNITS_PER_WORD; + else +frame->gp_sp_offset = offset; /* The hard frame pointer points above the callee-saved GPRs. */ frame->hard_frame_pointer_offset = offset; /* Above the hard frame pointer is the callee-allocated varags save area. */ -- 2.31.1
[PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]
Hi, This patch modifies the combine pattern after recog fails. With a helper - change_pseudo_and_mask, it converts a single pseudo to the pseudo AND with a mask when the outer operator is IOR/XOR/PLUS and inner operator is ASHIFT or AND. The conversion helps pattern to match rotate and mask insn on some targets. For test case rlwimi-2.c, current trunk fails on "scan-assembler-times (?n)^\\s+[a-z]". It reports 21305 times. So my patch reduces the total number of insns from 21305 to 21279. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is this okay for trunk? Any recommendations? Thanks a lot. ChangeLog 2022-07-07 Haochen Gui gcc/ PR target/93453 * combine.cc (change_pseudo_and_mask): New. (recog_for_combine): If recog fails, try again with the pattern modified by change_pseudo_and_mask. * config/rs6000/rs6000.md (plus_ior_xor): Removed. (anonymous split pattern for plus_ior_xor): Removed. gcc/testsuite/ PR target/93453 * gcc.target/powerpc/20050603-3.c: Modify dump check conditions. * gcc.target/powerpc/rlwimi-2.c: Likewise. * gcc.target/powerpc/pr93453-2.c: New. patch.diff diff --git a/gcc/combine.cc b/gcc/combine.cc index a5fabf397f7..3cd7b2b652b 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -11599,6 +11599,47 @@ change_zero_ext (rtx pat) return changed; } +/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is + ASHIFT/AND, convert a pseudo to psuedo AND with a mask if its nonzero_bits + is less than its mode mask. The nonzero_bits in other pass doesn't return + the same value as it does in combine pass. */ +static bool +change_pseudo_and_mask (rtx pat) +{ + rtx src = SET_SRC (pat); + if ((GET_CODE (src) == IOR + || GET_CODE (src) == XOR + || GET_CODE (src) == PLUS) + && (((GET_CODE (XEXP (src, 0)) == ASHIFT + || GET_CODE (XEXP (src, 0)) == AND) + && REG_P (XEXP (src, 1) +{ + rtx *reg = &XEXP (SET_SRC (pat), 1); + machine_mode mode = GET_MODE (*reg); + unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode); + if (nonzero < GET_MODE_MASK (mode)) + { + int shift; + + if (GET_CODE (XEXP (src, 0)) == ASHIFT) + shift = INTVAL (XEXP (XEXP (src, 0), 1)); + else + shift = ctz_hwi (INTVAL (XEXP (XEXP (src, 0), 1))); + + if (shift > 0 + && ((HOST_WIDE_INT_1U << shift) - 1) >= nonzero) + { + unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << shift) - 1; + rtx x = gen_rtx_AND (mode, *reg, GEN_INT (mask)); + SUBST (*reg, x); + maybe_swap_commutative_operands (SET_SRC (pat)); + return true; + } + } + } + return false; +} + /* Like recog, but we receive the address of a pointer to a new pattern. We try to match the rtx that the pointer points to. If that fails, we may try to modify or replace the pattern, @@ -11646,7 +11687,10 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) } } else - changed = change_zero_ext (pat); + { + changed = change_pseudo_and_mask (pat); + changed |= change_zero_ext (pat); + } } else if (GET_CODE (pat) == PARALLEL) { diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 1367a2cb779..2bd6bd5f908 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -4207,24 +4207,6 @@ (define_insn_and_split "*rotl3_insert_3_" (ior:GPR (and:GPR (match_dup 3) (match_dup 4)) (ashift:GPR (match_dup 1) (match_dup 2]) -(define_code_iterator plus_ior_xor [plus ior xor]) - -(define_split - [(set (match_operand:GPR 0 "gpc_reg_operand") - (plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand") - (match_operand:SI 2 "const_int_operand")) - (match_operand:GPR 3 "gpc_reg_operand")))] - "nonzero_bits (operands[3], mode) - < HOST_WIDE_INT_1U << INTVAL (operands[2])" - [(set (match_dup 0) - (ior:GPR (and:GPR (match_dup 3) - (match_dup 4)) -(ashift:GPR (match_dup 1) -(match_dup 2] -{ - operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1); -}) - (define_insn "*rotlsi3_insert_4" [(set (match_operand:SI 0 "gpc_reg_operand" "=r") (ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0") diff --git a/gcc/testsuite/gcc.target/powerpc/20050603-3.c b/gcc/testsuite/gcc.target/powerpc/20050603-3.c index 4017d34f429..e628be11532 100644 --- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c +++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c @@ -12,7 +12,7 @@ void rotins (unsigned int x) b.y = (x<<12) | (x>>20); } -/* { dg-final { scan-assembler-not {\mrlwinm} } } */ +/* { dg-final { scan-ass
Re: [PATCH] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero.
Hi, On 2022/7/7 16:04, Lulu Cheng wrote: gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_compute_frame_info): Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD or UNITS_PER_FP_REG. IMO it's better to also state which problem this change is meant to solve (i.e. your intent), better yet, with an appropriate bugzilla link. --- gcc/config/loongarch/loongarch.cc | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index d72b256df51..5c9a33c14f7 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -917,8 +917,12 @@ loongarch_compute_frame_info (void) frame->frame_pointer_offset = offset; /* Next are the callee-saved FPRs. */ if (frame->fmask) -offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); - frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +{ + offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); + frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +} + else +frame->fp_sp_offset = offset; /* Next are the callee-saved GPRs. */ if (frame->mask) { @@ -931,8 +935,10 @@ loongarch_compute_frame_info (void) frame->save_libcall_adjustment = x_save_size; offset += x_save_size; + frame->gp_sp_offset = offset - UNITS_PER_WORD; } - frame->gp_sp_offset = offset - UNITS_PER_WORD; + else +frame->gp_sp_offset = offset; /* The hard frame pointer points above the callee-saved GPRs. */ frame->hard_frame_pointer_offset = offset; /* Above the hard frame pointer is the callee-allocated varags save area. */
Adjust 'libgomp.c-c++-common/requires-3.c' (was: [Patch][v4] OpenMP: Move omp requires checks to libgomp)
Hi! In preparation for other changes: On 2022-06-29T16:33:02+0200, Tobias Burnus wrote: > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-3-aux.c > @@ -0,0 +1,11 @@ > +/* { dg-skip-if "" { *-*-* } } */ > + > +#pragma omp requires unified_address > + > +int x; > + > +void foo (void) > +{ > + #pragma omp target > + x = 1; > +} > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-3.c > @@ -0,0 +1,24 @@ > +/* { dg-do link { target offloading_enabled } } */ Not expected to see 'offloading_enabled' here... > +/* { dg-additional-sources requires-3-aux.c } */ > + > +/* Check diagnostic by device-compiler's lto1. ..., because of this note ^. > + Other file uses: 'requires unified_address'. */ > + > +#pragma omp requires unified_address,unified_shared_memory > + > +int a[10]; > +extern void foo (void); > + > +int > +main (void) > +{ > + #pragma omp target > + for (int i = 0; i < 10; i++) > +a[i] = 0; > + > + foo (); > + return 0; > +} > + > +/* { dg-error "OpenMP 'requires' directive with non-identical clauses in > multiple compilation units: 'unified_address, unified_shared_memory' vs. > 'unified_address'" "" { target *-*-* } 0 } */ > +/* { dg-excess-errors "Ignore messages like: errors during merging of > translation units|mkoffload returned 1 exit status" } */ OK to push the attached "Adjust 'libgomp.c-c++-common/requires-3.c'"? Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 >From 6a4031b351680bdbfe3cdb9ac4e4a3aa59e4ca84 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Thu, 7 Jul 2022 09:59:45 +0200 Subject: [PATCH] Adjust 'libgomp.c-c++-common/requires-3.c' As documented, this one does "Check diagnostic by device-compiler's lto1". Indeed there are none when compiling with '-foffload=disable' with an offloading-enabled compiler, so we should use 'offload_target_[...]', as used in other similar test cases. Follow-up to recent commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0 "OpenMP: Move omp requires checks to libgomp". libgomp/ * testsuite/libgomp.c-c++-common/requires-3.c: Adjust. --- libgomp/testsuite/libgomp.c-c++-common/requires-3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-3.c b/libgomp/testsuite/libgomp.c-c++-common/requires-3.c index 4b07ffdd09b..7091f400ef0 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/requires-3.c +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-3.c @@ -1,4 +1,4 @@ -/* { dg-do link { target offloading_enabled } } */ +/* { dg-do link { target { offload_target_nvptx || offload_target_amdgcn } } } */ /* { dg-additional-sources requires-3-aux.c } */ /* Check diagnostic by device-compiler's lto1. -- 2.35.1
Enhance 'libgomp.c-c++-common/requires-4.c', 'libgomp.c-c++-common/requires-5.c' testing (was: [Patch][v4] OpenMP: Move omp requires checks to libgomp)
Hi! In preparation for other changes: On 2022-06-29T16:33:02+0200, Tobias Burnus wrote: > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4-aux.c > @@ -0,0 +1,13 @@ > +/* { dg-skip-if "" { *-*-* } } */ > + > +#pragma omp requires reverse_offload > + > +/* Note: The file does not have neither of: > + declare target directives, device constructs or device routines. */ > + > +int x; > + > +void foo (void) > +{ > + x = 1; > +} > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c > @@ -0,0 +1,23 @@ > +/* { dg-do link { target offloading_enabled } } */ > +/* { dg-additional-options "-flto" } */ > +/* { dg-additional-sources requires-4-aux.c } */ > + > +/* Check diagnostic by device-compiler's or host compiler's lto1. > + Other file uses: 'requires reverse_offload', but that's inactive as > + there are no declare target directives, device constructs nor device > routines */ > + > +#pragma omp requires unified_address,unified_shared_memory > + > +int a[10]; > +extern void foo (void); > + > +int > +main (void) > +{ > + #pragma omp target > + for (int i = 0; i < 10; i++) > +a[i] = 0; > + > + foo (); > + return 0; > +} > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5-aux.c > @@ -0,0 +1,11 @@ > +/* { dg-skip-if "" { *-*-* } } */ > + > +#pragma omp requires unified_shared_memory, unified_address, reverse_offload > + > +int x; > + > +void foo (void) > +{ > + #pragma omp target > + x = 1; > +} > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c > @@ -0,0 +1,20 @@ > +/* { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } > } */ > +/* { dg-additional-sources requires-5-aux.c } */ > + > +#pragma omp requires unified_shared_memory, unified_address, reverse_offload > + > +int a[10]; > +extern void foo (void); > + > +int > +main (void) > +{ > + #pragma omp target > + for (int i = 0; i < 10; i++) > +a[i] = 0; > + > + foo (); > + return 0; > +} > + > +/* { dg-output "devices present but 'omp requires unified_address, > unified_shared_memory, reverse_offload' cannot be fulfilled" } */ (The latter diagnostic later got conditionalized by 'GOMP_DEBUG=1'.) OK to push the attached "Enhance 'libgomp.c-c++-common/requires-4.c', 'libgomp.c-c++-common/requires-5.c' testing"? Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 >From ae14ccbd050d0b49073d5ea09de3e2af63f8c674 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Thu, 7 Jul 2022 09:45:42 +0200 Subject: [PATCH] Enhance 'libgomp.c-c++-common/requires-4.c', 'libgomp.c-c++-common/requires-5.c' testing These should compile and link and execute in all configurations; host-fallback execution, which we may actually verify. Follow-up to recent commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0 "OpenMP: Move omp requires checks to libgomp". libgomp/ * testsuite/libgomp.c-c++-common/requires-4.c: Enhance testing. * testsuite/libgomp.c-c++-common/requires-5.c: Likewise. --- .../libgomp.c-c++-common/requires-4.c | 17 - .../libgomp.c-c++-common/requires-5.c | 18 +++--- 2 files changed, 23 insertions(+), 12 deletions(-) diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c index 128fdbb8463..deb04368108 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c @@ -1,22 +1,29 @@ -/* { dg-do link { target offloading_enabled } } */ /* { dg-additional-options "-flto" } */ /* { dg-additional-sources requires-4-aux.c } */ -/* Check diagnostic by device-compiler's or host compiler's lto1. +/* Check no diagnostic by device-compiler's or host compiler's lto1. Other file uses: 'requires reverse_offload', but that's inactive as there are no declare target directives, device constructs nor device routines */ +/* For actual offload execution, prints the following (only) if GOMP_DEBUG=1: + "devices present but 'omp requires unified_address, unified_shared_memory, reverse_offload' cannot be fulfilled" + and does host-fallback execution. */ + #pragma omp requires unified_address,unified_shared_memory -int a[10]; +int a[10] = { 0 }; extern void foo (void); int main (void) { - #pragma omp target + #pragma omp target map(to: a) + for (int i = 0; i < 10; i++) +a[i] = i; + for (int i = 0; i < 10; i++) -a[i] = 0; +if (a[i] != i) + __builtin_abort (); foo (); return 0; diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c index c1e5540cfc5..68816314b94 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c +++ b/libgomp/testsu
[PATCH] Speed up LC SSA rewrite more
In many cases loops have only one exit or a variable is only live across one of the exits. In this case we know that all uses outside of the loop will be dominated by the single LC PHI node we insert. If that holds for all variables requiring LC SSA PHIs then we can simplify the update_ssa process, avoiding the (iterated) dominance frontier computations. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * tree-ssa-loop-manip.cc (add_exit_phis_var): Return the number of LC PHIs inserted. (add_exit_phis): Return whether any variable required multiple LC PHI nodes. (rewrite_into_loop_closed_ssa_1): Use TODO_update_ssa_no_phi when possible. --- gcc/tree-ssa-loop-manip.cc | 30 +- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc index 0324ff60a0f..c531f1f12fd 100644 --- a/gcc/tree-ssa-loop-manip.cc +++ b/gcc/tree-ssa-loop-manip.cc @@ -314,9 +314,10 @@ add_exit_phi (basic_block exit, tree var) } /* Add exit phis for VAR that is used in LIVEIN. - Exits of the loops are stored in LOOP_EXITS. */ + Exits of the loops are stored in LOOP_EXITS. Returns the number + of PHIs added for VAR. */ -static void +static unsigned add_exit_phis_var (tree var, bitmap use_blocks, bitmap def_loop_exits) { unsigned index; @@ -328,10 +329,13 @@ add_exit_phis_var (tree var, bitmap use_blocks, bitmap def_loop_exits) auto_bitmap live_exits (&loop_renamer_obstack); compute_live_loop_exits (live_exits, use_blocks, def_bb, def_loop_exits); + unsigned cnt = 0; EXECUTE_IF_SET_IN_BITMAP (live_exits, 0, index, bi) { add_exit_phi (BASIC_BLOCK_FOR_FN (cfun, index), var); + cnt++; } + return cnt; } static int @@ -348,13 +352,15 @@ loop_name_cmp (const void *p1, const void *p2) /* Add exit phis for the names marked in NAMES_TO_RENAME. Exits of the loops are stored in EXITS. Sets of blocks where the ssa - names are used are stored in USE_BLOCKS. */ + names are used are stored in USE_BLOCKS. Returns whether any name + required multiple LC PHI nodes. */ -static void +static bool add_exit_phis (bitmap names_to_rename, bitmap *use_blocks) { unsigned i; bitmap_iterator bi; + bool multiple_p = false; /* Sort names_to_rename after definition loop so we can avoid re-computing def_loop_exits. */ @@ -381,9 +387,12 @@ add_exit_phis (bitmap names_to_rename, bitmap *use_blocks) for (auto exit = loop->exits->next; exit->e; exit = exit->next) bitmap_set_bit (def_loop_exits, exit->e->dest->index); } - add_exit_phis_var (ssa_name (p.second), use_blocks[p.second], -def_loop_exits); + if (add_exit_phis_var (ssa_name (p.second), use_blocks[p.second], +def_loop_exits) > 1) + multiple_p = true; } + + return multiple_p; } /* For USE in BB, if it is used outside of the loop it is defined in, @@ -588,15 +597,18 @@ rewrite_into_loop_closed_ssa_1 (bitmap changed_bbs, unsigned update_flag, } /* Add the PHI nodes on exits of the loops for the names we need to -rewrite. */ - add_exit_phis (names_to_rename, use_blocks); +rewrite. When no variable required multiple LC PHI nodes to be +inserted then we know that all uses outside of the loop are +dominated by the single LC SSA definition and no further PHI +node insertions are required. */ + bool need_phis_p = add_exit_phis (names_to_rename, use_blocks); if (release_recorded_exits_p) release_recorded_exits (cfun); /* Fix up all the names found to be used outside their original loops. */ - update_ssa (TODO_update_ssa); + update_ssa (need_phis_p ? TODO_update_ssa : TODO_update_ssa_no_phi); } bitmap_obstack_release (&loop_renamer_obstack); -- 2.35.3
[PATCH] target/106219 - proprly mark builtins pure via ix86_add_new_builtins
The target optimize pragma path to initialize extra target specific builtins missed handling of the pure_p flag which in turn causes extra clobber side-effects of gather builtins leading to unexpected issues downhill. Bootstrap and regtest running on x86_64-unknown-linux-gnu, will push as obvious if that succeeds. * config/i386/i386-builtins.cc (ix86_add_new_builtins): Properly set DECL_PURE_P. * g++.dg/pr106219.C: New testcase. --- gcc/config/i386/i386-builtins.cc | 2 ++ gcc/testsuite/g++.dg/pr106219.C | 31 +++ 2 files changed, 33 insertions(+) create mode 100644 gcc/testsuite/g++.dg/pr106219.C diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc index 96743e6122d..fe7243c3837 100644 --- a/gcc/config/i386/i386-builtins.cc +++ b/gcc/config/i386/i386-builtins.cc @@ -385,6 +385,8 @@ ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT isa2) ix86_builtins[i] = decl; if (ix86_builtins_isa[i].const_p) TREE_READONLY (decl) = 1; + if (ix86_builtins_isa[i].pure_p) + DECL_PURE_P (decl) = 1; } } diff --git a/gcc/testsuite/g++.dg/pr106219.C b/gcc/testsuite/g++.dg/pr106219.C new file mode 100644 index 000..3cad1507d5f --- /dev/null +++ b/gcc/testsuite/g++.dg/pr106219.C @@ -0,0 +1,31 @@ +// { dg-do compile } +// { dg-options "-O3" } +// { dg-additional-options "-march=bdver2" { target x86_64-*-* } } + +int max(int __b) { + if (0 < __b) +return __b; + return 0; +} +struct Plane { + Plane(int, int); + int *Row(); +}; +#ifdef __x86_64__ +#pragma GCC target "sse2,ssse3,avx,avx2" +#endif +float *ConvolveXSampleAndTranspose_rowp; +int ConvolveXSampleAndTranspose_res, ConvolveXSampleAndTranspose_r; +void ConvolveXSampleAndTranspose() { + Plane out(0, ConvolveXSampleAndTranspose_res); + for (int y;;) { +float sum; +for (int i = ConvolveXSampleAndTranspose_r; i; ++i) + sum += i; +for (; ConvolveXSampleAndTranspose_r; ++ConvolveXSampleAndTranspose_r) + sum += + ConvolveXSampleAndTranspose_rowp[max(ConvolveXSampleAndTranspose_r)] * + ConvolveXSampleAndTranspose_r; +out.Row()[y] = sum; + } +} -- 2.35.3
[PATCH 0/0] RISC-V: Support IEEE half precision operation
This patch set implement _Float16 both for softfloat and hardfloat (zfh/zfhmin), _Float16 has introduced into RISC-V psABI[1] since Jul 2021 and zfh/zfhmin extension has ratified since 2022[2]. [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/172 [2] https://github.com/riscv/riscv-isa-manual/commit/b35a54079e0da11740ce5b1e6db999d1d5172768
[PATCH 1/2] RISC-V: Support _Float16 type.
RISC-V decide use _Float16 as primary IEEE half precision type, and this already become part of psABI, this patch has added folloing support for _Float16: - Soft-float support for _Float16. - Make sure _Float16 available on C++ mode. - Name mangling for _Float16 on C++ mode. gcc/ChangeLog * config/riscv/riscv-builtins.cc: include stringpool.h (riscv_float16_type_node): New. (riscv_init_builtin_types): Ditto. (riscv_init_builtins): Call riscv_init_builtin_types. * config/riscv/riscv-modes.def (HF): New. * gcc/config/riscv/riscv.cc (riscv_output_move): Handle HFmode. (riscv_mangle_type): New. (riscv_scalar_mode_supported_p): Ditto. (riscv_libgcc_floating_mode_supported_p): Ditto. (riscv_excess_precision): Ditto. (riscv_floatn_mode): Ditto. (riscv_init_libfuncs): Ditto. (TARGET_MANGLE_TYPE): Ditto. (TARGET_SCALAR_MODE_SUPPORTED_P): Ditto. (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Ditto. (TARGET_INIT_LIBFUNCS): Ditto. (TARGET_C_EXCESS_PRECISION): Ditto. (TARGET_FLOATN_MODE): Ditto. * gcc/config/riscv/riscv.md (mode): Add HF. (softload): Add HF. (softstore): Ditto. (fmt): Ditto. (UNITMODE): Ditto. (movhf): New. (*movhf_softfloat): New. libgcc/ChangeLog: * config/riscv/sfp-machine.h (_FP_NANFRAC_H): New. (_FP_NANFRAC_H): Ditto. (_FP_NANSIGN_H): Ditto. * config/riscv/t-softfp32 (softfp_extensions): Add HF related routines. (softfp_truncations): Ditto. (softfp_extras): Ditto. * config/riscv/t-softfp64 (softfp_extras): Add HF related routines. gcc/testsuite/ChangeLog: * gcc/testsuite/g++.target/riscv/_Float16.C: New. * gcc/testsuite/gcc.target/riscv/_Float16-soft-1.c: Ditto. * gcc/testsuite/gcc.target/riscv/_Float16-soft-2.c: Ditto. * gcc/testsuite/gcc.target/riscv/_Float16-soft-3.c: Ditto. * gcc/testsuite/gcc.target/riscv/_Float16-soft-4.c: Ditto. * gcc/testsuite/gcc.target/riscv/_Float16.c: Ditto. --- gcc/config/riscv/riscv-builtins.cc| 24 +++ gcc/config/riscv/riscv-modes.def | 1 + gcc/config/riscv/riscv.cc | 171 -- gcc/config/riscv/riscv.md | 30 ++- gcc/testsuite/g++.target/riscv/_Float16.C | 18 ++ .../gcc.target/riscv/_Float16-soft-1.c| 9 + .../gcc.target/riscv/_Float16-soft-2.c| 13 ++ .../gcc.target/riscv/_Float16-soft-3.c| 12 ++ .../gcc.target/riscv/_Float16-soft-4.c| 12 ++ gcc/testsuite/gcc.target/riscv/_Float16.c | 19 ++ libgcc/config/riscv/sfp-machine.h | 3 + libgcc/config/riscv/t-softfp32| 5 + libgcc/config/riscv/t-softfp64| 1 + 13 files changed, 300 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/g++.target/riscv/_Float16.C create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16.c diff --git a/gcc/config/riscv/riscv-builtins.cc b/gcc/config/riscv/riscv-builtins.cc index 1218fdfc67d..3009311604d 100644 --- a/gcc/config/riscv/riscv-builtins.cc +++ b/gcc/config/riscv/riscv-builtins.cc @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "recog.h" #include "diagnostic-core.h" #include "stor-layout.h" +#include "stringpool.h" #include "expr.h" #include "langhooks.h" @@ -160,6 +161,8 @@ static GTY(()) int riscv_builtin_decl_index[NUM_INSN_CODES]; #define GET_BUILTIN_DECL(CODE) \ riscv_builtin_decls[riscv_builtin_decl_index[(CODE)]] +tree riscv_float16_type_node = NULL_TREE; + /* Return the function type associated with function prototype TYPE. */ static tree @@ -185,11 +188,32 @@ riscv_build_function_type (enum riscv_function_type type) return types[(int) type]; } +static void +riscv_init_builtin_types (void) +{ + /* Provide the _Float16 type and float16_type_node if needed. */ + if (!float16_type_node) +{ + riscv_float16_type_node = make_node (REAL_TYPE); + TYPE_PRECISION (riscv_float16_type_node) = 16; + SET_TYPE_MODE (riscv_float16_type_node, HFmode); + layout_type (riscv_float16_type_node); +} + else +riscv_float16_type_node = float16_type_node; + + if (!maybe_get_identifier ("_Float16")) +lang_hooks.types.register_builtin_type (riscv_float16_type_node, + "_Float16"); +} + /* Implement TARGET_INIT_BUILTINS. */ void riscv_init_builtins (void) { + riscv_init_builtin_types (); + for (size_t i = 0; i < ARRAY_SIZE (riscv_builtins); i++) { const struct riscv_bu
[PATCH 2/2] RISC-V: Support zfh and zfhmin extension
Zfh and Zfhmin are extensions for IEEE half precision, both are ratified in Jan. 2022[1]: - Zfh has full set of operation like F or D for single or double precision. - Zfhmin has only provide minimal support for half precision operation, like conversion, load, store and move instructions. [1] https://github.com/riscv/riscv-isa-manual/commit/b35a54079e0da11740ce5b1e6db999d1d5172768 gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_implied_info): Add zfh and zfhmin. (riscv_ext_version_table): Ditto. (riscv_ext_flag_table): Ditto. * config/riscv/riscv-opts.h (MASK_ZFHMIN): New. (MASK_ZFH): Ditto. (TARGET_ZFHMIN): Ditto. (TARGET_ZFH): Ditto. * config/riscv/riscv.cc (riscv_output_move): Handle HFmode move for zfh and zfhmin. (riscv_emit_float_compare): Handle HFmode. * config/riscv/riscv.md (ANYF): Add HF. (SOFTF): Add HF. (load): Ditto. (store): Ditto. (truncsfhf2): New. (truncdfhf2): Ditto. (extendhfsf2): Ditto. (extendhfdf2): Ditto. (*movhf_hardfloat): Ditto. (*movhf_softfloat): Make sure not ZFHMIN. * config/riscv/riscv.opt (riscv_zf_subext): New. gcc/testsuite/ChangeLog: * gcc.target/riscv/_Float16-zfh-1.c: New. * gcc.target/riscv/_Float16-zfh-2.c: Ditto. * gcc.target/riscv/_Float16-zfh-3.c: Ditto. * gcc.target/riscv/_Float16-zfhmin-1.c: Ditto. * gcc.target/riscv/_Float16-zfhmin-2.c: Ditto. * gcc.target/riscv/_Float16-zfhmin-3.c: Ditto. * gcc.target/riscv/arch-16.c: Ditto. * gcc.target/riscv/arch-17.c: Ditto. * gcc.target/riscv/predef-21.c: Ditto. * gcc.target/riscv/predef-22.c: Ditto. --- gcc/common/config/riscv/riscv-common.cc | 8 +++ gcc/config/riscv/riscv-opts.h | 6 ++ gcc/config/riscv/riscv.cc | 34 ++- gcc/config/riscv/riscv.md | 59 +-- gcc/config/riscv/riscv.opt| 3 + .../gcc.target/riscv/_Float16-zfh-1.c | 8 +++ .../gcc.target/riscv/_Float16-zfh-2.c | 8 +++ .../gcc.target/riscv/_Float16-zfh-3.c | 8 +++ .../gcc.target/riscv/_Float16-zfhmin-1.c | 9 +++ .../gcc.target/riscv/_Float16-zfhmin-2.c | 9 +++ .../gcc.target/riscv/_Float16-zfhmin-3.c | 9 +++ gcc/testsuite/gcc.target/riscv/arch-16.c | 5 ++ gcc/testsuite/gcc.target/riscv/arch-17.c | 5 ++ gcc/testsuite/gcc.target/riscv/predef-21.c| 59 +++ gcc/testsuite/gcc.target/riscv/predef-22.c| 59 +++ 15 files changed, 280 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/arch-16.c create mode 100644 gcc/testsuite/gcc.target/riscv/arch-17.c create mode 100644 gcc/testsuite/gcc.target/riscv/predef-21.c create mode 100644 gcc/testsuite/gcc.target/riscv/predef-22.c diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index 0e5be2ce105..4ee1b3198c5 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -96,6 +96,9 @@ static const riscv_implied_info_t riscv_implied_info[] = {"zvl32768b", "zvl16384b"}, {"zvl65536b", "zvl32768b"}, + {"zfh", "zfhmin"}, + {"zfhmin", "f"}, + {NULL, NULL} }; @@ -193,6 +196,9 @@ static const struct riscv_ext_version riscv_ext_version_table[] = {"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0}, {"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0}, + {"zfh", ISA_SPEC_CLASS_NONE, 1, 0}, + {"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0}, + /* Terminate the list. */ {NULL, ISA_SPEC_CLASS_NONE, 0, 0} }; @@ -1148,6 +1154,8 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] = {"zvl32768b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL32768B}, {"zvl65536b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL65536B}, + {"zfhmin",&gcc_options::x_riscv_zf_subext, MASK_ZFHMIN}, + {"zfh", &gcc_options::x_riscv_zf_subext, MASK_ZFH}, {NULL, NULL, 0} }; diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h index 1e153b3a6e7..85e869e62e3 100644 --- a/gcc/config/riscv/riscv-opts.h +++ b/gcc/config/riscv/riscv-opts.h @@ -153,6 +153,12 @@ enum stack_protector_guard { #define TARGET_ZICBOM ((riscv_zicmo_subext & MASK_ZICBOM) != 0) #define TARGET_ZICBOP ((riscv_zicmo_subext & MASK_ZICBOP) != 0) +#define MASK_ZFHMIN (1 << 0) +#define MASK_ZFH (1 << 1) + +#define TARGET_ZFHMIN
[PATCH] rs6000: Preserve REG_EH_REGION when replacing load/store [PR106091]
Hi, As test case in PR106091 shows, rs6000 specific pass swaps doesn't preserve the reg_note REG_EH_REGION when replacing some load insn at the end of basic block, it causes the flow info verification to fail unexpectedly. Since memory reference rtx may trap, this patch is to ensure we copy REG_EH_REGION reg_note while replacing swapped aligned load or store. Bootstrapped and regtested on powerpc64-linux-gnu P7 & P8, and powerpc64le-linux-gnu P9 & P10. Richi, could you help to review this patch from a point view of non-call-exceptions expert? I'm going to install it if it looks good to you. Thanks! - PR target/106091 gcc/ChangeLog: * config/rs6000/rs6000-p8swap.cc (replace_swapped_aligned_store): Copy REG_EH_REGION when replacing one store insn having it. (replace_swapped_aligned_load): Likewise. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr106091.c: New test. --- gcc/config/rs6000/rs6000-p8swap.cc | 20 ++-- gcc/testsuite/gcc.target/powerpc/pr106091.c | 15 +++ 2 files changed, 33 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106091.c diff --git a/gcc/config/rs6000/rs6000-p8swap.cc b/gcc/config/rs6000/rs6000-p8swap.cc index 275702fee1b..19fbbfb67dc 100644 --- a/gcc/config/rs6000/rs6000-p8swap.cc +++ b/gcc/config/rs6000/rs6000-p8swap.cc @@ -1690,7 +1690,15 @@ replace_swapped_aligned_store (swap_web_entry *insn_entry, gcc_assert ((GET_CODE (new_body) == SET) && MEM_P (SET_DEST (new_body))); - set_block_for_insn (new_insn, BLOCK_FOR_INSN (store_insn)); + basic_block bb = BLOCK_FOR_INSN (store_insn); + set_block_for_insn (new_insn, bb); + /* Handle REG_EH_REGION note. */ + if (cfun->can_throw_non_call_exceptions && BB_END (bb) == store_insn) +{ + rtx note = find_reg_note (store_insn, REG_EH_REGION, NULL_RTX); + if (note) + add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0)); +} df_insn_rescan (new_insn); df_insn_delete (store_insn); @@ -1784,7 +1792,15 @@ replace_swapped_aligned_load (swap_web_entry *insn_entry, rtx swap_insn) gcc_assert ((GET_CODE (new_body) == SET) && MEM_P (SET_SRC (new_body))); - set_block_for_insn (new_insn, BLOCK_FOR_INSN (def_insn)); + basic_block bb = BLOCK_FOR_INSN (def_insn); + set_block_for_insn (new_insn, bb); + /* Handle REG_EH_REGION note. */ + if (cfun->can_throw_non_call_exceptions && BB_END (bb) == def_insn) +{ + rtx note = find_reg_note (def_insn, REG_EH_REGION, NULL_RTX); + if (note) + add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0)); +} df_insn_rescan (new_insn); df_insn_delete (def_insn); diff --git a/gcc/testsuite/gcc.target/powerpc/pr106091.c b/gcc/testsuite/gcc.target/powerpc/pr106091.c new file mode 100644 index 000..61ce8cf4733 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr106091.c @@ -0,0 +1,15 @@ +/* { dg-options "-O -fnon-call-exceptions -fno-tree-dce -fno-tree-forwprop -w" } */ + +/* Verify there is no ICE. */ + +typedef short __attribute__ ((__vector_size__ (64))) V; +V v, w; + +inline V foo (V a, V b); + +V +foo (V a, V b) +{ + b &= v < b; + return (V){foo (b, w)[3], (V){}[3]}; +} -- 2.25.1
Re: [PATCH] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero.
On Thu, 2022-07-07 at 16:31 +0800, WANG Xuerui wrote: > IMO it's better to also state which problem this change is meant to > solve (i.e. your intent), better yet, with an appropriate bugzilla > link. And/or add a testcase (which FAILs without this change) into gcc/testsuite/gcc.target/loongarch. > > --- > > gcc/config/loongarch/loongarch.cc | 12 +--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/gcc/config/loongarch/loongarch.cc > > b/gcc/config/loongarch/loongarch.cc > > index d72b256df51..5c9a33c14f7 100644 > > --- a/gcc/config/loongarch/loongarch.cc > > +++ b/gcc/config/loongarch/loongarch.cc > > @@ -917,8 +917,12 @@ loongarch_compute_frame_info (void) > > frame->frame_pointer_offset = offset; > > /* Next are the callee-saved FPRs. */ > > if (frame->fmask) > > - offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); > > - frame->fp_sp_offset = offset - UNITS_PER_FP_REG; > > + { > > + offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); > > + frame->fp_sp_offset = offset - UNITS_PER_FP_REG; > > + } > > + else > > + frame->fp_sp_offset = offset; > > /* Next are the callee-saved GPRs. */ > > if (frame->mask) > > { > > @@ -931,8 +935,10 @@ loongarch_compute_frame_info (void) > > frame->save_libcall_adjustment = x_save_size; > > > > offset += x_save_size; > > + frame->gp_sp_offset = offset - UNITS_PER_WORD; > > } > > - frame->gp_sp_offset = offset - UNITS_PER_WORD; > > + else > > + frame->gp_sp_offset = offset; > > /* The hard frame pointer points above the callee-saved GPRs. > > */ > > frame->hard_frame_pointer_offset = offset; > > /* Above the hard frame pointer is the callee-allocated varags > > save area. */ -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: Adjust 'libgomp.c-c++-common/requires-3.c' (was: [Patch][v4] OpenMP: Move omp requires checks to libgomp)
Hi Thomas, hello all, On 07.07.22 10:37, Thomas Schwinge wrote: In preparation for other changes: On 2022-06-29T16:33:02+0200, Tobias Burnus wrote: +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-3.c @@ -0,0 +1,24 @@ +/* { dg-do link { target offloading_enabled } } */ Not expected to see 'offloading_enabled' here... +/* { dg-additional-sources requires-3-aux.c } */ + +/* Check diagnostic by device-compiler's lto1. ..., because of this note ^. ... Subject: [PATCH] Adjust 'libgomp.c-c++-common/requires-3.c' ... libgomp/ * testsuite/libgomp.c-c++-common/requires-3.c: Adjust. ... index 4b07ffdd09b..7091f400ef0 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/requires-3.c +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-3.c @@ -1,4 +1,4 @@ -/* { dg-do link { target offloading_enabled } } */ +/* { dg-do link { target { offload_target_nvptx || offload_target_amdgcn } } } */ LGTM. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH] rs6000: Preserve REG_EH_REGION when replacing load/store [PR106091]
On Thu, Jul 7, 2022 at 10:55 AM Kewen.Lin wrote: > > Hi, > > As test case in PR106091 shows, rs6000 specific pass swaps > doesn't preserve the reg_note REG_EH_REGION when replacing > some load insn at the end of basic block, it causes the > flow info verification to fail unexpectedly. Since memory > reference rtx may trap, this patch is to ensure we copy > REG_EH_REGION reg_note while replacing swapped aligned load > or store. > > Bootstrapped and regtested on powerpc64-linux-gnu P7 & P8, > and powerpc64le-linux-gnu P9 & P10. > > Richi, could you help to review this patch from a point view > of non-call-exceptions expert? I think it looks OK but I do wonder if in RTL there's a better way to transfer EH info from one stmt to another when you are replacing it? On gimple gsi_replace would do, but I can't immediately find a proper RTL replacement for your emit_insn_before (..., X); remove_insn (X); (plus DF assorted things). Eric? > > I'm going to install it if it looks good to you. Thanks! > > - > PR target/106091 > > gcc/ChangeLog: > > * config/rs6000/rs6000-p8swap.cc (replace_swapped_aligned_store): Copy > REG_EH_REGION when replacing one store insn having it. > (replace_swapped_aligned_load): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr106091.c: New test. > --- > gcc/config/rs6000/rs6000-p8swap.cc | 20 ++-- > gcc/testsuite/gcc.target/powerpc/pr106091.c | 15 +++ > 2 files changed, 33 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106091.c > > diff --git a/gcc/config/rs6000/rs6000-p8swap.cc > b/gcc/config/rs6000/rs6000-p8swap.cc > index 275702fee1b..19fbbfb67dc 100644 > --- a/gcc/config/rs6000/rs6000-p8swap.cc > +++ b/gcc/config/rs6000/rs6000-p8swap.cc > @@ -1690,7 +1690,15 @@ replace_swapped_aligned_store (swap_web_entry > *insn_entry, >gcc_assert ((GET_CODE (new_body) == SET) > && MEM_P (SET_DEST (new_body))); > > - set_block_for_insn (new_insn, BLOCK_FOR_INSN (store_insn)); > + basic_block bb = BLOCK_FOR_INSN (store_insn); > + set_block_for_insn (new_insn, bb); > + /* Handle REG_EH_REGION note. */ > + if (cfun->can_throw_non_call_exceptions && BB_END (bb) == store_insn) > +{ > + rtx note = find_reg_note (store_insn, REG_EH_REGION, NULL_RTX); > + if (note) > + add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0)); > +} >df_insn_rescan (new_insn); > >df_insn_delete (store_insn); > @@ -1784,7 +1792,15 @@ replace_swapped_aligned_load (swap_web_entry > *insn_entry, rtx swap_insn) >gcc_assert ((GET_CODE (new_body) == SET) > && MEM_P (SET_SRC (new_body))); > > - set_block_for_insn (new_insn, BLOCK_FOR_INSN (def_insn)); > + basic_block bb = BLOCK_FOR_INSN (def_insn); > + set_block_for_insn (new_insn, bb); > + /* Handle REG_EH_REGION note. */ > + if (cfun->can_throw_non_call_exceptions && BB_END (bb) == def_insn) > +{ > + rtx note = find_reg_note (def_insn, REG_EH_REGION, NULL_RTX); > + if (note) > + add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0)); > +} >df_insn_rescan (new_insn); > >df_insn_delete (def_insn); > diff --git a/gcc/testsuite/gcc.target/powerpc/pr106091.c > b/gcc/testsuite/gcc.target/powerpc/pr106091.c > new file mode 100644 > index 000..61ce8cf4733 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr106091.c > @@ -0,0 +1,15 @@ > +/* { dg-options "-O -fnon-call-exceptions -fno-tree-dce -fno-tree-forwprop > -w" } */ > + > +/* Verify there is no ICE. */ > + > +typedef short __attribute__ ((__vector_size__ (64))) V; > +V v, w; > + > +inline V foo (V a, V b); > + > +V > +foo (V a, V b) > +{ > + b &= v < b; > + return (V){foo (b, w)[3], (V){}[3]}; > +} > -- > 2.25.1
Re: [PATCH] rs6000: Preserve REG_EH_REGION when replacing load/store [PR106091]
on 2022/7/7 17:03, Richard Biener wrote: > On Thu, Jul 7, 2022 at 10:55 AM Kewen.Lin wrote: >> >> Hi, >> >> As test case in PR106091 shows, rs6000 specific pass swaps >> doesn't preserve the reg_note REG_EH_REGION when replacing >> some load insn at the end of basic block, it causes the >> flow info verification to fail unexpectedly. Since memory >> reference rtx may trap, this patch is to ensure we copy >> REG_EH_REGION reg_note while replacing swapped aligned load >> or store. >> >> Bootstrapped and regtested on powerpc64-linux-gnu P7 & P8, >> and powerpc64le-linux-gnu P9 & P10. >> >> Richi, could you help to review this patch from a point view >> of non-call-exceptions expert? > > I think it looks OK but I do wonder if in RTL there's a better > way to transfer EH info from one stmt to another when you > are replacing it? On gimple gsi_replace would do, but I > can't immediately find a proper RTL replacement for your > emit_insn_before (..., X); remove_insn (X); (plus DF assorted > things). > Thanks for so prompt review! For the question, I'm not sure :(, when I was drafting this patch, I wondered if there is one function passing/copying reg_note REG_EH_REGION for this kind of need, so I went through almost all the places related to REG_EH_REGION, but nothing desired was found (though I may miss sth.). BR, Kewen > Eric? > >> >> I'm going to install it if it looks good to you. Thanks! >> >> - >> PR target/106091 >> >> gcc/ChangeLog: >> >> * config/rs6000/rs6000-p8swap.cc (replace_swapped_aligned_store): >> Copy >> REG_EH_REGION when replacing one store insn having it. >> (replace_swapped_aligned_load): Likewise. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr106091.c: New test. >> --- >> gcc/config/rs6000/rs6000-p8swap.cc | 20 ++-- >> gcc/testsuite/gcc.target/powerpc/pr106091.c | 15 +++ >> 2 files changed, 33 insertions(+), 2 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106091.c >> >> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc >> b/gcc/config/rs6000/rs6000-p8swap.cc >> index 275702fee1b..19fbbfb67dc 100644 >> --- a/gcc/config/rs6000/rs6000-p8swap.cc >> +++ b/gcc/config/rs6000/rs6000-p8swap.cc >> @@ -1690,7 +1690,15 @@ replace_swapped_aligned_store (swap_web_entry >> *insn_entry, >>gcc_assert ((GET_CODE (new_body) == SET) >> && MEM_P (SET_DEST (new_body))); >> >> - set_block_for_insn (new_insn, BLOCK_FOR_INSN (store_insn)); >> + basic_block bb = BLOCK_FOR_INSN (store_insn); >> + set_block_for_insn (new_insn, bb); >> + /* Handle REG_EH_REGION note. */ >> + if (cfun->can_throw_non_call_exceptions && BB_END (bb) == store_insn) >> +{ >> + rtx note = find_reg_note (store_insn, REG_EH_REGION, NULL_RTX); >> + if (note) >> + add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0)); >> +} >>df_insn_rescan (new_insn); >> >>df_insn_delete (store_insn); >> @@ -1784,7 +1792,15 @@ replace_swapped_aligned_load (swap_web_entry >> *insn_entry, rtx swap_insn) >>gcc_assert ((GET_CODE (new_body) == SET) >> && MEM_P (SET_SRC (new_body))); >> >> - set_block_for_insn (new_insn, BLOCK_FOR_INSN (def_insn)); >> + basic_block bb = BLOCK_FOR_INSN (def_insn); >> + set_block_for_insn (new_insn, bb); >> + /* Handle REG_EH_REGION note. */ >> + if (cfun->can_throw_non_call_exceptions && BB_END (bb) == def_insn) >> +{ >> + rtx note = find_reg_note (def_insn, REG_EH_REGION, NULL_RTX); >> + if (note) >> + add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0)); >> +} >>df_insn_rescan (new_insn); >> >>df_insn_delete (def_insn); >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106091.c >> b/gcc/testsuite/gcc.target/powerpc/pr106091.c >> new file mode 100644 >> index 000..61ce8cf4733 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr106091.c >> @@ -0,0 +1,15 @@ >> +/* { dg-options "-O -fnon-call-exceptions -fno-tree-dce -fno-tree-forwprop >> -w" } */ >> + >> +/* Verify there is no ICE. */ >> + >> +typedef short __attribute__ ((__vector_size__ (64))) V; >> +V v, w; >> + >> +inline V foo (V a, V b); >> + >> +V >> +foo (V a, V b) >> +{ >> + b &= v < b; >> + return (V){foo (b, w)[3], (V){}[3]}; >> +} >> -- >> 2.25.1
Re: libstdc++: Minor codegen improvement for atomic wait spinloop
On Wed, 6 Jul 2022 at 22:42, Thomas Rodgers wrote: > > Ok for trunk? backport? Yes, for all branches that have the atomic wait code. > > On Wed, Jul 6, 2022 at 1:56 PM Jonathan Wakely wrote: >> >> On Wed, 6 Jul 2022 at 02:05, Thomas Rodgers via Libstdc++ >> wrote: >> > >> > This patch merges the spin loops in the atomic wait implementation which is >> > a >> > minor codegen improvement. >> > >> > libstdc++-v3/ChangeLog: >> > * include/bits/atomic_wait.h (__atomic_spin): Merge spin loops. >> >> OK, thanks. >>
Re: Enhance 'libgomp.c-c++-common/requires-4.c', 'libgomp.c-c++-common/requires-5.c' testing (was: [Patch][v4] OpenMP: Move omp requires checks to libgomp)
On 07.07.22 10:42, Thomas Schwinge wrote: In preparation for other changes: ... On 2022-06-29T16:33:02+0200, Tobias Burnus wrote: +/* { dg-output "devices present but 'omp requires unified_address, unified_shared_memory, reverse_offload' cannot be fulfilled" } */ (The latter diagnostic later got conditionalized by 'GOMP_DEBUG=1'.) OK to push the attached "Enhance 'libgomp.c-c++-common/requires-4.c', 'libgomp.c-c++-common/requires-5.c' testing"? ... libgomp/ * testsuite/libgomp.c-c++-common/requires-4.c: Enhance testing. * testsuite/libgomp.c-c++-common/requires-5.c: Likewise. ... --- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c @@ -1,22 +1,29 @@ -/* { dg-do link { target offloading_enabled } } */ /* { dg-additional-options "-flto" } */ /* { dg-additional-sources requires-4-aux.c } */ -/* Check diagnostic by device-compiler's or host compiler's lto1. +/* Check no diagnostic by device-compiler's or host compiler's lto1. I note that without ENABLE_OFFLOADING that there is never any lto1 diagnostic. However, given that no diagnostic is expected, it also works for "! offloading_enabled". Thus, the change fine. Other file uses: 'requires reverse_offload', but that's inactive as there are no declare target directives, device constructs nor device routines */ +/* For actual offload execution, prints the following (only) if GOMP_DEBUG=1: + "devices present but 'omp requires unified_address, unified_shared_memory, reverse_offload' cannot be fulfilled" + and does host-fallback execution. */ The latter is only true when also device code is produced – and a device is available for that/those device types. I think that's what you imply by "For actual offload execution", but it is a bit hidden. Maybe s/For actual offload execution, prints/It may print/ is clearer? In principle, it would be nice if we could test for the output, but currently setting an env var for remote execution does not work, yet. Cf. https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597773.html - When set, we could use offload_target_nvptx etc. (..._amdgcn, ..._any) to test – as this guarantees that it is compiled for that device + the device is available. + #pragma omp requires unified_address,unified_shared_memory -int a[10]; +int a[10] = { 0 }; extern void foo (void); int main (void) { - #pragma omp target + #pragma omp target map(to: a) Hmm, I wonder whether I like it or not. Without, there is an implicit "map(tofrom:a)". On the other hand, OpenMP permits that – even with unified-shared memory – the implementation my copy the data to the device. (For instance, to permit faster access to "a".) Thus, ... + for (int i = 0; i < 10; i++) +a[i] = i; + for (int i = 0; i < 10; i++) -a[i] = 0; +if (a[i] != i) + __builtin_abort (); ... this condition (back on the host) could also fail with USM. However, given that to my knowledge no USM implementation actually copies the data, I believe it is fine. (Disclaimer: I have not checked what OG12, but I guess it also does not copy it.) --- a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c @@ -1,21 +1,25 @@ -/* { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } } */ /* { dg-additional-sources requires-5-aux.c } */ +/* For actual offload execution, prints the following (only) if GOMP_DEBUG=1: + "devices present but 'omp requires unified_address, unified_shared_memory, reverse_offload' cannot be fulfilled" + and does host-fallback execution. */ + This wording is correct with the now-removed check – but if you remove the offload_target..., it only "might" print it, depending, well, on the conditions set by offload_target... #pragma omp requires unified_shared_memory, unified_address, reverse_offload -int a[10]; +int a[10] = { 0 }; extern void foo (void); int main (void) { - #pragma omp target + #pragma omp target map(to: a) + for (int i = 0; i < 10; i++) +a[i] = i; + for (int i = 0; i < 10; i++) -a[i] = 0; +if (a[i] != i) + __builtin_abort (); foo (); return 0; } Thus: LGTM, if you update the GOMP_DEBUG=... wording, either using "might" (etc.) or by being more explicit. Once we have remote setenv, we probably want to add another testcase to check for the GOMP_DEBUG=1, copying an existing one, adding dg-output and restricting it to target offload_target_... Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[PATCH v2] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.
Under the LA architecture, when the stack is dropped too far, the process of dropping the stack is divided into two steps. step1: After dropping the stack, save callee saved registers on the stack. step2: The rest of it. The stack drop operation is optimized when frame->total_size minus frame->sp_fp_offset is an integer multiple of 4096, can reduce the number of instructions required to drop the stack. However, this optimization is not effective because of the original calculation method The following case: int main() { char buf[1024 * 12]; printf ("%p\n", buf); return 0; } As you can see from the generated assembler, the old GCC has two more instructions than the new GCC, lines 14 and line 24. newold 10 main: │ 11 main: 11 addi.d $r3,$r3,-16 │ 12 lu12i.w $r13,-12288>>12 12 lu12i.w $r13,-12288>>12 │ 13 addi.d $r3,$r3,-2032 13 lu12i.w $r5,-12288>>12│ 14 ori $r13,$r13,2016 14 lu12i.w $r12,12288>>12│ 15 lu12i.w $r5,-12288>>12 15 st.d $r1,$r3,8 │ 16 lu12i.w $r12,12288>>12 16 add.d $r12,$r12,$r5 │ 17 st.d $r1,$r3,2024 17 add.d $r3,$r3,$r13│ 18 add.d $r12,$r12,$r5 18 add.d $r5,$r12,$r3│ 19 add.d $r3,$r3,$r13 19 la.local $r4,.LC0│ 20 add.d $r5,$r12,$r3 20 bl %plt(printf) │ 21 la.local $r4,.LC0 21 lu12i.w $r13,12288>>12│ 22 bl %plt(printf) 22 add.d $r3,$r3,$r13│ 23 lu12i.w $r13,8192>>12 23 ld.d $r1,$r3,8 │ 24 ori $r13,$r13,2080 24 or $r4,$r0,$r0 │ 25 add.d $r3,$r3,$r13 25 addi.d $r3,$r3,16│ 26 ld.d $r1,$r3,2024 26 jr $r1 │ 27 or $r4,$r0,$r0 │ 28 addi.d $r3,$r3,2032 │ 29 jr $r1 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_compute_frame_info): Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD or UNITS_PER_FP_REG. gcc/testsuite/ChangeLog: * gcc.target/loongarch/prolog-opt.c: New test. --- gcc/config/loongarch/loongarch.cc | 12 ++-- .../gcc.target/loongarch/prolog-opt.c | 29 +++ 2 files changed, 38 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/prolog-opt.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index d72b256df51..5c9a33c14f7 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -917,8 +917,12 @@ loongarch_compute_frame_info (void) frame->frame_pointer_offset = offset; /* Next are the callee-saved FPRs. */ if (frame->fmask) -offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); - frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +{ + offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); + frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +} + else +frame->fp_sp_offset = offset; /* Next are the callee-saved GPRs. */ if (frame->mask) { @@ -931,8 +935,10 @@ loongarch_compute_frame_info (void) frame->save_libcall_adjustment = x_save_size; offset += x_save_size; + frame->gp_sp_offset = offset - UNITS_PER_WORD; } - frame->gp_sp_offset = offset - UNITS_PER_WORD; + else +frame->gp_sp_offset = offset; /* The hard frame pointer points above the callee-saved GPRs. */ frame->hard_frame_pointer_offset = offset; /* Above the hard frame pointer is the callee-allocated varags save area. */ diff --git a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c new file mode 100644 index 000..7f611370aa4 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c @@ -0,0 +1,29 @@ +/* Test that LoongArch backend stack drop operation optimized. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=lp64d" } */ +/* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */ + +struct test +{ + int empty1[0]; + double empty2[0]; + int : 0; + float x; + long empty3[0]; + long : 0; + float y; + unsigned : 0; + char empty4[0]; +}; + +extern void callee (struct test); + +void +caller (void) +{ + struct test test; + test.x = 114; + test.y = 514; + callee (test); +} -- 2.31.1
[PATCH v3] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.
update testsuite. -- Under the LA architecture, when the stack is dropped too far, the process of dropping the stack is divided into two steps. step1: After dropping the stack, save callee saved registers on the stack. step2: The rest of it. The stack drop operation is optimized when frame->total_size minus frame->sp_fp_offset is an integer multiple of 4096, can reduce the number of instructions required to drop the stack. However, this optimization is not effective because of the original calculation method The following case: int main() { char buf[1024 * 12]; printf ("%p\n", buf); return 0; } As you can see from the generated assembler, the old GCC has two more instructions than the new GCC, lines 14 and line 24. newold 10 main: │ 11 main: 11 addi.d $r3,$r3,-16 │ 12 lu12i.w $r13,-12288>>12 12 lu12i.w $r13,-12288>>12 │ 13 addi.d $r3,$r3,-2032 13 lu12i.w $r5,-12288>>12│ 14 ori $r13,$r13,2016 14 lu12i.w $r12,12288>>12│ 15 lu12i.w $r5,-12288>>12 15 st.d $r1,$r3,8 │ 16 lu12i.w $r12,12288>>12 16 add.d $r12,$r12,$r5 │ 17 st.d $r1,$r3,2024 17 add.d $r3,$r3,$r13│ 18 add.d $r12,$r12,$r5 18 add.d $r5,$r12,$r3│ 19 add.d $r3,$r3,$r13 19 la.local $r4,.LC0│ 20 add.d $r5,$r12,$r3 20 bl %plt(printf) │ 21 la.local $r4,.LC0 21 lu12i.w $r13,12288>>12│ 22 bl %plt(printf) 22 add.d $r3,$r3,$r13│ 23 lu12i.w $r13,8192>>12 23 ld.d $r1,$r3,8 │ 24 ori $r13,$r13,2080 24 or $r4,$r0,$r0 │ 25 add.d $r3,$r3,$r13 25 addi.d $r3,$r3,16│ 26 ld.d $r1,$r3,2024 26 jr $r1 │ 27 or $r4,$r0,$r0 │ 28 addi.d $r3,$r3,2032 │ 29 jr $r1 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_compute_frame_info): Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD or UNITS_PER_FP_REG. gcc/testsuite/ChangeLog: * gcc.target/loongarch/prolog-opt.c: New test. --- gcc/config/loongarch/loongarch.cc | 12 +--- gcc/testsuite/gcc.target/loongarch/prolog-opt.c | 15 +++ 2 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/prolog-opt.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index d72b256df51..5c9a33c14f7 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -917,8 +917,12 @@ loongarch_compute_frame_info (void) frame->frame_pointer_offset = offset; /* Next are the callee-saved FPRs. */ if (frame->fmask) -offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); - frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +{ + offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); + frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +} + else +frame->fp_sp_offset = offset; /* Next are the callee-saved GPRs. */ if (frame->mask) { @@ -931,8 +935,10 @@ loongarch_compute_frame_info (void) frame->save_libcall_adjustment = x_save_size; offset += x_save_size; + frame->gp_sp_offset = offset - UNITS_PER_WORD; } - frame->gp_sp_offset = offset - UNITS_PER_WORD; + else +frame->gp_sp_offset = offset; /* The hard frame pointer points above the callee-saved GPRs. */ frame->hard_frame_pointer_offset = offset; /* Above the hard frame pointer is the callee-allocated varags save area. */ diff --git a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c new file mode 100644 index 000..c7bd71dde93 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c @@ -0,0 +1,15 @@ +/* Test that LoongArch backend stack drop operation optimized. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=lp64d" } */ +/* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */ + +#include + +int main() +{ + char buf[1024 * 12]; + printf ("%p\n", buf); + return 0; +} + -- 2.31.1
[PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators
This patch series implements OpenMP allocators for low-latency memory on nvptx, unified shared memory on both nvptx and amdgcn, and generic pinned memory support for all Linux hosts (an nvptx-specific implementation using Cuda pinned memory is planned for the future, as is low-latency memory on amdgcn). Patches 01 to 14 are reposts of patches previously submitted, now forward ported to the current master branch and with the various follow-up patches folded in. Where it conflicts with the new memkind implementation the memkind takes precedence (but there's currently no way to implement memory that's both high-bandwidth and pinned anyway). Patches 15 to 17 are new work. I can probably approve these myself, but they can't be committed until the rest of the series is approved. Andrew Andrew Stubbs (11): libgomp, nvptx: low-latency memory allocator libgomp: pinned memory libgomp, openmp: Add ompx_pinned_mem_alloc openmp, nvptx: low-lat memory access traits openmp, nvptx: ompx_unified_shared_mem_alloc openmp: Add -foffload-memory openmp: allow requires unified_shared_memory openmp: -foffload-memory=pinned amdgcn: Support XNACK mode amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK amdgcn: libgomp plugin USM implementation Hafiz Abid Qadeer (6): openmp: Use libgomp memory allocation functions with unified shared memory. Add parsing support for allocate directive (OpenMP 5.0) Translate allocate directive (OpenMP 5.0). Handle cleanup of omp allocated variables (OpenMP 5.0). Gimplify allocate directive (OpenMP 5.0). Lower allocate directive (OpenMP 5.0). gcc/c/c-parser.cc | 22 +- gcc/common.opt| 16 + gcc/config/gcn/gcn-hsa.h | 3 +- gcc/config/gcn/gcn-opts.h | 10 +- gcc/config/gcn/gcn-valu.md| 29 +- gcc/config/gcn/gcn.cc | 62 ++- gcc/config/gcn/gcn.md | 113 +++-- gcc/config/gcn/gcn.opt| 18 +- gcc/config/gcn/mkoffload.cc | 56 ++- gcc/coretypes.h | 7 + gcc/cp/parser.cc | 22 +- gcc/doc/gimple.texi | 38 +- gcc/doc/invoke.texi | 16 +- gcc/fortran/dump-parse-tree.cc| 3 + gcc/fortran/gfortran.h| 5 +- gcc/fortran/match.h | 1 + gcc/fortran/openmp.cc | 242 ++- gcc/fortran/parse.cc | 10 +- gcc/fortran/resolve.cc| 1 + gcc/fortran/st.cc | 1 + gcc/fortran/trans-decl.cc | 20 + gcc/fortran/trans-openmp.cc | 50 +++ gcc/fortran/trans.cc | 1 + gcc/gimple-pretty-print.cc| 37 ++ gcc/gimple.cc | 12 + gcc/gimple.def| 6 + gcc/gimple.h | 60 ++- gcc/gimplify.cc | 19 + gcc/gsstruct.def | 1 + gcc/omp-builtins.def | 3 + gcc/omp-low.cc| 383 + gcc/passes.def| 1 + .../c-c++-common/gomp/alloc-pinned-1.c| 28 ++ gcc/testsuite/c-c++-common/gomp/usm-1.c | 4 + gcc/testsuite/c-c++-common/gomp/usm-2.c | 46 +++ gcc/testsuite/c-c++-common/gomp/usm-3.c | 44 ++ gcc/testsuite/c-c++-common/gomp/usm-4.c | 4 + gcc/testsuite/g++.dg/gomp/usm-1.C | 32 ++ gcc/testsuite/g++.dg/gomp/usm-2.C | 30 ++ gcc/testsuite/g++.dg/gomp/usm-3.C | 38 ++ gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 + gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 | 73 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 84 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 | 13 + gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 | 15 + gcc/testsuite/gfortran.dg/gomp/usm-1.f90 | 6 + gcc/testsuite/gfortran.dg/gomp/usm-2.f90 | 16 + gcc/testsuite/gfortran.dg/gomp/usm-3.f90 | 13 + gcc/testsuite/gfortran.dg/gomp/usm-4.f90 | 6 + gcc/tree-core.h | 9 + gcc/tree-pass.h | 1 + gcc/tree-pretty-print.cc | 23 ++ gcc/tree.cc | 1 + gcc/tree.def | 4 + gcc/tree.h| 15 + include/cuda/cuda.h | 12 + libgomp/allocator.c | 304 ++ libgomp/config/linux/allocator.c | 137 +++ libgomp/config/nvptx/allocator.c | 387 ++ libgomp/config/nvptx/team.c
[PATCH 02/17] libgomp: pinned memory
Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. (xmlock): New function. (omp_init_allocator): Don't disallow the pinned trait. (omp_aligned_alloc): Add pinning to all MEMSPACE_* calls. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. (omp_free): Likewise. * config/linux/allocator.c: New file. * config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add PIN. (MEMSPACE_REALLOC): Add PIN. (MEMSPACE_FREE): Add PIN. * testsuite/libgomp.c/alloc-pinned-1.c: New test. * testsuite/libgomp.c/alloc-pinned-2.c: New test. * testsuite/libgomp.c/alloc-pinned-3.c: New test. * testsuite/libgomp.c/alloc-pinned-4.c: New test. --- libgomp/allocator.c | 67 ++ libgomp/config/linux/allocator.c | 99 ++ libgomp/config/nvptx/allocator.c | 8 +- libgomp/testsuite/libgomp.c/alloc-pinned-1.c | 95 + libgomp/testsuite/libgomp.c/alloc-pinned-2.c | 101 ++ libgomp/testsuite/libgomp.c/alloc-pinned-3.c | 130 ++ libgomp/testsuite/libgomp.c/alloc-pinned-4.c | 132 +++ 7 files changed, 602 insertions(+), 30 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c diff --git a/libgomp/allocator.c b/libgomp/allocator.c index 9b33bcf529b..54310ab93ca 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -39,16 +39,20 @@ /* These macros may be overridden in config//allocator.c. */ #ifndef MEMSPACE_ALLOC -#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE) +#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \ + (PIN ? NULL : malloc (SIZE)) #endif #ifndef MEMSPACE_CALLOC -#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE) +#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \ + (PIN ? NULL : calloc (1, SIZE)) #endif #ifndef MEMSPACE_REALLOC -#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE) +#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \ + ((PIN) || (OLDPIN) ? NULL : realloc (ADDR, SIZE)) #endif #ifndef MEMSPACE_FREE -#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR) +#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \ + (PIN ? NULL : free (ADDR)) #endif /* Map the predefined allocators to the correct memory space. @@ -351,10 +355,6 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits, break; } - /* No support for this so far. */ - if (data.pinned) -return omp_null_allocator; - ret = gomp_malloc (sizeof (struct omp_allocator_data)); *ret = data; #ifndef HAVE_SYNC_BUILTINS @@ -481,7 +481,8 @@ retry: } else #endif - ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size); + ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size, + allocator_data->pinned); if (ptr == NULL) { #ifdef HAVE_SYNC_BUILTINS @@ -511,7 +512,8 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); - ptr = MEMSPACE_ALLOC (memspace, new_size); + ptr = MEMSPACE_ALLOC (memspace, new_size, +allocator_data && allocator_data->pinned); } if (ptr == NULL) goto fail; @@ -542,9 +544,9 @@ fail: #ifdef LIBGOMP_USE_MEMKIND || memkind #endif - || (allocator_data - && allocator_data->pool_size < ~(uintptr_t) 0) - || !allocator_data) + || !allocator_data + || allocator_data->pool_size < ~(uintptr_t) 0 + || allocator_data->pinned) { allocator = omp_default_mem_alloc; goto retry; @@ -596,6 +598,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator) struct omp_mem_header *data; omp_memspace_handle_t memspace __attribute__((unused)) = omp_default_mem_space; + int pinned __attribute__((unused)) = false; if (ptr == NULL) return; @@ -627,6 +630,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator) #endif memspace = allocator_data->memspace; + pinned = allocator_data->pinned; } else { @@ -651,7 +655,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator) memspace = predefined_alloc_mapping[data->allocator]; } - MEMSPACE_FREE (memspace, data->ptr, data->size); + MEMSPACE_FREE (memspace, data->ptr, data->size, pinned); } ialias (omp_free) @@ -767,7 +771,8 @@ retry: } else #endif - ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size); + ptr =
[PATCH 01/17] libgomp, nvptx: low-latency memory allocator
This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using the GOMP_NVPTX_LOWLAT_POOL environment variable. The use of the PTX dynamic_smem_size feature means that low-latency allocator will not work with the PTX 3.1 multilib. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): New macro. (MEMSPACE_CALLOC): New macro. (MEMSPACE_REALLOC): New macro. (MEMSPACE_FREE): New macro. (dynamic_smem_size): New constants. (omp_alloc): Use MEMSPACE_ALLOC. Implement fall-backs for predefined allocators. (omp_free): Use MEMSPACE_FREE. (omp_calloc): Use MEMSPACE_CALLOC. Implement fall-backs for predefined allocators. (omp_realloc): Use MEMSPACE_REALLOC and MEMSPACE_ALLOC.. Implement fall-backs for predefined allocators. * config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable. (__nvptx_lowlat_pool): New asm varaible. (gomp_nvptx_main): Initialize the low-latency heap. * plugin/plugin-nvptx.c (lowlat_pool_size): New variable. (GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar. (GOMP_OFFLOAD_run): Apply lowlat_pool_size. * config/nvptx/allocator.c: New file. * testsuite/libgomp.c/allocators-1.c: New test. * testsuite/libgomp.c/allocators-2.c: New test. * testsuite/libgomp.c/allocators-3.c: New test. * testsuite/libgomp.c/allocators-4.c: New test. * testsuite/libgomp.c/allocators-5.c: New test. * testsuite/libgomp.c/allocators-6.c: New test. co-authored-by: Kwok Cheung Yeung --- libgomp/allocator.c| 235 - libgomp/config/nvptx/allocator.c | 370 + libgomp/config/nvptx/team.c| 28 ++ libgomp/plugin/plugin-nvptx.c | 23 +- libgomp/testsuite/libgomp.c/allocators-1.c | 56 libgomp/testsuite/libgomp.c/allocators-2.c | 64 libgomp/testsuite/libgomp.c/allocators-3.c | 42 +++ libgomp/testsuite/libgomp.c/allocators-4.c | 196 +++ libgomp/testsuite/libgomp.c/allocators-5.c | 63 libgomp/testsuite/libgomp.c/allocators-6.c | 117 +++ 10 files changed, 1110 insertions(+), 84 deletions(-) create mode 100644 libgomp/config/nvptx/allocator.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-1.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-2.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-3.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-4.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-5.c create mode 100644 libgomp/testsuite/libgomp.c/allocators-6.c diff --git a/libgomp/allocator.c b/libgomp/allocator.c index b04820b8cf9..9b33bcf529b 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -37,6 +37,34 @@ #define omp_max_predefined_alloc omp_thread_mem_alloc +/* These macros may be overridden in config//allocator.c. */ +#ifndef MEMSPACE_ALLOC +#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE) +#endif +#ifndef MEMSPACE_CALLOC +#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE) +#endif +#ifndef MEMSPACE_REALLOC +#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE) +#endif +#ifndef MEMSPACE_FREE +#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR) +#endif + +/* Map the predefined allocators to the correct memory space. + The index to this table is the omp_allocator_handle_t enum value. */ +static const omp_memspace_handle_t predefined_alloc_mapping[] = { + omp_default_mem_space, /* omp_null_allocator. */ + omp_default_mem_space, /* omp_default_mem_alloc. */ + omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */ + omp_default_mem_space, /* omp_const_mem_alloc. */ + omp_high_bw_mem_space, /* omp_high_bw_mem_alloc. */ + omp_low_lat_mem_space, /* omp_low_lat_mem_alloc. */ + omp_low_lat_mem_space, /* omp_cgroup_mem_alloc. */ + omp_low_lat_mem_space, /* omp_pteam_mem_alloc. */ + omp_low_lat_mem_space, /* omp_thread_mem_alloc. */ +}; + enum gomp_memkind_kind { GOMP_MEMKIND_NONE = 0, @@ -453,7 +481,7 @@ retry: } else #endif - ptr = malloc (new_size); + ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size); if (ptr == NULL) { #ifdef HAVE_SYNC_BUILTINS @@ -478,7 +506,13 @@ retry: } else #endif - ptr = malloc (new_size); + { + omp_memspace_handle_t memspace __attribute__((unused)) + = (allocator_data + ? allocator_data->memspace + : predefined_alloc_mapping[allocator]); + ptr = MEMSPACE_ALLOC (memspace, new_size); + } if (ptr == NULL) goto fail; } @@ -496,35 +530,38 @@ retry: return ret; fail: - if (allocator_data) + int fallback = (allocator_da
[PATCH 04/17] openmp, nvptx: low-lat memory access traits
The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator now implicitly implies the "pteam" trait. libgomp/ChangeLog: * allocator.c (MEMSPACE_VALIDATE): New macro. (omp_aligned_alloc): Use MEMSPACE_VALIDATE. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. * config/nvptx/allocator.c (nvptx_memspace_validate): New function. (MEMSPACE_VALIDATE): New macro. * testsuite/libgomp.c/allocators-4.c (main): Add access trait. * testsuite/libgomp.c/allocators-6.c (main): Add access trait. * testsuite/libgomp.c/allocators-7.c: New test. --- libgomp/allocator.c| 15 + libgomp/config/nvptx/allocator.c | 11 libgomp/testsuite/libgomp.c/allocators-4.c | 7 ++- libgomp/testsuite/libgomp.c/allocators-6.c | 7 ++- libgomp/testsuite/libgomp.c/allocators-7.c | 68 ++ 5 files changed, 102 insertions(+), 6 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/allocators-7.c diff --git a/libgomp/allocator.c b/libgomp/allocator.c index 029d0d40a36..48ab0782e6b 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -54,6 +54,9 @@ #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \ (PIN ? NULL : free (ADDR)) #endif +#ifndef MEMSPACE_VALIDATE +#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) 1 +#endif /* Map the predefined allocators to the correct memory space. The index to this table is the omp_allocator_handle_t enum value. */ @@ -438,6 +441,10 @@ retry: if (__builtin_add_overflow (size, new_size, &new_size)) goto fail; + if (allocator_data + && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access)) +goto fail; + if (__builtin_expect (allocator_data && allocator_data->pool_size < ~(uintptr_t) 0, 0)) { @@ -733,6 +740,10 @@ retry: if (__builtin_add_overflow (size_temp, new_size, &new_size)) goto fail; + if (allocator_data + && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access)) +goto fail; + if (__builtin_expect (allocator_data && allocator_data->pool_size < ~(uintptr_t) 0, 0)) { @@ -964,6 +975,10 @@ retry: goto fail; old_size = data->size; + if (allocator_data + && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access)) +goto fail; + if (__builtin_expect (allocator_data && allocator_data->pool_size < ~(uintptr_t) 0, 0)) { diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c index f740b97f6ac..0102680b717 100644 --- a/libgomp/config/nvptx/allocator.c +++ b/libgomp/config/nvptx/allocator.c @@ -358,6 +358,15 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr, return realloc (addr, size); } +static inline int +nvptx_memspace_validate (omp_memspace_handle_t memspace, unsigned access) +{ + /* Disallow use of low-latency memory when it must be accessible by + all threads. */ + return (memspace != omp_low_lat_mem_space + || access != omp_atv_all); +} + #define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \ nvptx_memspace_alloc (MEMSPACE, SIZE) #define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \ @@ -366,5 +375,7 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr, nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE) #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \ nvptx_memspace_free (MEMSPACE, ADDR, SIZE) +#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \ + nvptx_memspace_validate (MEMSPACE, ACCESS) #include "../../allocator.c" diff --git a/libgomp/testsuite/libgomp.c/allocators-4.c b/libgomp/testsuite/libgomp.c/allocators-4.c index 9fa6aa1624f..cae27ea33c1 100644 --- a/libgomp/testsuite/libgomp.c/allocators-4.c +++ b/libgomp/testsuite/libgomp.c/allocators-4.c @@ -23,10 +23,11 @@ main () #pragma omp target { /* Ensure that the memory we get *is* low-latency with a null-fallback. */ -omp_alloctrait_t traits[1] - = { { omp_atk_fallback, omp_atv_null_fb } }; +omp_alloctrait_t traits[2] + = { { omp_atk_fallback, omp_atv_null_fb }, + { omp_atk_access, omp_atv_pteam } }; omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space, - 1, traits); + 2, traits); int size = 4; diff --git a/libgomp/testsuite/libgomp.c/allocators-6.c b/libgomp/testsuite/libgomp.c/allocators-6.c index 90bf73095ef..c03233df582 100644 --- a/libgomp/testsuite/libgomp.c/allocators-6.c +++ b/libgomp/testsuite/libgomp.c/allocators-6.c @@ -23,10 +23,11 @@ main () #pragma omp target { /* Ensure that the memory we get *is* low-latency with a null-fallback. */ -omp_alloctrait_t traits[1] - = { { omp_atk_fallback, omp_atv_null_fb } }; +omp_alloctrait_t traits[2] + = { { omp_atk_fallback, omp_atv_null_fb },
[PATCH 03/17] libgomp, openmp: Add ompx_pinned_mem_alloc
This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations currently in development. The allocator is equivalent to using a custom allocator with the pinned trait and the null fallback trait. libgomp/ChangeLog: * allocator.c (omp_max_predefined_alloc): Update. (omp_aligned_alloc): Support ompx_pinned_mem_alloc. (omp_free): Likewise. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. * omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc. * omp_lib.f90.in: Add ompx_pinned_mem_alloc. * testsuite/libgomp.c/alloc-pinned-5.c: New test. * testsuite/libgomp.c/alloc-pinned-6.c: New test. * testsuite/libgomp.fortran/alloc-pinned-1.f90: New test. --- libgomp/allocator.c | 60 +++ libgomp/omp.h.in | 1 + libgomp/omp_lib.f90.in| 2 + libgomp/testsuite/libgomp.c/alloc-pinned-5.c | 90 libgomp/testsuite/libgomp.c/alloc-pinned-6.c | 101 ++ .../libgomp.fortran/alloc-pinned-1.f90| 16 +++ 6 files changed, 252 insertions(+), 18 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90 diff --git a/libgomp/allocator.c b/libgomp/allocator.c index 54310ab93ca..029d0d40a36 100644 --- a/libgomp/allocator.c +++ b/libgomp/allocator.c @@ -35,7 +35,7 @@ #include #endif -#define omp_max_predefined_alloc omp_thread_mem_alloc +#define omp_max_predefined_alloc ompx_pinned_mem_alloc /* These macros may be overridden in config//allocator.c. */ #ifndef MEMSPACE_ALLOC @@ -67,6 +67,7 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = { omp_low_lat_mem_space, /* omp_cgroup_mem_alloc. */ omp_low_lat_mem_space, /* omp_pteam_mem_alloc. */ omp_low_lat_mem_space, /* omp_thread_mem_alloc. */ + omp_default_mem_space, /* ompx_pinned_mem_alloc. */ }; enum gomp_memkind_kind @@ -512,8 +513,11 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); - ptr = MEMSPACE_ALLOC (memspace, new_size, -allocator_data && allocator_data->pinned); + int pinned __attribute__((unused)) + = (allocator_data + ? allocator_data->pinned + : allocator == ompx_pinned_mem_alloc); + ptr = MEMSPACE_ALLOC (memspace, new_size, pinned); } if (ptr == NULL) goto fail; @@ -534,7 +538,8 @@ retry: fail: int fallback = (allocator_data ? allocator_data->fallback - : allocator == omp_default_mem_alloc + : (allocator == omp_default_mem_alloc + || allocator == ompx_pinned_mem_alloc) ? omp_atv_null_fb : omp_atv_default_mem_fb); switch (fallback) @@ -653,6 +658,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator) #endif memspace = predefined_alloc_mapping[data->allocator]; + pinned = (data->allocator == ompx_pinned_mem_alloc); } MEMSPACE_FREE (memspace, data->ptr, data->size, pinned); @@ -802,8 +808,11 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); - ptr = MEMSPACE_CALLOC (memspace, new_size, - allocator_data && allocator_data->pinned); + int pinned __attribute__((unused)) + = (allocator_data + ? allocator_data->pinned + : allocator == ompx_pinned_mem_alloc); + ptr = MEMSPACE_CALLOC (memspace, new_size, pinned); } if (ptr == NULL) goto fail; @@ -824,7 +833,8 @@ retry: fail: int fallback = (allocator_data ? allocator_data->fallback - : allocator == omp_default_mem_alloc + : (allocator == omp_default_mem_alloc + || allocator == ompx_pinned_mem_alloc) ? omp_atv_null_fb : omp_atv_default_mem_fb); switch (fallback) @@ -1026,11 +1036,15 @@ retry: else #endif if (prev_size) - new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr, -data->size, new_size, -(free_allocator_data - && free_allocator_data->pinned), -allocator_data->pinned); + { + int was_pinned __attribute__((unused)) + = (free_allocator_data + ? free_allocator_data->pinned + : free_allocator == ompx_pinned_mem_alloc); + new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr, + data->size, new_size, was_pinned, + allocator_data->pinned); + } else new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size, allocator_data->pinned); @@ -1079,10 +1093,16 @@ retry: = (allocator_data ? allocator_data->memspace : predefined_alloc_mapping[allocator]); + int was_pinned __attribute__((unused)) + = (free_allocator_data
[PATCH 09/17] openmp: Use libgomp memory allocation functions with unified shared memory.
This patches changes calls to malloc/free/calloc/realloc and operator new to memory allocation functions in libgomp with allocator=ompx_unified_shared_mem_alloc. This helps existing code to benefit from the unified shared memory. The libgomp does the correct thing with all the mapping constructs and there is no memory copies if the pointer is pointing to unified shared memory. We only replace replacable new operator and not the class member or placement new. gcc/ChangeLog: * omp-low.cc (usm_transform): New function. (make_pass_usm_transform): Likewise. (class pass_usm_transform): New. * passes.def: Add pass_usm_transform. * tree-pass.h (make_pass_usm_transform): New declaration. gcc/testsuite/ChangeLog: * c-c++-common/gomp/usm-2.c: New test. * c-c++-common/gomp/usm-3.c: New test. * g++.dg/gomp/usm-1.C: New test. * g++.dg/gomp/usm-2.C: New test. * g++.dg/gomp/usm-3.C: New test. * gfortran.dg/gomp/usm-2.f90: New test. * gfortran.dg/gomp/usm-3.f90: New test. libgomp/ChangeLog: * testsuite/libgomp.c/usm-6.c: New test. * testsuite/libgomp.c++/usm-1.C: Likewise. co-authored-by: Andrew Stubbs --- gcc/omp-low.cc | 174 +++ gcc/passes.def | 1 + gcc/testsuite/c-c++-common/gomp/usm-2.c | 46 ++ gcc/testsuite/c-c++-common/gomp/usm-3.c | 44 ++ gcc/testsuite/g++.dg/gomp/usm-1.C| 32 + gcc/testsuite/g++.dg/gomp/usm-2.C| 30 gcc/testsuite/g++.dg/gomp/usm-3.C| 38 + gcc/testsuite/gfortran.dg/gomp/usm-2.f90 | 16 +++ gcc/testsuite/gfortran.dg/gomp/usm-3.f90 | 13 ++ gcc/tree-pass.h | 1 + libgomp/testsuite/libgomp.c++/usm-1.C| 54 +++ libgomp/testsuite/libgomp.c/usm-6.c | 92 12 files changed, 541 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90 create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index ba612e5c67d..cdadd6f0c96 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -15097,6 +15097,180 @@ make_pass_diagnose_omp_blocks (gcc::context *ctxt) { return new pass_diagnose_omp_blocks (ctxt); } + +/* Provide transformation required for using unified shared memory + by replacing calls to standard memory allocation functions with + function provided by the libgomp. */ + +static tree +usm_transform (gimple_stmt_iterator *gsi_p, bool *, + struct walk_stmt_info *wi) +{ + gimple *stmt = gsi_stmt (*gsi_p); + /* ompx_unified_shared_mem_alloc is 10. */ + const unsigned int unified_shared_mem_alloc = 10; + + switch (gimple_code (stmt)) +{ +case GIMPLE_CALL: + { + gcall *gs = as_a (stmt); + tree fndecl = gimple_call_fndecl (gs); + if (fndecl) + { + tree allocator = build_int_cst (pointer_sized_int_node, + unified_shared_mem_alloc); + const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl)); + if ((strcmp (name, "malloc") == 0) + || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MALLOC) + || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl) + || strcmp (name, "omp_target_alloc") == 0) + { + tree omp_alloc_type + = build_function_type_list (ptr_type_node, size_type_node, + pointer_sized_int_node, + NULL_TREE); + tree repl = build_fn_decl ("omp_alloc", omp_alloc_type); + tree size = gimple_call_arg (gs, 0); + gimple *g = gimple_build_call (repl, 2, size, allocator); + gimple_call_set_lhs (g, gimple_call_lhs (gs)); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi_p, g, true); + } + else if (strcmp (name, "aligned_alloc") == 0) + { + /* May be we can also use this for new operator with + std::align_val_t parameter. */ + tree omp_alloc_type + = build_function_type_list (ptr_type_node, size_type_node, + size_type_node, + pointer_sized_int_node, + NULL_TREE); + tree repl = build_fn_decl ("omp_aligned_alloc", + omp_alloc_type); + tree align = gimple_call_arg (gs, 0); + tree size = gimple_call_arg (gs, 1); + gimple *g = gimple_build_call (repl, 3, align, size, + allocator); + gimple_call_set_lhs (g, gimple_call_lhs (gs)); + gimple_set_location (g, gimple_location (stmt)); + gsi_replace (gsi_p, g, true); + } + else if ((strcmp (name, "calloc") == 0) + || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL) + && DECL_F
[PATCH 06/17] openmp: Add -foffload-memory
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 gcc/coretypes.h | 7 +++ gcc/doc/invoke.texi | 16 +++- 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/gcc/common.opt b/gcc/common.opt index e7a51e882ba..8d76980fbbb 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2213,6 +2213,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32) EnumValue Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64) +foffload-memory= +Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) Init(OFFLOAD_MEMORY_NONE) +-foffload-memory=[none|unified|pinned] Use an offload memory optimization. + +Enum +Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload memory option %qs) + +EnumValue +Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE) + +EnumValue +Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED) + +EnumValue +Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED) + fomit-frame-pointer Common Var(flag_omit_frame_pointer) Optimization When possible do not generate stack frames. diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 08b9ac9094c..dd52d5bb113 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -206,6 +206,13 @@ enum offload_abi { OFFLOAD_ABI_ILP32 }; +/* Types of memory optimization for an offload device. */ +enum offload_memory { + OFFLOAD_MEMORY_NONE, + OFFLOAD_MEMORY_UNIFIED, + OFFLOAD_MEMORY_PINNED +}; + /* Types of profile update methods. */ enum profile_update { PROFILE_UPDATE_SINGLE, diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d5ff1018372..3df39bb06e3 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -202,7 +202,7 @@ in the following sections. -fno-builtin -fno-builtin-@var{function} -fcond-mismatch @gol -ffreestanding -fgimple -fgnu-tm -fgnu89-inline -fhosted @gol -flax-vector-conversions -fms-extensions @gol --foffload=@var{arg} -foffload-options=@var{arg} @gol +-foffload=@var{arg} -foffload-options=@var{arg} -foffload-memory=@var{arg} @gol -fopenacc -fopenacc-dim=@var{geom} @gol -fopenmp -fopenmp-simd @gol -fpermitted-flt-eval-methods=@var{standard} @gol @@ -2708,6 +2708,20 @@ Typical command lines are -foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm @end smallexample +@item -foffload-memory=none +@itemx -foffload-memory=unified +@itemx -foffload-memory=pinned +@opindex foffload-memory +@cindex OpenMP offloading memory modes +Enable a memory optimization mode to use with OpenMP. The default behavior, +@option{-foffload-memory=none}, is to do nothing special (unless enabled via +a requires directive in the code). @option{-foffload-memory=unified} is +equivalent to @code{#pragma omp requires unified_shared_memory}. +@option{-foffload-memory=pinned} forces all host memory to be pinned (this +mode may require the user to increase the ulimit setting for locked memory). +All translation units must select the same setting to avoid undefined +behavior. + @item -fopenacc @opindex fopenacc @cindex OpenACC accelerator programming
[PATCH 05/17] openmp, nvptx: ompx_unified_shared_mem_alloc
This adds support for using Cuda Managed Memory with omp_alloc. It will be used as the underpinnings for "requires unified_shared_memory" in a later patch. There are two new predefined allocators, ompx_unified_shared_mem_alloc and ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to allocate memory in the "managed" space and explicitly on the host (it is intended that "malloc" will be intercepted by the compiler). The nvptx plugin is modified to make the necessary Cuda calls, and libgomp is modified to switch to shared-memory mode for USM allocated mappings. include/ChangeLog: * cuda/cuda.h (CUdevice_attribute): Add definitions for CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR. (CUmemAttach_flags): New. (CUpointer_attribute): New. (cuMemAllocManaged): New prototype. (cuPointerGetAttribute): New prototype. libgomp/ChangeLog: * allocator.c (omp_max_predefined_alloc): Update. (omp_aligned_alloc): Don't fallback ompx_host_mem_alloc. (omp_aligned_calloc): Likewise. (omp_realloc): Likewise. * config/linux/allocator.c (linux_memspace_alloc): Handle USM. (linux_memspace_calloc): Handle USM. (linux_memspace_free): Handle USM. (linux_memspace_realloc): Handle USM. * config/nvptx/allocator.c (nvptx_memspace_alloc): Reject ompx_host_mem_alloc. (nvptx_memspace_calloc): Likewise. (nvptx_memspace_realloc): Likewise. * libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype. (GOMP_OFFLOAD_usm_free): New prototype. (GOMP_OFFLOAD_is_usm_ptr): New prototype. * libgomp.h (gomp_usm_alloc): New prototype. (gomp_usm_free): New prototype. (gomp_is_usm_ptr): New prototype. (struct gomp_device_descr): Add USM functions. * omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space and ompx_host_mem_space. (omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and ompx_host_mem_alloc. * omp_lib.f90.in: Likewise. * plugin/cuda-lib.def (cuMemAllocManaged): Add new call. (cuPointerGetAttribute): Likewise. * plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter. Call cuMemAllocManaged as appropriate. (GOMP_OFFLOAD_get_num_devices): Allow GOMP_REQUIRES_UNIFIED_ADDRESS and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY. (GOMP_OFFLOAD_alloc): Move internals to ... (GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter. (GOMP_OFFLOAD_usm_alloc): New function. (GOMP_OFFLOAD_usm_free): New function. (GOMP_OFFLOAD_is_usm_ptr): New function. * target.c (gomp_map_vars_internal): Add USM support. (gomp_usm_alloc): New function. (gomp_usm_free): New function. (gomp_load_plugin_for_device): New function. * testsuite/libgomp.c/usm-1.c: New test. * testsuite/libgomp.c/usm-2.c: New test. * testsuite/libgomp.c/usm-3.c: New test. * testsuite/libgomp.c/usm-4.c: New test. * testsuite/libgomp.c/usm-5.c: New test. co-authored-by: Kwok Cheung Yeung squash! openmp, nvptx: ompx_unified_shared_mem_alloc --- include/cuda/cuda.h | 12 ++ libgomp/allocator.c | 13 -- libgomp/config/linux/allocator.c| 48 ++ libgomp/config/nvptx/allocator.c| 6 +++ libgomp/libgomp-plugin.h| 3 ++ libgomp/libgomp.h | 6 +++ libgomp/omp.h.in| 4 ++ libgomp/omp_lib.f90.in | 8 libgomp/plugin/cuda-lib.def | 2 + libgomp/plugin/plugin-nvptx.c | 47 ++--- libgomp/target.c| 64 + libgomp/testsuite/libgomp.c/usm-1.c | 24 +++ libgomp/testsuite/libgomp.c/usm-2.c | 32 +++ libgomp/testsuite/libgomp.c/usm-3.c | 35 libgomp/testsuite/libgomp.c/usm-4.c | 36 libgomp/testsuite/libgomp.c/usm-5.c | 28 + 16 files changed, 340 insertions(+), 28 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h index 3938d05d150..8135e7c9247 100644 --- a/include/cuda/cuda.h +++ b/include/cuda/cuda.h @@ -77,9 +77,19 @@ typedef enum { CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31, CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39, CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76, CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
[PATCH 12/17] Handle cleanup of omp allocated variables (OpenMP 5.0).
Currently we are only handling omp allocate directive that is associated with an allocate statement. This statement results in malloc and free calls. The malloc calls are easy to get to as they are in the same block as allocate directive. But the free calls come in a separate cleanup block. To help any later passes finding them, an allocate directive is generated in the cleanup block with kind=free. The normal allocate directive is given kind=allocate. gcc/fortran/ChangeLog: * gfortran.h (struct access_ref): Declare new members omp_allocated and omp_allocated_end. * openmp.cc (gfc_match_omp_allocate): Set new_st.resolved_sym to NULL. (prepare_omp_allocated_var_list_for_cleanup): New function. (gfc_resolve_omp_allocate): Call it. * trans-decl.cc (gfc_trans_deferred_vars): Process omp_allocated. * trans-openmp.cc (gfc_trans_omp_allocate): Set kind for the stmt generated for allocate directive. gcc/ChangeLog: * tree-core.h (struct tree_base): Add comments. * tree-pretty-print.cc (dump_generic_node): Handle allocate directive kind. * tree.h (OMP_ALLOCATE_KIND_ALLOCATE): New define. (OMP_ALLOCATE_KIND_FREE): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-6.f90: Test kind of allocate directive. --- gcc/fortran/gfortran.h| 1 + gcc/fortran/openmp.cc | 30 +++ gcc/fortran/trans-decl.cc | 20 + gcc/fortran/trans-openmp.cc | 6 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 3 +- gcc/tree-core.h | 6 gcc/tree-pretty-print.cc | 4 +++ gcc/tree.h| 4 +++ 8 files changed, 73 insertions(+), 1 deletion(-) diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 755469185a6..c6f58341cf3 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -1829,6 +1829,7 @@ typedef struct gfc_symbol gfc_array_spec *as; struct gfc_symbol *result; /* function result symbol */ gfc_component *components; /* Derived type components */ + gfc_omp_namelist *omp_allocated, *omp_allocated_end; /* Defined only for Cray pointees; points to their pointer. */ struct gfc_symbol *cp_pointer; diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index 38003890bb0..4c94bc763b5 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -6057,6 +6057,7 @@ gfc_match_omp_allocate (void) new_st.op = EXEC_OMP_ALLOCATE; new_st.ext.omp_clauses = c; + new_st.resolved_sym = NULL; gfc_free_expr (allocator); return MATCH_YES; } @@ -9548,6 +9549,34 @@ gfc_resolve_oacc_routines (gfc_namespace *ns) } } +static void +prepare_omp_allocated_var_list_for_cleanup (gfc_omp_namelist *cn, locus loc) +{ + gfc_symbol *proc = cn->sym->ns->proc_name; + gfc_omp_namelist *p, *n; + + for (n = cn; n; n = n->next) +{ + if (n->sym->attr.allocatable && !n->sym->attr.save + && !n->sym->attr.result && !proc->attr.is_main_program) + { + p = gfc_get_omp_namelist (); + p->sym = n->sym; + p->expr = gfc_copy_expr (n->expr); + p->where = loc; + p->next = NULL; + if (proc->omp_allocated == NULL) + proc->omp_allocated_end = proc->omp_allocated = p; + else + { + proc->omp_allocated_end->next = p; + proc->omp_allocated_end = p; + } + + } +} +} + static void check_allocate_directive_restrictions (gfc_symbol *sym, gfc_expr *omp_al, gfc_namespace *ns, locus loc) @@ -9678,6 +9707,7 @@ gfc_resolve_omp_allocate (gfc_code *code, gfc_namespace *ns) code->loc); } } + prepare_omp_allocated_var_list_for_cleanup (cn, code->loc); } diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc index 6493cc2f6b1..326365f22fc 100644 --- a/gcc/fortran/trans-decl.cc +++ b/gcc/fortran/trans-decl.cc @@ -4588,6 +4588,26 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block) } } + /* Generate a dummy allocate pragma with free kind so that cleanup + of those variables which were allocated using the allocate statement + associated with an allocate clause happens correctly. */ + + if (proc_sym->omp_allocated) +{ + gfc_clear_new_st (); + new_st.op = EXEC_OMP_ALLOCATE; + gfc_omp_clauses *c = gfc_get_omp_clauses (); + c->lists[OMP_LIST_ALLOCATOR] = proc_sym->omp_allocated; + new_st.ext.omp_clauses = c; + /* This is just a hacky way to convey to handler that we are + dealing with cleanup here. Saves us from using another field + for it. */ + new_st.resolved_sym = proc_sym->omp_allocated->sym; + gfc_add_init_cleanup (block, NULL, + gfc_trans_omp_directive (&new_st)); + gfc_free_omp_clauses (c); + proc_sym->omp_allocated = NULL; +} /* Initialize the INTENT(OUT) derived type dummy argu
[PATCH 07/17] openmp: allow requires unified_shared_memory
This is the front-end portion of the Unified Shared Memory implementation. It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets flag_offload_memory, but is otherwise inactive, for now. It also checks that -foffload-memory isn't set to an incompatible mode. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_requires): Allow "requires unified_share_memory" and "unified_address". gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_requires): Allow "requires unified_share_memory" and "unified_address". gcc/fortran/ChangeLog: * openmp.cc (gfc_match_omp_requires): Allow "requires unified_share_memory" and "unified_address". gcc/testsuite/ChangeLog: * c-c++-common/gomp/usm-1.c: New test. * c-c++-common/gomp/usm-4.c: New test. * gfortran.dg/gomp/usm-1.f90: New test. * gfortran.dg/gomp/usm-4.f90: New test. --- gcc/c/c-parser.cc| 22 -- gcc/cp/parser.cc | 22 -- gcc/fortran/openmp.cc| 13 + gcc/testsuite/c-c++-common/gomp/usm-1.c | 4 gcc/testsuite/c-c++-common/gomp/usm-4.c | 4 gcc/testsuite/gfortran.dg/gomp/usm-1.f90 | 6 ++ gcc/testsuite/gfortran.dg/gomp/usm-4.f90 | 6 ++ 7 files changed, 73 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-4.c create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-4.f90 diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 9c02141e2c6..c30f67cd2da 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -22726,9 +22726,27 @@ c_parser_omp_requires (c_parser *parser) enum omp_requires this_req = (enum omp_requires) 0; if (!strcmp (p, "unified_address")) - this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + { + this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "% is incompatible with the " + "selected %<-foffload-memory%> option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "unified_shared_memory")) - this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + { + this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "% is incompatible with the " + "selected %<-foffload-memory%> option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "dynamic_allocators")) this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS; else if (!strcmp (p, "reverse_offload")) diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index df657a3fb2b..3deafc7c928 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -46860,9 +46860,27 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok) enum omp_requires this_req = (enum omp_requires) 0; if (!strcmp (p, "unified_address")) - this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + { + this_req = OMP_REQUIRES_UNIFIED_ADDRESS; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "% is incompatible with the " + "selected %<-foffload-memory%> option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "unified_shared_memory")) - this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + { + this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + error_at (cloc, + "% is incompatible with the " + "selected %<-foffload-memory%> option"); + flag_offload_memory = OFFLOAD_MEMORY_UNIFIED; + } else if (!strcmp (p, "dynamic_allocators")) this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS; else if (!strcmp (p, "reverse_offload")) diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc index bd4ff259fe0..91bf8a3c50d 100644 --- a/gcc/fortran/openmp.cc +++ b/gcc/fortran/openmp.cc @@ -29,6 +29,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic.h" #include "gomp-constants.h" #include "target-memory.h" /* For gfc_encode_character. */ +#include "options.h" /* Match an end of OpenMP directive. End of OpenMP directive is optional whitespace, followed by '\n' or comment '!'. */ @@ -5556,6 +5557,12 @@ gfc_match_omp_requires (void) requires_clause = OMP_REQ_UNIFIED_ADDRESS; if (requires_clauses & OMP_REQ_UNIFIED_ADDRESS) goto duplicate_clause; + + if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED + && flag_offload_memory != OFFLOAD_MEMORY_NONE) + gfc_error_now ("unified_address
[PATCH 11/17] Translate allocate directive (OpenMP 5.0).
gcc/fortran/ChangeLog: * trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR. (gfc_trans_omp_allocate): New function. (gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE. gcc/ChangeLog: * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_ALLOCATOR. (dump_generic_node): Handle OMP_ALLOCATE. * tree.def (OMP_ALLOCATE): New. * tree.h (OMP_ALLOCATE_CLAUSES): Likewise. (OMP_ALLOCATE_DECL): Likewise. (OMP_ALLOCATE_ALLOCATOR): Likewise. * tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_ALLOCATOR. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-6.f90: New test. --- gcc/fortran/trans-openmp.cc | 44 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 72 +++ gcc/tree-core.h | 3 + gcc/tree-pretty-print.cc | 19 + gcc/tree.cc | 1 + gcc/tree.def | 4 ++ gcc/tree.h| 11 +++ 7 files changed, 154 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index de27ed52c02..3ee63e416ed 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -2728,6 +2728,28 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, } } break; + case OMP_LIST_ALLOCATOR: + for (; n != NULL; n = n->next) + if (n->sym->attr.referenced) + { + tree t = gfc_trans_omp_variable (n->sym, false); + if (t != error_mark_node) + { + tree node = build_omp_clause (input_location, + OMP_CLAUSE_ALLOCATOR); + OMP_ALLOCATE_DECL (node) = t; + if (n->expr) + { + tree allocator_; + gfc_init_se (&se, NULL); + gfc_conv_expr (&se, n->expr); + allocator_ = gfc_evaluate_now (se.expr, block); + OMP_ALLOCATE_ALLOCATOR (node) = allocator_; + } + omp_clauses = gfc_trans_add_clause (node, omp_clauses); + } + } + break; case OMP_LIST_LINEAR: { gfc_expr *last_step_expr = NULL; @@ -4982,6 +5004,26 @@ gfc_trans_omp_atomic (gfc_code *code) return gfc_finish_block (&block); } +static tree +gfc_trans_omp_allocate (gfc_code *code) +{ + stmtblock_t block; + tree stmt; + + gfc_omp_clauses *clauses = code->ext.omp_clauses; + gcc_assert (clauses); + + gfc_start_block (&block); + stmt = make_node (OMP_ALLOCATE); + TREE_TYPE (stmt) = void_type_node; + OMP_ALLOCATE_CLAUSES (stmt) = gfc_trans_omp_clauses (&block, clauses, + code->loc, false, + true); + gfc_add_expr_to_block (&block, stmt); + gfc_merge_block_scope (&block); + return gfc_finish_block (&block); +} + static tree gfc_trans_omp_barrier (void) { @@ -7488,6 +7530,8 @@ gfc_trans_omp_directive (gfc_code *code) { switch (code->op) { +case EXEC_OMP_ALLOCATE: + return gfc_trans_omp_allocate (code); case EXEC_OMP_ATOMIC: return gfc_trans_omp_atomic (code); case EXEC_OMP_BARRIER: diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 new file mode 100644 index 000..2de2b52ee44 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 @@ -0,0 +1,72 @@ +! { dg-do compile } +! { dg-additional-options "-fdump-tree-original" } + +module omp_lib_kinds + use iso_c_binding, only: c_int, c_intptr_t + implicit none + private :: c_int, c_intptr_t + integer, parameter :: omp_allocator_handle_kind = c_intptr_t + + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_null_allocator = 0 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_default_mem_alloc = 1 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_large_cap_mem_alloc = 2 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_const_mem_alloc = 3 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_high_bw_mem_alloc = 4 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_low_lat_mem_alloc = 5 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_cgroup_mem_alloc = 6 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_pteam_mem_alloc = 7 + integer (kind=omp_allocator_handle_kind), & + parameter :: omp_thread_mem_alloc = 8 +end module + + +subroutine foo(x, y, al) + use omp_lib_kinds + implicit none + +type :: my_type + integer :: i + integer :: j + real :: x +end type + + integer :: x + integer :: y + integer (kind=omp_allocator_handle_kind) :: al + + integer, allocatable :: var1 + integer, allocatable :: var2 + real, allocatable :: var3(:,:) + type (my_type), allocatable :: var4 + integer, pointer :: pii, parr(:) + + character, allocatable :: str1a, str1aarr(:) + character(len=5), allocatable :: str5a, str5aarr(:) + + !$
[PATCH 14/17] Lower allocate directive (OpenMP 5.0).
This patch looks for malloc/free calls that were generated by allocate statement that is associated with allocate directive and replaces them with GOMP_alloc and GOMP_free. gcc/ChangeLog: * omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR. (scan_omp_allocate): New. (scan_omp_1_stmt): Call it. (lower_omp_allocate): New function. (lower_omp_1): Call it. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-6.f90: Add tests. * gfortran.dg/gomp/allocate-7.f90: New test. * gfortran.dg/gomp/allocate-8.f90: New test. libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-2.f90: New test. --- gcc/omp-low.cc| 139 ++ gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 9 ++ gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 | 13 ++ gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 | 15 ++ .../testsuite/libgomp.fortran/allocate-2.f90 | 48 ++ 5 files changed, 224 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-2.f90 diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index cdadd6f0c96..7d1a2a0d795 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -1746,6 +1746,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_FINALIZE: case OMP_CLAUSE_TASK_REDUCTION: case OMP_CLAUSE_ALLOCATE: + case OMP_CLAUSE_ALLOCATOR: break; case OMP_CLAUSE_ALIGNED: @@ -1963,6 +1964,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_FINALIZE: case OMP_CLAUSE_FILTER: case OMP_CLAUSE__CONDTEMP_: + case OMP_CLAUSE_ALLOCATOR: break; case OMP_CLAUSE__CACHE_: @@ -3033,6 +3035,16 @@ scan_omp_simd_scan (gimple_stmt_iterator *gsi, gomp_for *stmt, maybe_lookup_ctx (new_stmt)->for_simd_scan_phase = true; } +/* Scan an OpenMP allocate directive. */ + +static void +scan_omp_allocate (gomp_allocate *stmt, omp_context *outer_ctx) +{ + omp_context *ctx; + ctx = new_omp_context (stmt, outer_ctx); + scan_sharing_clauses (gimple_omp_allocate_clauses (stmt), ctx); +} + /* Scan an OpenMP sections directive. */ static void @@ -4332,6 +4344,9 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p, insert_decl_map (&ctx->cb, var, var); } break; +case GIMPLE_OMP_ALLOCATE: + scan_omp_allocate (as_a (stmt), ctx); + break; default: *handled_ops_p = false; break; @@ -8768,6 +8783,125 @@ lower_omp_single_simple (gomp_single *single_stmt, gimple_seq *pre_p) gimple_seq_add_stmt (pre_p, gimple_build_label (flabel)); } +static void +lower_omp_allocate (gimple_stmt_iterator *gsi_p, omp_context *ctx) +{ + gomp_allocate *st = as_a (gsi_stmt (*gsi_p)); + tree clauses = gimple_omp_allocate_clauses (st); + int kind = gimple_omp_allocate_kind (st); + gcc_assert (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE + || kind == GF_OMP_ALLOCATE_KIND_FREE); + + for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) +{ + if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_ALLOCATOR) + continue; + + bool allocate = (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE); + /* The allocate directives that appear in a target region must specify + an allocator clause unless a requires directive with the + dynamic_allocators clause is present in the same compilation unit. */ + if (OMP_ALLOCATE_ALLOCATOR (c) == NULL_TREE + && ((omp_requires_mask & OMP_REQUIRES_DYNAMIC_ALLOCATORS) == 0) + && omp_maybe_offloaded_ctx (ctx)) + error_at (OMP_CLAUSE_LOCATION (c), "% directive must" + " specify an allocator here"); + + tree var = OMP_ALLOCATE_DECL (c); + + gimple_stmt_iterator gsi = *gsi_p; + for (gsi_next (&gsi); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + if (gimple_code (stmt) != GIMPLE_CALL + || (allocate && gimple_call_fndecl (stmt) + != builtin_decl_explicit (BUILT_IN_MALLOC)) + || (!allocate && gimple_call_fndecl (stmt) + != builtin_decl_explicit (BUILT_IN_FREE))) + continue; + const gcall *gs = as_a (stmt); + tree allocator = OMP_ALLOCATE_ALLOCATOR (c) + ? OMP_ALLOCATE_ALLOCATOR (c) + : integer_zero_node; + if (allocate) + { + tree lhs = gimple_call_lhs (gs); + if (lhs && TREE_CODE (lhs) == SSA_NAME) + { + gimple_stmt_iterator gsi2 = gsi; + gsi_next (&gsi2); + gimple *assign = gsi_stmt (gsi2); + if (gimple_code (assign) == GIMPLE_ASSIGN) + { + lhs = gimple_assign_lhs (as_a (assign)); + if (lhs == NULL_TREE + || TREE_CODE (lhs) != COMPONENT_REF) + continue; + lhs = TREE_OPERAND (lhs, 0); + } + } + + if (lhs == var) + { + unsigned HOST_WIDE_INT ialign = 0; + tree align; + if (TYPE_P (var)) + ialign = TYPE_ALIGN_UNIT (var); + else + ialign = DECL_ALIGN_UNIT (var);
[PATCH 08/17] openmp: -foffload-memory=pinned
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mlockall to enable always-on memory pinning. It requires that the ulimit feature is set high enough to accommodate all the program's memory usage. In this mode the ompx_pinned_memory_alloc feature is disabled as it is not needed and may conflict. gcc/ChangeLog: * omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New. * omp-low.cc (omp_enable_pinned_mode): New function. (execute_lower_omp): Call omp_enable_pinned_mode. libgomp/ChangeLog: * config/linux/allocator.c (always_pinned_mode): New variable. (GOMP_enable_pinned_mode): New function. (linux_memspace_alloc): Disable pinning when always_pinned_mode set. (linux_memspace_calloc): Likewise. (linux_memspace_free): Likewise. (linux_memspace_realloc): Likewise. * libgomp.map: Add GOMP_enable_pinned_mode. * testsuite/libgomp.c/alloc-pinned-7.c: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/alloc-pinned-1.c: New test. --- gcc/omp-builtins.def | 3 + gcc/omp-low.cc| 66 +++ .../c-c++-common/gomp/alloc-pinned-1.c| 28 libgomp/config/linux/allocator.c | 26 libgomp/libgomp.map | 1 + libgomp/target.c | 4 +- libgomp/testsuite/libgomp.c/alloc-pinned-7.c | 63 ++ 7 files changed, 190 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def index ee5213eedcf..276dd7812f2 100644 --- a/gcc/omp-builtins.def +++ b/gcc/omp-builtins.def @@ -470,3 +470,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning", BT_FN_VOID_CONST_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error", BT_FN_VOID_CONST_PTR_SIZE, ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST) +DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE, + "GOMP_enable_pinned_mode", + BT_FN_VOID, ATTR_NOTHROW_LIST) diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d73c165f029..ba612e5c67d 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx) input_location = saved_location; } +/* Emit a constructor function to enable -foffload-memory=pinned + at runtime. Libgomp handles the OS mode setting, but we need to trigger + it by calling GOMP_enable_pinned mode before the program proper runs. */ + +static void +omp_enable_pinned_mode () +{ + static bool visited = false; + if (visited) +return; + visited = true; + + /* Create a new function like this: + + static void __attribute__((constructor)) + __set_pinned_mode () + { + GOMP_enable_pinned_mode (); + } + */ + + tree name = get_identifier ("__set_pinned_mode"); + tree voidfntype = build_function_type_list (void_type_node, NULL_TREE); + tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype); + + TREE_STATIC (decl) = 1; + TREE_USED (decl) = 1; + DECL_ARTIFICIAL (decl) = 1; + DECL_IGNORED_P (decl) = 0; + TREE_PUBLIC (decl) = 0; + DECL_UNINLINABLE (decl) = 1; + DECL_EXTERNAL (decl) = 0; + DECL_CONTEXT (decl) = NULL_TREE; + DECL_INITIAL (decl) = make_node (BLOCK); + BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl; + DECL_STATIC_CONSTRUCTOR (decl) = 1; + DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"), + NULL_TREE, NULL_TREE); + + tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, + void_type_node); + DECL_ARTIFICIAL (t) = 1; + DECL_IGNORED_P (t) = 1; + DECL_CONTEXT (t) = decl; + DECL_RESULT (decl) = t; + + push_struct_function (decl); + init_tree_ssa (cfun); + + tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE); + gcall *call = gimple_build_call (calldecl, 0); + + gimple_seq seq = NULL; + gimple_seq_add_stmt (&seq, call); + gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL)); + + cfun->function_end_locus = UNKNOWN_LOCATION; + cfun->curr_properties |= PROP_gimple_any; + pop_cfun (); + cgraph_node::add_new_function (decl, true); +} + /* Main entry point. */ static unsigned int @@ -14676,6 +14738,10 @@ execute_lower_omp (void) for (auto task_stmt : task_cpyfns) finalize_task_copyfn (task_stmt); task_cpyfns.release (); + + if (flag_offload_memory == OFFLOAD_MEMORY_PINNED) +omp_enable_pinned_mode (); + return 0; } diff --git a/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c b/gcc/testsuite/c-c++-common/gomp/alloc-pi
[PATCH 13/17] Gimplify allocate directive (OpenMP 5.0).
gcc/ChangeLog: * doc/gimple.texi: Describe GIMPLE_OMP_ALLOCATE. * gimple-pretty-print.cc (dump_gimple_omp_allocate): New function. (pp_gimple_stmt_1): Call it. * gimple.cc (gimple_build_omp_allocate): New function. * gimple.def (GIMPLE_OMP_ALLOCATE): New node. * gimple.h (enum gf_mask): Add GF_OMP_ALLOCATE_KIND_MASK, GF_OMP_ALLOCATE_KIND_ALLOCATE and GF_OMP_ALLOCATE_KIND_FREE. (struct gomp_allocate): New. (is_a_helper ::test): New. (is_a_helper ::test): New. (gimple_build_omp_allocate): Declare. (gimple_omp_subcode): Replace GIMPLE_OMP_TEAMS with GIMPLE_OMP_ALLOCATE. (gimple_omp_allocate_set_clauses): New. (gimple_omp_allocate_set_kind): Likewise. (gimple_omp_allocate_clauses): Likewise. (gimple_omp_allocate_kind): Likewise. (CASE_GIMPLE_OMP): Add GIMPLE_OMP_ALLOCATE. * gimplify.cc (gimplify_omp_allocate): New. (gimplify_expr): Call it. * gsstruct.def (GSS_OMP_ALLOCATE): Define. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-6.f90: Add tests. --- gcc/doc/gimple.texi | 38 +++- gcc/gimple-pretty-print.cc| 37 gcc/gimple.cc | 12 gcc/gimple.def| 6 ++ gcc/gimple.h | 60 ++- gcc/gimplify.cc | 19 ++ gcc/gsstruct.def | 1 + gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 4 +- 8 files changed, 173 insertions(+), 4 deletions(-) diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index dd9149377f3..67b9061f3a7 100644 --- a/gcc/doc/gimple.texi +++ b/gcc/doc/gimple.texi @@ -420,6 +420,9 @@ kinds, along with their relationships to @code{GSS_} values (layouts) and + gomp_continue |layout: GSS_OMP_CONTINUE, code: GIMPLE_OMP_CONTINUE | + + gomp_allocate + |layout: GSS_OMP_ALLOCATE, code: GIMPLE_OMP_ALLOCATE + | + gomp_atomic_load |layout: GSS_OMP_ATOMIC_LOAD, code: GIMPLE_OMP_ATOMIC_LOAD | @@ -454,6 +457,7 @@ The following table briefly describes the GIMPLE instruction set. @item @code{GIMPLE_GOTO} @tab x @tab x @item @code{GIMPLE_LABEL} @tab x @tab x @item @code{GIMPLE_NOP} @tab x @tab x +@item @code{GIMPLE_OMP_ALLOCATE} @tab x @tab x @item @code{GIMPLE_OMP_ATOMIC_LOAD} @tab x @tab x @item @code{GIMPLE_OMP_ATOMIC_STORE} @tab x @tab x @item @code{GIMPLE_OMP_CONTINUE} @tab x @tab x @@ -1029,6 +1033,7 @@ Return a deep copy of statement @code{STMT}. * @code{GIMPLE_LABEL}:: * @code{GIMPLE_GOTO}:: * @code{GIMPLE_NOP}:: +* @code{GIMPLE_OMP_ALLOCATE}:: * @code{GIMPLE_OMP_ATOMIC_LOAD}:: * @code{GIMPLE_OMP_ATOMIC_STORE}:: * @code{GIMPLE_OMP_CONTINUE}:: @@ -1729,6 +1734,38 @@ Build a @code{GIMPLE_NOP} statement. Returns @code{TRUE} if statement @code{G} is a @code{GIMPLE_NOP}. @end deftypefn +@node @code{GIMPLE_OMP_ALLOCATE} +@subsection @code{GIMPLE_OMP_ALLOCATE} +@cindex @code{GIMPLE_OMP_ALLOCATE} + +@deftypefn {GIMPLE function} gomp_allocate *gimple_build_omp_allocate ( @ +tree clauses, int kind) +Build a @code{GIMPLE_OMP_ALLOCATE} statement. @code{CLAUSES} is the clauses +associated with this node. @code{KIND} is the enumeration value +@code{GF_OMP_ALLOCATE_KIND_ALLOCATE} if this directive allocates memory +or @code{GF_OMP_ALLOCATE_KIND_FREE} if it de-allocates. +@end deftypefn + +@deftypefn {GIMPLE function} void gimple_omp_allocate_set_clauses ( @ +gomp_allocate *g, tree clauses) +Set the @code{CLAUSES} for a @code{GIMPLE_OMP_ALLOCATE}. +@end deftypefn + +@deftypefn {GIMPLE function} tree gimple_omp_aallocate_clauses ( @ +const gomp_allocate *g) +Get the @code{CLAUSES} of a @code{GIMPLE_OMP_ALLOCATE}. +@end deftypefn + +@deftypefn {GIMPLE function} void gimple_omp_allocate_set_kind ( @ +gomp_allocate *g, int kind) +Set the @code{KIND} for a @code{GIMPLE_OMP_ALLOCATE}. +@end deftypefn + +@deftypefn {GIMPLE function} tree gimple_omp_allocate_kind ( @ +const gomp_atomic_load *g) +Get the @code{KIND} of a @code{GIMPLE_OMP_ALLOCATE}. +@end deftypefn + @node @code{GIMPLE_OMP_ATOMIC_LOAD} @subsection @code{GIMPLE_OMP_ATOMIC_LOAD} @cindex @code{GIMPLE_OMP_ATOMIC_LOAD} @@ -1760,7 +1797,6 @@ const gomp_atomic_load *g) Get the @code{RHS} of an atomic set. @end deftypefn - @node @code{GIMPLE_OMP_ATOMIC_STORE} @subsection @code{GIMPLE_OMP_ATOMIC_STORE} @cindex @code{GIMPLE_OMP_ATOMIC_STORE} diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc index ebd87b20a0a..bb961a900df 100644 --- a/gcc/gimple-pretty-print.cc +++ b/gcc/gimple-pretty-print.cc @@ -1967,6 +1967,38 @@ dump_gimple_omp_critical (pretty_printer *buffer, const gomp_critical *gs, } } +static void +dump_gimple_omp_allocate (pretty_printer *buffer, const gomp_allocate *gs, + int spc, dump_flags_t fl
[PATCH 10/17] Add parsing support for allocate directive (OpenMP 5.0)
Currently we only make use of this directive when it is associated with an allocate statement. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE. (show_code_node): Likewise. * gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE. (OMP_LIST_ALLOCATOR): New enum value. (enum gfc_exec_op): Add EXEC_OMP_ALLOCATE. * match.h (gfc_match_omp_allocate): New function. * openmp.cc (enum omp_mask1): Add OMP_CLAUSE_ALLOCATOR. (OMP_ALLOCATE_CLAUSES): New define. (gfc_match_omp_allocate): New function. (resolve_omp_clauses): Add ALLOCATOR in clause_names. (omp_code_to_statement): Handle EXEC_OMP_ALLOCATE. (EMPTY_VAR_LIST): New define. (check_allocate_directive_restrictions): New function. (gfc_resolve_omp_allocate): Likewise. (gfc_resolve_omp_directive): Handle EXEC_OMP_ALLOCATE. * parse.cc (decode_omp_directive): Handle ST_OMP_ALLOCATE. (next_statement): Likewise. (gfc_ascii_statement): Likewise. * resolve.cc (gfc_resolve_code): Handle EXEC_OMP_ALLOCATE. * st.cc (gfc_free_statement): Likewise. * trans.cc (trans_code): Likewise --- gcc/fortran/dump-parse-tree.cc| 3 + gcc/fortran/gfortran.h| 4 +- gcc/fortran/match.h | 1 + gcc/fortran/openmp.cc | 199 +- gcc/fortran/parse.cc | 10 +- gcc/fortran/resolve.cc| 1 + gcc/fortran/st.cc | 1 + gcc/fortran/trans.cc | 1 + gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 ++ gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 | 73 +++ 10 files changed, 400 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc index 5352008a63d..e0c6c0d9d96 100644 --- a/gcc/fortran/dump-parse-tree.cc +++ b/gcc/fortran/dump-parse-tree.cc @@ -2003,6 +2003,7 @@ show_omp_node (int level, gfc_code *c) case EXEC_OACC_CACHE: name = "CACHE"; is_oacc = true; break; case EXEC_OACC_ENTER_DATA: name = "ENTER DATA"; is_oacc = true; break; case EXEC_OACC_EXIT_DATA: name = "EXIT DATA"; is_oacc = true; break; +case EXEC_OMP_ALLOCATE: name = "ALLOCATE"; break; case EXEC_OMP_ATOMIC: name = "ATOMIC"; break; case EXEC_OMP_BARRIER: name = "BARRIER"; break; case EXEC_OMP_CANCEL: name = "CANCEL"; break; @@ -2204,6 +2205,7 @@ show_omp_node (int level, gfc_code *c) || c->op == EXEC_OMP_TARGET_UPDATE || c->op == EXEC_OMP_TARGET_ENTER_DATA || c->op == EXEC_OMP_TARGET_EXIT_DATA || c->op == EXEC_OMP_SCAN || c->op == EXEC_OMP_DEPOBJ || c->op == EXEC_OMP_ERROR + || c->op == EXEC_OMP_ALLOCATE || (c->op == EXEC_OMP_ORDERED && c->block == NULL)) return; if (c->op == EXEC_OMP_SECTIONS || c->op == EXEC_OMP_PARALLEL_SECTIONS) @@ -3329,6 +3331,7 @@ show_code_node (int level, gfc_code *c) case EXEC_OACC_CACHE: case EXEC_OACC_ENTER_DATA: case EXEC_OACC_EXIT_DATA: +case EXEC_OMP_ALLOCATE: case EXEC_OMP_ATOMIC: case EXEC_OMP_CANCEL: case EXEC_OMP_CANCELLATION_POINT: diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 696aadd7db6..755469185a6 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -259,7 +259,7 @@ enum gfc_statement ST_OACC_CACHE, ST_OACC_KERNELS_LOOP, ST_OACC_END_KERNELS_LOOP, ST_OACC_SERIAL_LOOP, ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL, ST_OACC_END_SERIAL, ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA, ST_OACC_ROUTINE, - ST_OACC_ATOMIC, ST_OACC_END_ATOMIC, + ST_OACC_ATOMIC, ST_OACC_END_ATOMIC, ST_OMP_ALLOCATE, ST_OMP_ATOMIC, ST_OMP_BARRIER, ST_OMP_CRITICAL, ST_OMP_END_ATOMIC, ST_OMP_END_CRITICAL, ST_OMP_END_DO, ST_OMP_END_MASTER, ST_OMP_END_ORDERED, ST_OMP_END_PARALLEL, ST_OMP_END_PARALLEL_DO, ST_OMP_END_PARALLEL_SECTIONS, @@ -1398,6 +1398,7 @@ enum OMP_LIST_USE_DEVICE_ADDR, OMP_LIST_NONTEMPORAL, OMP_LIST_ALLOCATE, + OMP_LIST_ALLOCATOR, OMP_LIST_HAS_DEVICE_ADDR, OMP_LIST_ENTER, OMP_LIST_NUM /* Must be the last. */ @@ -2908,6 +2909,7 @@ enum gfc_exec_op EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP, EXEC_OACC_UPDATE, EXEC_OACC_WAIT, EXEC_OACC_CACHE, EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA, EXEC_OACC_ATOMIC, EXEC_OACC_DECLARE, + EXEC_OMP_ALLOCATE, EXEC_OMP_CRITICAL, EXEC_OMP_DO, EXEC_OMP_FLUSH, EXEC_OMP_MASTER, EXEC_OMP_ORDERED, EXEC_OMP_PARALLEL, EXEC_OMP_PARALLEL_DO, EXEC_OMP_PARALLEL_SECTIONS, EXEC_OMP_PARALLEL_WORKSHARE, diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h index 495c93e0b5c..fe43d4b3fd3 100644 --- a/gcc/fortran/match.h +++ b/gcc/fortran/match.h @@ -149,6 +149,7 @@ match gfc_match_oacc_routine (
[PATCH 17/17] amdgcn: libgomp plugin USM implementation
Implement the Unified Shared Memory API calls in the GCN plugin. The allocate and free are pretty straight-forward because all "target" memory allocations are compatible with USM, on the right hardware. However, there's no known way to check what memory region was used, after the fact, so we use a splay tree to record allocations so we can answer "is_usm_ptr" later. libgomp/ChangeLog: * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Allow GOMP_REQUIRES_UNIFIED_ADDRESS and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY. (struct usm_splay_tree_key_s): New. (usm_splay_compare): New. (splay_tree_prefix): New. (GOMP_OFFLOAD_usm_alloc): New. (GOMP_OFFLOAD_usm_free): New. (GOMP_OFFLOAD_is_usm_ptr): New. (GOMP_OFFLOAD_supported_features): Move into the OpenMP API fold. Add GOMP_REQUIRES_UNIFIED_ADDRESS and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY. (gomp_fatal): New. (splay_tree_c): New. * testsuite/lib/libgomp.exp (check_effective_target_omp_usm): New. * testsuite/libgomp.c++/usm-1.C: Use dg-require-effective-target. * testsuite/libgomp.c-c++-common/requires-1.c: Likewise. * testsuite/libgomp.c/usm-1.c: Likewise. * testsuite/libgomp.c/usm-2.c: Likewise. * testsuite/libgomp.c/usm-3.c: Likewise. * testsuite/libgomp.c/usm-4.c: Likewise. * testsuite/libgomp.c/usm-5.c: Likewise. * testsuite/libgomp.c/usm-6.c: Likewise. --- libgomp/plugin/plugin-gcn.c | 104 +- libgomp/testsuite/lib/libgomp.exp | 22 libgomp/testsuite/libgomp.c++/usm-1.C | 2 +- .../libgomp.c-c++-common/requires-1.c | 1 + libgomp/testsuite/libgomp.c/usm-1.c | 1 + libgomp/testsuite/libgomp.c/usm-2.c | 1 + libgomp/testsuite/libgomp.c/usm-3.c | 1 + libgomp/testsuite/libgomp.c/usm-4.c | 1 + libgomp/testsuite/libgomp.c/usm-5.c | 2 +- libgomp/testsuite/libgomp.c/usm-6.c | 2 +- 10 files changed, 133 insertions(+), 4 deletions(-) diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index ea327bf2ca0..6a9ff5cd93e 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3226,7 +3226,11 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask) if (!init_hsa_context ()) return 0; /* Return -1 if no omp_requires_mask cannot be fulfilled but - devices were present. */ + devices were present. + Note: not all devices support USM, but the compiler refuses to create + binaries for those that don't anyway. */ + omp_requires_mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS + | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY); if (hsa_context.agent_count > 0 && omp_requires_mask != 0) return -1; return hsa_context.agent_count; @@ -3810,6 +3814,89 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars, GOMP_PLUGIN_target_task_completion, async_data); } +/* Use a splay tree to track USM allocations. */ + +typedef struct usm_splay_tree_node_s *usm_splay_tree_node; +typedef struct usm_splay_tree_s *usm_splay_tree; +typedef struct usm_splay_tree_key_s *usm_splay_tree_key; + +struct usm_splay_tree_key_s { + void *addr; + size_t size; +}; + +static inline int +usm_splay_compare (usm_splay_tree_key x, usm_splay_tree_key y) +{ + if ((x->addr <= y->addr && x->addr + x->size > y->addr) + || (y->addr <= x->addr && y->addr + y->size > x->addr)) +return 0; + + return (x->addr > y->addr ? 1 : -1); +} + +#define splay_tree_prefix usm +#include "../splay-tree.h" + +static struct usm_splay_tree_s usm_map = { NULL }; + +/* Allocate memory suitable for Unified Shared Memory. + + In fact, AMD memory need only be "coarse grained", which target + allocations already are. We do need to track allocations so that + GOMP_OFFLOAD_is_usm_ptr can look them up. */ + +void * +GOMP_OFFLOAD_usm_alloc (int device, size_t size) +{ + void *ptr = GOMP_OFFLOAD_alloc (device, size); + + usm_splay_tree_node node = malloc (sizeof (struct usm_splay_tree_node_s)); + node->key.addr = ptr; + node->key.size = size; + node->left = NULL; + node->right = NULL; + usm_splay_tree_insert (&usm_map, node); + + return ptr; +} + +/* Free memory allocated via GOMP_OFFLOAD_usm_alloc. */ + +bool +GOMP_OFFLOAD_usm_free (int device, void *ptr) +{ + struct usm_splay_tree_key_s key = { ptr, 1 }; + usm_splay_tree_key node = usm_splay_tree_lookup (&usm_map, &key); + if (node) +{ + usm_splay_tree_remove (&usm_map, &key); + free (node); +} + + return GOMP_OFFLOAD_free (device, ptr); +} + +/* True if the memory was allocated via GOMP_OFFLOAD_usm_alloc. */ + +bool +GOMP_OFFLOAD_is_usm_ptr (void *ptr) +{ + struct usm_splay_tree_key_s key = { ptr, 1 }; + return usm_splay_tree_lookup (&usm_map, &key); +} + +/* Indicate which GOMP_REQUIRES_* features are supported. */ + +bool +GO
[PATCH 15/17] amdgcn: Support XNACK mode
The XNACK feature allows memory load instructions to restart safely following a page-miss interrupt. This is useful for shared-memory devices, like APUs, and to implement OpenMP Unified Shared Memory. To support the feature we must be able to set the appropriate meta-data and set the load instructions to early-clobber. When the port supports scheduling of s_waitcnt instructions there will be further requirements. gcc/ChangeLog: * config/gcn/gcn-hsa.h (XNACKOPT): New macro. (ASM_SPEC): Use XNACKOPT. * config/gcn/gcn-opts.h (enum sram_ecc_type): Rename to ... (enum hsaco_attr_type): ... this, and generalize the names. (TARGET_XNACK): New macro. * config/gcn/gcn-valu.md (gather_insn_1offset): Add xnack compatible alternatives. (gather_insn_2offsets): Likewise. * config/gcn/gcn.c (gcn_option_override): Permit -mxnack for devices other than Fiji. (gcn_expand_epilogue): Remove early-clobber problems. (output_file_start): Emit xnack attributes. (gcn_hsa_declare_function_name): Obey -mxnack setting. * config/gcn/gcn.md (xnack): New attribute. (enabled): Rework to include "xnack" attribute. (*movbi): Add xnack compatible alternatives. (*mov_insn): Likewise. (*mov_insn): Likewise. (*mov_insn): Likewise. (*movti_insn): Likewise. * config/gcn/gcn.opt (-mxnack): Add the "on/off/any" syntax. (sram_ecc_type): Rename to ... (hsaco_attr_type: ... this.) * config/gcn/mkoffload.c (SET_XNACK_ANY): New macro. (TEST_XNACK): Delete. (TEST_XNACK_ANY): New macro. (TEST_XNACK_ON): New macro. (main): Support the new -mxnack=on/off/any syntax. --- gcc/config/gcn/gcn-hsa.h| 3 +- gcc/config/gcn/gcn-opts.h | 10 ++-- gcc/config/gcn/gcn-valu.md | 29 - gcc/config/gcn/gcn.cc | 34 ++- gcc/config/gcn/gcn.md | 113 +++- gcc/config/gcn/gcn.opt | 18 +++--- gcc/config/gcn/mkoffload.cc | 19 -- 7 files changed, 140 insertions(+), 86 deletions(-) diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h index b3079cebb43..fd08947574f 100644 --- a/gcc/config/gcn/gcn-hsa.h +++ b/gcc/config/gcn/gcn-hsa.h @@ -81,12 +81,13 @@ extern unsigned int gcn_local_sym_hash (const char *name); /* In HSACOv4 no attribute setting means the binary supports "any" hardware configuration. The name of the attribute also changed. */ #define SRAMOPT "msram-ecc=on:-mattr=+sramecc;msram-ecc=off:-mattr=-sramecc" +#define XNACKOPT "mxnack=on:-mattr=+xnack;mxnack=off:-mattr=-xnack" /* Use LLVM assembler and linker options. */ #define ASM_SPEC "-triple=amdgcn--amdhsa " \ "%:last_arg(%{march=*:-mcpu=%*}) " \ "%{!march=*|march=fiji:--amdhsa-code-object-version=3} " \ - "%{" NO_XNACK "mxnack:-mattr=+xnack;:-mattr=-xnack} " \ + "%{" NO_XNACK XNACKOPT "}" \ "%{" NO_SRAM_ECC SRAMOPT "} " \ "-filetype=obj" #define LINK_SPEC "--pie --export-dynamic" diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h index b62dfb45f59..07ddc79cda3 100644 --- a/gcc/config/gcn/gcn-opts.h +++ b/gcc/config/gcn/gcn-opts.h @@ -48,11 +48,13 @@ extern enum gcn_isa { #define TARGET_M0_LDS_LIMIT (TARGET_GCN3) #define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS) -enum sram_ecc_type +#define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF) + +enum hsaco_attr_type { - SRAM_ECC_OFF, - SRAM_ECC_ON, - SRAM_ECC_ANY + HSACO_ATTR_OFF, + HSACO_ATTR_ON, + HSACO_ATTR_ANY }; #endif diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index abe46201344..ec114db9dd1 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -741,13 +741,13 @@ (define_expand "gather_expr" {}) (define_insn "gather_insn_1offset" - [(set (match_operand:V_ALL 0 "register_operand" "=v") + [(set (match_operand:V_ALL 0 "register_operand" "=v,&v") (unspec:V_ALL - [(plus: (match_operand: 1 "register_operand" " v") + [(plus: (match_operand: 1 "register_operand" " v, v") (vec_duplicate: - (match_operand 2 "immediate_operand" " n"))) - (match_operand 3 "immediate_operand" " n") - (match_operand 4 "immediate_operand" " n") + (match_operand 2 "immediate_operand" " n, n"))) + (match_operand 3 "immediate_operand" " n, n") + (match_operand 4 "immediate_operand" " n, n") (mem:BLK (scratch))] UNSPEC_GATHER))] "(AS_FLAT_P (INTVAL (operands[3])) @@ -777,7 +777,8 @@ (define_insn "gather_insn_1offset" return buf; } [(set_attr "type" "flat") - (set_attr "length" "12")]) + (set_attr "length" "12") + (set_attr "xnack" "off,on")]) (define_insn "gather_insn_1offset_ds" [(set (match_operand:V_ALL 0 "register_operand" "=v") @@ -802,17 +803,18 @@ (define_insn "gather_insn_1offset_ds" (set_attr "length" "12")]) (define_insn "gather_insn_2o
[PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK
The AMD GCN runtime must be set to the correct mode for Unified Shared Memory to work, but this is not always clear at compile and link time due to the split nature of the offload compilation pipeline. This patch sets a new attribute on OpenMP offload functions to ensure that the information is passed all the way to the backend. The backend then places a marker in the assembler code for mkoffload to find. Finally mkoffload places a constructor function into the final program to ensure that the HSA_XNACK environment variable passes the correct mode to the GPU. The HSA_XNACK variable must be set before the HSA runtime is even loaded, so it makes more sense to have this set within the constructor than at some point later within libgomp or the GCN plugin. gcc/ChangeLog: * config/gcn/gcn.c (unified_shared_memory_enabled): New variable. (gcn_init_cumulative_args): Handle attribute "omp unified memory". (gcn_hsa_declare_function_name): Emit "MKOFFLOAD OPTIONS: USM+". * config/gcn/mkoffload.c (TEST_XNACK_OFF): New macro. (process_asm): Detect "MKOFFLOAD OPTIONS: USM+". Emit configure_xnack constructor, as required. * omp-low.c (create_omp_child_function): Add attribute "omp unified memory". --- gcc/config/gcn/gcn.cc | 28 +++- gcc/config/gcn/mkoffload.cc | 37 - gcc/omp-low.cc | 4 3 files changed, 67 insertions(+), 2 deletions(-) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 4df05453604..88cc505597e 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -68,6 +68,11 @@ static bool ext_gcn_constants_init = 0; enum gcn_isa gcn_isa = ISA_GCN3; /* Default to GCN3. */ +/* Record whether the host compiler added "omp unifed memory" attributes to + any functions. We can then pass this on to mkoffload to ensure xnack is + compatible there too. */ +static bool unified_shared_memory_enabled = false; + /* Reserve this much space for LDS (for propagating variables from worker-single mode to worker-partitioned mode), per workgroup. Global analysis could calculate an exact bound, but we don't do that yet. @@ -2542,6 +2547,25 @@ gcn_init_cumulative_args (CUMULATIVE_ARGS *cum /* Argument info to init */ , if (!caller && cfun->machine->normal_function) gcn_detect_incoming_pointer_arg (fndecl); + if (fndecl && lookup_attribute ("omp unified memory", + DECL_ATTRIBUTES (fndecl))) +{ + unified_shared_memory_enabled = true; + + switch (gcn_arch) + { + case PROCESSOR_FIJI: + case PROCESSOR_VEGA10: + case PROCESSOR_VEGA20: + error ("GPU architecture does not support Unified Shared Memory"); + default: + ; + } + + if (flag_xnack == HSACO_ATTR_OFF) + error ("Unified Shared Memory is enabled, but XNACK is disabled"); +} + reinit_regs (); } @@ -5458,12 +5482,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree) assemble_name (file, name); fputs (":\n", file); - /* This comment is read by mkoffload. */ + /* These comments are read by mkoffload. */ if (flag_openacc) fprintf (file, "\t;; OPENACC-DIMS: %d, %d, %d : %s\n", oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_GANG), oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_WORKER), oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_VECTOR), name); + if (unified_shared_memory_enabled) +fprintf (asm_out_file, "\t;; MKOFFLOAD OPTIONS: USM+\n"); } /* Implement TARGET_ASM_SELECT_SECTION. diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc index cb8903c27cb..5741d0a917b 100644 --- a/gcc/config/gcn/mkoffload.cc +++ b/gcc/config/gcn/mkoffload.cc @@ -80,6 +80,8 @@ == EF_AMDGPU_FEATURE_XNACK_ANY_V4) #define TEST_XNACK_ON(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \ == EF_AMDGPU_FEATURE_XNACK_ON_V4) +#define TEST_XNACK_OFF(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \ + == EF_AMDGPU_FEATURE_XNACK_OFF_V4) #define SET_SRAM_ECC_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \ | EF_AMDGPU_FEATURE_SRAMECC_ON_V4) @@ -474,6 +476,7 @@ static void process_asm (FILE *in, FILE *out, FILE *cfile) { int fn_count = 0, var_count = 0, dims_count = 0, regcount_count = 0; + bool unified_shared_memory_enabled = false; struct obstack fns_os, dims_os, regcounts_os; obstack_init (&fns_os); obstack_init (&dims_os); @@ -498,6 +501,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile) fn_count += 2; char buf[1000]; + char dummy; enum { IN_CODE, IN_METADATA, @@ -517,6 +521,9 @@ process_asm (FILE *in, FILE *out, FILE *cfile) dims_count++; } + if (sscanf (buf, " ;; MKOFFLOAD OPTIONS: USM+%c", &dummy) > 0) + unified_shared_memory_enabled = true; + break; } case IN_METADATA: @@ -565,7 +572,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile) } } - char dummy; if (sscanf (buf, " .section
Re: Enhance 'libgomp.c-c++-common/requires-4.c', 'libgomp.c-c++-common/requires-5.c' testing (was: [Patch][v4] OpenMP: Move omp requires checks to libgomp)
Hi Tobias! On 2022-07-07T11:36:34+0200, Tobias Burnus wrote: > On 07.07.22 10:42, Thomas Schwinge wrote: >> In preparation for other changes: > ... >> On 2022-06-29T16:33:02+0200, Tobias Burnus wrote: >>> +/* { dg-output "devices present but 'omp requires unified_address, >>> unified_shared_memory, reverse_offload' cannot be fulfilled" } */ >> (The latter diagnostic later got conditionalized by 'GOMP_DEBUG=1'.) >> OK to push the attached "Enhance 'libgomp.c-c++-common/requires-4.c', >> 'libgomp.c-c++-common/requires-5.c' testing"? > ... >> libgomp/ >> * testsuite/libgomp.c-c++-common/requires-4.c: Enhance testing. >> * testsuite/libgomp.c-c++-common/requires-5.c: Likewise. > ... >> --- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c >> +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c >> @@ -1,22 +1,29 @@ >> -/* { dg-do link { target offloading_enabled } } */ >> /* { dg-additional-options "-flto" } */ >> /* { dg-additional-sources requires-4-aux.c } */ >> >> -/* Check diagnostic by device-compiler's or host compiler's lto1. >> +/* Check no diagnostic by device-compiler's or host compiler's lto1. > > I note that without ENABLE_OFFLOADING that there is never any lto1 > diagnostic. > > However, given that no diagnostic is expected, it also works for "! > offloading_enabled". > > Thus, the change fine. ACK. >> Other file uses: 'requires reverse_offload', but that's inactive as >> there are no declare target directives, device constructs nor device >> routines */ >> >> +/* For actual offload execution, prints the following (only) if >> GOMP_DEBUG=1: >> + "devices present but 'omp requires unified_address, >> unified_shared_memory, reverse_offload' cannot be fulfilled" >> + and does host-fallback execution. */ > > The latter is only true when also device code is produced – and a device > is available for that/those device types. I think that's what you imply > by "For actual offload execution" ACK. > but it is a bit hidden. > > Maybe s/For actual offload execution, prints/It may print/ is clearer? I've settled on: /* Depending on offload device capabilities, it may print something like the following (only) if GOMP_DEBUG=1: "devices present but 'omp requires unified_address, unified_shared_memory, reverse_offload' cannot be fulfilled" and in that case does host-fallback execution. */ > In principle, it would be nice if we could test for the output, but > currently setting an env var for remote execution does not work, yet. > Cf. https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597773.html Right, I'm aware of that issue with remote testing, and that's why I didn't propose such output verification. (In a few other test cases, we do have 'dg-set-target-env-var GOMP_DEBUG "1"', which then at present are UNSUPPORTED for remote testing.) > When set, we could use offload_target_nvptx etc. (..._amdgcn, ..._any) > to test – as this guarantees that it is compiled for that device + the > device is available. Use 'target offload_device_nvptx', not 'target offload_target_nvptx', etc. ;-) >> + >> #pragma omp requires unified_address,unified_shared_memory >> >> -int a[10]; >> +int a[10] = { 0 }; >> extern void foo (void); >> >> int >> main (void) >> { >> - #pragma omp target >> + #pragma omp target map(to: a) > > Hmm, I wonder whether I like it or not. Without, there is an implicit > "map(tofrom:a)". On the other hand, OpenMP permits that – even with > unified-shared memory – the implementation my copy the data to the > device. (For instance, to permit faster access to "a".) > > Thus, ... > >> + for (int i = 0; i < 10; i++) >> +a[i] = i; >> + >> for (int i = 0; i < 10; i++) >> -a[i] = 0; >> +if (a[i] != i) >> + __builtin_abort (); > ... this condition (back on the host) could also fail with USM. However, > given that to my knowledge no USM implementation actually copies the > data, I believe it is fine. Right, this is meant to describe/test the current GCC master branch behavior, where USM isn't supported, so I didn't consider that. But I agree, a source code comment should be added: As no offload devices support USM at present, we may verify host-fallback execution by absence of separate memory spaces. */ > (Disclaimer: I have not checked what OG12, > but I guess it also does not copy it.) >> --- a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c >> +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c >> @@ -1,21 +1,25 @@ >> -/* { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } >> } */ >> /* { dg-additional-sources requires-5-aux.c } */ >> >> +/* For actual offload execution, prints the following (only) if >> GOMP_DEBUG=1: >> + "devices present but 'omp requires unified_address, >> unified_shared_memory, reverse_offload' cannot be fulfilled" >> + and does host-fallback execution. */ >> + > This wording is correct with the now-removed ch
[statistics.cc] ICE in get_function_name with fortran test-case
Hi, My recent commit to emit asm name with -fdump-statistics-asmname caused following ICE for attached fortran test case. during IPA pass: icf power.fppized.f90:6:26: 6 | END SUBROUTINE power_print | ^ internal compiler error: Segmentation fault 0xfddc13 crash_signal ../../gcc/gcc/toplev.cc:322 0x7f6f940de51f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 0xfc909d get_function_name ../../gcc/gcc/statistics.cc:124 0xfc929f statistics_fini_pass_2(statistics_counter**, void*) ../../gcc/gcc/statistics.cc:175 0xfc94a4 void hash_table::traverse_noresize(void*) ../../gcc/gcc/hash-table.h:1084 0xfc94a4 statistics_fini_pass() ../../gcc/gcc/statistics.cc:219 0xef12bc execute_todo ../../gcc/gcc/passes.cc:2142 This happens because fn was passed NULL in get_function_name. The patch adds a check to see if fn is NULL before checking for DECL_ASSEMBLER_NAME_SET_P, which fixes the issue. In case the fn is NULL, it calls function_name(NULL) as per old behavior, which returns "(nofn)". Bootstrap+tested on x86_64-linux-gnu. OK to commit ? Thanks, Prathamesh diff --git a/gcc/statistics.cc b/gcc/statistics.cc index 6c21415bf65..01ad353e3a9 100644 --- a/gcc/statistics.cc +++ b/gcc/statistics.cc @@ -121,7 +121,7 @@ static const char * get_function_name (struct function *fn) { if ((statistics_dump_flags & TDF_ASMNAME) - && DECL_ASSEMBLER_NAME_SET_P (fn->decl)) + && fn && DECL_ASSEMBLER_NAME_SET_P (fn->decl)) { tree asmname = decl_assembler_name (fn->decl); if (asmname) power.fppized.f90 Description: Binary data
Re: Fix Intel MIC 'mkoffload' for OpenMP 'requires' (was: [Patch] OpenMP: Move omp requires checks to libgomp)
Hi Tobias! On 2022-07-06T15:30:57+0200, Tobias Burnus wrote: > On 06.07.22 14:38, Thomas Schwinge wrote: >> :-) Haha, that's actually *exactly* what I had implemented first! But >> then I realized that 'target offloading_enabled' is doing exactly that: >> check that offloading compilation is configured -- not that "there is an >> offloading device available or not" as you seem to understand? Or am I >> confused there? > > I think as you mentioned below – there is a difference. Eh, thanks for un-confusing me on that aspect! There's a reason after all that 'offloading_enabled' lives in 'gcc/testsuite/lib/'... > And that difference, > I explicitly maked use of: [...] > Granted, as the other files do not use -foffload=..., it should not > make a difference - but, still, replacing it unconditionally > with 'target offloading_enabled' feels wrong. ACK! I've pushed to master branch commit 9ef714539cb7cc1cd746312fd5dcc987bf167471 "Fix Intel MIC 'mkoffload' for OpenMP 'requires'", see attached. Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 >From 9ef714539cb7cc1cd746312fd5dcc987bf167471 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Tue, 5 Jul 2022 12:21:33 +0200 Subject: [PATCH] Fix Intel MIC 'mkoffload' for OpenMP 'requires' Similar to how the other 'mkoffload's got changed in recent commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0 "OpenMP: Move omp requires checks to libgomp". This also means finally switching Intel MIC 'mkoffload' to 'GOMP_offload_register_ver', 'GOMP_offload_unregister_ver', making 'GOMP_offload_register', 'GOMP_offload_unregister' legacy entry points. gcc/ * config/i386/intelmic-mkoffload.cc (generate_host_descr_file) (prepare_target_image, main): Handle OpenMP 'requires'. (generate_host_descr_file): Switch to 'GOMP_offload_register_ver', 'GOMP_offload_unregister_ver'. libgomp/ * target.c (GOMP_offload_register, GOMP_offload_unregister): Denote as legacy entry points. * testsuite/lib/libgomp.exp (check_effective_target_offload_target_any): New proc. * testsuite/libgomp.c-c++-common/requires-1.c: Enable for 'offload_target_any'. * testsuite/libgomp.c-c++-common/requires-3.c: Likewise. * testsuite/libgomp.c-c++-common/requires-7.c: Likewise. * testsuite/libgomp.fortran/requires-1.f90: Likewise. --- gcc/config/i386/intelmic-mkoffload.cc | 56 +++ libgomp/target.c | 4 ++ libgomp/testsuite/lib/libgomp.exp | 5 ++ .../libgomp.c-c++-common/requires-1.c | 2 +- .../libgomp.c-c++-common/requires-3.c | 2 +- .../libgomp.c-c++-common/requires-7.c | 2 +- .../testsuite/libgomp.fortran/requires-1.f90 | 2 +- 7 files changed, 57 insertions(+), 16 deletions(-) diff --git a/gcc/config/i386/intelmic-mkoffload.cc b/gcc/config/i386/intelmic-mkoffload.cc index c683d6f473e..596f6f107b8 100644 --- a/gcc/config/i386/intelmic-mkoffload.cc +++ b/gcc/config/i386/intelmic-mkoffload.cc @@ -370,7 +370,7 @@ generate_target_offloadend_file (const char *target_compiler) /* Generates object file with the host side descriptor. */ static const char * -generate_host_descr_file (const char *host_compiler) +generate_host_descr_file (const char *host_compiler, uint32_t omp_requires) { char *dump_filename = concat (dumppfx, "_host_descr.c", NULL); const char *src_filename = save_temps @@ -386,39 +386,50 @@ generate_host_descr_file (const char *host_compiler) if (!src_file) fatal_error (input_location, "cannot open '%s'", src_filename); + fprintf (src_file, "#include \n\n"); + fprintf (src_file, "extern const void *const __OFFLOAD_TABLE__;\n" "extern const void *const __offload_image_intelmic_start;\n" "extern const void *const __offload_image_intelmic_end;\n\n" - "static const void *const __offload_target_data[] = {\n" + "static const struct intelmic_data {\n" + " uintptr_t omp_requires_mask;\n" + " const void *const image_start;\n" + " const void *const image_end;\n" + "} intelmic_data = {\n" + " %d,\n" " &__offload_image_intelmic_start, &__offload_image_intelmic_end\n" - "};\n\n"); + "};\n\n", omp_requires); fprintf (src_file, "#ifdef __cplusplus\n" "extern \"C\"\n" "#endif\n" - "void GOMP_offload_register (const void *, int, const void *);\n" + "void GOMP_offload_register_ver (unsigned, const void *, int, const void *);\n" "#ifdef __cplusplus\n" "extern \"C\"\n" "#endif\n" - "void GOMP_offload_unregister (const void *, int, const void *);\n\n" + "void GOMP_offload_unregister_ver (unsigned, const void *, int, const void *);\n\n" "__attribute__((constructor))\n" "static void\n" "init (void)\n" "{\n" - " GOMP_offload_regi
[PATCH] Speedup update-ssa some more
The following avoids copying an sbitmap and one traversal by avoiding to re-allocate old_ssa_names when not necessary. In addition this actually checks what the comment before PHI insert iterating promises, that the old_ssa_names set does not grow. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. * tree-into-ssa.cc (iterating_old_ssa_names): New. (add_new_name_mapping): Grow {new,old}_ssa_names separately and only when actually needed. Assert we are not growing the old_ssa_names set when iterating over it. (update_ssa): Remove old_ssa_names copying and empty_p query, note we are iterating over it and expect no set changes. --- gcc/tree-into-ssa.cc | 36 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc index c90651c3a89..9f45e62c6d0 100644 --- a/gcc/tree-into-ssa.cc +++ b/gcc/tree-into-ssa.cc @@ -587,6 +587,8 @@ add_to_repl_tbl (tree new_tree, tree old) bitmap_set_bit (*set, SSA_NAME_VERSION (old)); } +/* Debugging aid to fence old_ssa_names changes when iterating over it. */ +static bool iterating_old_ssa_names; /* Add a new mapping NEW_TREE -> OLD REPL_TBL. Every entry N_i in REPL_TBL represents the set of names O_1 ... O_j replaced by N_i. This is @@ -602,10 +604,15 @@ add_new_name_mapping (tree new_tree, tree old) /* We may need to grow NEW_SSA_NAMES and OLD_SSA_NAMES because our caller may have created new names since the set was created. */ - if (SBITMAP_SIZE (new_ssa_names) <= num_ssa_names - 1) + if (SBITMAP_SIZE (new_ssa_names) <= SSA_NAME_VERSION (new_tree)) { unsigned int new_sz = num_ssa_names + NAME_SETS_GROWTH_FACTOR; new_ssa_names = sbitmap_resize (new_ssa_names, new_sz, 0); +} + if (SBITMAP_SIZE (old_ssa_names) <= SSA_NAME_VERSION (old)) +{ + gcc_assert (!iterating_old_ssa_names); + unsigned int new_sz = num_ssa_names + NAME_SETS_GROWTH_FACTOR; old_ssa_names = sbitmap_resize (old_ssa_names, new_sz, 0); } @@ -619,8 +626,11 @@ add_new_name_mapping (tree new_tree, tree old) /* Register NEW_TREE and OLD in NEW_SSA_NAMES and OLD_SSA_NAMES, respectively. */ + if (iterating_old_ssa_names) +gcc_assert (bitmap_bit_p (old_ssa_names, SSA_NAME_VERSION (old))); + else +bitmap_set_bit (old_ssa_names, SSA_NAME_VERSION (old)); bitmap_set_bit (new_ssa_names, SSA_NAME_VERSION (new_tree)); - bitmap_set_bit (old_ssa_names, SSA_NAME_VERSION (old)); } @@ -3460,20 +3470,14 @@ update_ssa (unsigned update_flags) bitmap_initialize (&dfs[bb->index], &bitmap_default_obstack); compute_dominance_frontiers (dfs); - if (bitmap_first_set_bit (old_ssa_names) >= 0) - { - sbitmap_iterator sbi; - - /* insert_update_phi_nodes_for will call add_new_name_mapping -when inserting new PHI nodes, so the set OLD_SSA_NAMES -will grow while we are traversing it (but it will not -gain any new members). Copy OLD_SSA_NAMES to a temporary -for traversal. */ - auto_sbitmap tmp (SBITMAP_SIZE (old_ssa_names)); - bitmap_copy (tmp, old_ssa_names); - EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, sbi) - insert_updated_phi_nodes_for (ssa_name (i), dfs, update_flags); - } + /* insert_update_phi_nodes_for will call add_new_name_mapping +when inserting new PHI nodes, but it will not add any +new members to OLD_SSA_NAMES. */ + iterating_old_ssa_names = true; + sbitmap_iterator sbi; + EXECUTE_IF_SET_IN_BITMAP (old_ssa_names, 0, i, sbi) + insert_updated_phi_nodes_for (ssa_name (i), dfs, update_flags); + iterating_old_ssa_names = false; symbols_to_rename.qsort (insert_updated_phi_nodes_compare_uids); FOR_EACH_VEC_ELT (symbols_to_rename, i, sym) -- 2.35.3
[PATCH] lto-plugin: use locking only for selected targets
For now, support locking only for linux targets that are different from riscv* where the target depends on libatomic (and fails during bootstrap). Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin PR lto/106170 lto-plugin/ChangeLog: * configure.ac: Configure HAVE_PTHREAD_LOCKING. * lto-plugin.c (LOCK_SECTION): New. (UNLOCK_SECTION): New. (claim_file_handler): Use the newly added macros. (onload): Likewise. * config.h.in: Regenerate. * configure: Regenerate. --- lto-plugin/config.h.in | 4 ++-- lto-plugin/configure| 20 ++-- lto-plugin/configure.ac | 17 +++-- lto-plugin/lto-plugin.c | 30 -- 4 files changed, 51 insertions(+), 20 deletions(-) diff --git a/lto-plugin/config.h.in b/lto-plugin/config.h.in index 029e782f1ee..bf269f000d2 100644 --- a/lto-plugin/config.h.in +++ b/lto-plugin/config.h.in @@ -9,8 +9,8 @@ /* Define to 1 if you have the header file. */ #undef HAVE_MEMORY_H -/* Define to 1 if pthread.h is present. */ -#undef HAVE_PTHREAD_H +/* Define if the system-provided pthread locking mechanism. */ +#undef HAVE_PTHREAD_LOCKING /* Define to 1 if you have the header file. */ #undef HAVE_STDINT_H diff --git a/lto-plugin/configure b/lto-plugin/configure index aaa91a63623..7ea54e6008f 100755 --- a/lto-plugin/configure +++ b/lto-plugin/configure @@ -6011,14 +6011,22 @@ fi # Check for thread headers. -ac_fn_c_check_header_mongrel "$LINENO" "pthread.h" "ac_cv_header_pthread_h" "$ac_includes_default" -if test "x$ac_cv_header_pthread_h" = xyes; then : +use_locking=no -$as_echo "#define HAVE_PTHREAD_H 1" >>confdefs.h +case $target in + riscv*) +# do not use locking as pthread depends on libatomic +;; + *-linux*) +use_locking=yes +;; +esac -fi +if test x$use_locking = xyes; then +$as_echo "#define HAVE_PTHREAD_LOCKING 1" >>confdefs.h +fi case `pwd` in *\ * | *\*) @@ -12091,7 +12099,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 12094 "configure" +#line 12102 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -12197,7 +12205,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 12200 "configure" +#line 12208 "configure" #include "confdefs.h" #if HAVE_DLFCN_H diff --git a/lto-plugin/configure.ac b/lto-plugin/configure.ac index c2ec512880f..69bc5139193 100644 --- a/lto-plugin/configure.ac +++ b/lto-plugin/configure.ac @@ -88,8 +88,21 @@ AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_GNU, [test "x$lto_plugin_use_symver" = xgnu AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_SUN, [test "x$lto_plugin_use_symver" = xsun]) # Check for thread headers. -AC_CHECK_HEADER(pthread.h, - [AC_DEFINE(HAVE_PTHREAD_H, 1, [Define to 1 if pthread.h is present.])]) +use_locking=no + +case $target in + riscv*) +# do not use locking as pthread depends on libatomic +;; + *-linux*) +use_locking=yes +;; +esac + +if test x$use_locking = xyes; then + AC_DEFINE(HAVE_PTHREAD_LOCKING, 1, + [Define if the system-provided pthread locking mechanism.]) +fi AM_PROG_LIBTOOL ACX_LT_HOST_FLAGS diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c index 635e126946b..cba58f5999b 100644 --- a/lto-plugin/lto-plugin.c +++ b/lto-plugin/lto-plugin.c @@ -40,11 +40,7 @@ along with this program; see the file COPYING3. If not see #ifdef HAVE_CONFIG_H #include "config.h" -#if !HAVE_PTHREAD_H -#error POSIX threads are mandatory dependency #endif -#endif - #if HAVE_STDINT_H #include #endif @@ -59,7 +55,9 @@ along with this program; see the file COPYING3. If not see #include #include #include +#if HAVE_PTHREAD_LOCKING #include +#endif #ifdef HAVE_SYS_WAIT_H #include #endif @@ -162,9 +160,18 @@ enum symbol_style ss_uscore, /* Underscore prefix all symbols. */ }; +#if HAVE_PTHREAD_LOCKING /* Plug-in mutex. */ static pthread_mutex_t plugin_lock; +#define LOCK_SECTION pthread_mutex_lock (&plugin_lock) +#define UNLOCK_SECTION pthread_mutex_unlock (&plugin_lock) +#else + +#define LOCK_SECTION +#define UNLOCK_SECTION +#endif + static char *arguments_file_name; static ld_plugin_register_claim_file register_claim_file; static ld_plugin_register_all_symbols_read register_all_symbols_read; @@ -1270,18 +1277,18 @@ claim_file_handler (const struct ld_plugin_input_file *file, int *claimed) lto_file.symtab.syms); check (status == LDPS_OK, LDPL_FATAL, "could not add symbols"); - pthread_mutex_lock (&plugin_lock); + LOCK_SECTION; num_claimed_files++; claimed_files = xrealloc (claimed_files, num_claimed_files * sizeof (struct plugin_file_info)); claimed_files[num_claimed_files - 1] = lto_file; - pt
Re: [statistics.cc] ICE in get_function_name with fortran test-case
On Thu, Jul 7, 2022 at 12:44 PM Prathamesh Kulkarni wrote: > > Hi, > My recent commit to emit asm name with -fdump-statistics-asmname > caused following ICE > for attached fortran test case. > > during IPA pass: icf > power.fppized.f90:6:26: > > 6 | END SUBROUTINE power_print > | ^ > internal compiler error: Segmentation fault > 0xfddc13 crash_signal > ../../gcc/gcc/toplev.cc:322 > 0x7f6f940de51f ??? > ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 > 0xfc909d get_function_name > ../../gcc/gcc/statistics.cc:124 > 0xfc929f statistics_fini_pass_2(statistics_counter**, void*) > ../../gcc/gcc/statistics.cc:175 > 0xfc94a4 void hash_table xcallocator>::traverse_noresize &(statistics_fini_pass_2(statistics_counter**, void*))>(void*) > ../../gcc/gcc/hash-table.h:1084 > 0xfc94a4 statistics_fini_pass() > ../../gcc/gcc/statistics.cc:219 > 0xef12bc execute_todo > ../../gcc/gcc/passes.cc:2142 > > This happens because fn was passed NULL in get_function_name. > The patch adds a check to see if fn is NULL before checking for > DECL_ASSEMBLER_NAME_SET_P, which fixes the issue. > In case the fn is NULL, it calls function_name(NULL) as per old behavior, > which returns "(nofn)". > > Bootstrap+tested on x86_64-linux-gnu. > OK to commit ? OK > > Thanks, > Prathamesh
Re: [PATCH] lto-plugin: use locking only for selected targets
On Thu, Jul 7, 2022 at 1:43 PM Martin Liška wrote: > > For now, support locking only for linux targets that are different from > riscv* where the target depends on libatomic (and fails during > bootstrap). > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? OK - that also resolves the mingw issue, correct? I suppose we need to be careful to not advertise v1 API (which includes threadsafeness) when not HAVE_PTHREAD_LOCKING. Thanks, Richard. > Thanks, > Martin > > PR lto/106170 > > lto-plugin/ChangeLog: > > * configure.ac: Configure HAVE_PTHREAD_LOCKING. > * lto-plugin.c (LOCK_SECTION): New. > (UNLOCK_SECTION): New. > (claim_file_handler): Use the newly added macros. > (onload): Likewise. > * config.h.in: Regenerate. > * configure: Regenerate. > --- > lto-plugin/config.h.in | 4 ++-- > lto-plugin/configure| 20 ++-- > lto-plugin/configure.ac | 17 +++-- > lto-plugin/lto-plugin.c | 30 -- > 4 files changed, 51 insertions(+), 20 deletions(-) > > diff --git a/lto-plugin/config.h.in b/lto-plugin/config.h.in > index 029e782f1ee..bf269f000d2 100644 > --- a/lto-plugin/config.h.in > +++ b/lto-plugin/config.h.in > @@ -9,8 +9,8 @@ > /* Define to 1 if you have the header file. */ > #undef HAVE_MEMORY_H > > -/* Define to 1 if pthread.h is present. */ > -#undef HAVE_PTHREAD_H > +/* Define if the system-provided pthread locking mechanism. */ > +#undef HAVE_PTHREAD_LOCKING > > /* Define to 1 if you have the header file. */ > #undef HAVE_STDINT_H > diff --git a/lto-plugin/configure b/lto-plugin/configure > index aaa91a63623..7ea54e6008f 100755 > --- a/lto-plugin/configure > +++ b/lto-plugin/configure > @@ -6011,14 +6011,22 @@ fi > > > # Check for thread headers. > -ac_fn_c_check_header_mongrel "$LINENO" "pthread.h" "ac_cv_header_pthread_h" > "$ac_includes_default" > -if test "x$ac_cv_header_pthread_h" = xyes; then : > +use_locking=no > > -$as_echo "#define HAVE_PTHREAD_H 1" >>confdefs.h > +case $target in > + riscv*) > +# do not use locking as pthread depends on libatomic > +;; > + *-linux*) > +use_locking=yes > +;; > +esac > > -fi > +if test x$use_locking = xyes; then > > +$as_echo "#define HAVE_PTHREAD_LOCKING 1" >>confdefs.h > > +fi > > case `pwd` in >*\ * | *\*) > @@ -12091,7 +12099,7 @@ else >lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 >lt_status=$lt_dlunknown >cat > conftest.$ac_ext <<_LT_EOF > -#line 12094 "configure" > +#line 12102 "configure" > #include "confdefs.h" > > #if HAVE_DLFCN_H > @@ -12197,7 +12205,7 @@ else >lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 >lt_status=$lt_dlunknown >cat > conftest.$ac_ext <<_LT_EOF > -#line 12200 "configure" > +#line 12208 "configure" > #include "confdefs.h" > > #if HAVE_DLFCN_H > diff --git a/lto-plugin/configure.ac b/lto-plugin/configure.ac > index c2ec512880f..69bc5139193 100644 > --- a/lto-plugin/configure.ac > +++ b/lto-plugin/configure.ac > @@ -88,8 +88,21 @@ AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_GNU, [test > "x$lto_plugin_use_symver" = xgnu > AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_SUN, [test "x$lto_plugin_use_symver" = > xsun]) > > # Check for thread headers. > -AC_CHECK_HEADER(pthread.h, > - [AC_DEFINE(HAVE_PTHREAD_H, 1, [Define to 1 if pthread.h is present.])]) > +use_locking=no > + > +case $target in > + riscv*) > +# do not use locking as pthread depends on libatomic > +;; > + *-linux*) > +use_locking=yes > +;; > +esac > + > +if test x$use_locking = xyes; then > + AC_DEFINE(HAVE_PTHREAD_LOCKING, 1, > + [Define if the system-provided pthread locking mechanism.]) > +fi > > AM_PROG_LIBTOOL > ACX_LT_HOST_FLAGS > diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c > index 635e126946b..cba58f5999b 100644 > --- a/lto-plugin/lto-plugin.c > +++ b/lto-plugin/lto-plugin.c > @@ -40,11 +40,7 @@ along with this program; see the file COPYING3. If not see > > #ifdef HAVE_CONFIG_H > #include "config.h" > -#if !HAVE_PTHREAD_H > -#error POSIX threads are mandatory dependency > #endif > -#endif > - > #if HAVE_STDINT_H > #include > #endif > @@ -59,7 +55,9 @@ along with this program; see the file COPYING3. If not see > #include > #include > #include > +#if HAVE_PTHREAD_LOCKING > #include > +#endif > #ifdef HAVE_SYS_WAIT_H > #include > #endif > @@ -162,9 +160,18 @@ enum symbol_style >ss_uscore, /* Underscore prefix all symbols. */ > }; > > +#if HAVE_PTHREAD_LOCKING > /* Plug-in mutex. */ > static pthread_mutex_t plugin_lock; > > +#define LOCK_SECTION pthread_mutex_lock (&plugin_lock) > +#define UNLOCK_SECTION pthread_mutex_unlock (&plugin_lock) > +#else > + > +#define LOCK_SECTION > +#define UNLOCK_SECTION > +#endif > + > static char *arguments_file_name; > static ld_plugin_register_claim_file register_claim_file; > static ld_plugin_register_all_symbols_r
Re: [PATCH v3] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.
On Thu, 2022-07-07 at 18:30 +0800, Lulu Cheng wrote: /* snip */ > diff --git a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c > b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c > new file mode 100644 > index 000..c7bd71dde93 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c > @@ -0,0 +1,15 @@ > +/* Test that LoongArch backend stack drop operation optimized. */ > + > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mabi=lp64d" } */ > +/* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */ > + > +#include It's better to hard code "extern int printf(char *, ...);" here, so the test case won't unnecessarily depend on libc header. LGTM otherwise. > + > +int main() > +{ > + char buf[1024 * 12]; > + printf ("%p\n", buf); > + return 0; > +} > +
Re: [PATCH] lto-plugin: use locking only for selected targets
Richard Biener via Gcc-patches writes: >> +if test x$use_locking = xyes; then >> + AC_DEFINE(HAVE_PTHREAD_LOCKING, 1, >> + [Define if the system-provided pthread locking mechanism.]) This isn't even a sentence. At least I cannot parse it. Besides, it seems to be misnamed since the test doesn't check if pthread_mutex_lock and friends are present on the target, but if they should be used. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH 08/17] openmp: -foffload-memory=pinned
Hi Andrew, On 07.07.22 12:34, Andrew Stubbs wrote: Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. ... gcc/ChangeLog: * omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New. * omp-low.cc (omp_enable_pinned_mode): New function. (execute_lower_omp): Call omp_enable_pinned_mode. libgomp/ChangeLog: * config/linux/allocator.c (always_pinned_mode): New variable. (GOMP_enable_pinned_mode): New function. (linux_memspace_alloc): Disable pinning when always_pinned_mode set. (linux_memspace_calloc): Likewise. (linux_memspace_free): Likewise. (linux_memspace_realloc): Likewise. * libgomp.map: Add GOMP_enable_pinned_mode. * testsuite/libgomp.c/alloc-pinned-7.c: New test. ... ... --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx) input_location = saved_location; } +/* Emit a constructor function to enable -foffload-memory=pinned + at runtime. Libgomp handles the OS mode setting, but we need to trigger + it by calling GOMP_enable_pinned mode before the program proper runs. */ + +static void +omp_enable_pinned_mode () Is there a reason not to use the mechanism of OpenMP's 'requires' directive for this? (Okay, I have to admit that the final patch was only committed on Monday. But still ...) It looks very similar in spirit. I don't know whether there are issues of having -foffload-memory=pinned in some TU and not, but that could be handled in a similar way to GOMP_REQUIRES_TARGET_USED. For requires, omp_requires_mask is streamed out if OMP_REQUIRES_TARGET_USED and g->have_offload. (For completeness, it also requires ENABLE_OFFLOADING.) This data is read in by all lto1 (in lto-cgraph.cc) and checked for consistency. This data is then also passed on to *mkoffload.cc. And in libgomp, it is processed by GOMP_register_ver. Likewise, the 'requires' mechanism could then also be used in '[PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'. Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH v3] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.
在 2022/7/7 下午7:51, Xi Ruoyao 写道: On Thu, 2022-07-07 at 18:30 +0800, Lulu Cheng wrote: /* snip */ diff --git a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c new file mode 100644 index 000..c7bd71dde93 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c @@ -0,0 +1,15 @@ +/* Test that LoongArch backend stack drop operation optimized. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=lp64d" } */ +/* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */ + +#include It's better to hard code "extern int printf(char *, ...);" here, so the test case won't unnecessarily depend on libc header. LGTM otherwise. OK! Thanks!:-)
[PATCH] lto-dump: Do not print output file
Right now the following is printed: lto-dump .file "" .ident "GCC: (GNU) 13.0.0 20220707 (experimental)" .section.note.GNU-stack,"",@progbits After the patch we print -help and do not emit any assembly output: lto-dump Usage: lto-dump [OPTION]... SUB_COMMAND [OPTION]... LTO dump tool command line options. -list [options] Dump the symbol list. -demangle Dump the demangled output. -defined-only Dump only the defined symbols. ... Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin gcc/lto/ChangeLog: * lto-dump.cc (lto_main): Exit in the function as we don't want any LTO bytecode processing. gcc/ChangeLog: * toplev.cc (init_asm_output): Do not init asm_out_file. --- gcc/lto/lto-dump.cc | 16 ++-- gcc/toplev.cc | 2 +- 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/gcc/lto/lto-dump.cc b/gcc/lto/lto-dump.cc index f88486b5143..f3d852df51f 100644 --- a/gcc/lto/lto-dump.cc +++ b/gcc/lto/lto-dump.cc @@ -316,7 +316,10 @@ lto_main (void) { quiet_flag = true; if (flag_lto_dump_tool_help) -dump_tool_help (); +{ + dump_tool_help (); + exit (SUCCESS_EXIT_CODE); +} /* LTO is called as a front end, even though it is not a front end. Because it is called as a front end, TV_PHASE_PARSING and @@ -369,11 +372,12 @@ lto_main (void) { /* Dump specific gimple body of specified function. */ dump_body (); - return; } else if (flag_dump_callgraph) -{ - dump_symtab_graphviz (); - return; -} +dump_symtab_graphviz (); + else +dump_tool_help (); + + /* Exit right now. */ + exit (SUCCESS_EXIT_CODE); } diff --git a/gcc/toplev.cc b/gcc/toplev.cc index a24ad5db438..61d234a9ef4 100644 --- a/gcc/toplev.cc +++ b/gcc/toplev.cc @@ -721,7 +721,7 @@ init_asm_output (const char *name) "cannot open %qs for writing: %m", asm_file_name); } - if (!flag_syntax_only) + if (!flag_syntax_only && !(global_dc->lang_mask & CL_LTODump)) { targetm.asm_out.file_start (); -- 2.36.1
Re: [PATCH] lto-plugin: use locking only for selected targets
On 7/7/22 13:52, Rainer Orth wrote: > Richard Biener via Gcc-patches writes: > >>> +if test x$use_locking = xyes; then >>> + AC_DEFINE(HAVE_PTHREAD_LOCKING, 1, >>> + [Define if the system-provided pthread locking mechanism.]) > > This isn't even a sentence. At least I cannot parse it. Besides, it > seems to be misnamed since the test doesn't check if pthread_mutex_lock > and friends are present on the target, but if they should be used. You are right, fixed in v2. Martin > > Rainer > From 0c3380e64dc769b4bc00568395468bdf02e74f6f Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Thu, 7 Jul 2022 12:15:28 +0200 Subject: [PATCH] lto-plugin: use locking only for selected targets For now, support locking only for linux targets that are different from riscv* where the target depends on libatomic (and fails during bootstrap). PR lto/106170 lto-plugin/ChangeLog: * configure.ac: Configure HAVE_PTHREAD_LOCKING. * lto-plugin.c (LOCK_SECTION): New. (UNLOCK_SECTION): New. (claim_file_handler): Use the newly added macros. (onload): Likewise. * config.h.in: Regenerate. * configure: Regenerate. --- lto-plugin/config.h.in | 4 ++-- lto-plugin/configure| 21 + lto-plugin/configure.ac | 17 +++-- lto-plugin/lto-plugin.c | 29 +++-- 4 files changed, 53 insertions(+), 18 deletions(-) diff --git a/lto-plugin/config.h.in b/lto-plugin/config.h.in index 029e782f1ee..8eb9c8aa47d 100644 --- a/lto-plugin/config.h.in +++ b/lto-plugin/config.h.in @@ -9,8 +9,8 @@ /* Define to 1 if you have the header file. */ #undef HAVE_MEMORY_H -/* Define to 1 if pthread.h is present. */ -#undef HAVE_PTHREAD_H +/* Define if the system provides pthread locking mechanism. */ +#undef HAVE_PTHREAD_LOCKING /* Define to 1 if you have the header file. */ #undef HAVE_STDINT_H diff --git a/lto-plugin/configure b/lto-plugin/configure index aaa91a63623..870e49b2e62 100755 --- a/lto-plugin/configure +++ b/lto-plugin/configure @@ -6011,14 +6011,27 @@ fi # Check for thread headers. -ac_fn_c_check_header_mongrel "$LINENO" "pthread.h" "ac_cv_header_pthread_h" "$ac_includes_default" +use_locking=no + +case $target in + riscv*) +# do not use locking as pthread depends on libatomic +;; + *-linux*) +use_locking=yes +;; +esac + +if test x$use_locking = xyes; then + ac_fn_c_check_header_mongrel "$LINENO" "pthread.h" "ac_cv_header_pthread_h" "$ac_includes_default" if test "x$ac_cv_header_pthread_h" = xyes; then : -$as_echo "#define HAVE_PTHREAD_H 1" >>confdefs.h +$as_echo "#define HAVE_PTHREAD_LOCKING 1" >>confdefs.h fi +fi case `pwd` in *\ * | *\ *) @@ -12091,7 +12104,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 12094 "configure" +#line 12107 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -12197,7 +12210,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 12200 "configure" +#line 12213 "configure" #include "confdefs.h" #if HAVE_DLFCN_H diff --git a/lto-plugin/configure.ac b/lto-plugin/configure.ac index c2ec512880f..18eb4f60b0a 100644 --- a/lto-plugin/configure.ac +++ b/lto-plugin/configure.ac @@ -88,8 +88,21 @@ AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_GNU, [test "x$lto_plugin_use_symver" = xgnu AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_SUN, [test "x$lto_plugin_use_symver" = xsun]) # Check for thread headers. -AC_CHECK_HEADER(pthread.h, - [AC_DEFINE(HAVE_PTHREAD_H, 1, [Define to 1 if pthread.h is present.])]) +use_locking=no + +case $target in + riscv*) +# do not use locking as pthread depends on libatomic +;; + *-linux*) +use_locking=yes +;; +esac + +if test x$use_locking = xyes; then + AC_CHECK_HEADER(pthread.h, +[AC_DEFINE(HAVE_PTHREAD_LOCKING, 1, [Define if the system provides pthread locking mechanism.])]) +fi AM_PROG_LIBTOOL ACX_LT_HOST_FLAGS diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c index 635e126946b..7927dca60a4 100644 --- a/lto-plugin/lto-plugin.c +++ b/lto-plugin/lto-plugin.c @@ -40,11 +40,7 @@ along with this program; see the file COPYING3. If not see #ifdef HAVE_CONFIG_H #include "config.h" -#if !HAVE_PTHREAD_H -#error POSIX threads are mandatory dependency #endif -#endif - #if HAVE_STDINT_H #include #endif @@ -59,7 +55,9 @@ along with this program; see the file COPYING3. If not see #include #include #include +#if HAVE_PTHREAD_LOCKING #include +#endif #ifdef HAVE_SYS_WAIT_H #include #endif @@ -162,9 +160,17 @@ enum symbol_style ss_uscore, /* Underscore prefix all symbols. */ }; +#if HAVE_PTHREAD_LOCKING /* Plug-in mutex. */ static pthread_mutex_t plugin_lock; +#define LOCK_SECTION pthread_mutex_lock (&plugin_lock) +#define UNLOCK_SECTION pthread_mutex_unlock (&plugin_lock) +#else +#define LOCK_SECTION +#define UNLOCK_SECTION +#endif + static char
Re: [PATCH] lto-plugin: use locking only for selected targets
On 7/7/22 13:46, Richard Biener wrote: > OK - that also resolves the mingw issue, correct? I suppose we need Yes. > to be careful to not advertise v1 API (which includes threadsafeness) > when not HAVE_PTHREAD_LOCKING. Will reflect that in the patch. I'm going to push it now. Martin
Fix one issue in OpenMP 'requires' directive diagnostics (was: [Patch][v5] OpenMP: Move omp requires checks to libgomp)
Hi! On 2022-07-01T23:08:16+0200, Tobias Burnus wrote: > Updated version attached – I hope I got everything right, but I start to > get tired, I am not 100% sure. ..., and so the obligatory copy'n'past-o ;-) crept in: > --- a/gcc/lto-cgraph.cc > +++ b/gcc/lto-cgraph.cc > @@ -1773,6 +1804,10 @@ input_offload_tables (bool do_force_output) >struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data (); >struct lto_file_decl_data *file_data; >unsigned int j = 0; > + const char *requires_fn = NULL; > + tree requires_decl = NULL_TREE; > + > + omp_requires_mask = (omp_requires) 0; > >while ((file_data = file_data_vec[j++])) > { > @@ -1784,6 +1819,7 @@ input_offload_tables (bool do_force_output) >if (!ib) > continue; > > + tree tmp_decl = NULL_TREE; >enum LTO_symtab_tags tag > = streamer_read_enum (ib, LTO_symtab_tags, LTO_symtab_last_tag); >while (tag) > @@ -1799,6 +1835,7 @@ input_offload_tables (bool do_force_output) >LTO mode. */ > if (do_force_output) > cgraph_node::get (fn_decl)->mark_force_output (); > + tmp_decl = fn_decl; > } > else if (tag == LTO_symtab_variable) > { > @@ -1810,6 +1847,72 @@ input_offload_tables (bool do_force_output) >may be no refs to var_decl in offload LTO mode. */ > if (do_force_output) > varpool_node::get (var_decl)->force_output = 1; > + tmp_decl = var_decl; > + } > + else if (tag == LTO_symtab_edge) > + { > + static bool error_emitted = false; > + HOST_WIDE_INT val = streamer_read_hwi (ib); > + > + if (omp_requires_mask == 0) > + { > + omp_requires_mask = (omp_requires) val; > + requires_decl = tmp_decl; > + requires_fn = file_data->file_name; > + } > + else if (omp_requires_mask != val && !error_emitted) > + { > + const char *fn1 = requires_fn; > + if (requires_decl != NULL_TREE) > + { > + while (DECL_CONTEXT (requires_decl) != NULL_TREE > + && TREE_CODE (requires_decl) != > TRANSLATION_UNIT_DECL) > + requires_decl = DECL_CONTEXT (requires_decl); > + if (requires_decl != NULL_TREE) > + fn1 = IDENTIFIER_POINTER (DECL_NAME (requires_decl)); > + } > + > + const char *fn2 = file_data->file_name; > + if (tmp_decl != NULL_TREE) > + { > + while (DECL_CONTEXT (tmp_decl) != NULL_TREE > + && TREE_CODE (tmp_decl) != TRANSLATION_UNIT_DECL) > + tmp_decl = DECL_CONTEXT (tmp_decl); > + if (tmp_decl != NULL_TREE) > + fn2 = IDENTIFIER_POINTER (DECL_NAME (requires_decl)); > + } ... here: tmp_decl' not 'requires_decl'. OK to push the attached "Fix one issue in OpenMP 'requires' directive diagnostics"? I'd even push that one "as obvious", but thought I'd ask whether you maybe have a quick idea about the XFAILs that I'm adding? (I'm otherwise not planning on resolving that issue at this time.) > + > + char buf1[sizeof ("unified_address, unified_shared_memory, " > + "reverse_offload")]; > + char buf2[sizeof ("unified_address, unified_shared_memory, " > + "reverse_offload")]; > + omp_requires_to_name (buf2, sizeof (buf2), > + val != OMP_REQUIRES_TARGET_USED > + ? val > + : (HOST_WIDE_INT) omp_requires_mask); > + if (val != OMP_REQUIRES_TARGET_USED > + && omp_requires_mask != OMP_REQUIRES_TARGET_USED) > + { > + omp_requires_to_name (buf1, sizeof (buf1), > + omp_requires_mask); > + error ("OpenMP % directive with non-identical " > + "clauses in multiple compilation units: %qs vs. " > + "%qs", buf1, buf2); > + inform (UNKNOWN_LOCATION, "%qs has %qs", fn1, buf1); > + inform (UNKNOWN_LOCATION, "%qs has %qs", fn2, buf2); > + } > + else > + { > + error ("OpenMP % directive with %qs specified " > + "only in some compilation units", buf2); > + inform (UNKNOWN_LOCATION, "%qs has %qs", > + val != OMP_REQUIRES_TARGET_USED ? fn2 : fn1, > + buf2); > + inform (UNKNOWN_LOCATION, "but %qs has not", > + val != OMP_REQUIRES_TARGET_USED ? f
[PATCH][RFC] More update-ssa speedup
When we do TODO_update_ssa_no_phi we already avoid computing dominance frontiers for all blocks - it is worth to also avoid walking all dominated blocks in the update domwalk and restrict the walk to the SEME region with the affected blocks. We can do that by walking the CFG in reverse from blocks_to_update to the common immediate dominator, marking blocks in the region and telling the domwalk to STOP when leaving it. For an artificial testcase with N adjacent loops with one unswitching opportunity that takes the incremental SSA updating off the -ftime-report radar: tree loop unswitching : 11.25 ( 3%) 0.09 ( 5%) 11.53 ( 3%) 36M ( 9%) `- tree SSA incremental: 35.74 ( 9%) 0.07 ( 4%) 36.65 ( 9%) 2734k ( 1%) improves to tree loop unswitching : 10.21 ( 3%) 0.05 ( 3%) 11.50 ( 3%) 36M ( 9%) `- tree SSA incremental: 0.66 ( 0%) 0.02 ( 1%) 0.49 ( 0%) 2734k ( 1%) for less localized updates the SEME region isn't likely constrained enough so I've restricted the extra work to TODO_update_ssa_no_phi callers. Bootstrap & regtest running on x86_64-unknown-linux-gnu. It's probably the last change that makes a visible difference for general update-ssa and any specialized manual updating that would be possible as well would not show up in better numbers than above (unless I manage to complicate the testcase more). Comments? Thanks, Richard. * tree-into-ssa.cc (rewrite_mode::REWRITE_UPDATE_REGION): New. (rewrite_update_dom_walker::rewrite_update_dom_walker): Update. (rewrite_update_dom_walker::m_in_region_flag): New. (rewrite_update_dom_walker::before_dom_children): If the region to update is marked, STOP at exits. (rewrite_blocks): For REWRITE_UPDATE_REGION mark the region to be updated. (dump_update_ssa): Use bitmap_empty_p. (update_ssa): Likewise. Use REWRITE_UPDATE_REGION when TODO_update_ssa_no_phi. * tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Account pending update_ssa to the caller. --- gcc/tree-cfgcleanup.cc | 6 ++- gcc/tree-into-ssa.cc | 97 -- 2 files changed, 90 insertions(+), 13 deletions(-) diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc index b9ff6896ce6..3535a7e28a4 100644 --- a/gcc/tree-cfgcleanup.cc +++ b/gcc/tree-cfgcleanup.cc @@ -1095,7 +1095,11 @@ cleanup_tree_cfg_noloop (unsigned ssa_update_flags) /* After doing the above SSA form should be valid (or an update SSA should be required). */ if (ssa_update_flags) -update_ssa (ssa_update_flags); +{ + timevar_pop (TV_TREE_CLEANUP_CFG); + update_ssa (ssa_update_flags); + timevar_push (TV_TREE_CLEANUP_CFG); +} /* Compute dominator info which we need for the iterative process below. */ if (!dom_info_available_p (CDI_DOMINATORS)) diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc index 9f45e62c6d0..be71b629f97 100644 --- a/gcc/tree-into-ssa.cc +++ b/gcc/tree-into-ssa.cc @@ -240,7 +240,8 @@ enum rewrite_mode { /* Incrementally update the SSA web by replacing existing SSA names with new ones. See update_ssa for details. */ -REWRITE_UPDATE +REWRITE_UPDATE, +REWRITE_UPDATE_REGION }; /* The set of symbols we ought to re-write into SSA form in update_ssa. */ @@ -2155,11 +2156,14 @@ rewrite_update_phi_arguments (basic_block bb) class rewrite_update_dom_walker : public dom_walker { public: - rewrite_update_dom_walker (cdi_direction direction) -: dom_walker (direction, ALL_BLOCKS, (int *)(uintptr_t)-1) {} + rewrite_update_dom_walker (cdi_direction direction, int in_region_flag = -1) +: dom_walker (direction, ALL_BLOCKS, (int *)(uintptr_t)-1), + m_in_region_flag (in_region_flag) {} edge before_dom_children (basic_block) final override; void after_dom_children (basic_block) final override; + + int m_in_region_flag; }; /* Initialization of block data structures for the incremental SSA @@ -2179,6 +2183,10 @@ rewrite_update_dom_walker::before_dom_children (basic_block bb) /* Mark the unwind point for this block. */ block_defs_stack.safe_push (NULL_TREE); + if (m_in_region_flag != -1 + && !(bb->flags & m_in_region_flag)) +return STOP; + if (!bitmap_bit_p (blocks_to_update, bb->index)) return NULL; @@ -2270,8 +2278,8 @@ rewrite_update_dom_walker::after_dom_children (basic_block bb ATTRIBUTE_UNUSED) WHAT indicates what actions will be taken by the renamer (see enum rewrite_mode). - BLOCKS are the set of interesting blocks for the dominator walker - to process. If this set is NULL, then all the nodes dominated + REGION is a SEME region of interesting blocks for the dominator walker + to process. If this set is invalid, then all the nodes dominated by ENTRY are walked. Otherwise, blocks dominated by ENTRY that are not present in
Re: [GCC 13][PATCH] PR101836: Add a new option -fstrict-flex-array[=n] and use it in __builtin_object_size
> On Jul 7, 2022, at 4:02 AM, Richard Biener wrote: > > On Wed, Jul 6, 2022 at 4:20 PM Qing Zhao wrote: >> >> (Sorry for the late reply, just came back from a short vacation.) >> >>> On Jul 4, 2022, at 2:49 AM, Richard Biener >>> wrote: >>> >>> On Fri, Jul 1, 2022 at 5:32 PM Martin Sebor wrote: On 7/1/22 08:01, Qing Zhao wrote: > > >> On Jul 1, 2022, at 8:59 AM, Jakub Jelinek wrote: >> >> On Fri, Jul 01, 2022 at 12:55:08PM +, Qing Zhao wrote: >>> If so, comparing to the current implemenation to have all the checking >>> in middle-end, what’s the >>> major benefit of moving part of the checking into FE, and leaving the >>> other part in middle-end? >> >> The point is recording early what FIELD_DECLs could be vs. can't >> possibly be >> treated like flexible array members and just use that flag in the >> decisions >> in the current routines in addition to what it is doing. > > Okay. > > Based on the discussion so far, I will do the following: > > 1. Add a new flag “DECL_NOT_FLEXARRAY” to FIELD_DECL; > 2. In C/C++ FE, set the new flag “DECL_NOT_FLEXARRAY” for a FIELD_DECL > based on [0], [1], >[] and the option -fstrict-flex-array, and whether it’s the last field > of the DECL_CONTEXT. > 3. In Middle end, Add a new utility routine is_flexible_array_member_p, > which bases on >DECL_NOT_FLEXARRAY + array_at_struct_end_p to decide whether the array >reference is a real flexible array member reference. >>> >>> I would just update all existing users, not introduce another wrapper >>> that takes DECL_NOT_FLEXARRAY >>> into account additionally. >> >> Okay. >>> > > > Middle end currently is quite mess, array_at_struct_end_p, > component_ref_size, and all the phases that > use these routines need to be updated, + new testing cases for each of > the phases. > > > So, I still plan to separate the patch set into 2 parts: > > Part A:the above 1 + 2 + 3, and use these new utilities in > tree-object-size.cc to resolve PR101836 first. > Then kernel can use __FORTIFY_SOURCE correctly; > > Part B:update all other phases with the new utilities + new testing > cases + resolving regressions. > > Let me know if you have any comment and suggestion. It might be worth considering whether it should be possible to control the "flexible array" property separately for each trailing array member via either a #pragma or an attribute in headers that can't change the struct layout but that need to be usable in programs compiled with stricter -fstrict-flex-array=N settings. >>> >>> Or an decl attribute. >> >> Yes, it might be necessary to add a corresponding decl attribute >> >> strict_flex_array (N) >> >> Which is attached to a trailing structure array member to provide the user a >> finer control when -fstrict-flex-array=N is specified. >> >> So, I will do the following: >> >> >> *User interface: >> >> 1. command line option: >> -fstrict-flex-array=N (N=0, 1, 2, 3) >> 2. decl attribute: >> strict_flex_array (N) (N=0, 1, 2, 3) >> >> >> *Implementation: >> >> 1. Add a new flag “DECL_NOT_FLEXARRAY” to FIELD_DECL; >> 2. In C/C++ FE, set the new flag “DECL_NOT_FLEXARRAY” for a FIELD_DECL based >> on [0], [1], >> [], the option -fstrict-flex-array, the attribute strict_flex_array, >> and whether it’s the last field >> of the DECL_CONTEXT. >> 3. In Middle end, update all users of “array_at_struct_end_p" or >> “component_ref_size”, or any place that treats >>Trailing array as flexible array member with the new flag >> DECL_NOT_FLEXARRAY. >>(Still think we need a new consistent utility routine here). >> >> >> I still plan to separate the patch set into 2 parts: >> >> Part A:the above 1 + 2 + 3, and use these new utilities in >> tree-object-size.cc to resolve PR101836 first. >> Then kernel can use __FORTIFY_SOURCE correctly. >> Part B:update all other phases with the new utilities + new testing >> cases + resolving regressions. >> >> >> Let me know any more comment or suggestion. > > Sounds good. Part 3. is "optimization" and reasonable to do > separately, I'm not sure you need > 'B' (since we're not supposed to have new utilities), but instead I'd > do '3.' as part of 'B', just > changing the pieces th resolve PR101836 for part 'A'. Okay, I see. Then I will separate the patches to: Part A: 1 + 2 Part B: In Middle end, use the new flag in tree-object-size.cc to resolve PR101836, then kernel can use __FORTIFY_SOURCE correctly after this; Part C: in Middle end, use the new flag in all other places that use “array_at_struct_end_p” or “component_ref_size” to make GCC consistently behave for trailing array. The reason I separat
Re: Fix one issue in OpenMP 'requires' directive diagnostics (was: [Patch][v5] OpenMP: Move omp requires checks to libgomp)
Hi Thomas, On 07.07.22 15:26, Thomas Schwinge wrote: On 2022-07-01T23:08:16+0200, Tobias Burnus wrote: Updated version attached – I hope I got everything right, but I start to get tired, I am not 100% sure. ..., and so the obligatory copy'n'past-o;-) crept in: ... + if (tmp_decl != NULL_TREE) +fn2 = IDENTIFIER_POINTER (DECL_NAME (requires_decl)); +} ... here: tmp_decl' not 'requires_decl'. OK to push the attached "Fix one issue in OpenMP 'requires' directive diagnostics"? Good that you spotted it and thanks for testing + fixing it! I'd even push that one "as obvious", but thought I'd ask whether you maybe have a quick idea about the XFAILs that I'm adding? (I'm otherwise not planning on resolving that issue at this time.) (This question relates to what's printed if there is no TRANSLATION_UNIT_DECL.) * * * Pre-remark - the code does: * If there is any offload_func or offload_var DECL in the TU, it uses that one for diagnostic. This is always the case if there is a 'declare target' or 'omp target' but not when there is only 'omp target data'. → In real-world code it likely has a proper name. * Otherwise, it takes the file name of file_data->file_name. With -save-temps, that's based on the input files, which gives a useful output. When using gcc -c *.c gcc *.o the file name is .o - which is also quite useful. However, when doing gcc *.c which combines compiling and linking in one step, the filename is /tmp/cc*.o which is not that helpful. There is no real way to avoid this, unless we explicitly store the filename or some location_t for the 'requires'. But the used LTO writer does not support it directly. It can be fixed, but requires some re-organization and increased intermittent .o file size. (Cf. https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597496.html for what it needed and why it does not work.) However, in the real world, there should be usually a proper message as (a) It is unlikely to have code which only does 'omp target ... data' transfer - and no 'omp target' and (b) for larger code, separating compilation and linking is common while for smaller code, 'requires' mismatches is less likely and also easier to find the file causing the issue. Still, like always, having a nice diagnostic would not harm :-) * * * Subject: [PATCH] Fix one issue in OpenMP 'requires' directive diagnostics Fix-up for recent commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0 "OpenMP: Move omp requires checks to libgomp". gcc/ * lto-cgraph.cc (input_offload_tables) : Correct 'fn2' computation. libgomp/ * testsuite/libgomp.c-c++-common/requires-1.c: Add 'dg-note's. * testsuite/libgomp.c-c++-common/requires-2.c: Likewise. * testsuite/libgomp.c-c++-common/requires-3.c: Likewise. * testsuite/libgomp.c-c++-common/requires-7.c: Likewise. * testsuite/libgomp.fortran/requires-1.f90: Likewise. Regarding the patch, it adds 'dg-note' for the location data - and fixes the wrong-decl-use bug. Those LGTM - Thanks! Regarding the xfail part: --- a/libgomp/testsuite/libgomp.c-c++-common/requires-7.c ... -/* { dg-error "OpenMP 'requires' directive with non-identical clauses in multiple compilation units: 'unified_shared_memory' vs. 'unified_address'" "" { target *-*-* } 0 } */ +/* { dg-error "OpenMP 'requires' directive with non-identical clauses in multiple compilation units: 'unified_shared_memory' vs. 'unified_address'" "" { target *-*-* } 0 } + { dg-note {requires-7\.c' has 'unified_shared_memory'} {} { target *-*-* } 0 } + TODO There is some issue that we're not seeing the source file name here (but a temporary '*.o' instead): + { dg-note {requires-7-aux\.c' has 'unified_address'} {} { xfail *-*-* } 0 } + ..., so verify that at least the rest of the diagnostic is correct: + { dg-note {' has 'unified_address'} {} { target *-*-* } 0 } */ The requires-7-aux.c file uses, on purpose, only 'omp target enter data' to trigger the .o name in 'inform' as no decl is written to offload_func/offload_vars for that TU. As the testsuite compiles+links the two requires-7*.c files in one step and is invoked without -save-temps, the used object file names will be /tmp/cc*.o. * * * Regarding the xfail: I think it is fine to have this xfail, but as it is clear why inform points to /tmp/cc*.o, you could reword the TODO to state why it goes wrong. Thanks, Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[PATCH] match.pd: Add new bitwise arithmetic pattern [PR98304]
Hi! This patch is meant to solve a missed optimization in match.pd. It optimizes the following expression: n - (((n > 63) ? n : 63) & -64) where the constant being negated (in this case -64) is a power of 2 and the sum of the two constants is -1. For the signed case, this gets optimized to (n <= 63) ? n : (n & 63). For the unsigned case, it gets optimized to (n & 63). In both scenarios, the number of instructions produced decreases. There are also tests for this optimization making sure the optimization happens when it is supposed to, and does not happen when it isn't. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? PR tree-optimization/98304 gcc/ChangeLog: * match.pd (n - (((n > C1) ? n : C1) & -C2)): New simplification. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr98304-2.c: New test. * gcc.dg/pr98304-1.c: New test. --- gcc/match.pd | 12 .../gcc.c-torture/execute/pr98304-2.c | 37 gcc/testsuite/gcc.dg/pr98304-1.c | 57 +++ 3 files changed, 106 insertions(+) create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr98304-2.c create mode 100644 gcc/testsuite/gcc.dg/pr98304-1.c diff --git a/gcc/match.pd b/gcc/match.pd index 88c6c414881..45aefd96688 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -7836,3 +7836,15 @@ and, (match (bitwise_induction_p @0 @2 @3) (bit_not (nop_convert1? (bit_xor@0 (convert2? (lshift integer_onep@1 @2)) @3 + +/* n - (((n > C1) ? n : C1) & -C2) -> n & C1 for unsigned case. + n - (((n > C1) ? n : C1) & -C2) -> (n <= C1) ? n : (n & C1) for signed case. */ +(simplify + (minus @0 (bit_and (max @0 INTEGER_CST@1) INTEGER_CST@2)) + (with { auto i = wi::neg (wi::to_wide (@2)); } + /* Check if -C2 is a power of 2 and C1 = -C2 - 1. */ +(if (wi::popcount (i) == 1 + && (wi::to_wide (@1)) == (i - 1)) + (if (TYPE_UNSIGNED (TREE_TYPE (@0))) +(bit_and @0 @1) + (cond (le @0 @1) @0 (bit_and @0 @1)) diff --git a/gcc/testsuite/gcc.c-torture/execute/pr98304-2.c b/gcc/testsuite/gcc.c-torture/execute/pr98304-2.c new file mode 100644 index 000..114c612db3b --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr98304-2.c @@ -0,0 +1,37 @@ +/* PR tree-optimization/98304 */ + +#include "../../gcc.dg/pr98304-1.c" + +/* Runtime tests. */ +int main() { + +/* Signed tests. */ +if (foo(-42) != -42 +|| foo(0) != 0 +|| foo(63) != 63 +|| foo(64) != 0 +|| foo(65) != 1 +|| foo(99) != 35) { +__builtin_abort(); +} + +/* Unsigned tests. */ +if (bar(42) != 42 +|| bar(0) != 0 +|| bar(63) != 63 +|| bar(64) != 0 +|| bar(65) != 1 +|| bar(99) != 35) { +__builtin_abort(); +} + +/* Should not simplify. */ +if (corge(13) != 13 +|| thud(13) != 13 +|| qux(13) != 13 +|| quux(13) != 13) { +__builtin_abort(); +} + +return 0; +} diff --git a/gcc/testsuite/gcc.dg/pr98304-1.c b/gcc/testsuite/gcc.dg/pr98304-1.c new file mode 100644 index 000..dce54ddffe8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr98304-1.c @@ -0,0 +1,57 @@ +/* PR tree-optimization/98304 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +/* Signed test function. */ +__attribute__((noipa)) int foo(int n) { +return n - (((n > 63) ? n : 63) & -64); +} + +/* Unsigned test function. */ +__attribute__((noipa)) unsigned int bar(unsigned int n) { +return n - (((n > 63) ? n : 63) & -64); +} + +/* Different power of 2. */ +__attribute__((noipa)) int goo(int n) { +return n - (((n > 31) ? n : 31) & -32); +} + +/* Commutative property (should be identical to foo) */ +__attribute__((noipa)) int baz(int n) { +return n - (((64 > n) ? 63 : n) & -64); +} + +/* < instead of >. */ +__attribute__((noipa)) int fred(int n) { +return n - (((63 < n) ? n : 63) & -64); +} + +/* Constant is not a power of 2 so should not simplify. */ +__attribute__((noipa)) int qux(int n) { +return n - (((n > 62) ? n : 62) & -63); +} + +/* Constant is not a power of 2 so should not simplify. */ +__attribute__((noipa)) unsigned int quux(unsigned int n) { +return n - (((n > 62) ? n : 62) & -63); +} + +/* Constant is a variable so should not simplify. */ +__attribute__((noipa)) int waldo(int n, int x) { +return n - (((n > 63) ? n : 63) & x); +} + +/* Difference between constants is not -1. */ +__attribute__((noipa)) int corge(int n) { +return n - (((n > 1) ? n : 1) & -64); +} + +/* Difference between constants is not -1. */ +__attribute__((noipa)) unsigned int thud(unsigned int n) +{ +return n - (((n > 1) ? n : 1) & -64); +} + +/* { dg-final { scan-tree-dump-times " - " 5 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " <= " 4 "optimized" } } */ base-commit: a8b5d63503b8cf49de32d241218057409f8731ac -- 2.31.1
Re: [PATCH][RFC] More update-ssa speedup
On 7/7/2022 7:33 AM, Richard Biener via Gcc-patches wrote: When we do TODO_update_ssa_no_phi we already avoid computing dominance frontiers for all blocks - it is worth to also avoid walking all dominated blocks in the update domwalk and restrict the walk to the SEME region with the affected blocks. We can do that by walking the CFG in reverse from blocks_to_update to the common immediate dominator, marking blocks in the region and telling the domwalk to STOP when leaving it. For an artificial testcase with N adjacent loops with one unswitching opportunity that takes the incremental SSA updating off the -ftime-report radar: tree loop unswitching : 11.25 ( 3%) 0.09 ( 5%) 11.53 ( 3%)36M ( 9%) `- tree SSA incremental: 35.74 ( 9%) 0.07 ( 4%) 36.65 ( 9%) 2734k ( 1%) improves to tree loop unswitching : 10.21 ( 3%) 0.05 ( 3%) 11.50 ( 3%)36M ( 9%) `- tree SSA incremental: 0.66 ( 0%) 0.02 ( 1%) 0.49 ( 0%) 2734k ( 1%) for less localized updates the SEME region isn't likely constrained enough so I've restricted the extra work to TODO_update_ssa_no_phi callers. Bootstrap & regtest running on x86_64-unknown-linux-gnu. It's probably the last change that makes a visible difference for general update-ssa and any specialized manual updating that would be possible as well would not show up in better numbers than above (unless I manage to complicate the testcase more). Comments? Thanks, Richard. * tree-into-ssa.cc (rewrite_mode::REWRITE_UPDATE_REGION): New. (rewrite_update_dom_walker::rewrite_update_dom_walker): Update. (rewrite_update_dom_walker::m_in_region_flag): New. (rewrite_update_dom_walker::before_dom_children): If the region to update is marked, STOP at exits. (rewrite_blocks): For REWRITE_UPDATE_REGION mark the region to be updated. (dump_update_ssa): Use bitmap_empty_p. (update_ssa): Likewise. Use REWRITE_UPDATE_REGION when TODO_update_ssa_no_phi. * tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Account pending update_ssa to the caller. Overall concept seems quite reasonable to me -- it also largely mirrors ideas I was exploring for incremental DOM many years ago. Jeff
[RFA] Improve initialization of objects when the initializer has trailing zeros.
This is an update to a patch originally posted by Takayuki Suwa a few months ago. When we initialize an array from a STRING_CST we perform the initialization in two steps. The first step copies the STRING_CST to the destination. The second step uses clear_storage to initialize storage in the array beyond TREE_STRING_LENGTH of the initializer. Takayuki's patch added a special case when the STRING_CST itself was all zeros which would avoid the copy from the STRING_CST and instead do all the initialization via clear_storage which is clearly more runtime efficient. Richie had the suggestion that instead of special casing when the entire STRING_CST was NULs to instead identify when the tail of the STRING_CST was NULs. That's more general and handles Takayuki's case as well. Bootstrapped and regression tested on x86_64-linux-gnu. Given I rewrote Takayuki's patch I think it needs someone else to review rather than self-approving. OK for the trunk? Jeff * expr.cc (store_expr): Identify trailing NULs in a STRING_CST initializer and use clear_storage rather than copying the NULs to the destination array. diff --git a/gcc/expr.cc b/gcc/expr.cc index 62297379ec9..f94d46b969c 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -6087,6 +6087,17 @@ store_expr (tree exp, rtx target, int call_param_p, } str_copy_len = TREE_STRING_LENGTH (str); + + /* Trailing NUL bytes in EXP will be handled by the call to +clear_storage, which is more efficient than copying them from +the STRING_CST, so trim those from STR_COPY_LEN. */ + while (str_copy_len) + { + if (TREE_STRING_POINTER (str)[str_copy_len - 1]) + break; + str_copy_len--; + } + if ((STORE_MAX_PIECES & (STORE_MAX_PIECES - 1)) == 0) { str_copy_len += STORE_MAX_PIECES - 1;
Re: [PATCH] c++: generic targs and identity substitution [PR105956]
On Thu, 7 Jul 2022, Jason Merrill wrote: > On 7/6/22 15:26, Patrick Palka wrote: > > On Tue, 5 Jul 2022, Jason Merrill wrote: > > > > > On 7/5/22 10:06, Patrick Palka wrote: > > > > On Fri, 1 Jul 2022, Jason Merrill wrote: > > > > > > > > > On 6/29/22 13:42, Patrick Palka wrote: > > > > > > In r13-1045-gcb7fd1ea85feea I assumed that substitution into generic > > > > > > DECL_TI_ARGS corresponds to an identity mapping of the given > > > > > > arguments, > > > > > > and hence its safe to always elide such substitution. But this PR > > > > > > demonstrates that such a substitution isn't always the identity > > > > > > mapping, > > > > > > in particular when there's an ARGUMENT_PACK_SELECT argument, which > > > > > > gets > > > > > > handled specially during substitution: > > > > > > > > > > > > * when substituting an APS into a template parameter, we strip > > > > > > the > > > > > >APS to its underlying argument; > > > > > > * and when substituting an APS into a pack expansion, we strip > > > > > > the > > > > > >APS to its underlying argument pack. > > > > > > > > > > Ah, right. For instance, in variadic96.C we have > > > > > > > > > > 10 template < typename... T > > > > > > 11 struct derived > > > > > 12: public base< T, derived< T... > >... > > > > > > > > > > so when substituting into the base-specifier, we're approaching it > > > > > from > > > > > the > > > > > outside in, so when we get to the inner T... we need some way to find > > > > > the > > > > > T > > > > > pack again. It might be possible to remove the need for APS by > > > > > substituting > > > > > inner pack expansions before outer ones, which could improve > > > > > worst-case > > > > > complexity, but I don't know how relevant that is in real code; I > > > > > imagine > > > > > most > > > > > inner pack expansions are as simple as this one. > > > > > > > > Aha, that makes sense. > > > > > > > > > > > > > > > In this testcase, when expanding the pack expansion pattern (idx + > > > > > > Ns)... > > > > > > with Ns={0,1}, we specialize idx twice, first with Ns=APS<0,{0,1}> > > > > > > and > > > > > > then Ns=APS<1,{0,1}>. The DECL_TI_ARGS of idx are the generic > > > > > > template > > > > > > arguments of the enclosing class template impl, so before r13-1045, > > > > > > we'd substitute into its DECL_TI_ARGS which gave Ns={0,1} as > > > > > > desired. > > > > > > But after r13-1045, we elide this substitution and end up attempting > > > > > > to > > > > > > hash the original Ns argument, an APS, which ICEs. > > > > > > > > > > > > So this patch partially reverts this part of r13-1045. I considered > > > > > > using preserve_args in this case instead, but that'd break the > > > > > > static_assert in the testcase because preserve_args always strips > > > > > > APS to > > > > > > its underlying argument, but here we want to strip it to its > > > > > > underlying > > > > > > argument pack, so we'd incorrectly end up forming the > > > > > > specializations > > > > > > impl<0>::idx and impl<1>::idx instead of impl<0,1>::idx. > > > > > > > > > > > > Although we can't elide the substitution into DECL_TI_ARGS in light > > > > > > of > > > > > > ARGUMENT_PACK_SELECT, it should still be safe to elide template > > > > > > argument > > > > > > coercion in the case of a non-template decl, which this patch > > > > > > preserves. > > > > > > > > > > > > It's unfortunate that we need to remove this optimization just > > > > > > because > > > > > > it doesn't hold for one special tree code. So this patch implements > > > > > > a > > > > > > heuristic in tsubst_template_args to avoid allocating a new TREE_VEC > > > > > > if > > > > > > the substituted elements are identical to those of a level from > > > > > > ARGS. > > > > > > It turns out that about 30% of all calls to tsubst_template_args > > > > > > benefit > > > > > > from this optimization, and it reduces memory usage by about 1.5% > > > > > > for > > > > > > e.g. stdc++.h (relative to r13-1045). (This is the maybe_reuse > > > > > > stuff, > > > > > > the rest of the changes to tsubst_template_args are just drive-by > > > > > > cleanups.) > > > > > > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK > > > > > > for > > > > > > trunk? Patch generated with -w to ignore noisy whitespace changes. > > > > > > > > > > > > PR c++/105956 > > > > > > > > > > > > gcc/cp/ChangeLog: > > > > > > > > > > > > * pt.cc (tsubst_template_args): Move variable declarations > > > > > > closer to their first use. Replace 'orig_t' with 'r'. Rename > > > > > > 'need_new' to 'const_subst_p'. Heuristically detect if the > > > > > > substituted elements are identical to that of a level from > > > > > > 'args' and avoid allocating a new TREE_VEC if so. > > > > > > (tsubst_decl) : Revert > > > > > > r13-1045-gcb7fd1ea85feea change for avoiding substitution into > > > > > > DECL
Re: [PATCH] Add a heuristic for eliminate redundant load and store in inline pass.
Hello, > From: Lili > > > Hi Hubicka, > > This patch is to add a heuristic inline hint to eliminate redundant load and > store. > > Bootstrap and regtest pending on x86_64-unknown-linux-gnu. > OK for trunk? > > Thanks, > Lili. > > Add a INLINE_HINT_eliminate_load_and_store hint in to inline pass. > We accumulate the insn number of redundant load and store that can be > reduced by these three cases, when the count size is greater than the > threshold, we will enable the hint. with the hint, inlining_insns_auto > will enlarge the bound. > > 1. Caller's store is same with callee's load > 2. Caller's load is same with callee's load > 3. Callee's load is same with caller's local memory access > > With the patch applied > Icelake server: 538.imagic_r get 14.10% improvement for multicopy and 38.90% > improvement for single copy with no measurable changes for other benchmarks. > > Casecadelake: 538.imagic_r get 12.5% improvement for multicopy with and code > size increased by 0.2%. With no measurable changes for other benchmarks. > > Znver3 server: 538.imagic_r get 14.20% improvement for multicopy with and > codei > size increased by 0.3%. With no measurable changes for other benchmarks. > > CPU2017 single copy performance data for Icelake server > BenchMarks Score Build time Code size > 500.perlbench_r 1.50% -0.20% 0.00% > 502.gcc_r0.10% -0.10% 0.00% > 505.mcf_r0.00% 1.70% 0.00% > 520.omnetpp_r-0.60% -0.30% 0.00% > 523.xalancbmk_r 0.60% 0.00% 0.00% > 525.x264_r 0.00% -0.20% 0.00% > 531.deepsjeng_r 0.40% -1.10% -0.10% > 541.leela_r 0.00% 0.00% 0.00% > 548.exchange2_r 0.00% -0.90% 0.00% > 557.xz_r 0.00% 0.00% 0.00% > 503.bwaves_r 0.00% 1.40% 0.00% > 507.cactuBSSN_r 0.00% 1.00% 0.00% > 508.namd_r 0.00% 0.30% 0.00% > 510.parest_r 0.00% -0.40% 0.00% > 511.povray_r 0.70% -0.60% 0.00% > 519.lbm_r0.00% 0.00% 0.00% > 521.wrf_r0.00% 0.60% 0.00% > 526.blender_r0.00% 0.00% 0.00% > 527.cam4_r -0.30% -0.50% 0.00% > 538.imagick_r38.90% 0.50% 0.20% > 544.nab_r0.00% 1.10% 0.00% > 549.fotonik3d_r 0.00% 0.90% 0.00% > 554.roms_r 2.30% -0.10% 0.00% > Geomean-int 0.00% -0.30% 0.00% > Geomean-fp 3.80% 0.30% 0.00% This is interesting idea. Basically we want to guess if inlining will make SRA and or strore->load propagation possible. I think the solution using INLINE_HINT may be bit too trigger happy, since it is very common that this happens and with -O3 the hints are taken quite sriously. We already have mechanism to predict this situaiton by simply expeciting that stores to addresses pointed to by function parameter will be eliminated by 50%. See eliminated_by_inlining_prob. I was thinking that we may combine it with a knowledge that the parameter points to caller local memory (which is done by llvm's heuristics) which can be added to IPA predicates. The idea of checking that the actual sotre in question is paired with load at caller side is bit harder: one needs to invent representation for such conditions. So I wonder how much extra help we need for critical inlning to happen at imagemagics? Honza > > gcc/ChangeLog: > > * ipa-fnsummary.cc (ipa_dump_hints): Add print for hint > "eliminate_load_and_store" > * ipa-fnsummary.h (enum ipa_hints_vals): Add > INLINE_HINT_eliminate_load_and_store. > * ipa-inline-analysis.cc (do_estimate_edge_time): Add judgment for > INLINE_HINT_eliminate_load_and_store. > * ipa-inline.cc (want_inline_small_function_p): Add > "INLINE_HINT_eliminate_load_and_store" for hints flag. > * ipa-modref-tree.h (struct modref_access_node): Move function contains > to public.. > (struct modref_tree): Add new function "same" and > "local_vector_memory_accesse" > * ipa-modref.cc (eliminate_load_and_store): New. > (ipa_merge_modref_summary_after_inlining): Change the input value of > useful_p. > * ipa-modref.h (eliminate_load_and_store): New. > * opts.cc: Add param "min_inline_hint_eliminate_loads_num" > * params.opt: Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.dg/ipa/inlinehint-6.c: New test. > --- > gcc/ipa-fnsummary.cc| 5 ++ > gcc/ipa-fnsummary.h | 4 +- > gcc/ipa-inline-analysis.cc | 7 ++ > gcc/ipa-inline.cc | 3 +- > gcc/ipa-modref-tree.h | 109 +++- > gcc/ipa-modref.cc | 46 +- > gcc/ipa-modref.h| 1 + > gcc/opts.cc | 1 + > gcc/params.opt | 4 + > gcc/testsuite/gcc.
[PATCH v2] ipa-visibility: Optimize TLS access [PR99619]
From: Artem Klimov Fix PR99619, which asks to optimize TLS model based on visibility. The fix is implemented as an IPA optimization: this allows to take optimized visibility status into account (as well as avoid modifying all language frontends). 2022-04-17 Artem Klimov gcc/ChangeLog: * ipa-visibility.cc (function_and_variable_visibility): Promote TLS access model afer visibility optimizations. * varasm.cc (have_optimized_refs): New helper. (optimize_dyn_tls_for_decl_p): New helper. Use it ... (decl_default_tls_model): ... here in place of 'optimize' check. gcc/testsuite/ChangeLog: * gcc.dg/tls/vis-attr-gd.c: New test. * gcc.dg/tls/vis-attr-hidden-gd.c: New test. * gcc.dg/tls/vis-attr-hidden.c: New test. * gcc.dg/tls/vis-flag-hidden-gd.c: New test. * gcc.dg/tls/vis-flag-hidden.c: New test. * gcc.dg/tls/vis-pragma-hidden-gd.c: New test. * gcc.dg/tls/vis-pragma-hidden.c: New test. Co-Authored-By: Alexander Monakov Signed-off-by: Artem Klimov --- v2: run the new loop in ipa-visibility only in the whole-program IPA pass; in decl_default_tls_model, check if any referring function is optimized when 'optimize == 0' (when running in LTO mode) Note for reviewers: I noticed there's a place which tries to avoid TLS promotion, but the comment seems wrong and I could not find a testcase. I'd suggest we remove it. The compiler can only promote general-dynamic to local-dynamic and initial-exec to local-exec. The comment refers to promoting x-dynamic to y-exec, but that cannot happen AFAICT: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=8e1ba78f1b8eedd6c65c6f0e6d6d09a801de5d3d gcc/ipa-visibility.cc | 19 +++ gcc/testsuite/gcc.dg/tls/vis-attr-gd.c| 12 +++ gcc/testsuite/gcc.dg/tls/vis-attr-hidden-gd.c | 13 gcc/testsuite/gcc.dg/tls/vis-attr-hidden.c| 12 +++ gcc/testsuite/gcc.dg/tls/vis-flag-hidden-gd.c | 13 gcc/testsuite/gcc.dg/tls/vis-flag-hidden.c| 12 +++ .../gcc.dg/tls/vis-pragma-hidden-gd.c | 17 ++ gcc/testsuite/gcc.dg/tls/vis-pragma-hidden.c | 16 ++ gcc/varasm.cc | 32 ++- 9 files changed, 145 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/tls/vis-attr-gd.c create mode 100644 gcc/testsuite/gcc.dg/tls/vis-attr-hidden-gd.c create mode 100644 gcc/testsuite/gcc.dg/tls/vis-attr-hidden.c create mode 100644 gcc/testsuite/gcc.dg/tls/vis-flag-hidden-gd.c create mode 100644 gcc/testsuite/gcc.dg/tls/vis-flag-hidden.c create mode 100644 gcc/testsuite/gcc.dg/tls/vis-pragma-hidden-gd.c create mode 100644 gcc/testsuite/gcc.dg/tls/vis-pragma-hidden.c diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc index 8a27e7bcd..3ed2b7cf6 100644 --- a/gcc/ipa-visibility.cc +++ b/gcc/ipa-visibility.cc @@ -873,6 +873,25 @@ function_and_variable_visibility (bool whole_program) } } + if (symtab->state >= IPA_SSA) +{ + FOR_EACH_VARIABLE (vnode) + { + tree decl = vnode->decl; + + /* Upgrade TLS access model based on optimized visibility status, +unless it was specified explicitly or no references remain. */ + if (DECL_THREAD_LOCAL_P (decl) + && !lookup_attribute ("tls_model", DECL_ATTRIBUTES (decl)) + && vnode->ref_list.referring.length ()) + { + enum tls_model new_model = decl_default_tls_model (decl); + gcc_checking_assert (new_model >= decl_tls_model (decl)); + set_decl_tls_model (decl, new_model); + } + } +} + if (dump_file) { fprintf (dump_file, "\nMarking local functions:"); diff --git a/gcc/varasm.cc b/gcc/varasm.cc index 4db8506b1..de149e82c 100644 --- a/gcc/varasm.cc +++ b/gcc/varasm.cc @@ -6679,6 +6679,36 @@ init_varasm_once (void) #endif } +/* Determine whether SYMBOL is used in any optimized function. */ + +static bool +have_optimized_refs (struct symtab_node *symbol) +{ + struct ipa_ref *ref; + + for (int i = 0; symbol->iterate_referring (i, ref); i++) +{ + cgraph_node *cnode = dyn_cast (ref->referring); + + if (cnode && opt_for_fn (cnode->decl, optimize)) + return true; +} + + return false; +} + +/* Check if promoting general-dynamic TLS access model to local-dynamic is + desirable for DECL. */ + +static bool +optimize_dyn_tls_for_decl_p (const_tree decl) +{ + if (optimize) +return true; + return symtab->state >= IPA && have_optimized_refs (symtab_node::get (decl)); +} + + enum tls_model decl_default_tls_model (const_tree decl) { @@ -6696,7 +6726,7 @@ decl_default_tls_model (const_tree decl) /* Local dynamic is inefficient when we're not combining the parts of the address. */ - else if (optimize && is_local) + else if (is_local && optimize_dyn_tls_for_decl_p (decl)) kind = TLS_MODEL_LOCAL
[pushed] c++: -Woverloaded-virtual and dtors [PR87729]
My earlier patch broke out of the loop over base members when we found a match, but that caused trouble for dtors, which can have multiple for which same_signature_p is true. But as the function comment says, we know this doesn't apply to [cd]tors, so skip them. Tested x86_64-pc-linux-gnu, applying to trunk. PR c++/87729 gcc/cp/ChangeLog: * class.cc (warn_hidden): Ignore [cd]tors. gcc/testsuite/ChangeLog: * g++.dg/warn/Woverloaded-virt3.C: New test. --- gcc/cp/class.cc | 3 +++ gcc/testsuite/g++.dg/warn/Woverloaded-virt3.C | 7 +++ 2 files changed, 10 insertions(+) create mode 100644 gcc/testsuite/g++.dg/warn/Woverloaded-virt3.C diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc index 17683f421a7..eb69e7f985c 100644 --- a/gcc/cp/class.cc +++ b/gcc/cp/class.cc @@ -3020,6 +3020,9 @@ warn_hidden (tree t) tree binfo; unsigned j; + if (IDENTIFIER_CDTOR_P (name)) + continue; + /* Iterate through all of the base classes looking for possibly hidden functions. */ for (binfo = TYPE_BINFO (t), j = 0; diff --git a/gcc/testsuite/g++.dg/warn/Woverloaded-virt3.C b/gcc/testsuite/g++.dg/warn/Woverloaded-virt3.C new file mode 100644 index 000..34214ba2557 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Woverloaded-virt3.C @@ -0,0 +1,7 @@ +// PR c++/87729 +// { dg-additional-options -Woverloaded-virtual } + +struct S1 {}; +struct S2: S1 { virtual ~S2(); }; +struct S3 { virtual ~S3(); }; +struct S4: S2, S3 { virtual ~S4(); }; base-commit: c1b1c4e58bda152ae932b45396ab67b07dd8c3fe -- 2.31.1
Re: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.
On 7/5/2022 8:09 PM, Andrew Pinski via Gcc-patches wrote: Not your fault but there are now like two different predicates for a boolean like operand. zero_one_valued_p and truth_valued_p and a third way to describe it is to use SSA_NAME and check ssa_name_has_boolean_range. The latter is meant to catch cases where analysis indicates that a given SSA_NAME only takes on the values 0 or 1, regardless of the actual size of the SSA_NAME. It pre-dates having reasonable range information available in DOM and from reviewing the existing uses in DOM, I would expect Ranger to make most, if not all, of this code useless. jeff
Re: [PATCH v2] tree-optimization/95821 - Convert strlen + strchr to memchr
On Tue, Jun 21, 2022 at 11:13 AM Noah Goldstein wrote: > > On Tue, Jun 21, 2022 at 5:01 AM Jakub Jelinek wrote: > > > > On Mon, Jun 20, 2022 at 02:42:20PM -0700, Noah Goldstein wrote: > > > This patch allows for strchr(x, c) to the replace with memchr(x, c, > > > strlen(x) + 1) if strlen(x) has already been computed earlier in the > > > tree. > > > > > > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821 > > > > > > Since memchr doesn't need to re-find the null terminator it is faster > > > than strchr. > > > > > > bootstrapped and tested on x86_64-linux. > > > > > > PR tree-optimization/95821 > > > > This should be indented by a single tab, not two. > > Fixed in V3 > > > > > > gcc/ > > > > > > * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit > > > memchr instead of strchr if strlen already computed. > > > > > > gcc/testsuite/ > > > > > > * c-c++-common/pr95821-1.c: New test. > > > * c-c++-common/pr95821-2.c: New test. > > > * c-c++-common/pr95821-3.c: New test. > > > * c-c++-common/pr95821-4.c: New test. > > > * c-c++-common/pr95821-5.c: New test. > > > * c-c++-common/pr95821-6.c: New test. > > > * c-c++-common/pr95821-7.c: New test. > > > * c-c++-common/pr95821-8.c: New test. > > > --- a/gcc/tree-ssa-strlen.cc > > > +++ b/gcc/tree-ssa-strlen.cc > > > @@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen () > > > } > > > } > > > > > > -/* Handle a strchr call. If strlen of the first argument is known, > > > replace > > > - the strchr (x, 0) call with the endptr or x + strlen, otherwise > > > remember > > > - that lhs of the call is endptr and strlen of the argument is endptr - > > > x. */ > > > +/* Handle a strchr call. If strlen of the first argument is known, > > > + replace the strchr (x, 0) call with the endptr or x + strlen, > > > + otherwise remember that lhs of the call is endptr and strlen of the > > > + argument is endptr - x. If strlen of x is not know but has been > > > + computed earlier in the tree then replace strchr(x, c) to > > > > Still missing space before ( above. > > Sorry, fixed that in V3. > > > > > + memchr (x, c, strlen + 1). */ > > > > > > void > > > strlen_pass::handle_builtin_strchr () > > > @@ -2418,8 +2421,12 @@ strlen_pass::handle_builtin_strchr () > > >if (lhs == NULL_TREE) > > > return; > > > > > > - if (!integer_zerop (gimple_call_arg (stmt, 1))) > > > -return; > > > + tree chr = gimple_call_arg (stmt, 1); > > > + /* strchr only uses the lower char of input so to check if its > > > + strchr (s, zerop) only take into account the lower char. */ > > > + bool is_strchr_zerop > > > + = (TREE_CODE (chr) == INTEGER_CST > > > + && integer_zerop (fold_convert (char_type_node, chr))); > > > > The indentation rule is that = should be 2 columns to the right from bool, > > so > > > > Fixed in V3. > > bool is_strchr_zerop > > = (TREE_CODE (chr) == INTEGER_CST > >&& integer_zerop (fold_convert (char_type_node, chr))); > > > > > + /* If its not strchr (s, zerop) then try and convert to > > > + memchr since strlen has already been computed. */ > > > > This comment still has the second line weirdly indented. > > Sorry, have emacs with 4-space tabs so things that look right arent > as they seem :/ > > Fixed in V3 I believe. > > > > > + tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR); > > > + > > > + /* Only need to check length strlen (s) + 1 if chr may be > > > zero. > > > + Otherwise the last chr (which is known to be zero) can never > > > + be a match. NB: We don't need to test if chr is a non-zero > > > + integer const with zero char bits because that is taken into > > > + account with is_strchr_zerop. */ > > > + if (!tree_expr_nonzero_p (chr)) > > > > The above is unsafe though. tree_expr_nonzero_p (chr) will return true > > if say VRP can prove it is not zero, but because of the implicit > > (char) chr cast done by the function we need something different. > > Say if VRP determines that chr is in [1, INT_MAX] or even just [255, 257] > > it doesn't mean (char) chr won't be 0. > > So, as I've tried to explain in the previous mail, it can be done e.g. with > > Added your code in V3. Thanks for the help. > > bool chr_nonzero = false; > > if (TREE_CODE (chr) == INTEGER_CST > > && integer_nonzerop (fold_convert (char_type_node, chr))) > > chr_nonzero = true; > > else if (TREE_CODE (chr) == SSA_NAME > >&& CHAR_TYPE_SIZE < INT_TYPE_SIZE) > > { > > value_range r; > > /* Try to determine using ranges if (char) chr must > > be always 0. That is true e.g. if all the subranges > > have the INT_TYPE_SIZE - CHAR_TY
[x86 PATCH] Support *testdi_not_doubleword during STV pass.
This patch fixes the current two FAILs of pr65105-5.c on x86 when compiled with -m32. These (temporary) breakages were fallout from my patches to improve/upgrade (scalar) double word comparisons. On mainline, the i386 backend currently represents a critical comparison using (compare (and (not reg1) reg2) (const_int 0)) which isn't/wasn't recognized by the STV pass' convertible_comparison_p. This simple STV patch adds support for this pattern (*testdi_not_doubleword) and generates the vector pandn and ptest instructions expected in the existing (failing) test case. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, where with --target_board=unix{-m32} there are two fewer failures, and without, there are no new failures. Ok for mainline? 2022-07-07 Roger Sayle gcc/ChangeLog * config/i386/i386-features.cc (convert_compare): Add support for *testdi_not_doubleword pattern (i.e. "(compare (and (not ...") by generating a pandn followed by ptest. (convertible_comparison_p): Recognize both *cmpdi_doubleword and recent *testdi_not_doubleword comparison patterns. Thanks in advance, Roger -- diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index be38586..a7bd172 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -938,10 +938,10 @@ general_scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn) { rtx tmp = gen_reg_rtx (vmode); rtx src; - convert_op (&op1, insn); /* Comparison against anything other than zero, requires an XOR. */ if (op2 != const0_rtx) { + convert_op (&op1, insn); convert_op (&op2, insn); /* If both operands are MEMs, explicitly load the OP1 into TMP. */ if (MEM_P (op1) && MEM_P (op2)) @@ -953,8 +953,25 @@ general_scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn) src = op1; src = gen_rtx_XOR (vmode, src, op2); } + else if (GET_CODE (op1) == AND + && GET_CODE (XEXP (op1, 0)) == NOT) +{ + rtx op11 = XEXP (XEXP (op1, 0), 0); + rtx op12 = XEXP (op1, 1); + convert_op (&op11, insn); + convert_op (&op12, insn); + if (MEM_P (op11)) + { + emit_insn_before (gen_rtx_SET (tmp, op11), insn); + op11 = tmp; + } + src = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, op11), op12); +} else -src = op1; +{ + convert_op (&op1, insn); + src = op1; +} emit_insn_before (gen_rtx_SET (tmp, src), insn); if (vmode == V2DImode) @@ -1399,17 +1416,29 @@ convertible_comparison_p (rtx_insn *insn, enum machine_mode mode) rtx op1 = XEXP (src, 0); rtx op2 = XEXP (src, 1); - if (!CONST_INT_P (op1) - && ((!REG_P (op1) && !MEM_P (op1)) - || GET_MODE (op1) != mode)) -return false; - - if (!CONST_INT_P (op2) - && ((!REG_P (op2) && !MEM_P (op2)) - || GET_MODE (op2) != mode)) -return false; + /* *cmp_doubleword. */ + if ((CONST_INT_P (op1) + || ((REG_P (op1) || MEM_P (op1)) + && GET_MODE (op1) == mode)) + && (CONST_INT_P (op2) + || ((REG_P (op2) || MEM_P (op2)) + && GET_MODE (op2) == mode))) +return true; + + /* *test_not_doubleword. */ + if (op2 == const0_rtx + && GET_CODE (op1) == AND + && GET_CODE (XEXP (op1, 0)) == NOT) +{ + rtx op11 = XEXP (XEXP (op1, 0), 0); + rtx op12 = XEXP (op1, 1); + return (REG_P (op11) || MEM_P (op11)) +&& (REG_P (op12) || MEM_P (op12)) +&& GET_MODE (op11) == mode +&& GET_MODE (op12) == mode; +} - return true; + return false; } /* The general version of scalar_to_vector_candidate_p. */
[PATCH v2] Simplify memchr with small constant strings
When memchr is applied on a constant string of no more than the bytes of a word, simplify memchr by checking each byte in the constant string. int f (int a) { return __builtin_memchr ("AE", a, 2) != 0; } is simplified to int f (int a) { return ((char) a == 'A' || (char) a == 'E') != 0; } gcc/ PR tree-optimization/103798 * tree-ssa-forwprop.cc: Include "tree-ssa-strlen.h". (simplify_builtin_call): Inline memchr with constant strings of no more than the bytes of a word. * tree-ssa-strlen.cc (use_in_zero_equality): Make it global. * tree-ssa-strlen.h (use_in_zero_equality): New. gcc/testsuite/ PR tree-optimization/103798 * c-c++-common/pr103798-1.c: New test. * c-c++-common/pr103798-2.c: Likewise. * c-c++-common/pr103798-3.c: Likewise. * c-c++-common/pr103798-4.c: Likewise. * c-c++-common/pr103798-5.c: Likewise. * c-c++-common/pr103798-6.c: Likewise. * c-c++-common/pr103798-7.c: Likewise. * c-c++-common/pr103798-8.c: Likewise. --- gcc/testsuite/c-c++-common/pr103798-1.c | 28 +++ gcc/testsuite/c-c++-common/pr103798-2.c | 30 gcc/testsuite/c-c++-common/pr103798-3.c | 28 +++ gcc/testsuite/c-c++-common/pr103798-4.c | 28 +++ gcc/testsuite/c-c++-common/pr103798-5.c | 26 ++ gcc/testsuite/c-c++-common/pr103798-6.c | 27 +++ gcc/testsuite/c-c++-common/pr103798-7.c | 27 +++ gcc/testsuite/c-c++-common/pr103798-8.c | 27 +++ gcc/tree-ssa-forwprop.cc| 64 + gcc/tree-ssa-strlen.cc | 4 +- gcc/tree-ssa-strlen.h | 2 + 11 files changed, 289 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/pr103798-1.c create mode 100644 gcc/testsuite/c-c++-common/pr103798-2.c create mode 100644 gcc/testsuite/c-c++-common/pr103798-3.c create mode 100644 gcc/testsuite/c-c++-common/pr103798-4.c create mode 100644 gcc/testsuite/c-c++-common/pr103798-5.c create mode 100644 gcc/testsuite/c-c++-common/pr103798-6.c create mode 100644 gcc/testsuite/c-c++-common/pr103798-7.c create mode 100644 gcc/testsuite/c-c++-common/pr103798-8.c diff --git a/gcc/testsuite/c-c++-common/pr103798-1.c b/gcc/testsuite/c-c++-common/pr103798-1.c new file mode 100644 index 000..cd3edf569fc --- /dev/null +++ b/gcc/testsuite/c-c++-common/pr103798-1.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */ + +__attribute__ ((weak)) +int +f (char a) +{ + return __builtin_memchr ("a", a, 1) == 0; +} + +__attribute__ ((weak)) +int +g (char a) +{ + return a != 'a'; +} + +int +main () +{ + for (int i = 0; i < 255; i++) + if (f (i) != g (i)) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-assembler-not "memchr" } } */ diff --git a/gcc/testsuite/c-c++-common/pr103798-2.c b/gcc/testsuite/c-c++-common/pr103798-2.c new file mode 100644 index 000..e7e99c3679e --- /dev/null +++ b/gcc/testsuite/c-c++-common/pr103798-2.c @@ -0,0 +1,30 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */ + +#include + +__attribute__ ((weak)) +int +f (int a) +{ + return memchr ("aE", a, 2) != NULL; +} + +__attribute__ ((weak)) +int +g (char a) +{ + return a == 'a' || a == 'E'; +} + +int +main () +{ + for (int i = 0; i < 255; i++) + if (f (i + 256) != g (i + 256)) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-assembler-not "memchr" } } */ diff --git a/gcc/testsuite/c-c++-common/pr103798-3.c b/gcc/testsuite/c-c++-common/pr103798-3.c new file mode 100644 index 000..ddcedc7e238 --- /dev/null +++ b/gcc/testsuite/c-c++-common/pr103798-3.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */ + +__attribute__ ((weak)) +int +f (char a) +{ + return __builtin_memchr ("aEgZ", a, 3) == 0; +} + +__attribute__ ((weak)) +int +g (char a) +{ + return a != 'a' && a != 'E' && a != 'g'; +} + +int +main () +{ + for (int i = 0; i < 255; i++) + if (f (i) != g (i)) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-assembler-not "memchr" } } */ diff --git a/gcc/testsuite/c-c++-common/pr103798-4.c b/gcc/testsuite/c-c++-common/pr103798-4.c new file mode 100644 index 000..00e8302a833 --- /dev/null +++ b/gcc/testsuite/c-c++-common/pr103798-4.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */ + +__attribute__ ((weak)) +int +f (char a) +{ + return __builtin_memchr ("aEgi", a, 4) != 0; +} + +__attribute__ ((weak)) +int +g (char a) +{ + return a == 'a' || a == 'E' || a == 'g' || a == 'i'; +} + +int +main () +{ + for (int i = 0; i < 255; i++) + if (f (i) != g (i)) + __builtin_abort (); + + return 0; +} + +/* { dg-final { scan-assembler-not "memchr" } } */ diff --git a/gcc/testsuite/c-c++-common/pr103798-5.c b/gcc/testsuite/c-c++-common
Re: [PATCH] Inline memchr with a small constant string
On Thu, Jun 23, 2022 at 9:26 AM H.J. Lu wrote: > > On Wed, Jun 22, 2022 at 11:03 PM Richard Biener > wrote: > > > > On Wed, Jun 22, 2022 at 7:13 PM H.J. Lu wrote: > > > > > > On Wed, Jun 22, 2022 at 4:39 AM Richard Biener > > > wrote: > > > > > > > > On Tue, Jun 21, 2022 at 11:03 PM H.J. Lu via Gcc-patches > > > > wrote: > > > > > > > > > > When memchr is applied on a constant string of no more than the bytes > > > > > of > > > > > a word, inline memchr by checking each byte in the constant string. > > > > > > > > > > int f (int a) > > > > > { > > > > >return __builtin_memchr ("eE", a, 2) != 0; > > > > > } > > > > > > > > > > is simplified to > > > > > > > > > > int f (int a) > > > > > { > > > > > return (char) a == 'e' || (char) a == 'E'; > > > > > } > > > > > > > > > > gcc/ > > > > > > > > > > PR tree-optimization/103798 > > > > > * match.pd (__builtin_memchr (const_str, a, N)): Inline memchr > > > > > with constant strings of no more than the bytes of a word. > > > > > > > > Please do this in strlenopt or so, with match.pd you will end up moving > > > > the memchr loads across possible aliasing stores to the point of the > > > > comparison. > > > > > > strlenopt is run after many other passes. The code won't be well > > > optimized. > > > > What followup optimizations do you expect? That is, other builtins are only > > reassociation and dce turn > > _5 = a_2(D) == 101; > _6 = a_2(D) == 69; > _1 = _5 | _6; > _4 = (int) _1; > > into > > _7 = a_2(D) & -33; > _8 = _7 == 69; > _1 = _8; > _4 = (int) _1; > > > expanded inline at RTL expansion time? > > Some high level optimizations will be missed and > TARGET_GIMPLE_FOLD_BUILTIN improves builtins > codegen. > > > > Since we are only optimizing > > > > > > __builtin_memchr ("eE", a, 2) != 0; > > > > > > I don't see any aliasing store issues here. > > > > Ah, I failed to see the STRING_CST restriction. Note that when optimizing > > for > > size this doesn't look very good. > > True. > > > I would expect a target might produce some vector code for > > memchr ("aAbBcCdDeE...", c, 9) != 0 by splatting 'c', doing > > a v16qimode compare, masking off excess elements beyond length > > and then comparing against zero or for == 0 against all-ones. > > > > The repetitive pattern result also suggests an implementation elsewhere, > > if you think strlenopt is too late there would be forwprop as well. > > forwprop seems a good place. The v2 patch is at https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598022.html Thanks. -- H.J.
[committed] libstdc++: Remove workaround in __gnu_cxx::char_traits::move [PR89074]
Tested aarch64-linux, pushed to trunk. -- >8 -- The front-end bug that prevented this constexpr loop from working has been fixed since GCC 12.1 so we can remove the workaround. libstdc++-v3/ChangeLog: PR c++/89074 * include/bits/char_traits.h (__gnu_cxx::char_traits::move): Remove workaround for front-end bug. --- libstdc++-v3/include/bits/char_traits.h | 9 - 1 file changed, 9 deletions(-) diff --git a/libstdc++-v3/include/bits/char_traits.h b/libstdc++-v3/include/bits/char_traits.h index b856b1da320..965ff29b75c 100644 --- a/libstdc++-v3/include/bits/char_traits.h +++ b/libstdc++-v3/include/bits/char_traits.h @@ -215,14 +215,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { if (__s1 == __s2) // unlikely, but saves a lot of work return __s1; -#if __cpp_constexpr_dynamic_alloc - // The overlap detection below fails due to PR c++/89074, - // so use a temporary buffer instead. - char_type* __tmp = new char_type[__n]; - copy(__tmp, __s2, __n); - copy(__s1, __tmp, __n); - delete[] __tmp; -#else const auto __end = __s2 + __n - 1; bool __overlap = false; for (std::size_t __i = 0; __i < __n - 1; ++__i) @@ -244,7 +236,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } else copy(__s1, __s2, __n); -#endif return __s1; } #endif -- 2.36.1
[PATCH] libstdc++: Prefer const T to std::add_const_t
Does anybody see a problem with this change? The point is to avoid unnecessary class template instantiations. Tested aarch64-linux. -- >8 -- For any typedef-name or template parameter, T, add_const_t is equivalent to T const, so we can avoid instantiating the std::add_const class template and just say T const (or const T). This isn't true for a non-typedef like int&, where int& const would be ill-formed, but we shouldn't be using add_const_t anyway, because we know what that type is. The only place we need to continue using std::add_const is in the std::bind implementation where it's used as a template template parameter to be applied as a metafunction elsewhere. libstdc++-v3/ChangeLog: * include/bits/stl_iterator.h (__iter_to_alloc_t): Replace add_const_t with const-qualifier. * include/bits/utility.h (tuple_element): Likewise for all cv-qualifiers. * include/std/type_traits (add_const, add_volatile): Replace typedef-declaration with using-declaration. (add_cv): Replace add_const and add_volatile with cv-qualifiers. * include/std/variant (variant_alternative): Replace add_const_t, add_volatile_t and add_cv_t etc. with cv-qualifiers. --- libstdc++-v3/include/bits/stl_iterator.h | 11 +-- libstdc++-v3/include/bits/utility.h | 6 +++--- libstdc++-v3/include/std/type_traits | 9 +++-- libstdc++-v3/include/std/variant | 6 +++--- 4 files changed, 14 insertions(+), 18 deletions(-) diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h index 12a89ab229f..049cb02a4c4 100644 --- a/libstdc++-v3/include/bits/stl_iterator.h +++ b/libstdc++-v3/include/bits/stl_iterator.h @@ -2536,19 +2536,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // of associative containers. template using __iter_key_t = remove_const_t< -typename iterator_traits<_InputIterator>::value_type::first_type>; + typename iterator_traits<_InputIterator>::value_type::first_type>; template -using __iter_val_t = -typename iterator_traits<_InputIterator>::value_type::second_type; +using __iter_val_t + = typename iterator_traits<_InputIterator>::value_type::second_type; template struct pair; template -using __iter_to_alloc_t = -pair>, -__iter_val_t<_InputIterator>>; +using __iter_to_alloc_t + = pair, __iter_val_t<_InputIterator>>; #endif // __cpp_deduction_guides _GLIBCXX_END_NAMESPACE_VERSION diff --git a/libstdc++-v3/include/bits/utility.h b/libstdc++-v3/include/bits/utility.h index e0e40309a6d..6a192e27836 100644 --- a/libstdc++-v3/include/bits/utility.h +++ b/libstdc++-v3/include/bits/utility.h @@ -86,19 +86,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template struct tuple_element<__i, const _Tp> { - typedef typename add_const<__tuple_element_t<__i, _Tp>>::type type; + using type = const __tuple_element_t<__i, _Tp>; }; template struct tuple_element<__i, volatile _Tp> { - typedef typename add_volatile<__tuple_element_t<__i, _Tp>>::type type; + using type = volatile __tuple_element_t<__i, _Tp>; }; template struct tuple_element<__i, const volatile _Tp> { - typedef typename add_cv<__tuple_element_t<__i, _Tp>>::type type; + using type = const volatile __tuple_element_t<__i, _Tp>; }; #if __cplusplus >= 201402L diff --git a/libstdc++-v3/include/std/type_traits b/libstdc++-v3/include/std/type_traits index 2572d8edd69..e5f58bc2e3f 100644 --- a/libstdc++-v3/include/std/type_traits +++ b/libstdc++-v3/include/std/type_traits @@ -1577,20 +1577,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION /// add_const template struct add_const -{ typedef _Tp const type; }; +{ using type = _Tp const; }; /// add_volatile template struct add_volatile -{ typedef _Tp volatile type; }; +{ using type = _Tp volatile; }; /// add_cv template struct add_cv -{ - typedef typename - add_const::type>::type type; -}; +{ using type = _Tp const volatile; }; #if __cplusplus > 201103L diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant index 5ff1e3edcdf..f8f15665433 100644 --- a/libstdc++-v3/include/std/variant +++ b/libstdc++-v3/include/std/variant @@ -107,15 +107,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template struct variant_alternative<_Np, const _Variant> -{ using type = add_const_t>; }; +{ using type = const variant_alternative_t<_Np, _Variant>; }; template struct variant_alternative<_Np, volatile _Variant> -{ using type = add_volatile_t>; }; +{ using type = volatile variant_alternative_t<_Np, _Variant>; }; template struct variant_alternative<_Np, const volatile _Variant> -{ using type = add_cv_t>; }; +{ using type = const volatile variant_alternative_t<_Np, _Variant>; }; inline constexpr size_t variant_npos = -
[PATCH] c++: Define built-in for std::tuple_element [PR100157]
This adds a new built-in to replace the recursive class template instantiations done by traits such as std::tuple_element and std::variant_alternative. The purpose is to select the Nth type from a list of types, e.g. __builtin_type_pack_element(1, char, int, float) is int. For a pathological example tuple_element_t<1000, tuple<2000 types...>> the compilation time is reduced by more than 90% and the memory used by the compiler is reduced by 97%. In realistic examples the gains will be much smaller, but still relevant. Clang has a similar built-in, __type_pack_element, but that's a "magic template" built-in using <> syntax, which GCC doesn't support. So this provides an equivalent feature, but as a built-in function using parens instead of <>. I don't really like the name "type pack element" (it gives you an element from a pack of types) but the semi-consistency with Clang seems like a reasonable argument in favour of keeping the name. I'd be open to alternative names though, e.g. __builtin_nth_type or __builtin_type_at_index. The patch has some problems though ... FIXME 1: Marek pointed out that this this ICEs: template using type = __builtin_type_pack_element(sizeof(T), T...); type c; The sizeof(T) expression is invalid, because T is an unexpanded pack, but it's not rejected and instead crashes: ice.C: In substitution of 'template using type = __builtin_type_pack_element (sizeof (T), T ...) [with T = {int, char}]': ice.C:2:15: required from here ice.C:1:63: internal compiler error: in dependent_type_p, at cp/pt.cc:27490 1 | template using type = __builtin_type_pack_element(sizeof(T), T...); | ^ 0xe13eea dependent_type_p(tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:27490 0xeb1286 cxx_sizeof_or_alignof_type(unsigned int, tree_node*, tree_code, bool, bool) /home/jwakely/src/gcc/gcc/gcc/cp/typeck.cc:1912 0xdf4fcc tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:20582 0xdd9121 tsubst_tree_list(tree_node*, tree_node*, int, tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15587 0xddb583 tsubst(tree_node*, tree_node*, int, tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16056 0xddcc9d tsubst(tree_node*, tree_node*, int, tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16436 0xdd6d45 tsubst_decl /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15038 0xdd952a tsubst(tree_node*, tree_node*, int, tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15668 0xdfb9a1 instantiate_template(tree_node*, tree_node*, int) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21811 0xdfc1b6 instantiate_alias_template /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21896 0xdd9796 tsubst(tree_node*, tree_node*, int, tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15696 0xdbaba5 lookup_template_class(tree_node*, tree_node*, tree_node*, tree_node*, int, int) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:10131 0xe4bac0 finish_template_type(tree_node*, tree_node*, int) /home/jwakely/src/gcc/gcc/gcc/cp/semantics.cc:3727 0xd334c8 cp_parser_template_id /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:18458 0xd429b0 cp_parser_class_name /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:25923 0xd1ade9 cp_parser_qualifying_entity /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:7193 0xd1a2c8 cp_parser_nested_name_specifier_opt /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:6875 0xd4eefd cp_parser_template_introduction /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31668 0xd4f416 cp_parser_template_declaration_after_export /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31840 0xd2d60e cp_parser_declaration /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:15083 FIXME 2: I want to mangle __builtin_type_pack_element(N, T...) the same as typename std::_Nth_type::type but I don't know how. Instead of trying to fake the mangled string, it's probably better to build a decl for that nested type, right? Any suggestions where to find something similar I can learn from? The reason to mangle it that way is that it preserves the same symbol names as the library produced in GCC 12, and that it will still produce with non-GCC compilers (see the definitions of std::_Nth_type in the library parts of the patch). If we don't do that then either we need to ensure it never appears in a mangled name, or define some other GCC-specific mangling for this built-in (e.g. we could just mangle it as though it's a function, "19__builtin_type_pack_elementELm1EJDpT_E" or something like that!). If we ensure it doesn't appear in mangled names that means we still need to instantiate the _Nth_type class template, rather than defining the alias template _Nth_type_t to use the built-in directly. That loses a little of the compilation performance gain that comes from defining the built-in in the first place (although avoid th
Re: [PATCH] c++: generic targs and identity substitution [PR105956]
On Thu, 7 Jul 2022, Patrick Palka wrote: > On Thu, 7 Jul 2022, Jason Merrill wrote: > > > On 7/6/22 15:26, Patrick Palka wrote: > > > On Tue, 5 Jul 2022, Jason Merrill wrote: > > > > > > > On 7/5/22 10:06, Patrick Palka wrote: > > > > > On Fri, 1 Jul 2022, Jason Merrill wrote: > > > > > > > > > > > On 6/29/22 13:42, Patrick Palka wrote: > > > > > > > In r13-1045-gcb7fd1ea85feea I assumed that substitution into > > > > > > > generic > > > > > > > DECL_TI_ARGS corresponds to an identity mapping of the given > > > > > > > arguments, > > > > > > > and hence its safe to always elide such substitution. But this PR > > > > > > > demonstrates that such a substitution isn't always the identity > > > > > > > mapping, > > > > > > > in particular when there's an ARGUMENT_PACK_SELECT argument, which > > > > > > > gets > > > > > > > handled specially during substitution: > > > > > > > > > > > > > > * when substituting an APS into a template parameter, we > > > > > > > strip > > > > > > > the > > > > > > >APS to its underlying argument; > > > > > > > * and when substituting an APS into a pack expansion, we > > > > > > > strip > > > > > > > the > > > > > > >APS to its underlying argument pack. > > > > > > > > > > > > Ah, right. For instance, in variadic96.C we have > > > > > > > > > > > > 10template < typename... T > > > > > > > 11struct derived > > > > > > 12 : public base< T, derived< T... > >... > > > > > > > > > > > > so when substituting into the base-specifier, we're approaching it > > > > > > from > > > > > > the > > > > > > outside in, so when we get to the inner T... we need some way to > > > > > > find > > > > > > the > > > > > > T > > > > > > pack again. It might be possible to remove the need for APS by > > > > > > substituting > > > > > > inner pack expansions before outer ones, which could improve > > > > > > worst-case > > > > > > complexity, but I don't know how relevant that is in real code; I > > > > > > imagine > > > > > > most > > > > > > inner pack expansions are as simple as this one. > > > > > > > > > > Aha, that makes sense. > > > > > > > > > > > > > > > > > > In this testcase, when expanding the pack expansion pattern (idx + > > > > > > > Ns)... > > > > > > > with Ns={0,1}, we specialize idx twice, first with Ns=APS<0,{0,1}> > > > > > > > and > > > > > > > then Ns=APS<1,{0,1}>. The DECL_TI_ARGS of idx are the generic > > > > > > > template > > > > > > > arguments of the enclosing class template impl, so before > > > > > > > r13-1045, > > > > > > > we'd substitute into its DECL_TI_ARGS which gave Ns={0,1} as > > > > > > > desired. > > > > > > > But after r13-1045, we elide this substitution and end up > > > > > > > attempting > > > > > > > to > > > > > > > hash the original Ns argument, an APS, which ICEs. > > > > > > > > > > > > > > So this patch partially reverts this part of r13-1045. I > > > > > > > considered > > > > > > > using preserve_args in this case instead, but that'd break the > > > > > > > static_assert in the testcase because preserve_args always strips > > > > > > > APS to > > > > > > > its underlying argument, but here we want to strip it to its > > > > > > > underlying > > > > > > > argument pack, so we'd incorrectly end up forming the > > > > > > > specializations > > > > > > > impl<0>::idx and impl<1>::idx instead of impl<0,1>::idx. > > > > > > > > > > > > > > Although we can't elide the substitution into DECL_TI_ARGS in > > > > > > > light > > > > > > > of > > > > > > > ARGUMENT_PACK_SELECT, it should still be safe to elide template > > > > > > > argument > > > > > > > coercion in the case of a non-template decl, which this patch > > > > > > > preserves. > > > > > > > > > > > > > > It's unfortunate that we need to remove this optimization just > > > > > > > because > > > > > > > it doesn't hold for one special tree code. So this patch > > > > > > > implements > > > > > > > a > > > > > > > heuristic in tsubst_template_args to avoid allocating a new > > > > > > > TREE_VEC > > > > > > > if > > > > > > > the substituted elements are identical to those of a level from > > > > > > > ARGS. > > > > > > > It turns out that about 30% of all calls to tsubst_template_args > > > > > > > benefit > > > > > > > from this optimization, and it reduces memory usage by about 1.5% > > > > > > > for > > > > > > > e.g. stdc++.h (relative to r13-1045). (This is the maybe_reuse > > > > > > > stuff, > > > > > > > the rest of the changes to tsubst_template_args are just drive-by > > > > > > > cleanups.) > > > > > > > > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look > > > > > > > OK > > > > > > > for > > > > > > > trunk? Patch generated with -w to ignore noisy whitespace > > > > > > > changes. > > > > > > > > > > > > > > PR c++/105956 > > > > > > > > > > > > > > gcc/cp/ChangeLog: > > > > > > > > > > > > > > * pt.cc (tsubst_template_args): Move variable declarations > > > > > >
Re: [PATCH] c++: Define built-in for std::tuple_element [PR100157]
On Thu, Jul 07, 2022 at 06:14:36PM +0100, Jonathan Wakely wrote: > This adds a new built-in to replace the recursive class template > instantiations done by traits such as std::tuple_element and > std::variant_alternative. The purpose is to select the Nth type from a > list of types, e.g. __builtin_type_pack_element(1, char, int, float) is > int. > > For a pathological example tuple_element_t<1000, tuple<2000 types...>> > the compilation time is reduced by more than 90% and the memory used by > the compiler is reduced by 97%. In realistic examples the gains will be > much smaller, but still relevant. > > Clang has a similar built-in, __type_pack_element, but that's a > "magic template" built-in using <> syntax, which GCC doesn't support. So > this provides an equivalent feature, but as a built-in function using > parens instead of <>. I don't really like the name "type pack element" > (it gives you an element from a pack of types) but the semi-consistency > with Clang seems like a reasonable argument in favour of keeping the > name. I'd be open to alternative names though, e.g. __builtin_nth_type > or __builtin_type_at_index. > > > The patch has some problems though ... > > FIXME 1: Marek pointed out that this this ICEs: > template using type = __builtin_type_pack_element(sizeof(T), > T...); > type c; > > The sizeof(T) expression is invalid, because T is an unexpanded pack, > but it's not rejected and instead crashes: I think this could be fixed by if (check_for_bare_parameter_packs (n)) return error_mark_node; in finish_type_pack_element. (I haven't looked at the rest of the patch yet.) Marek
Re: [PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]
Hi! On Thu, Jul 07, 2022 at 04:30:50PM +0800, HAO CHEN GUI wrote: > This patch modifies the combine pattern after recog fails. With a helper It modifies combine itself, not just a pattern in the machine description. > - change_pseudo_and_mask, it converts a single pseudo to the pseudo AND with > a mask when the outer operator is IOR/XOR/PLUS and inner operator is ASHIFT > or AND. The conversion helps pattern to match rotate and mask insn on some > targets. > For test case rlwimi-2.c, current trunk fails on > "scan-assembler-times (?n)^\\s+[a-z]". It reports 21305 times. So my patch > reduces the total number of insns from 21305 to 21279. That is incorrect. You need to figure out what actually changed, and if that is wanted or not, and then write some explanation about that. > * config/rs6000/rs6000.md (plus_ior_xor): Removed. > (anonymous split pattern for plus_ior_xor): Removed. "Remove.", in both cases. Always use imperative in changelogs and commit messages and the like, not some passive tense. > +/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is > + ASHIFT/AND, convert a pseudo to psuedo AND with a mask if its nonzero_bits s/psuedo/pseudo/ > + is less than its mode mask. The nonzero_bits in other pass doesn't return > + the same value as it does in combine pass. */ That isn't quite the problem. Later passes can return a mask for nonzero_bits (which means: bits that are not known to be zero) that is not a superset of what was known during combine. If you use nonzero_bits in the insn condition of a define_insn (or define_insn_and_split, same thing under the covers) you then can end up with an insns that is fine during combine, but no longer recog()ed later. > +static bool > +change_pseudo_and_mask (rtx pat) > +{ > + rtx src = SET_SRC (pat); > + if ((GET_CODE (src) == IOR > + || GET_CODE (src) == XOR > + || GET_CODE (src) == PLUS) > + && (((GET_CODE (XEXP (src, 0)) == ASHIFT > + || GET_CODE (XEXP (src, 0)) == AND) > +&& REG_P (XEXP (src, 1) > +{ > + rtx *reg = &XEXP (SET_SRC (pat), 1); Why the extra indirection? SUBST is a macro, it can take lvalues just fine :-) > + machine_mode mode = GET_MODE (*reg); > + unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode); > + if (nonzero < GET_MODE_MASK (mode)) > + { > + int shift; > + > + if (GET_CODE (XEXP (src, 0)) == ASHIFT) > + shift = INTVAL (XEXP (XEXP (src, 0), 1)); > + else > + shift = ctz_hwi (INTVAL (XEXP (XEXP (src, 0), 1))); > + > + if (shift > 0 > + && ((HOST_WIDE_INT_1U << shift) - 1) >= nonzero) Too many parens. > + { > + unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << shift) - 1; > + rtx x = gen_rtx_AND (mode, *reg, GEN_INT (mask)); > + SUBST (*reg, x); > + maybe_swap_commutative_operands (SET_SRC (pat)); > + return true; > + } > + } > + } > + return false; Broken indentation. > --- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c > +++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c > @@ -12,7 +12,7 @@ void rotins (unsigned int x) >b.y = (x<<12) | (x>>20); > } > > -/* { dg-final { scan-assembler-not {\mrlwinm} } } */ > +/* { dg-final { scan-assembler-not {\mrlwinm} { target ilp32 } } } */ Why? > +/* { dg-final { scan-assembler-times {\mrlwimi\M} 2 { target ilp32 } } } */ > +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 { target lp64 } } } */ Can this just be /* { dg-final { scan-assembler-times {\mrl[wd]imi\M} 2 } } */ or is it necessary to not want rlwimi on 64-bit? > --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c > +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c > @@ -2,14 +2,14 @@ > /* { dg-options "-O2" } */ > > /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 } > } } */ > -/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 20217 { target lp64 } } > } */ > +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 21279 { target lp64 } } > } */ You are saying there should be 21279 instructions generated by this test case. What makes you say that? Yes, we regressed some time ago, we generate too many insns in many cases, but that is *bad*. > /* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } > } } */ > -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } > } */ > +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target lp64 } } > } */ This needs an explanation (and then the 32-bit and 64-bit checks can be merged). This probably needs changes after 4306339798b6 (if it is still wanted?) Segher
Re: [PATCH] c++: generic targs and identity substitution [PR105956]
On 7/7/22 11:16, Patrick Palka wrote: On Thu, 7 Jul 2022, Jason Merrill wrote: On 7/6/22 15:26, Patrick Palka wrote: On Tue, 5 Jul 2022, Jason Merrill wrote: On 7/5/22 10:06, Patrick Palka wrote: On Fri, 1 Jul 2022, Jason Merrill wrote: On 6/29/22 13:42, Patrick Palka wrote: In r13-1045-gcb7fd1ea85feea I assumed that substitution into generic DECL_TI_ARGS corresponds to an identity mapping of the given arguments, and hence its safe to always elide such substitution. But this PR demonstrates that such a substitution isn't always the identity mapping, in particular when there's an ARGUMENT_PACK_SELECT argument, which gets handled specially during substitution: * when substituting an APS into a template parameter, we strip the APS to its underlying argument; * and when substituting an APS into a pack expansion, we strip the APS to its underlying argument pack. Ah, right. For instance, in variadic96.C we have 10 template < typename... T > 11 struct derived 12 : public base< T, derived< T... > >... so when substituting into the base-specifier, we're approaching it from the outside in, so when we get to the inner T... we need some way to find the T pack again. It might be possible to remove the need for APS by substituting inner pack expansions before outer ones, which could improve worst-case complexity, but I don't know how relevant that is in real code; I imagine most inner pack expansions are as simple as this one. Aha, that makes sense. In this testcase, when expanding the pack expansion pattern (idx + Ns)... with Ns={0,1}, we specialize idx twice, first with Ns=APS<0,{0,1}> and then Ns=APS<1,{0,1}>. The DECL_TI_ARGS of idx are the generic template arguments of the enclosing class template impl, so before r13-1045, we'd substitute into its DECL_TI_ARGS which gave Ns={0,1} as desired. But after r13-1045, we elide this substitution and end up attempting to hash the original Ns argument, an APS, which ICEs. So this patch partially reverts this part of r13-1045. I considered using preserve_args in this case instead, but that'd break the static_assert in the testcase because preserve_args always strips APS to its underlying argument, but here we want to strip it to its underlying argument pack, so we'd incorrectly end up forming the specializations impl<0>::idx and impl<1>::idx instead of impl<0,1>::idx. Although we can't elide the substitution into DECL_TI_ARGS in light of ARGUMENT_PACK_SELECT, it should still be safe to elide template argument coercion in the case of a non-template decl, which this patch preserves. It's unfortunate that we need to remove this optimization just because it doesn't hold for one special tree code. So this patch implements a heuristic in tsubst_template_args to avoid allocating a new TREE_VEC if the substituted elements are identical to those of a level from ARGS. It turns out that about 30% of all calls to tsubst_template_args benefit from this optimization, and it reduces memory usage by about 1.5% for e.g. stdc++.h (relative to r13-1045). (This is the maybe_reuse stuff, the rest of the changes to tsubst_template_args are just drive-by cleanups.) Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? Patch generated with -w to ignore noisy whitespace changes. PR c++/105956 gcc/cp/ChangeLog: * pt.cc (tsubst_template_args): Move variable declarations closer to their first use. Replace 'orig_t' with 'r'. Rename 'need_new' to 'const_subst_p'. Heuristically detect if the substituted elements are identical to that of a level from 'args' and avoid allocating a new TREE_VEC if so. (tsubst_decl) : Revert r13-1045-gcb7fd1ea85feea change for avoiding substitution into DECL_TI_ARGS, but still avoid coercion in this case. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/variadic183.C: New test. --- gcc/cp/pt.cc | 113 ++- gcc/testsuite/g++.dg/cpp0x/variadic183.C | 14 +++ 2 files changed, 85 insertions(+), 42 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic183.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 8672da123f4..7898834faa6 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3. If not see Fixed by: C++20 modules. */ #include "config.h" +#define INCLUDE_ALGORITHM // for std::equal #include "system.h" #include "coretypes.h" #include "cp-tree.h" @@ -13544,17 +13545,22 @@ tsubst_argument_pack (tree orig_arg, tree args, tsubst_flags_t complain, tree tsubst_template_args (tree t, tree args, tsubst_flags_t complain, tree in_decl) { - tree orig_t = t; - int len, need_new = 0, i, expanded_len_adjust = 0, out; - tree *elts; - if (t == error_mark_node) return error_mark_node; - len = TRE
Re: [PATCH] c++: Define built-in for std::tuple_element [PR100157]
On 7/7/22 13:14, Jonathan Wakely wrote: This adds a new built-in to replace the recursive class template instantiations done by traits such as std::tuple_element and std::variant_alternative. The purpose is to select the Nth type from a list of types, e.g. __builtin_type_pack_element(1, char, int, float) is int. For a pathological example tuple_element_t<1000, tuple<2000 types...>> the compilation time is reduced by more than 90% and the memory used by the compiler is reduced by 97%. In realistic examples the gains will be much smaller, but still relevant. Clang has a similar built-in, __type_pack_element, but that's a "magic template" built-in using <> syntax, which GCC doesn't support. So this provides an equivalent feature, but as a built-in function using parens instead of <>. I don't really like the name "type pack element" (it gives you an element from a pack of types) but the semi-consistency with Clang seems like a reasonable argument in favour of keeping the name. I'd be open to alternative names though, e.g. __builtin_nth_type or __builtin_type_at_index. The patch has some problems though ... FIXME 1: Marek pointed out that this this ICEs: template using type = __builtin_type_pack_element(sizeof(T), T...); type c; The sizeof(T) expression is invalid, because T is an unexpanded pack, but it's not rejected and instead crashes: ice.C: In substitution of 'template using type = __builtin_type_pack_element (sizeof (T), T ...) [with T = {int, char}]': ice.C:2:15: required from here ice.C:1:63: internal compiler error: in dependent_type_p, at cp/pt.cc:27490 1 | template using type = __builtin_type_pack_element(sizeof(T), T...); | ^ 0xe13eea dependent_type_p(tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:27490 0xeb1286 cxx_sizeof_or_alignof_type(unsigned int, tree_node*, tree_code, bool, bool) /home/jwakely/src/gcc/gcc/gcc/cp/typeck.cc:1912 0xdf4fcc tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, bool) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:20582 0xdd9121 tsubst_tree_list(tree_node*, tree_node*, int, tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15587 0xddb583 tsubst(tree_node*, tree_node*, int, tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16056 0xddcc9d tsubst(tree_node*, tree_node*, int, tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16436 0xdd6d45 tsubst_decl /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15038 0xdd952a tsubst(tree_node*, tree_node*, int, tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15668 0xdfb9a1 instantiate_template(tree_node*, tree_node*, int) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21811 0xdfc1b6 instantiate_alias_template /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21896 0xdd9796 tsubst(tree_node*, tree_node*, int, tree_node*) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15696 0xdbaba5 lookup_template_class(tree_node*, tree_node*, tree_node*, tree_node*, int, int) /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:10131 0xe4bac0 finish_template_type(tree_node*, tree_node*, int) /home/jwakely/src/gcc/gcc/gcc/cp/semantics.cc:3727 0xd334c8 cp_parser_template_id /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:18458 0xd429b0 cp_parser_class_name /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:25923 0xd1ade9 cp_parser_qualifying_entity /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:7193 0xd1a2c8 cp_parser_nested_name_specifier_opt /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:6875 0xd4eefd cp_parser_template_introduction /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31668 0xd4f416 cp_parser_template_declaration_after_export /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31840 0xd2d60e cp_parser_declaration /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:15083 FIXME 2: I want to mangle __builtin_type_pack_element(N, T...) the same as typename std::_Nth_type::type but I don't know how. Instead of trying to fake the mangled string, it's probably better to build a decl for that nested type, right? Any suggestions where to find something similar I can learn from? The tricky thing is dealing with mangling compression, where we use a substitution instead of repeating a type; that's definitely easier if we actually have the type. So you'd probably want to have a declaration of std::_Nth_type to work with, and lookup_template_class to get the type of that specialization. And then if it's complete, look up ...::type; if not, we could probably stuff a ...::type in its TYPE_FIELDS that would get clobbered if we actually instantiated the type... The reason to mangle it that way is that it preserves the same symbol names as the library produced in GCC 12, and that it will still produce with non-GCC compilers (see the definitions of std::_Nth_type in the library parts of the patch). If we don't do that then either we need to e
[PATCH/RFC] combine_completed global variable.
Hi Kewen (and Segher), Many thanks for stress testing my patch to improve multiplication by integer constants on rs6000 by using the rldmi instruction. Although I've not been able to reproduce your ICE (using gcc135 on the compile farm), I completely agree with Segher's analysis that the Achilles heel with my approach/patch is that there's currently no way for the backend/recog to know that we're in a pass after combine. Rather than give up on this optimization (and a similar one for I386.md where test;sete can be replaced by xor $1 when combine knows that nonzero_bits is 1, but loses that information afterwards), I thought I'd post this "strawman" proposal to add a combine_completed global variable, matching the reload_completed and regstack_completed global variables already used (to track progress) by the middle-end. I was wondering if I could ask you could test the attached patch in combination with my previous rs6000.md patch (with the obvious change of reload_completed to combine_completed) to confirm that it fixes the problems you were seeing. Segher/Richard, would this sort of patch be considered acceptable? Or is there a better approach/solution? 2022-07-07 Roger Sayle gcc/ChangeLog * combine.cc (combine_completed): New global variable. (rest_of_handle_combine): Set combine_completed after pass. * final.cc (rest_of_clean_state): Reset combine_completed. * rtl.h (combine_completed): Prototype here. Many thanks in advance, Roger -- > -Original Message- > From: Kewen.Lin > Sent: 27 June 2022 10:04 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org; Segher Boessenkool > ; David Edelsohn > Subject: Re: [rs6000 PATCH] Improve constant integer multiply using rldimi. > > Hi Roger, > > on 2022/6/27 04:56, Roger Sayle wrote: > > > > > > This patch tweaks the code generated on POWER for integer > > multiplications > > > > by a constant, by making use of rldimi instructions. Much like x86's > > > > lea instruction, rldimi can be used to implement a shift and add pair > > > > in some circumstances. For rldimi this is when the shifted operand > > > > is known to have no bits in common with the added operand. > > > > > > > > Hence for the new testcase below: > > > > > > > > int foo(int x) > > > > { > > > > int t = x & 42; > > > > return t * 0x2001; > > > > } > > > > > > > > when compiled with -O2, GCC currently generates: > > > > > > > > andi. 3,3,0x2a > > > > slwi 9,3,13 > > > > add 3,9,3 > > > > extsw 3,3 > > > > blr > > > > > > > > with this patch, we now generate: > > > > > > > > andi. 3,3,0x2a > > > > rlwimi 3,3,13,0,31-13 > > > > extsw 3,3 > > > > blr > > > > > > > > It turns out this optimization already exists in the form of a combine > > > > splitter in rs6000.md, but the constraints on combine splitters, > > > > requiring three of four input instructions (and generating one or two > > > > output instructions) mean it doesn't get applied as often as it could. > > > > This patch converts the define_split into a define_insn_and_split to > > > > catch more cases (such as the one above). > > > > > > > > The one bit that's tricky/controversial is the use of RTL's > > > > nonzero_bits which is accurate during the combine pass when this > > > > pattern is first recognized, but not as advanced (not kept up to > > > > date) when this pattern is eventually split. To support this, > > > > I've used a "|| reload_completed" idiom. Does this approach seem > > > > reasonable? [I've another patch of x86 that uses the same idiom]. > > > > > > I tested this patch on powerpc64-linux-gnu, it caused the below ICE against > test > case gcc/testsuite/gcc.c-torture/compile/pr93098.c. > > gcc/testsuite/gcc.c-torture/compile/pr93098.c: In function ‘foo’: > gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: error: unrecognizable > insn: > (insn 104 32 34 2 (set (reg:SI 185 [+4 ]) > (ior:SI (and:SI (reg:SI 200 [+4 ]) > (const_int 4294967295 [0x])) > (ashift:SI (reg:SI 140) > (const_int 32 [0x20] "gcc/testsuite/gcc.c- > torture/compile/pr93098.c":6:11 -1 > (nil)) > during RTL pass: subreg3 > dump file: pr93098.c.291r.subreg3 > gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: internal compiler error: > in > extract_insn, at recog.cc:2791 0x101f664b _fatal_insn(char const*, rtx_def > const*, char const*, int, char const*) > gcc/rtl-error.cc:108 > 0x101f6697 _fatal_insn_not_found(rtx_def const*, char const*, int, char > const*) > gcc/rtl-error.cc:116 > 0x10ae427f extract_insn(rtx_insn*) > gcc/recog.cc:2791 > 0x11b239bb decompose_multiword_subregs > gcc/lower-subreg.cc:1555 > 0x11b25013 execute > gcc/lower-subreg.cc:1818 > > The above trace shows we fails to recog the pattern again due to the > inaccurate > nonzero_bits information as you pointed out above. > > There was another patch
[pushed] analyzer: fix false positives from -Wanalyzer-tainted-divisor [PR106225]
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r13-1562-g897b3b31f0a94b. gcc/analyzer/ChangeLog: PR analyzer/106225 * sm-taint.cc (taint_state_machine::on_stmt): Move handling of assignments from division to... (taint_state_machine::check_for_tainted_divisor): ...this new function. Reject warning when the divisor is known to be non-zero. * sm.cc: Include "analyzer/program-state.h". (sm_context::get_old_region_model): New. * sm.h (sm_context::get_old_region_model): New decl. gcc/testsuite/ChangeLog: PR analyzer/106225 * gcc.dg/analyzer/taint-divisor-1.c: Add test coverage for various correct and incorrect checks against zero. Signed-off-by: David Malcolm --- gcc/analyzer/sm-taint.cc | 51 ++ gcc/analyzer/sm.cc| 12 gcc/analyzer/sm.h | 2 + .../gcc.dg/analyzer/taint-divisor-1.c | 66 +++ 4 files changed, 119 insertions(+), 12 deletions(-) diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc index d2d03c3d602..4075cf6d868 100644 --- a/gcc/analyzer/sm-taint.cc +++ b/gcc/analyzer/sm-taint.cc @@ -109,6 +109,9 @@ private: const supernode *node, const gcall *call, tree callee_fndecl) const; + void check_for_tainted_divisor (sm_context *sm_ctxt, + const supernode *node, + const gassign *assign) const; public: /* State for a "tainted" value: unsanitized data potentially under an @@ -803,18 +806,7 @@ taint_state_machine::on_stmt (sm_context *sm_ctxt, case ROUND_MOD_EXPR: case RDIV_EXPR: case EXACT_DIV_EXPR: - { - tree divisor = gimple_assign_rhs2 (assign);; - state_t state = sm_ctxt->get_state (stmt, divisor); - enum bounds b; - if (get_taint (state, TREE_TYPE (divisor), &b)) - { - tree diag_divisor = sm_ctxt->get_diagnostic_tree (divisor); - sm_ctxt->warn (node, stmt, divisor, - new tainted_divisor (*this, diag_divisor, b)); - sm_ctxt->set_next_state (stmt, divisor, m_stop); - } - } + check_for_tainted_divisor (sm_ctxt, node, assign); break; } } @@ -989,6 +981,41 @@ taint_state_machine::check_for_tainted_size_arg (sm_context *sm_ctxt, } } +/* Complain if ASSIGN (a division operation) has a tainted divisor + that could be zero. */ + +void +taint_state_machine::check_for_tainted_divisor (sm_context *sm_ctxt, + const supernode *node, + const gassign *assign) const +{ + const region_model *old_model = sm_ctxt->get_old_region_model (); + if (!old_model) +return; + + tree divisor_expr = gimple_assign_rhs2 (assign);; + const svalue *divisor_sval = old_model->get_rvalue (divisor_expr, NULL); + + state_t state = sm_ctxt->get_state (assign, divisor_sval); + enum bounds b; + if (get_taint (state, TREE_TYPE (divisor_expr), &b)) +{ + const svalue *zero_sval + = old_model->get_manager ()->get_or_create_int_cst + (TREE_TYPE (divisor_expr), 0); + tristate ts + = old_model->eval_condition (divisor_sval, NE_EXPR, zero_sval); + if (ts.is_true ()) + /* The divisor is known to not equal 0: don't warn. */ + return; + + tree diag_divisor = sm_ctxt->get_diagnostic_tree (divisor_expr); + sm_ctxt->warn (node, assign, divisor_expr, +new tainted_divisor (*this, diag_divisor, b)); + sm_ctxt->set_next_state (assign, divisor_sval, m_stop); +} +} + } // anonymous namespace /* Internal interface to this file. */ diff --git a/gcc/analyzer/sm.cc b/gcc/analyzer/sm.cc index 24c20b894cd..d17d5c765b4 100644 --- a/gcc/analyzer/sm.cc +++ b/gcc/analyzer/sm.cc @@ -40,6 +40,7 @@ along with GCC; see the file COPYING3. If not see #include "analyzer/program-point.h" #include "analyzer/store.h" #include "analyzer/svalue.h" +#include "analyzer/program-state.h" #if ENABLE_ANALYZER @@ -159,6 +160,17 @@ state_machine::to_json () const return sm_obj; } +/* class sm_context. */ + +const region_model * +sm_context::get_old_region_model () const +{ + if (const program_state *old_state = get_old_program_state ()) +return old_state->m_region_model; + else +return NULL; +} + /* Create instances of the various state machines, each using LOGGER, and populate OUT with them. */ diff --git a/gcc/analyzer/sm.h b/gcc/analyzer/sm.h index e80ef1fac37..353a6db53b0 100644 --- a/gcc/analyzer/sm.h +++ b/gcc/analyzer/sm.h @@ -279,6 +279,8 @@ public: virtual const program_state *get_old_program_state () const =
[PATCH] Introduce hardbool attribute for C
This patch introduces hardened booleans in C. The hardbool attribute, when attached to an integral type, turns it into an enumerate type with boolean semantics, using the named or implied constants as representations for false and true. Expressions of such types decay to _Bool, trapping if the value is neither true nor false, and _Bool can convert implicitly back to them. Other conversions go through _Bool first. Regstrapped on x86_64-linux-gnu. Ok to install? for gcc/c-family/ChangeLog * c-attribs.cc (c_common_attribute_table): Add hardbool. (handle_hardbool_attribute): New. (type_valid_for_vector_size): Reject hardbool. * c-common.cc (convert_and_check): Skip warnings for convert and check for hardbool. (c_hardbool_type_attr_1): New. * c-common.h (c_hardbool_type_attr): New. for gcc/c/ChangeLog * c-typeck.cc (convert_lvalue_to_rvalue): Decay hardbools. * c-convert.cc (convert): Convert to hardbool through truthvalue. * c-decl.cc (check_bitfield_type_and_width): Skip enumeral truncation warnings for hardbool. (finish_struct): Propagate hardbool attribute to bitfield types. (digest_init): Convert to hardbool. for gcc/ChangeLog * doc/extend.texi (hardbool): New type attribute. for gcc/testsuite/ChangeLog * gcc.dg/hardbool-err.c: New. * gcc.dg/hardbool-trap.c: New. * gcc.dg/hardbool.c: New. * gcc.dg/hardbool-s.c: New. * gcc.dg/hardbool-us.c: New. * gcc.dg/hardbool-i.c: New. * gcc.dg/hardbool-ul.c: New. * gcc.dg/hardbool-ll.c: New. * gcc.dg/hardbool-5a.c: New. * gcc.dg/hardbool-s-5a.c: New. * gcc.dg/hardbool-us-5a.c: New. * gcc.dg/hardbool-i-5a.c: New. * gcc.dg/hardbool-ul-5a.c: New. * gcc.dg/hardbool-ll-5a.c: New. --- gcc/c-family/c-attribs.cc | 97 - gcc/c-family/c-common.cc | 21 gcc/c-family/c-common.h | 18 gcc/c/c-convert.cc| 14 +++ gcc/c/c-decl.cc | 10 ++ gcc/c/c-typeck.cc | 31 ++- gcc/doc/extend.texi | 37 gcc/testsuite/gcc.dg/hardbool-err.c | 28 ++ gcc/testsuite/gcc.dg/hardbool-trap.c | 13 +++ gcc/testsuite/gcc.dg/torture/hardbool-5a.c|6 + gcc/testsuite/gcc.dg/torture/hardbool-i-5a.c |6 + gcc/testsuite/gcc.dg/torture/hardbool-i.c |5 + gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c |6 + gcc/testsuite/gcc.dg/torture/hardbool-ll.c|5 + gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c |6 + gcc/testsuite/gcc.dg/torture/hardbool-s.c |5 + gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c |6 + gcc/testsuite/gcc.dg/torture/hardbool-ul.c|5 + gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c |6 + gcc/testsuite/gcc.dg/torture/hardbool-us.c|5 + gcc/testsuite/gcc.dg/torture/hardbool.c | 118 + 21 files changed, 444 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/hardbool-err.c create mode 100644 gcc/testsuite/gcc.dg/hardbool-trap.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-5a.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-i-5a.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-i.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ll.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-s.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ul.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-us.c create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool.c diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc index c8d96723f4c30..e385d780c49ce 100644 --- a/gcc/c-family/c-attribs.cc +++ b/gcc/c-family/c-attribs.cc @@ -172,6 +172,7 @@ static tree handle_objc_root_class_attribute (tree *, tree, tree, int, bool *); static tree handle_objc_nullability_attribute (tree *, tree, tree, int, bool *); static tree handle_signed_bool_precision_attribute (tree *, tree, tree, int, bool *); +static tree handle_hardbool_attribute (tree *, tree, tree, int, bool *); static tree handle_retain_attribute (tree *, tree, tree, int, bool *); /* Helper to define attribute exclusions. */ @@ -288,6 +289,8 @@ const struct attribute_spec c_common_attribute_table[] = affects_type_identity, handler, exclude } */ { "signed_bool_precision", 1, 1, false, true, false, true, ha
[pushed 2/2] analyzer: use label_text for superedge::get_description
No functional change intended. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Lightly tested with valgrind. Pushed to trunk as r13-1564-g52f538fa4a13d5. gcc/analyzer/ChangeLog: * checker-path.cc (start_cfg_edge_event::get_desc): Update for superedge::get_description returning a label_text. * engine.cc (feasibility_state::maybe_update_for_edge): Likewise. * supergraph.cc (superedge::dump): Likewise. (superedge::get_description): Convert return type from char * to label_text. * supergraph.h (superedge::get_description): Likewise. Signed-off-by: David Malcolm --- gcc/analyzer/checker-path.cc | 3 +-- gcc/analyzer/engine.cc | 5 ++--- gcc/analyzer/supergraph.cc | 13 + gcc/analyzer/supergraph.h| 2 +- 4 files changed, 9 insertions(+), 14 deletions(-) diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc index 959ffdd853c..211cf3e0333 100644 --- a/gcc/analyzer/checker-path.cc +++ b/gcc/analyzer/checker-path.cc @@ -594,8 +594,7 @@ label_text start_cfg_edge_event::get_desc (bool can_colorize) const { bool user_facing = !flag_analyzer_verbose_edges; - label_text edge_desc -= label_text::take (m_sedge->get_description (user_facing)); + label_text edge_desc (m_sedge->get_description (user_facing)); if (user_facing) { if (edge_desc.m_buffer && strlen (edge_desc.m_buffer) > 0) diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc index 0674c8ba3b6..888123f2b95 100644 --- a/gcc/analyzer/engine.cc +++ b/gcc/analyzer/engine.cc @@ -4586,12 +4586,11 @@ feasibility_state::maybe_update_for_edge (logger *logger, { if (logger) { - char *desc = sedge->get_description (false); + label_text desc (sedge->get_description (false)); logger->log (" sedge: SN:%i -> SN:%i %s", sedge->m_src->m_index, sedge->m_dest->m_index, - desc); - free (desc); + desc.m_buffer); } const gimple *last_stmt = src_point.get_supernode ()->get_last_stmt (); diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc index f023c533a09..52b4852404d 100644 --- a/gcc/analyzer/supergraph.cc +++ b/gcc/analyzer/supergraph.cc @@ -854,13 +854,12 @@ void superedge::dump (pretty_printer *pp) const { pp_printf (pp, "edge: SN: %i -> SN: %i", m_src->m_index, m_dest->m_index); - char *desc = get_description (false); - if (strlen (desc) > 0) + label_text desc (get_description (false)); + if (strlen (desc.m_buffer) > 0) { pp_space (pp); - pp_string (pp, desc); + pp_string (pp, desc.m_buffer); } - free (desc); } /* Dump this superedge to stderr. */ @@ -998,17 +997,15 @@ superedge::get_any_callgraph_edge () const /* Build a description of this superedge (e.g. "true" for the true edge of a conditional, or "case 42:" for a switch case). - The caller is responsible for freeing the result. - If USER_FACING is false, the result also contains any underlying CFG edge flags. e.g. " (flags FALLTHRU | DFS_BACK)". */ -char * +label_text superedge::get_description (bool user_facing) const { pretty_printer pp; dump_label_to_pp (&pp, user_facing); - return xstrdup (pp_formatted_text (&pp)); + return label_text::take (xstrdup (pp_formatted_text (&pp))); } /* Implementation of superedge::dump_label_to_pp for non-switch CFG diff --git a/gcc/analyzer/supergraph.h b/gcc/analyzer/supergraph.h index 42c6df57435..e9a5be27d88 100644 --- a/gcc/analyzer/supergraph.h +++ b/gcc/analyzer/supergraph.h @@ -331,7 +331,7 @@ class superedge : public dedge ::edge get_any_cfg_edge () const; cgraph_edge *get_any_callgraph_edge () const; - char *get_description (bool user_facing) const; + label_text get_description (bool user_facing) const; protected: superedge (supernode *src, supernode *dest, enum edge_kind kind) -- 2.26.3
[pushed 1/2] Convert label_text to C++11 move semantics
libcpp's class label_text stores a char * for a string and a flag saying whether it owns the buffer. I added this class before we could use C++11, and so to avoid lots of copying it required an explicit call to label_text::maybe_free to potentially free the buffer. Now that we can use C++11, this patch removes label_text::maybe_free in favor of doing the cleanup in the destructor, and using C++ move semantics to avoid any copying. This allows lots of messy cleanup code to be eliminated in favor of implicit destruction (mostly in the analyzer). No functional change intended. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Lightly tested with valgrind. Pushed to trunk as r13-1563-ga8dce13c076019. gcc/analyzer/ChangeLog: * call-info.cc (call_info::print): Update for removal of label_text::maybe_free in favor of automatic memory management. * checker-path.cc (checker_event::dump): Likewise. (checker_event::prepare_for_emission): Likewise. (state_change_event::get_desc): Likewise. (superedge_event::should_filter_p): Likewise. (start_cfg_edge_event::get_desc): Likewise. (warning_event::get_desc): Likewise. (checker_path::dump): Likewise. (checker_path::debug): Likewise. * diagnostic-manager.cc (diagnostic_manager::prune_for_sm_diagnostic): Likewise. (diagnostic_manager::prune_interproc_events): Likewise. * program-state.cc (sm_state_map::to_json): Likewise. * region.cc (region::to_json): Likewise. * sm-malloc.cc (inform_nonnull_attribute): Likewise. * store.cc (binding_map::to_json): Likewise. (store::to_json): Likewise. * svalue.cc (svalue::to_json): Likewise. gcc/c-family/ChangeLog: * c-format.cc (range_label_for_format_type_mismatch::get_text): Update for removal of label_text::maybe_free in favor of automatic memory management. gcc/ChangeLog: * diagnostic-format-json.cc (json_from_location_range): Update for removal of label_text::maybe_free in favor of automatic memory management. * diagnostic-format-sarif.cc (sarif_builder::make_location_object): Likewise. * diagnostic-show-locus.cc (struct pod_label_text): New. (class line_label): Convert m_text from label_text to pod_label_text. (layout::print_any_labels): Move "text" to the line_label. * tree-diagnostic-path.cc (path_label::get_text): Update for removal of label_text::maybe_free in favor of automatic memory management. (event_range::print): Likewise. (default_tree_diagnostic_path_printer): Likewise. (default_tree_make_json_for_path): Likewise. libcpp/ChangeLog: * include/line-map.h: Include . (class label_text): Delete maybe_free method in favor of a destructor. Add move ctor and assignment operator. Add deletion of the copy ctor and copy-assignment operator. Rename field m_caller_owned to m_owned. Add std::move where necessary; add moved_from member function. Signed-off-by: David Malcolm --- gcc/analyzer/call-info.cc | 1 - gcc/analyzer/checker-path.cc | 97 ++ gcc/analyzer/diagnostic-manager.cc | 8 --- gcc/analyzer/program-state.cc | 1 - gcc/analyzer/region.cc | 1 - gcc/analyzer/sm-malloc.cc | 3 - gcc/analyzer/store.cc | 3 - gcc/analyzer/svalue.cc | 1 - gcc/c-family/c-format.cc | 1 - gcc/diagnostic-format-json.cc | 4 +- gcc/diagnostic-format-sarif.cc | 1 - gcc/diagnostic-show-locus.cc | 35 +-- gcc/tree-diagnostic-path.cc| 4 -- libcpp/include/line-map.h | 46 +++--- 14 files changed, 101 insertions(+), 105 deletions(-) diff --git a/gcc/analyzer/call-info.cc b/gcc/analyzer/call-info.cc index b3ff51e7460..e1142d743a3 100644 --- a/gcc/analyzer/call-info.cc +++ b/gcc/analyzer/call-info.cc @@ -76,7 +76,6 @@ call_info::print (pretty_printer *pp) const { label_text desc (get_desc (pp_show_color (pp))); pp_string (pp, desc.m_buffer); - desc.maybe_free (); } /* Implementation of custom_edge_info::add_events_to_path vfunc for diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc index 953e192cd55..959ffdd853c 100644 --- a/gcc/analyzer/checker-path.cc +++ b/gcc/analyzer/checker-path.cc @@ -196,7 +196,6 @@ checker_event::dump (pretty_printer *pp) const label_text event_desc (get_desc (false)); pp_printf (pp, "\"%s\" (depth %i", event_desc.m_buffer, m_effective_depth); - event_desc.maybe_free (); if (m_effective_depth != m_original_depth) pp_printf (pp, " corrected from %i", @@ -235,7 +234,6 @@ checker_event::prepare_for_emission (checker_path *, m_emission_id = emission_id; label_text desc = get_desc (false); - desc.maybe_free (); } /* class debug_event : pub
[PATCH] Control flow redundancy hardening
This patch introduces an optional hardening pass to catch unexpected execution flows. Functions are transformed so that basic blocks set a bit in an automatic array, and (non-exceptional) function exit edges check that the bits in the array represent an expected execution path in the CFG. Functions with multiple exit edges, or with too many blocks, call an out-of-line checker builtin implemented in libgcc. For simpler functions, the verification is performed in-line. -fharden-control-flow-redundancy enables the pass for eligible functions, --param hardcfr-max-blocks sets a block count limit for functions to be eligible, and --param hardcfr-max-inline-blocks tunes the "too many blocks" limit for in-line verification. Regstrapped on x86_64-linux-gnu. Also bootstrapped with a patchlet that enables it by default, with --param hardcfr-max-blocks=32. Ok to install? for gcc/ChangeLog * Makefile.in (OBJS): Add gimple-harden-control-flow.o. * builtins.def (BUILT_IN___HARDCFR_CHECK): New. * common.opt (fharden-control-flow-redundancy): New. * doc/invoke.texi (fharden-control-flow-redundancy): New. (hardcfr-max-blocks, hardcfr-max-inline-blocks): New params. * gimple-harden-control-flow.cc: New. * params.opt (-param=hardcfr-max-blocks=): New. (-param=hradcfr-max-inline-blocks=): New. * passes.def (pass_harden_control_flow_redundancy): Add. * tree-pass.h (make_pass_harden_control_flow_redundancy): Declare. for gcc/testsuite/ChangeLog * c-c++-common/torture/harden-cfr.c: New. * c-c++-common/torture/harden-abrt.c: New. * c-c++-common/torture/harden-bref.c: New. * c-c++-common/torture/harden-tail.c: New. * gnat.dg/hardcfr.adb: New. for libgcc/ChangeLog * Makefile.in (LIB2ADD): Add hardcfr.c. * hardcfr.c: New. --- gcc/Makefile.in|1 gcc/builtins.def |3 gcc/common.opt |4 gcc/doc/invoke.texi| 19 + gcc/gimple-harden-control-flow.cc | 713 gcc/params.opt |8 gcc/passes.def |1 .../c-c++-common/torture/harden-cfr-abrt.c | 11 .../c-c++-common/torture/harden-cfr-bret.c | 11 .../c-c++-common/torture/harden-cfr-tail.c | 17 gcc/testsuite/c-c++-common/torture/harden-cfr.c| 81 ++ gcc/testsuite/gnat.dg/hardcfr.adb | 76 ++ gcc/tree-pass.h|2 libgcc/Makefile.in |3 libgcc/hardcfr.c | 176 + 15 files changed, 1126 insertions(+) create mode 100644 gcc/gimple-harden-control-flow.cc create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr-abrt.c create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr-bret.c create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr-tail.c create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr.c create mode 100644 gcc/testsuite/gnat.dg/hardcfr.adb create mode 100644 libgcc/hardcfr.c diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 3ae237024265c..2a15e6ecf0802 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1403,6 +1403,7 @@ OBJS = \ gimple-iterator.o \ gimple-fold.o \ gimple-harden-conditionals.o \ + gimple-harden-control-flow.o \ gimple-laddress.o \ gimple-loop-interchange.o \ gimple-loop-jam.o \ diff --git a/gcc/builtins.def b/gcc/builtins.def index 005976f34e913..b987f9af425fd 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -1055,6 +1055,9 @@ DEF_GCC_BUILTIN (BUILT_IN_FILE, "FILE", BT_FN_CONST_STRING, ATTR_NOTHROW_LEAF_LI DEF_GCC_BUILTIN (BUILT_IN_FUNCTION, "FUNCTION", BT_FN_CONST_STRING, ATTR_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, ATTR_NOTHROW_LEAF_LIST) +/* Control Flow Redundancy hardening out-of-line checker. */ +DEF_BUILTIN_STUB (BUILT_IN___HARDCFR_CHECK, "__builtin___hardcfr_check") + /* Synchronization Primitives. */ #include "sync-builtins.def" diff --git a/gcc/common.opt b/gcc/common.opt index e7a51e882bade..54eb30fba642f 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1797,6 +1797,10 @@ fharden-conditional-branches Common Var(flag_harden_conditional_branches) Optimization Harden conditional branches by checking reversed conditions. +fharden-control-flow-redundancy +Common Var(flag_harden_control_flow_redundancy) Optimization +Harden control flow by recording and checking execution paths. + ; Nonzero means ignore `#ident' directives. 0 means handle them. ; Generate position-independent code for executables if possible ; On SVR4 targets, it also controls whether or not to emit a diff --git a/gcc/doc/invoke.texi b/gcc/do
Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes
Hi Yonghong. > On 6/21/22 9:12 AM, Jose E. Marchesi wrote: >> >>> On 6/17/22 10:18 AM, Jose E. Marchesi wrote: Hi Yonghong. > On 6/15/22 1:57 PM, David Faust wrote: >> >> On 6/14/22 22:53, Yonghong Song wrote: >>> >>> >>> On 6/7/22 2:43 PM, David Faust wrote: Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose === 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: ++ BTF BTF +--+ | pahole |---> vmlinux.btf --->| verifier | ++ +--+ ^^ || DWARF |BTF | || vmlinux +-+ module1.ko | BPF program | module2.ko +-+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind
Re: [PATCH] c++: Define built-in for std::tuple_element [PR100157]
On Thu, 7 Jul 2022 at 20:29, Jason Merrill wrote: > > On 7/7/22 13:14, Jonathan Wakely wrote: > > This adds a new built-in to replace the recursive class template > > instantiations done by traits such as std::tuple_element and > > std::variant_alternative. The purpose is to select the Nth type from a > > list of types, e.g. __builtin_type_pack_element(1, char, int, float) is > > int. > > > > For a pathological example tuple_element_t<1000, tuple<2000 types...>> > > the compilation time is reduced by more than 90% and the memory used by > > the compiler is reduced by 97%. In realistic examples the gains will be > > much smaller, but still relevant. > > > > Clang has a similar built-in, __type_pack_element, but that's a > > "magic template" built-in using <> syntax, which GCC doesn't support. So > > this provides an equivalent feature, but as a built-in function using > > parens instead of <>. I don't really like the name "type pack element" > > (it gives you an element from a pack of types) but the semi-consistency > > with Clang seems like a reasonable argument in favour of keeping the > > name. I'd be open to alternative names though, e.g. __builtin_nth_type > > or __builtin_type_at_index. > > > > > > The patch has some problems though ... > > > > FIXME 1: Marek pointed out that this this ICEs: > > template using type = __builtin_type_pack_element(sizeof(T), > > T...); > > type c; > > > > The sizeof(T) expression is invalid, because T is an unexpanded pack, > > but it's not rejected and instead crashes: > > > > ice.C: In substitution of 'template using type = > > __builtin_type_pack_element (sizeof (T), T ...) [with T = {int, char}]': > > ice.C:2:15: required from here > > ice.C:1:63: internal compiler error: in dependent_type_p, at cp/pt.cc:27490 > > 1 | template using type = > > __builtin_type_pack_element(sizeof(T), T...); > >| > > ^ > > 0xe13eea dependent_type_p(tree_node*) > > /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:27490 > > 0xeb1286 cxx_sizeof_or_alignof_type(unsigned int, tree_node*, tree_code, > > bool, bool) > > /home/jwakely/src/gcc/gcc/gcc/cp/typeck.cc:1912 > > 0xdf4fcc tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, > > bool, bool) > > /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:20582 > > 0xdd9121 tsubst_tree_list(tree_node*, tree_node*, int, tree_node*) > > /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15587 > > 0xddb583 tsubst(tree_node*, tree_node*, int, tree_node*) > > /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16056 > > 0xddcc9d tsubst(tree_node*, tree_node*, int, tree_node*) > > /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16436 > > 0xdd6d45 tsubst_decl > > /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15038 > > 0xdd952a tsubst(tree_node*, tree_node*, int, tree_node*) > > /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15668 > > 0xdfb9a1 instantiate_template(tree_node*, tree_node*, int) > > /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21811 > > 0xdfc1b6 instantiate_alias_template > > /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21896 > > 0xdd9796 tsubst(tree_node*, tree_node*, int, tree_node*) > > /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15696 > > 0xdbaba5 lookup_template_class(tree_node*, tree_node*, tree_node*, > > tree_node*, int, int) > > /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:10131 > > 0xe4bac0 finish_template_type(tree_node*, tree_node*, int) > > /home/jwakely/src/gcc/gcc/gcc/cp/semantics.cc:3727 > > 0xd334c8 cp_parser_template_id > > /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:18458 > > 0xd429b0 cp_parser_class_name > > /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:25923 > > 0xd1ade9 cp_parser_qualifying_entity > > /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:7193 > > 0xd1a2c8 cp_parser_nested_name_specifier_opt > > /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:6875 > > 0xd4eefd cp_parser_template_introduction > > /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31668 > > 0xd4f416 cp_parser_template_declaration_after_export > > /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31840 > > 0xd2d60e cp_parser_declaration > > /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:15083 > > > > > > FIXME 2: I want to mangle __builtin_type_pack_element(N, T...) the same as > > typename std::_Nth_type::type but I don't know how. Instead of > > trying to fake the mangled string, it's probably better to build a decl > > for that nested type, right? Any suggestions where to find something > > similar I can learn from? > > The tricky thing is dealing with mangling compression, where we use a > substitution instead of repeating a type; that's definitely easier if we > actually have the type. Yeah, that's what I discovered when trying to fudge it as "19__builtin_type_pack_elementE" etc. > So you'd probably want to have a declaration of std::_Nth_type to work > with, and lookup_template_class
[PATCH] Be careful with MODE_CC in simplify_const_relational_operation.
I think it's fair to describe RTL's representation of condition flags using MODE_CC as a little counter-intuitive. For example, the i386 backend represents the carry flag (in adc instructions) using RTL of the form "(ltu:SI (reg:CCC) (const_int 0))", where great care needs to be taken not to treat this like a normal RTX expression, after all LTU (less-than-unsigned) against const0_rtx would normally always be false. Hence, MODE_CC comparisons need to be treated with caution, and simplify_const_relational_operation returns early (to avoid problems) when GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC. However, consider the (currently) hypothetical situation, where the RTL optimizers determine that a previous instruction unconditionally sets or clears the carry flag, and this gets propagated by combine into the above expression, we'd end up with something that looks like (ltu:SI (const_int 1) (const_int 0)), which doesn't mean what it says. Fortunately, simplify_const_relational_operation is passed the original mode of the comparison (cmp_mode, the original mode of op0) which can be checked for MODE_CC, even when op0 is now VOIDmode (const_int) after the substitution. Defending against this is clearly the right thing to do. More controversially, rather than just abort simplification/optimization in this case, we can use the comparison operator to infer/select the semantics of the CC_MODE flag. Hopefully, whenever a backend uses LTU, it represents the (set) carry flag (and behaves like i386.md), in which case the result of the simplified expression is the first operand. [If there's no standardization of semantics across backends, then we should always just return 0; but then miss potential optimizations]. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures, and in combination with a i386 backend patch (that introduces support for x86's stc and clc instructions) where it avoids failures. However, I'm submitting this middle-end piece independently, to confirm that maintainers/reviewers are happy with the approach, and also to check there are no issues on other platforms, before building upon this infrastructure. Thoughts? Ok for mainline? 2022-07-07 Roger Sayle gcc/ChangeLog * simplify-rtx.cc (simplify_const_relational_operation): Handle case where both operands of a MODE_CC comparison have been simplified to constant integers. Thanks in advance, Roger -- diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index fa20665..73ec5c7 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -6026,6 +6026,18 @@ simplify_const_relational_operation (enum rtx_code code, return 0; } + /* Handle MODE_CC comparisons that have been simplified to + constants. */ + if (GET_MODE_CLASS (mode) == MODE_CC + && op1 == const0_rtx + && CONST_INT_P (op0)) +{ + /* LTU represents the carry flag. */ + if (code == LTU) + return op0 == const0_rtx ? const0_rtx : const_true_rtx; + return 0; +} + /* We can't simplify MODE_CC values since we don't know what the actual comparison is. */ if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC)
Re: [PATCH 08/17] openmp: -foffload-memory=pinned
On 07/07/2022 12:54, Tobias Burnus wrote: Hi Andrew, On 07.07.22 12:34, Andrew Stubbs wrote: Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. ... gcc/ChangeLog: * omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New. * omp-low.cc (omp_enable_pinned_mode): New function. (execute_lower_omp): Call omp_enable_pinned_mode. libgomp/ChangeLog: * config/linux/allocator.c (always_pinned_mode): New variable. (GOMP_enable_pinned_mode): New function. (linux_memspace_alloc): Disable pinning when always_pinned_mode set. (linux_memspace_calloc): Likewise. (linux_memspace_free): Likewise. (linux_memspace_realloc): Likewise. * libgomp.map: Add GOMP_enable_pinned_mode. * testsuite/libgomp.c/alloc-pinned-7.c: New test. ... ... --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx) input_location = saved_location; } +/* Emit a constructor function to enable -foffload-memory=pinned + at runtime. Libgomp handles the OS mode setting, but we need to trigger + it by calling GOMP_enable_pinned mode before the program proper runs. */ + +static void +omp_enable_pinned_mode () Is there a reason not to use the mechanism of OpenMP's 'requires' directive for this? (Okay, I have to admit that the final patch was only committed on Monday. But still ...) Possibly, I had most of this done before then. I'll have a look next time I visit this patch. The Cuda-specific solution can't work this way anyway, because there's no mlockall equivalent, so I will make conditional adjustments anyway. Likewise, the 'requires' mechanism could then also be used in '[PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'. No, I don't think so; that environment variable needs to be set before the libraries are loaded or it's too late. There are other ways to achieve the same thing, by leaving messages for the libgomp plugin to pick up, perhaps, but it's all extra complexity for no real gain. Andrew
Re: [PATCH] Be careful with MODE_CC in simplify_const_relational_operation.
Hi! On Thu, Jul 07, 2022 at 10:08:04PM +0100, Roger Sayle wrote: > I think it's fair to describe RTL's representation of condition flags > using MODE_CC as a little counter-intuitive. "A little challenging", and you should see that as a good thing, as a puzzle to crack :-) > For example, the i386 > backend represents the carry flag (in adc instructions) using RTL of > the form "(ltu:SI (reg:CCC) (const_int 0))", where great care needs > to be taken not to treat this like a normal RTX expression, after all > LTU (less-than-unsigned) against const0_rtx would normally always be > false. A comparison of a MODE_CC thing against 0 means the result of a *previous* comparison (or other cc setter) is looked at. Usually it simply looks at some condition bits in a flags register. It does not do any actual comparison: that has been done before (if at all even). > Hence, MODE_CC comparisons need to be treated with caution, > and simplify_const_relational_operation returns early (to avoid > problems) when GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC. Not just to avoid problems: there simply isn't enough information to do a correct job. > However, consider the (currently) hypothetical situation, where the > RTL optimizers determine that a previous instruction unconditionally > sets or clears the carry flag, and this gets propagated by combine into > the above expression, we'd end up with something that looks like > (ltu:SI (const_int 1) (const_int 0)), which doesn't mean what it says. > Fortunately, simplify_const_relational_operation is passed the > original mode of the comparison (cmp_mode, the original mode of op0) > which can be checked for MODE_CC, even when op0 is now VOIDmode > (const_int) after the substitution. Defending against this is clearly the > right thing to do. > > More controversially, rather than just abort simplification/optimization > in this case, we can use the comparison operator to infer/select the > semantics of the CC_MODE flag. Hopefully, whenever a backend uses LTU, > it represents the (set) carry flag (and behaves like i386.md), in which > case the result of the simplified expression is the first operand. > [If there's no standardization of semantics across backends, then > we should always just return 0; but then miss potential optimizations]. On PowerPC, ltu means the result of an unsigned comparison (we have instructions for that, cmpl[wd][i] mainly) was "smaller than". It does not mean anything is unsigned smaller than zero. It also has nothing to do with carries, which are done via a different register (the XER). > + /* Handle MODE_CC comparisons that have been simplified to > + constants. */ > + if (GET_MODE_CLASS (mode) == MODE_CC > + && op1 == const0_rtx > + && CONST_INT_P (op0)) > +{ > + /* LTU represents the carry flag. */ > + if (code == LTU) > + return op0 == const0_rtx ? const0_rtx : const_true_rtx; > + return 0; > +} > + >/* We can't simplify MODE_CC values since we don't know what the > actual comparison is. */ ^^^ This comment is 100% true. We cannot simplify any MODE_CC comparison without having more context. The combiner does have that context when it tries to combine the CC setter with the CC consumer, for example. Do you have some piece of motivating example code? Segher
libbacktrace patch committed: Don't let "make clean" remove allocfail.sh
The script allocfail.sh was being incorrectly removed by "make clean". This patch fixes the problem. This fixes https://github.com/ianlancetaylor/libbacktrace/issues/81. Ran libbacktrace "make check" and "make clean" on x86_64-pc-linux-gnu. Committed to mainline. Ian For https://github.com/ianlancetaylor/libbacktrace/issues/81 * Makefile.am (MAKETESTS): New variable split out of TESTS. (CLEANFILES): Replace TESTS with BUILDTESTS and MAKETESTS. * Makefile.in: Regenerate. 9ed57796235abcd24e06b1ce10fe72c3d0d07cc5 diff --git a/libbacktrace/Makefile.am b/libbacktrace/Makefile.am index bf507b73918..9f8516d00e2 100644 --- a/libbacktrace/Makefile.am +++ b/libbacktrace/Makefile.am @@ -85,13 +85,19 @@ libbacktrace_la_DEPENDENCIES = $(libbacktrace_la_LIBADD) # Testsuite. -# Add a test to this variable if you want it to be built. +# Add a test to this variable if you want it to be built as a program, +# with SOURCES, etc. check_PROGRAMS = # Add a test to this variable if you want it to be run. TESTS = -# Add a test to this variable if you want it to be built and run. +# Add a test to this variable if you want it to be built as a Makefile +# target and run. +MAKETESTS = + +# Add a test to this variable if you want it to be built as a program, +# with SOURCES, etc., and run. BUILDTESTS = # Add a file to this variable if you want it to be built for testing. @@ -250,7 +256,7 @@ b2test_LDFLAGS = -Wl,--build-id b2test_LDADD = libbacktrace_elf_for_test.la check_PROGRAMS += b2test -TESTS += b2test_buildid +MAKETESTS += b2test_buildid if HAVE_DWZ @@ -260,7 +266,7 @@ b3test_LDFLAGS = -Wl,--build-id b3test_LDADD = libbacktrace_elf_for_test.la check_PROGRAMS += b3test -TESTS += b3test_dwz_buildid +MAKETESTS += b3test_dwz_buildid endif HAVE_DWZ @@ -311,11 +317,11 @@ if HAVE_DWZ cp $< $@; \ fi -TESTS += btest_dwz +MAKETESTS += btest_dwz if HAVE_OBJCOPY_DEBUGLINK -TESTS += btest_dwz_gnudebuglink +MAKETESTS += btest_dwz_gnudebuglink endif HAVE_OBJCOPY_DEBUGLINK @@ -416,7 +422,7 @@ endif HAVE_PTHREAD if HAVE_OBJCOPY_DEBUGLINK -TESTS += btest_gnudebuglink +MAKETESTS += btest_gnudebuglink %_gnudebuglink: % $(OBJCOPY) --only-keep-debug $< $@.debug @@ -494,7 +500,7 @@ endif USE_DSYMUTIL if HAVE_MINIDEBUG -TESTS += mtest_minidebug +MAKETESTS += mtest_minidebug %_minidebug: % $(NM) -D $< -P --defined-only | $(AWK) '{ print $$1 }' | sort > $<.dsyms @@ -536,10 +542,11 @@ endif HAVE_ELF check_PROGRAMS += $(BUILDTESTS) -TESTS += $(BUILDTESTS) +TESTS += $(MAKETESTS) $(BUILDTESTS) CLEANFILES = \ - $(TESTS) *.debug elf_for_test.c edtest2_build.c gen_edtest2_build \ + $(MAKETESTS) $(BUILDTESTS) *.debug elf_for_test.c edtest2_build.c \ + gen_edtest2_build \ *.dsyms *.fsyms *.keepsyms *.dbg *.mdbg *.mdbg.xz *.strip clean-local:
libbacktrace patch committed: Don't exit Mach-O dyld loop on failure
This libbacktrace patch changes the loop over dynamic libraries on Mach-O to keep going if we fail to find the debug info for a particular library. We can still pick up debug info for other libraries even if one fails. Tested on x86_64-pc-linux-gnu which admittedly does little, but others have tested it on Mach-o. Committed to mainline. Ian * macho.c (backtrace_initialize) [HAVE_MACH_O_DYLD_H]: Don't exit loop if we can't find debug info for one shared library. d8ddf1fa098fa50929ea0a1569a8e38d80fadbaf diff --git a/libbacktrace/macho.c b/libbacktrace/macho.c index 3f40811719e..16f406507d2 100644 --- a/libbacktrace/macho.c +++ b/libbacktrace/macho.c @@ -1268,7 +1268,7 @@ backtrace_initialize (struct backtrace_state *state, const char *filename, mff = macho_nodebug; if (!macho_add (state, name, d, 0, NULL, base_address, 0, error_callback, data, &mff, &mfs)) - return 0; + continue; if (mff != macho_nodebug) macho_fileline_fn = mff;
Re: [RFA] Improve initialization of objects when the initializer has trailing zeros.
On 2022/07/07 23:46, Jeff Law wrote: > This is an update to a patch originally posted by Takayuki Suwa a few months > ago. > > When we initialize an array from a STRING_CST we perform the initialization > in two steps. The first step copies the STRING_CST to the destination. The > second step uses clear_storage to initialize storage in the array beyond > TREE_STRING_LENGTH of the initializer. > > Takayuki's patch added a special case when the STRING_CST itself was all > zeros which would avoid the copy from the STRING_CST and instead do all the > initialization via clear_storage which is clearly more runtime efficient. Thank you for understanding what I mean... > Richie had the suggestion that instead of special casing when the entire > STRING_CST was NULs to instead identify when the tail of the STRING_CST was > NULs. That's more general and handles Takayuki's case as well. and offering good explanation. > Bootstrapped and regression tested on x86_64-linux-gnu. Given I rewrote > Takayuki's patch I think it needs someone else to review rather than > self-approving. LGTM and of course it resolves the beginning of the first place (https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595685.html). > > OK for the trunk? > > Jeff >
Re: [PATCH 0/2] loongarch: improve code generation for integer division
在 2022/7/7 上午10:23, Xi Ruoyao 写道: We were generating some unnecessary instructions for integer division. These two patches improve the code generation to compile template T div(T a, T b) { return a / b; } into a single division instruction (along with a return instruction of course) as we expected for T in {int32_t, uint32_t, int64_t}. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (2): loongarch: add alternatives for idiv insns to improve code generation loongarch: avoid unnecessary sign-extend after 32-bit division gcc/config/loongarch/loongarch-protos.h| 1 + gcc/config/loongarch/loongarch.cc | 2 +- gcc/config/loongarch/loongarch.md | 34 -- gcc/testsuite/gcc.target/loongarch/div-1.c | 9 ++ gcc/testsuite/gcc.target/loongarch/div-2.c | 9 ++ gcc/testsuite/gcc.target/loongarch/div-3.c | 9 ++ gcc/testsuite/gcc.target/loongarch/div-4.c | 9 ++ 7 files changed, 63 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c I am testing the spec and it can be done today or tomorrow.
[PATCH] diagnostics: Make line-ending logic consistent with libcpp [PR91733]
Hello- The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91733) points out that, while libcpp recognizes a lone '\r' as a valid line-ending character, the infrastructure that obtains source lines to be printed in diagnostics does not, and hence diagnostics do not output the intended portion of a source file that uses such line endings. The PR's author suggests that libcpp should stop accepting '\r' line endings, but that seems rather controversial and not likely to change. Fixing the diagnostics is easy enough though, and that's done by the attached patch. Please let me know if it looks OK, thanks! bootstrap + regtest all languages looks good, with just new PASSes for the new testcase. FAIL 103 103 PASS 543592 543627 UNSUPPORTED 15298 15298 UNTESTED 136 136 XFAIL 4130 4130 XPASS 20 20 -Lewis [PATCH] diagnostics: Make line-ending logic consistent with libcpp [PR91733] libcpp recognizes a lone \r as a valid line ending, so the infrastructure for retrieving source lines to be output in diagnostics needs to do the same. This patch fixes file_cache_slot::get_next_line() accordingly so that diagnostics display the correct part of the source when \r line endings are in use. gcc/ChangeLog: PR preprocessor/91733 * input.cc (find_end_of_line): New helper function. (file_cache_slot::get_next_line): Recognize \r as a line ending. * diagnostic-show-locus.cc (test_escaping_bytes_1): Adapt selftest since \r will now be interpreted as a line-ending. gcc/testsuite/ChangeLog: PR preprocessor/91733 * c-c++-common/pr91733.c: New test. diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc index 6eafe19785f..d267d2c258d 100644 --- a/gcc/diagnostic-show-locus.cc +++ b/gcc/diagnostic-show-locus.cc @@ -5508,7 +5508,7 @@ test_tab_expansion (const line_table_case &case_) static void test_escaping_bytes_1 (const line_table_case &case_) { - const char content[] = "before\0\1\2\3\r\x80\xff""after\n"; + const char content[] = "before\0\1\2\3\v\x80\xff""after\n"; const size_t sz = sizeof (content); temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz); line_table_test ltt (case_); @@ -5523,18 +5523,18 @@ test_escaping_bytes_1 (const line_table_case &case_) if (finish > LINE_MAP_MAX_LOCATION_WITH_COLS) return; - /* Locations of the NUL and \r bytes. */ + /* Locations of the NUL and \v bytes. */ location_t nul_loc = linemap_position_for_line_and_column (line_table, ord_map, 1, 7); - location_t r_loc + location_t v_loc = linemap_position_for_line_and_column (line_table, ord_map, 1, 11); gcc_rich_location richloc (nul_loc); - richloc.add_range (r_loc); + richloc.add_range (v_loc); { test_diagnostic_context dc; diagnostic_show_locus (&dc, &richloc, DK_ERROR); -ASSERT_STREQ (" before \1\2\3 \x80\xff""after\n" +ASSERT_STREQ (" before \1\2\3\v\x80\xff""after\n" " ^ ~\n", pp_formatted_text (dc.printer)); } @@ -5544,7 +5544,7 @@ test_escaping_bytes_1 (const line_table_case &case_) dc.escape_format = DIAGNOSTICS_ESCAPE_FORMAT_UNICODE; diagnostic_show_locus (&dc, &richloc, DK_ERROR); ASSERT_STREQ - (" before<80>after\n" + (" before<80>after\n" " ^~~~\n", pp_formatted_text (dc.printer)); } @@ -5552,7 +5552,7 @@ test_escaping_bytes_1 (const line_table_case &case_) test_diagnostic_context dc; dc.escape_format = DIAGNOSTICS_ESCAPE_FORMAT_BYTES; diagnostic_show_locus (&dc, &richloc, DK_ERROR); -ASSERT_STREQ (" before<00><01><02><03><0d><80>after\n" +ASSERT_STREQ (" before<00><01><02><03><0b><80>after\n" " ^~~~\n", pp_formatted_text (dc.printer)); } diff --git a/gcc/input.cc b/gcc/input.cc index 2acbfdea4f8..060ca160126 100644 --- a/gcc/input.cc +++ b/gcc/input.cc @@ -646,6 +646,37 @@ file_cache_slot::maybe_read_data () return read_data (); } +/* Helper function for file_cache_slot::get_next_line (), to find the end of + the next line. Returns with the memchr convention, i.e. nullptr if a line + terminator was not found. We need to determine line endings in the same + manner that libcpp does: any of \n, \r\n, or \r is a line ending. */ + +static char * +find_end_of_line (char *s, size_t len) +{ + for (const auto end = s + len; s != end; ++s) +{ + if (*s == '\n') + return s; + if (*s == '\r') + { + const auto next = s + 1; + if (next == end) + { + /* Don't find the line ending if \r is the very last character +in the buffer; we do not know if it's the end of the file or +just the end of what has been read so far, and we wouldn't +want to break in the middle of what's actually a \r\n +sequence. Instead, we will handle the case of a file end
[pushd][PATCH v4] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.
Pushed for trunk and gcc-12. r13-1569-gaa8fd7f65683ef. r12-8558-ge623829c18ec29 Under the LA architecture, when the stack is dropped too far, the process of dropping the stack is divided into two steps. step1: After dropping the stack, save callee saved registers on the stack. step2: The rest of it. The stack drop operation is optimized when frame->total_size minus frame->sp_fp_offset is an integer multiple of 4096, can reduce the number of instructions required to drop the stack. However, this optimization is not effective because of the original calculation method The following case: int main() { char buf[1024 * 12]; printf ("%p\n", buf); return 0; } As you can see from the generated assembler, the old GCC has two more instructions than the new GCC, lines 14 and line 24. newold 10 main: | 11 main: 11 addi.d $r3,$r3,-16 | 12 lu12i.w $r13,-12288>>12 12 lu12i.w $r13,-12288>>12 | 13 addi.d $r3,$r3,-2032 13 lu12i.w $r5,-12288>>12| 14 ori $r13,$r13,2016 14 lu12i.w $r12,12288>>12| 15 lu12i.w $r5,-12288>>12 15 st.d $r1,$r3,8 | 16 lu12i.w $r12,12288>>12 16 add.d $r12,$r12,$r5 | 17 st.d $r1,$r3,2024 17 add.d $r3,$r3,$r13| 18 add.d $r12,$r12,$r5 18 add.d $r5,$r12,$r3| 19 add.d $r3,$r3,$r13 19 la.local $r4,.LC0| 20 add.d $r5,$r12,$r3 20 bl %plt(printf) | 21 la.local $r4,.LC0 21 lu12i.w $r13,12288>>12| 22 bl %plt(printf) 22 add.d $r3,$r3,$r13| 23 lu12i.w $r13,8192>>12 23 ld.d $r1,$r3,8 | 24 ori $r13,$r13,2080 24 or $r4,$r0,$r0 | 25 add.d $r3,$r3,$r13 25 addi.d $r3,$r3,16| 26 ld.d $r1,$r3,2024 26 jr $r1 | 27 or $r4,$r0,$r0 | 28 addi.d $r3,$r3,2032 | 29 jr $r1 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_compute_frame_info): Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD or UNITS_PER_FP_REG. gcc/testsuite/ChangeLog: * gcc.target/loongarch/prolog-opt.c: New test. --- gcc/config/loongarch/loongarch.cc | 12 +--- gcc/testsuite/gcc.target/loongarch/prolog-opt.c | 15 +++ 2 files changed, 24 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/prolog-opt.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index d72b256df51..5c9a33c14f7 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -917,8 +917,12 @@ loongarch_compute_frame_info (void) frame->frame_pointer_offset = offset; /* Next are the callee-saved FPRs. */ if (frame->fmask) -offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); - frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +{ + offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG); + frame->fp_sp_offset = offset - UNITS_PER_FP_REG; +} + else +frame->fp_sp_offset = offset; /* Next are the callee-saved GPRs. */ if (frame->mask) { @@ -931,8 +935,10 @@ loongarch_compute_frame_info (void) frame->save_libcall_adjustment = x_save_size; offset += x_save_size; + frame->gp_sp_offset = offset - UNITS_PER_WORD; } - frame->gp_sp_offset = offset - UNITS_PER_WORD; + else +frame->gp_sp_offset = offset; /* The hard frame pointer points above the callee-saved GPRs. */ frame->hard_frame_pointer_offset = offset; /* Above the hard frame pointer is the callee-allocated varags save area. */ diff --git a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c new file mode 100644 index 000..0470a1f1eee --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c @@ -0,0 +1,15 @@ +/* Test that LoongArch backend stack drop operation optimized. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=lp64d" } */ +/* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */ + +extern int printf (char *, ...); + +int main() +{ + char buf[1024 * 12]; + printf ("%p\n", buf); + return 0; +} + -- 2.31.1
Re: [PATCH v2 1/7] config: use $EGREP instead of egrep
On Mon, Jun 27, 2022 at 2:07 AM Xi Ruoyao via Gcc-patches wrote: > > egrep has been deprecated in favor of grep -E for a long time, and the > next GNU grep release (3.8 or 4.0) will print a warning if egrep is used. > Unfortunately, old hosts with non-GNU grep may lack the support for -E > option. Use AC_PROG_EGREP and $EGREP variable so we'll work fine on > both old and new platforms. > > ChangeLog: > > * configure.ac: Call AC_PROG_EGREP, and use $EGREP instead of > egrep. > * config.rpath: Use $EGREP instead of egrep. config.rpath is imported from gnulib where this problem is already fixed apparently; wouldn't it make more sense to re-import a fresh config.rpath from upstream gnulib instead of patching GCC's local copy? > * configure: Regenerate. > > config/ChangeLog: > > * lib-ld.m4 (AC_LIB_PROG_LD_GNU): Call AC_PROG_EGREP, and use > $EGREP instead of egrep. > (acl_cv_path_LD): Likewise. > * lib-link.m4 (AC_LIB_RPATH): Call AC_PROG_EGREP, and pass > $EGREP to config.rpath. > > gcc/ChangeLog: > > * configure: Regenerate. > > intl/ChangeLog: > > * configure: Regenerate. > > libcpp/ChangeLog: > > * configure: Regenerate. > > libgcc/ChangeLog: > > * configure: Regenerate. > > libstdc++-v3/ChangeLog: > > * configure: Regenerate. > --- > config.rpath | 10 +-- > config/lib-ld.m4 | 6 +- > config/lib-link.m4 | 4 +- > configure | 136 - > configure.ac | 5 +- > gcc/configure | 13 ++-- > intl/configure | 9 +-- > libcpp/configure | 9 +-- > libgcc/configure | 2 +- > libstdc++-v3/configure | 9 +-- > 10 files changed, 172 insertions(+), 31 deletions(-) > > diff --git a/config.rpath b/config.rpath > index 4dea75957c2..4ada7468c22 100755 > --- a/config.rpath > +++ b/config.rpath > @@ -29,7 +29,7 @@ > #CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM > # or > #CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM > -# The environment variables CC, GCC, LDFLAGS, LD, with_gnu_ld > +# The environment variables CC, GCC, EGREP, LDFLAGS, LD, with_gnu_ld > # should be set by the caller. > # > # The set of defined variables is at the end of this script. > @@ -143,7 +143,7 @@ if test "$with_gnu_ld" = yes; then >ld_shlibs=no >;; > beos*) > - if $LD --help 2>&1 | egrep ': supported targets:.* elf' > /dev/null; > then > + if $LD --help 2>&1 | $EGREP ': supported targets:.* elf' > /dev/null; > then > : >else > ld_shlibs=no > @@ -162,9 +162,9 @@ if test "$with_gnu_ld" = yes; then > netbsd*) >;; > solaris* | sysv5*) > - if $LD -v 2>&1 | egrep 'BFD 2\.8' > /dev/null; then > + if $LD -v 2>&1 | $EGREP 'BFD 2\.8' > /dev/null; then > ld_shlibs=no > - elif $LD --help 2>&1 | egrep ': supported targets:.* elf' > /dev/null; > then > + elif $LD --help 2>&1 | $EGREP ': supported targets:.* elf' > > /dev/null; then > : >else > ld_shlibs=no > @@ -174,7 +174,7 @@ if test "$with_gnu_ld" = yes; then >hardcode_direct=yes >;; > *) > - if $LD --help 2>&1 | egrep ': supported targets:.* elf' > /dev/null; > then > + if $LD --help 2>&1 | $EGREP ': supported targets:.* elf' > /dev/null; > then > : >else > ld_shlibs=no > diff --git a/config/lib-ld.m4 b/config/lib-ld.m4 > index 11d0ce77342..88a014b7a74 100644 > --- a/config/lib-ld.m4 > +++ b/config/lib-ld.m4 > @@ -14,7 +14,8 @@ dnl From libtool-1.4. Sets the variable with_gnu_ld to yes > or no. > AC_DEFUN([AC_LIB_PROG_LD_GNU], > [AC_CACHE_CHECK([if the linker ($LD) is GNU ld], acl_cv_prog_gnu_ld, > [# I'd rather use --version here, but apparently some GNU ld's only accept > -v. > -if $LD -v 2>&1 &5; then > +AC_REQUIRE([AC_PROG_EGREP])dnl > +if $LD -v 2>&1 &5; then >acl_cv_prog_gnu_ld=yes > else >acl_cv_prog_gnu_ld=no > @@ -28,6 +29,7 @@ AC_DEFUN([AC_LIB_PROG_LD], > [ --with-gnu-ld assume the C compiler uses GNU ld [default=no]], > test "$withval" = no || with_gnu_ld=yes, with_gnu_ld=no) > AC_REQUIRE([AC_PROG_CC])dnl > +AC_REQUIRE([AC_PROG_EGREP])dnl > AC_REQUIRE([AC_CANONICAL_HOST])dnl > # Prepare PATH_SEPARATOR. > # The user is always right. > @@ -88,7 +90,7 @@ AC_CACHE_VAL(acl_cv_path_LD, ># Check to see if the program is GNU ld. I'd rather use --version, ># but apparently some GNU ld's only accept -v. ># Break only if it was the GNU/non-GNU ld that we prefer. > - if "$acl_cv_path_LD" -v 2>&1 < /dev/null | egrep '(GNU|with BFD)' > > /dev/null; then > + if "$acl_cv_path_LD" -v 2>&1 < /dev/null | $EGREP '(GNU|with BFD)' > > /dev/null; then > test "$with_gnu_ld" != no && break >else > test "$with_gnu_ld" != yes && break > diff --git a/config/lib-link.m4 b/config/lib-link.m4 > index 20e2
[PATCH] Fix tree-opt/PR106087: ICE with inline-asm with multiple output and assigned only static vars
From: Andrew Pinski The problem here is that when we mark the ssa name that was referenced in the now removed dead store (to a write only static variable), the inline-asm would also be removed even though it was defining another ssa name. This fixes the problem by checking to make sure that the statement was only defining one ssa name. OK? Bootstrapped and tested on x86_64 with no regressions. PR tree-optimization/106087 gcc/ChangeLog: * tree-ssa-dce.cc (simple_dce_from_worklist): Check to make sure the statement is only defining one operand. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/inline-asm-1.c: New test. --- gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c | 14 ++ gcc/tree-ssa-dce.cc| 5 + 2 files changed, 19 insertions(+) create mode 100644 gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c diff --git a/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c b/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c new file mode 100644 index 000..0044cb761b6 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c @@ -0,0 +1,14 @@ +/* PR tree-opt/106087, + simple_dce_from_worklist would delete the + inline-asm when it was still being referenced + by the other ssa name. */ + +static int t; + +int f(void) +{ + int tt, tt1; + asm("":"=r"(tt), "=r"(tt1)); + t = tt1; + return tt; +} diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc index bc533582673..602cdb30ceb 100644 --- a/gcc/tree-ssa-dce.cc +++ b/gcc/tree-ssa-dce.cc @@ -2061,6 +2061,11 @@ simple_dce_from_worklist (bitmap worklist) if (gimple_has_side_effects (t)) continue; + /* The defining statement needs to be defining one this name. */ + if (!is_a(t) + && !single_ssa_def_operand (t, SSA_OP_DEF)) + continue; + /* Don't remove statements that are needed for non-call eh to work. */ if (stmt_unremovable_because_of_non_call_eh_p (cfun, t)) -- 2.17.1