[PATCH] dwarf2out: Don't call expand_expr during early_dwarf [PR104407]
Hi! As mentioned in the PR, since PR96690 r11-2834 we call rtl_for_decl_init which can call expand_expr already during early_dwarf. The comment and PR explains it that the intent is to ensure the referenced vars and functions are properly mangled because free_lang_data doesn't cover everything, like template parameters etc. It doesn't work well though, because expand_expr can set DECL_RTLs e.g. on referenced vars and keep them there, and they can be created e.g. with different MEM_ALIGN compared to what they would be created with if they were emitted later. So, the following patch stops calling rtl_for_decl_init and instead for cases for which rtl_for_decl_init does anything at all walks the initializer and ensures referenced vars or functions are mangled. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-02-08 Jakub Jelinek PR debug/104407 * dwarf2out.cc (mangle_referenced_decls): New function. (tree_add_const_value_attribute): Don't call rtl_for_decl_init if early_dwarf. Instead walk the initializer and try to mangle vars or functions referenced from it. * g++.dg/debug/dwarf2/pr104407.C: New test. --- gcc/dwarf2out.cc.jj 2022-02-04 14:36:54.966605844 +0100 +++ gcc/dwarf2out.cc2022-02-07 12:28:13.000520666 +0100 @@ -20881,6 +20881,19 @@ add_location_or_const_value_attribute (d return tree_add_const_value_attribute_for_decl (die, decl); } +/* Mangle referenced decls. */ +static tree +mangle_referenced_decls (tree *tp, int *walk_subtrees, void *) +{ + if (! EXPR_P (*tp) && ! CONSTANT_CLASS_P (*tp)) +*walk_subtrees = 0; + + if (VAR_OR_FUNCTION_DECL_P (*tp)) +assign_assembler_name_if_needed (*tp); + + return NULL_TREE; +} + /* Attach a DW_AT_const_value attribute to DIE. The value of the attribute is the const value T. */ @@ -20889,7 +20902,6 @@ tree_add_const_value_attribute (dw_die_r { tree init; tree type = TREE_TYPE (t); - rtx rtl; if (!t || !TREE_TYPE (t) || TREE_TYPE (t) == error_mark_node) return false; @@ -20910,11 +20922,26 @@ tree_add_const_value_attribute (dw_die_r return true; } } - /* Generate the RTL even if early_dwarf to force mangling of all refered to - symbols. */ - rtl = rtl_for_decl_init (init, type); - if (rtl && !early_dwarf) -return add_const_value_attribute (die, TYPE_MODE (type), rtl); + if (!early_dwarf) +{ + rtx rtl = rtl_for_decl_init (init, type); + if (rtl && !early_dwarf) + return add_const_value_attribute (die, TYPE_MODE (type), rtl); +} + else +{ + /* For early_dwarf force mangling of all referenced symbols. */ + tree initializer = init; + STRIP_NOPS (initializer); + /* rtl_for_decl_init punts on other aggregates, and complex values. */ + if (AGGREGATE_TYPE_P (type) + || (TREE_CODE (initializer) == VIEW_CONVERT_EXPR + && AGGREGATE_TYPE_P (TREE_TYPE (TREE_OPERAND (initializer, 0 + || TREE_CODE (type) == COMPLEX_TYPE) + ; + else if (initializer_constant_valid_p (initializer, type)) + walk_tree (&initializer, mangle_referenced_decls, NULL, NULL); +} /* If the host and target are sane, try harder. */ if (CHAR_BIT == 8 && BITS_PER_UNIT == 8 && initializer_constant_valid_p (init, type)) --- gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C.jj 2022-02-07 10:45:51.945041031 +0100 +++ gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C2022-02-07 10:45:09.834635340 +0100 @@ -0,0 +1,12 @@ +// PR debug/104407 +// { dg-do compile { target c++17 } } +// { dg-options "-O1 -fcompare-debug" } + +struct A { int i; long j; int k : 2; char l; } a; + +auto [ aa, bb, cc, dd ] = a; + +namespace N +{ + auto & [ m, n, o, ppp ] = a; +} Jakub
Re: [PATCH 2/4] tree-optimization/104288 - Register side effects during ranger vrp domwalk.
On Mon, Feb 7, 2022 at 3:32 PM Andrew MacLeod via Gcc-patches wrote: > > This patch adds the ability to register side effects within a block to > ranger. This is currently only for tracking non-null values. > > the rvrp_folder performs a DOM walk and then attempts to simplify and > the fold each stmt during the walk. This patch adds a call to > register_side_effects() to record any effects the stmt might have before > returning. > > This checks if the non-null property is set for any ssa-names on the > statement, and if they are, moves the state to only non-null for the > block... This allows us to adjust the property within the block as we > do the DOM walk. All further queries within the block will then come > back as non-null. ALl other queries will be from outside the block, so > they will see the same results anyway. > > Unfortunately, none of the non-null routines in gimple.cc appear to work > "in bulk", but rather, on a specific tree-ssa operand request at a > time. For this patch, I need to scan the entire statement looking for > any operand that is nonnull (or the performance impact is awful). In tree.cc we have get_nonnull_args which you'd pass the gimple_call_fntype, its expense is allocating a bitmap (if any arg is nonnull) which you'd then need to iterate over and free. I think that's an OK overhead for not duplicating the walk? > I took the code in the various infer_nonnull routines, and packed it all > together into the two routines block_apply_nonnull && > infer_nonnull_call_attributes, which basically perform the same > functionality as infer_nonnull_range_by_dereference and > infer_nonnull_range_by_attribute, only on all the operands at once. I > think its right, but you may want to have a look. I intend to leverage > this code in GCC13 for a more generalized range_after_stmt mechanism > that will do away with the immediate-use visits of the current non-null +void +non_null_ref::block_apply_nonnull (gimple *s) +{ + if (!flag_delete_null_pointer_checks) +return; + if (is_a (s)) +return; + if (gimple_code (s) == GIMPLE_ASM || gimple_clobber_p (s)) +return; + if (is_a (s)) +{ + infer_nonnull_call_attributes (as_a (s)); + return; +} + walk_stmt_load_store_ops (s, (void *)this, non_null_loadstore, + non_null_loadstore); Both ASM and calls can have pointer dereferences embedded so I wonder whether you should do the walk_stmt_load_store_ops anyway. Note that any exceptional control flow (EH or abnormal) will not have realized the outputs on a stmt (LHS of call, outputs of asms), so nonnull carries over only across fallthru or conditional edges. I suppose realizing the LHS could be postponed for GCC 13, for calls you could still walk loads so you get the non-nullness of aggregate arguments like foo (*p) after the stmt. But you can of course do all this for GCC 13 only. > This patch also has zero effect on generated code as none of the > nonnull_deref_p() queries make use of the new granularity available > yet. It does set the table for the next patch. Performance impact of > this is also marginal, about a 1% hit to EVRP/VRP. > > Bootstraps with no regressions. OK for trunk? > > Andrew > > >
[PATCH] c++: Remove superflous assert [PR104403]
Hi! I've added the assert because start_decl diagnoses such vars for C++20 and earlier: if (current_function_decl && VAR_P (decl) && DECL_DECLARED_CONSTEXPR_P (current_function_decl) && cxx_dialect < cxx23) but as can be seen, we cam trigger the assert in older standards e.g. during non-manifestly constant evaluation. Rather than refining the assert that DECL_EXPRs for such vars don't appear for C++20 and older if they are inside of functions declared constexpr this patch just removes the assert, the code rejects encountering those vars in constant expressions anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-02-08 Jakub Jelinek PR c++/104403 * constexpr.cc (cxx_eval_constant_expression): Don't assert DECL_EXPRs of TREE_STATIC vars may only appear in -std=c++23. * g++.dg/cpp0x/lambda/lambda-104403.C: New test. --- gcc/cp/constexpr.cc.jj 2022-02-06 11:16:22.353814431 +0100 +++ gcc/cp/constexpr.cc 2022-02-07 11:38:59.131165171 +0100 @@ -6652,7 +6652,6 @@ cxx_eval_constant_expression (const cons /* Allow __FUNCTION__ etc. */ && !DECL_ARTIFICIAL (r)) { - gcc_assert (cxx_dialect >= cxx23); if (!ctx->quiet) { if (CP_DECL_THREAD_LOCAL_P (r)) --- gcc/testsuite/g++.dg/cpp0x/lambda/lambda-104403.C.jj2022-02-07 11:40:32.760847269 +0100 +++ gcc/testsuite/g++.dg/cpp0x/lambda/lambda-104403.C 2022-02-07 11:40:06.109223079 +0100 @@ -0,0 +1,8 @@ +// PR c++/104403 +// { dg-do compile { target c++11 } } + +int +main () +{ + []{ switch (0) { case 0: static int value; return &value; } }; +} Jakub
[PATCH] c++: Don't emit repeated -Wshadow warnings for templates/ctors [PR104379]
Hi! The following patch suppresses extraneous -Wshadow warnings. On the testcase without the patch we emit 14 -Wshadow warnings, with the patch just 4. It is enough to warn once e.g. during parsing of the template or the abstract ctor, while previously we'd warn also on the clones of the ctors and on instantiation. In GCC 8 and earlier we didn't warn because check_local_shadow did /* Inline decls shadow nothing. */ if (DECL_FROM_INLINE (decl)) return; Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-02-08 Jakub Jelinek PR c++/104379 * name-lookup.cc (check_local_shadow): When diagnosing shadowing of a member or global declaration, add warning suppression for the decl and don't warn again on it. * g++.dg/warn/Wshadow-18.C: New test. --- gcc/cp/name-lookup.cc.jj2022-01-18 11:58:59.0 +0100 +++ gcc/cp/name-lookup.cc 2022-02-07 15:00:19.781820426 +0100 @@ -3296,18 +3296,22 @@ check_local_shadow (tree decl) /* Warn if a variable shadows a non-function, or the variable is a function or a pointer-to-function. */ - if (!OVL_P (member) - || TREE_CODE (decl) == FUNCTION_DECL - || (TREE_TYPE (decl) - && (TYPE_PTRFN_P (TREE_TYPE (decl)) - || TYPE_PTRMEMFUNC_P (TREE_TYPE (decl) + if ((!OVL_P (member) +|| TREE_CODE (decl) == FUNCTION_DECL +|| (TREE_TYPE (decl) +&& (TYPE_PTRFN_P (TREE_TYPE (decl)) +|| TYPE_PTRMEMFUNC_P (TREE_TYPE (decl) + && !warning_suppressed_p (decl, OPT_Wshadow)) { auto_diagnostic_group d; if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wshadow, "declaration of %qD shadows a member of %qT", decl, current_nonlambda_class_type ()) && DECL_P (member)) - inform_shadowed (member); + { + inform_shadowed (member); + suppress_warning (decl, OPT_Wshadow); + } } return; } @@ -3319,14 +3323,18 @@ check_local_shadow (tree decl) || (TREE_CODE (old) == TYPE_DECL && (!DECL_ARTIFICIAL (old) || TREE_CODE (decl) == TYPE_DECL))) - && !instantiating_current_function_p ()) + && !instantiating_current_function_p () + && !warning_suppressed_p (decl, OPT_Wshadow)) /* XXX shadow warnings in outer-more namespaces */ { auto_diagnostic_group d; if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wshadow, "declaration of %qD shadows a global declaration", decl)) - inform_shadowed (old); + { + inform_shadowed (old); + suppress_warning (decl, OPT_Wshadow); + } return; } --- gcc/testsuite/g++.dg/warn/Wshadow-18.C.jj 2022-02-07 15:01:56.509456240 +0100 +++ gcc/testsuite/g++.dg/warn/Wshadow-18.C 2022-02-07 15:03:34.951067880 +0100 @@ -0,0 +1,22 @@ +// PR c++/104379 +// { dg-do compile } +// { dg-options "-Wshadow" } + +int x; + +template +struct S +{ + int i; + S(int i) { (void) i; } // { dg-warning "declaration of 'i' shadows a member of 'S'" } + S(float x) { (void) x; } // { dg-warning "declaration of 'x' shadows a global declaration" } + S(int *p) { int a = 1; (void) p; (void) a; + { int a = 2; (void) a; } }// { dg-warning "declaration of 'a' shadows a previous local" } +}; + +S i(1); +S j(1); +S k(1.0f); +S l(1.0f); +S m(&x); +S n(&x); Jakub
Re: [PATCH 3/4] tree-optimization/104288 - Update non-null interface to provide better precision.
On Mon, Feb 7, 2022 at 3:33 PM Andrew MacLeod via Gcc-patches wrote: > > This changes the non-null interface to provide more granularity and > facilitate more precise queries. > > Previously, we could only ask if NAME was nonnull anywhere in the block, > and it applied to the entire block. now we can ask: >bool always_nonnull_p (tree name, basic_block bb);// all uses are > nonnull in the block >bool contains_nonnull_p (tree name, basic_block bb); // there is a > nonnull use somewhere in the block. How is the latter useful? To me it seems that bool nonnull_at_p (tree name, gimple *stmt); // 'name' is nonnull at the execution of STMT would be (or nonnull_from_p with starting with STMT), nonnull_from_p could be overloaded with basic_block even. I see the patch uses contains_nonnull_p for range_on_exit. In the previous patch I commented that for : *p = foo (*q); // fallthru + EH on the fallthru edge we know p and q are not NULL but on the EH edge we only know q is not NULL. [with -fnon-call-exceptions we have a representational issue since the *q load might trap but we cannot currenty separate that EH on GIMPLE] So given that wouldn't we instead need range_on_edge () instead of range_on_exit? At least contains_nonnull_p cannot be used for range_on_exit in case we derive nonnull from *p? Btw, for -fnon-call-exceptions all ranges derived from the possibly throwing stmt are not realized on the exceptional path. Like for : _1 = _3 / _5; // fallthru + EH on the fallthru edge we have _5 ~[0,0] but on the EH edge we do not (in fact there we have _5 equals to [0,0] in this specific case). Short-term it might be neccessary to not derive ranges from a stmt that throws internally but it would be nice to sort those issues out? > Ranger was using this query for on-entry and range_of_expr within > block. Range-on-entry was simply wrong.. it should have been asking if > non-null was true in the dominator of this block. And this PR > demonstrate the issue with range_of_expr. > > This patch changes all the places which checked the state of non-null. > An Audit of every use now: > > ranger_cache::exit_range - Applies non-null if contains_nonnull (bb) is true > ranger_cache::range_of_expr - Applies non-null only if always_nonnull (bb). > ranger_cache::fill_block_cache - marks a block to have on-entry > calculated if predecessor contains_nonnull. > ranger_cache::range_from_dom - Applies non-null if contains_nonnull() is > true in a visited dominator block. > > gimple_ranger::range_of_expr - Applies non-null only if always_nonnull > in the block. > gimple_ranger::range_on_entry - Checks if any dominator block > contains_nonnull. > gimple_range::range_on_exit - Calls one of the above 2 routines, then > checks if contains_nonull in this block > > gimple_range_path has 2 uses in range_defined_in_block() and and > adjust_for_non_null_uses. Aldy audited both, and as the values are used > at the end of the path only, so both are correct in using > contains_nonull. So there should be no impact to threading from this. > > This patch allows us to properly optimize: > > void h(int result) __attribute__((noipa)); > void h(int result) > { > if (result) > __builtin_exit(0); > } > > void foo (void *p) __attribute__((nonnull(1))); > > void bar (void *p) > { >h (p == 0); >foo (p); >if (!p) > __builtin_abort (); > } > > to > >_1 = p_3(D) == 0B; >_2 = (int) _1; >h (_2); >foo (p_3(D)); >return; > > > From a risk perspective, I don't think this patch can make us more > incorrect than we were before. > > Oh, and compilation speed. All the patches combined with checking a > little less than 1% regression overall in EVRP/VRP. The audit exposed > that the threader was invoking an unnecessary DOM lookup version of the > query, and as a result, we see a 1.3% reduction in the time spent in the > threader. the overall compile time impact turned out to be a slight > improvement of 0.02% overall in 380 GCC files (according to valgrind) > so its basically all good. > > Bootstraps with no regressions. OK for trunk? > > Andrew
Re: [PATCH 4/4] [GCC11] tree-optimization/104288 - range on entry should check dominators for non-null.
On Mon, Feb 7, 2022 at 3:34 PM Andrew MacLeod via Gcc-patches wrote: > > The patches resolves the issue for GCC11 in a much simpler way. > > By default, ranger and EVRP are running in hybrid mode. This means if > ranger misses something, evrp will pick up the slack. This enables us to > change the 2 places which check for non-null to ignore potentially > incorrect block-wide information and only query dominator blocks for > on-entry ranges. This allows ranger to be conservative, and EVRP will > pick up the intra-block changes. > > Bootstraps with no regressions. OK for gcc 11 branch? OK. Thanks, Richard. > >
Re: [PATCH 5/4] tree-optimization/104288 - An alternative approach
On Tue, Feb 8, 2022 at 2:33 AM Andrew MacLeod via Gcc-patches wrote: > > On 2/7/22 09:29, Andrew MacLeod wrote: > > I have a proposal for PR 104288. > > > > ALL patches (in sequence) bootstrap on their own and each cause no > > regressions. > > I've been continuing to work with this towards the GCC13 solution for > statement side effects. And There is another possibility we could > pursue which is less intrusive > > I adapted the portions of patch 2/4 which process nonnull, but changes > it from being in the nonnull class to being in the cache. > > THe main difference is it continues to use a single bit, just changing > all uses of it to *always* mean its non-null on exit to the block. > > Range-on-entry is changed to only check dominator blocks, and > range_of_expr doesn't check the bit at all.. in both ranger and the cache. > > When we are doing the block walk and process side effects, the on-entry > *cache* range for the block is adjusted to be non-null instead. The > statements which precede this stmt have already been processed, and all > remaining statements in this block will now pick up this new non-value > from the on-entry cache. This should be perfectly safe. > > The final tweak is when range_of_expr is looking the def block, it > normally does not have an on entry cache value. (the def is in the > block, so there is no entry cache value). It now checks to see if we > have set one, which can only happen when we have been doing the > statement walk and set the on-entry cache with a non-null value. This > allows us to pick up the non-null range in the def block too... (once we > have passed a statement which set nonnull of course). > > THis patch provides much less change to trunk, and is probably a better > approach as its closer to what is going to happen in GCC13. > > Im still working with it, so the patch isnt fully refined yet... but it > bootstraps and passes all test with no regressions. And passes the new > tests too. I figured I would mention this before you look to hard at > the other patchset.the GCC11 patch doesn't change. > > Let me know if you prefer this I think I do :-) less churn, and > closer to end state. Yes, I very much prefer this - some comments to the other patches still apply to this one. Like using get_nonnull_args and probably adding a bail-out to computing ranges from stmts that can throw internally or have abnormal control flow (to not get into range-on-exit vs. normal vs. exceptional or abnormal edges). Richard. > > Andrew > > > > >
Re: [PATCH] dwarf2out: Don't call expand_expr during early_dwarf [PR104407]
On Tue, 8 Feb 2022, Jakub Jelinek wrote: > Hi! > > As mentioned in the PR, since PR96690 r11-2834 we call rtl_for_decl_init > which can call expand_expr already during early_dwarf. The comment and PR > explains it that the intent is to ensure the referenced vars and functions > are properly mangled because free_lang_data doesn't cover everything, like > template parameters etc. It doesn't work well though, because expand_expr free_lang_data mangling also only runs when generating LTO bytecode... > can set DECL_RTLs e.g. on referenced vars and keep them there, and they can > be created e.g. with different MEM_ALIGN compared to what they would be > created with if they were emitted later. > So, the following patch stops calling rtl_for_decl_init and instead > for cases for which rtl_for_decl_init does anything at all walks the > initializer and ensures referenced vars or functions are mangled. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2022-02-08 Jakub Jelinek > > PR debug/104407 > * dwarf2out.cc (mangle_referenced_decls): New function. > (tree_add_const_value_attribute): Don't call rtl_for_decl_init if > early_dwarf. Instead walk the initializer and try to mangle vars or > functions referenced from it. > > * g++.dg/debug/dwarf2/pr104407.C: New test. > > --- gcc/dwarf2out.cc.jj 2022-02-04 14:36:54.966605844 +0100 > +++ gcc/dwarf2out.cc 2022-02-07 12:28:13.000520666 +0100 > @@ -20881,6 +20881,19 @@ add_location_or_const_value_attribute (d >return tree_add_const_value_attribute_for_decl (die, decl); > } > > +/* Mangle referenced decls. */ > +static tree > +mangle_referenced_decls (tree *tp, int *walk_subtrees, void *) > +{ > + if (! EXPR_P (*tp) && ! CONSTANT_CLASS_P (*tp)) > +*walk_subtrees = 0; > + > + if (VAR_OR_FUNCTION_DECL_P (*tp)) > +assign_assembler_name_if_needed (*tp); > + > + return NULL_TREE; > +} > + > /* Attach a DW_AT_const_value attribute to DIE. The value of the > attribute is the const value T. */ > > @@ -20889,7 +20902,6 @@ tree_add_const_value_attribute (dw_die_r > { >tree init; >tree type = TREE_TYPE (t); > - rtx rtl; > >if (!t || !TREE_TYPE (t) || TREE_TYPE (t) == error_mark_node) > return false; > @@ -20910,11 +20922,26 @@ tree_add_const_value_attribute (dw_die_r > return true; > } > } > - /* Generate the RTL even if early_dwarf to force mangling of all refered to > - symbols. */ > - rtl = rtl_for_decl_init (init, type); > - if (rtl && !early_dwarf) > -return add_const_value_attribute (die, TYPE_MODE (type), rtl); > + if (!early_dwarf) > +{ > + rtx rtl = rtl_for_decl_init (init, type); > + if (rtl && !early_dwarf) the !early_dwarf is redundant Otherwise looks good to me, please give Jason the opportunity to comment though. Thanks, Richard. > + return add_const_value_attribute (die, TYPE_MODE (type), rtl); > +} > + else > +{ > + /* For early_dwarf force mangling of all referenced symbols. */ > + tree initializer = init; > + STRIP_NOPS (initializer); > + /* rtl_for_decl_init punts on other aggregates, and complex values. */ > + if (AGGREGATE_TYPE_P (type) > + || (TREE_CODE (initializer) == VIEW_CONVERT_EXPR > + && AGGREGATE_TYPE_P (TREE_TYPE (TREE_OPERAND (initializer, 0 > + || TREE_CODE (type) == COMPLEX_TYPE) > + ; > + else if (initializer_constant_valid_p (initializer, type)) > + walk_tree (&initializer, mangle_referenced_decls, NULL, NULL); > +} >/* If the host and target are sane, try harder. */ >if (CHAR_BIT == 8 && BITS_PER_UNIT == 8 >&& initializer_constant_valid_p (init, type)) > --- gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C.jj 2022-02-07 > 10:45:51.945041031 +0100 > +++ gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C 2022-02-07 > 10:45:09.834635340 +0100 > @@ -0,0 +1,12 @@ > +// PR debug/104407 > +// { dg-do compile { target c++17 } } > +// { dg-options "-O1 -fcompare-debug" } > + > +struct A { int i; long j; int k : 2; char l; } a; > + > +auto [ aa, bb, cc, dd ] = a; > + > +namespace N > +{ > + auto & [ m, n, o, ppp ] = a; > +} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
[committed] libgomp: Fix segfault with posthumous orphan tasks [PR104385]
Hi! The following patch fixes crashes with posthumous orphan tasks. When a parent task finishes, gomp_clear_parent clears the parent pointers of its children tasks present in the parent->children_queue. But children that are still waiting for dependencies aren't in that queue yet, they will be added there only when the sibling they are waiting for exits. Unfortunately we were adding those tasks into the queues with the original task->parent which then causes crashes because that task is gone and freed. The following patch fixes that by clearing the parent field when we schedule such task for running by adding it into the queues and we know that the sibling task which is about to finish has NULL parent. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk so far, queued for backporting. 2022-02-08 Jakub Jelinek PR libgomp/104385 * task.c (gomp_task_run_post_handle_dependers): If parent is NULL, clear task->parent. * testsuite/libgomp.c/pr104385.c: New test. --- libgomp/task.c.jj 2022-01-11 23:11:23.914268738 +0100 +++ libgomp/task.c 2022-02-07 18:33:14.399083055 +0100 @@ -1248,6 +1248,8 @@ gomp_task_run_post_handle_dependers (str } } } + else + task->parent = NULL; if (taskgroup) { priority_queue_insert (PQ_TASKGROUP, &taskgroup->taskgroup_queue, --- libgomp/testsuite/libgomp.c/pr104385.c.jj 2022-02-07 18:45:36.677690756 +0100 +++ libgomp/testsuite/libgomp.c/pr104385.c 2022-02-07 18:45:31.005770166 +0100 @@ -0,0 +1,26 @@ +/* PR libgomp/104385 */ + +#include + +int +main () +{ + int j = 0; + #pragma omp parallel shared(j) num_threads(2) + { +#pragma omp barrier +#pragma omp master +#pragma omp task shared(j) +{ + #pragma omp task depend(out: j) shared(j) + { +usleep (1); +j = 1; + } + + #pragma omp task depend(inout: j) shared(j) + j += 1; +} + } + return j - 2; +} Jakub
Re: [PATCH] [vect] Add vect_recog_cond_expr_convert_pattern.
On Mon, Jan 24, 2022 at 2:01 PM liuhongt via Gcc-patches wrote: > > The pattern converts (cond (cmp a b) (convert c) (convert d)) > to (convert (cond (cmp a b) c d)) when > 1) types_match (c, d) > 2) single_use for (convert c) and (convert d) > 3) TYPE_PRECISION (TREE_TYPE (c)) == TYPE_PRECISION (TREE_TYPE (a)) > 4) INTEGERAL_TYPE_P (TREE_TYPE (c)) > > The pattern can save packing of mask and data(partial for data, 2 vs > 1). > > Bootstrapped and regtested for x86_64-pc-linux-gnu{-m32,} > and x86_64-pc-linux-gnu{-m32\ -march=native,\ -march=native} on CLX. > Ok for trunk? Sorry for the slow response, see for some comments below > gcc/ChangeLog: > > PR target/103771 > * match.pd (cond_expr_convert_p): New match. > * tree-vect-patterns.cc (gimple_cond_expr_convert_p): Declare. > (vect_recog_cond_expr_convert_pattern): New. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr103771-2.c: New test. > --- > gcc/match.pd | 8 +++ > gcc/testsuite/gcc.target/i386/pr103771-2.c | 8 +++ > gcc/tree-vect-patterns.cc | 83 ++ > 3 files changed, 99 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr103771-2.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index c68eed70a26..5808c4561ee 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -7647,3 +7647,11 @@ and, > to the number of trailing zeroes. */ > (match (ctz_table_index @1 @2 @3) >(rshift (mult (bit_and:c (negate @1) @1) INTEGER_CST@2) INTEGER_CST@3)) > + > +(match (cond_expr_convert_p @0 @2 @3 @6) > + (cond (simple_comparison@6 @0 @1) (convert@4 @2) (convert@5 @3)) > + (if (types_match (TREE_TYPE (@2), TREE_TYPE (@3)) But in principle @2 or @3 could safely differ in sign, you'd then need to ensure to insert sign conversions to @2/@3 to the signedness of @4/@5. > + && INTEGRAL_TYPE_P (type) > + && INTEGRAL_TYPE_P (TREE_TYPE (@2)) > + && single_use (@4) > + && single_use (@5 > diff --git a/gcc/testsuite/gcc.target/i386/pr103771-2.c > b/gcc/testsuite/gcc.target/i386/pr103771-2.c > new file mode 100644 > index 000..962a3a74ecf > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr103771-2.c > @@ -0,0 +1,8 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=cascadelake -O3" } */ > +/* { dg-final { scan-assembler-not "kunpck" } } */ > +/* { dg-final { scan-assembler-not "kand" } } */ > +/* { dg-final { scan-assembler-not "kor" } } */ > +/* { dg-final { scan-assembler-not "kshift" } } */ > + > +#include "pr103771.c" > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index bea04992160..cbdfa96789c 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -924,6 +924,88 @@ vect_reassociating_reduction_p (vec_info *vinfo, >return true; > } > > +/* match.pd function to match > + (cond (cmp@3 a b) (convert@1 c) (convert@2 d)) > + with conditions: > + 1) there's single_use for both @1 and @2. > + 2) c and d have same type. > + record a and c and d and @3. */ > + > +extern bool gimple_cond_expr_convert_p (tree, tree*, tree (*)(tree)); > + > +/* Function vect_recog_cond_expr_convert > + > + Try to find the following pattern: > + > + TYPE1 A, B; > + TYPE2 C,D; > + TYPE3 E; > + TYPE3 op_true = (TYPE3)A; > + TYPE4 op_false = (TYPE3)B; > + > + E = C cmp D ? op_true : op_false; > + > + where > + TYPE_PRECISION (TYPE1) != TYPE_PRECISION (TYPE3); you are not testing for this anywhere? Btw, matching up the comments with the types is somewhat difficult, maybe using TYPE_AB, TYPE_CD, TYPE_E instead of 1,2,3 will make that easier ;) > + TYPE_PRECISION (TYPE1) == TYPE_PRECISION (TYPE2); > + single_use of op_true and op_false. > + > + Input: > + > + * STMT_VINFO: The stmt from which the pattern search begins. > + here it starts with E = c cmp D ? op_true : op_false; > + > + Output: > + > + TYPE1 E' = C cmp D ? A : B; > + TYPE3 E = (TYPE3) E'; > + > + * TYPE_OUT: The vector type of the output of this pattern. > + > + * Return value: A new stmt that will be used to replace the sequence of > + stmts that constitute the pattern. In this case it will be: > + E = (TYPE3)E'; > + E' = C cmp D ? A : B; is recorded in pattern definition statements; */ > + > +static gimple * > +vect_recog_cond_expr_convert_pattern (vec_info *vinfo, > + stmt_vec_info stmt_vinfo, tree > *type_out) > +{ > + gassign *last_stmt = dyn_cast (stmt_vinfo->stmt); > + tree lhs, match[4], temp, type, new_lhs; > + gimple *cond_stmt; > + gimple *pattern_stmt; > + > + if (!last_stmt) > +return NULL; > + > + lhs = gimple_assign_lhs (last_stmt); > + > + /* Find E = C cmp D ? (TYPE3) A ? (TYPE3) B; > + TYPE_PRECISION (A) == TYPE_PRECISION (C). */ > + if (!gimple_cond_expr_convert_p (lhs, &match[0], NULL) > + || (TYPE_PRECISION (TREE_TYPE (match[0])) > + != TYPE_PRECISION (TREE_TYP
[committed][nvptx] Fix .local atomic regressions
Hi, In PR target/104364, two problems were reported: - in muniform-simt mode, an atom.cas insn is no longer executed in the "master lane" only. - in msoft-stack mode, an __atomic_compare_exchange_n on stack memory is translated assuming it accesses local memory, while that's not the case. Fix these by: - ensuring that all insns with atomic attribute are also predicable, such that the validate_change in nvptx_reorg_uniform_simt will succeed, and asserting that it does, and - guarding the local atomics implementation with a new function nvptx_mem_local_p that correctly handles msoft-stack. Tested on x86_64 with nvptx accelerator. Committed to trunk. Thanks, - Tom [nvptx] Fix .local atomic regressions gcc/ChangeLog: 2022-02-04 Tom de Vries PR target/104364 * config/nvptx/nvptx-protos.h (nvptx_mem_local_p): Declare. * config/nvptx/nvptx.cc (nvptx_reorg_uniform_simt): Assert that change is validated. (nvptx_mem_local_p): New function. * config/nvptx/nvptx.md: Use nvptx_mem_local_p. (define_c_enum "unspecv"): Add UNSPECV_CAS_LOCAL. (define_insn "atomic_compare_and_swap_1_local"): New non-atomic, non-predicable define_insn, factored out of ... (define_insn "atomic_compare_and_swap_1"): ... here. Make predicable again. (define_expand "atomic_compare_and_swap"): Use atomic_compare_and_swap_1_local. gcc/testsuite/ChangeLog: 2022-02-04 Tom de Vries PR target/104364 * gcc.target/nvptx/softstack-2.c: New test. * gcc.target/nvptx/uniform-simt-1.c: New test. --- gcc/config/nvptx/nvptx-protos.h | 1 + gcc/config/nvptx/nvptx.cc | 25 +- gcc/config/nvptx/nvptx.md | 63 + gcc/testsuite/gcc.target/nvptx/softstack-2.c| 11 + gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c | 18 +++ 5 files changed, 87 insertions(+), 31 deletions(-) diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h index 3d6ad148cb4..a846e341917 100644 --- a/gcc/config/nvptx/nvptx-protos.h +++ b/gcc/config/nvptx/nvptx-protos.h @@ -59,5 +59,6 @@ extern const char *nvptx_output_simt_enter (rtx, rtx, rtx); extern const char *nvptx_output_simt_exit (rtx); extern const char *nvptx_output_red_partition (rtx, rtx); extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int); +extern bool nvptx_mem_local_p (rtx); #endif #endif diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index b3bb97c3c14..2a694926b7a 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -3150,7 +3150,8 @@ nvptx_reorg_uniform_simt () rtx pred = nvptx_get_unisimt_predicate (); pred = gen_rtx_NE (BImode, pred, const0_rtx); pat = gen_rtx_COND_EXEC (VOIDmode, pred, pat); - validate_change (insn, &PATTERN (insn), pat, false); + bool changed_p = validate_change (insn, &PATTERN (insn), pat, false); + gcc_assert (changed_p); } } @@ -6894,6 +6895,28 @@ nvptx_libc_has_function (enum function_class fn_class, tree type) return default_libc_has_function (fn_class, type); } +bool +nvptx_mem_local_p (rtx mem) +{ + gcc_assert (GET_CODE (mem) == MEM); + + struct address_info info; + decompose_mem_address (&info, mem); + + if (info.base != NULL && REG_P (*info.base) + && REGNO_PTR_FRAME_P (REGNO (*info.base))) +{ + if (TARGET_SOFT_STACK) + { + /* Frame-related doesn't mean local. */ + } + else + return true; +} + + return false; +} + #undef TARGET_OPTION_OVERRIDE #define TARGET_OPTION_OVERRIDE nvptx_option_override diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 92768dd9e95..d64dbfd8b33 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -54,6 +54,7 @@ (define_c_enum "unspec" [ (define_c_enum "unspecv" [ UNSPECV_LOCK UNSPECV_CAS + UNSPECV_CAS_LOCAL UNSPECV_XCHG UNSPECV_BARSYNC UNSPECV_WARPSYNC @@ -1771,8 +1772,14 @@ (define_expand "atomic_compare_and_swap" (match_operand:SI 7 "const_int_operand")] ;; failure model "" { - emit_insn (gen_atomic_compare_and_swap_1 -(operands[1], operands[2], operands[3], operands[4], operands[6])); + if (nvptx_mem_local_p (operands[2])) +emit_insn (gen_atomic_compare_and_swap_1_local + (operands[1], operands[2], operands[3], operands[4], +operands[6])); + else +emit_insn (gen_atomic_compare_and_swap_1 + (operands[1], operands[2], operands[3], operands[4], +operands[6])); rtx cond = gen_reg_rtx (BImode); emit_move_insn (cond, gen_rtx_EQ (BImode, operands[1], operands[3])); @@ -1780,23 +1787,18 @@ (define_expand "atomic_compare_and_swap" DONE; }) -(define_insn "atomic_compare_and_swap_1" +(define_insn "atomic_compare_and_swap_1_local" [(set (match_operand:SD
[committed][testsuite] Require c99_runtime to run builtin-sprintf.c
Hi, On nvptx, I run into an execution failure in test-case gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh' modifier. The port uses newlib, which does by default not support that modifier. There's a configure option --enable-newlib-io-c99-formats to enable this support, but that requires alloca support, which nvptx doesn't have. Fix this by requiring c99_runtime for running the test-case. Tested on nvptx. Committed to trunk. Thanks, - Tom [testsuite] Require c99_runtime to run builtin-sprintf.c gcc/testsuite/ChangeLog: 2022-02-07 Tom de Vries * gcc.dg/tree-ssa/builtin-sprintf.c: Require c99_runtime for dg-do run. --- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c index f90558e9b7e..9368a2e0e50 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c @@ -3,7 +3,8 @@ constant folding. With optimization enabled the test will fail to link if any of the assertions fails. Without optimization the test aborts at runtime if any of the assertions fails. */ -/* { dg-do run } */ +/* { dg-do run { target c99_runtime } } */ +/* { dg-do link { target { ! c99_runtime } } } */ /* { dg-skip-if "not IEEE float layout" { "pdp11-*-*" } } */ /* { dg-additional-options "-O2 -Wall -Wno-pedantic -fprintf-return-value" } */
[committed][testsuite] Require c99_runtime to run builtin-sprintf.c
Hi, On nvptx, I run into an execution failure in test-case gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh' modifier. The port uses newlib, which does by default not support that modifier. There's a configure option --enable-newlib-io-c99-formats to enable this support, but that requires alloca support, which nvptx doesn't have. Fix this by requiring c99_runtime for running the test-case. Tested on nvptx. Committed to trunk. Thanks, - Tom [testsuite] Require c99_runtime to run builtin-sprintf.c gcc/testsuite/ChangeLog: 2022-02-07 Tom de Vries * gcc.dg/tree-ssa/builtin-sprintf.c: Require c99_runtime for dg-do run. --- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c index f90558e9b7e..9368a2e0e50 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c @@ -3,7 +3,8 @@ constant folding. With optimization enabled the test will fail to link if any of the assertions fails. Without optimization the test aborts at runtime if any of the assertions fails. */ -/* { dg-do run } */ +/* { dg-do run { target c99_runtime } } */ +/* { dg-do link { target { ! c99_runtime } } } */ /* { dg-skip-if "not IEEE float layout" { "pdp11-*-*" } } */ /* { dg-additional-options "-O2 -Wall -Wno-pedantic -fprintf-return-value" } */
Re: [PATCH] doc: invoke: RISC-V: Clean up the -mstrict-align wording
LGTM, thanks :) On Tue, Feb 8, 2022 at 12:30 PM Palmer Dabbelt wrote: > > The polarity of do/do not was reversed for this option when compored to > the rest of them. This seems to have been copied from PowerPC, when the > polarity of the arguments in the docs was reversed (presumably to match > the default), but appears to have never made sense on RISC-V. > > gcc/ChangeLog > > * doc/invoke.texi (RISC-V -mstrict-align): Re-word the do/do not > language. > > Signed-off-by: Palmer Dabbelt > --- > gcc/doc/invoke.texi | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 0ebe538ccdc..5e8af05e359 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -27702,7 +27702,7 @@ integer load/stores only. > @item -mstrict-align > @itemx -mno-strict-align > @opindex mstrict-align > -Do not or do generate unaligned memory accesses. The default is set > depending > +Do or do not generate unaligned memory accesses. The default is set > depending > on whether the processor we are optimizing for supports fast unaligned access > or not. > > -- > 2.34.1 >
Re: [PATCH] doc: invoke: RISC-V: Clean up the -memit-attribute wording
Hi Palmer: Seems like...you update the wrong part? you are updating -mmcount-ra-address/-mno-mcount-ra-address rather than -memit-attribute/ -mno-emit-attribute ? On Tue, Feb 8, 2022 at 12:30 PM Palmer Dabbelt wrote: > > The previous wording makes it sound like "do not emit" is a > clarification of "emit", which is not accurate. This cleans up the > wording to match (most) of the other arguments. > > gcc/ChangeLog > > * doc/invoke.texi (RISC-V -memit-attribute): Re-word the do/do > not language. > > Signed-off-by: Palmer Dabbelt > --- > gcc/doc/invoke.texi | 11 +-- > 1 file changed, 5 insertions(+), 6 deletions(-) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 8bd5293fce7..0ebe538ccdc 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -26294,12 +26294,11 @@ assembler and the linker alone without help from > the compiler. > @itemx -mno-mcount-ra-address > @opindex mmcount-ra-address > @opindex mno-mcount-ra-address > -Emit (do not emit) code that allows @code{_mcount} to modify the > -calling function's return address. When enabled, this option extends > -the usual @code{_mcount} interface with a new @var{ra-address} > -parameter, which has type @code{intptr_t *} and is passed in register > -@code{$12}. @code{_mcount} can then modify the return address by > -doing both of the following: > +Do or do not emit code that allows @code{_mcount} to modify the calling > +function's return address. When enabled, this option extends the usual > +@code{_mcount} interface with a new @var{ra-address} parameter, which has > type > +@code{intptr_t *} and is passed in register @code{$12}. @code{_mcount} can > +then modify the return address by doing both of the following: > @itemize > @item > Returning the new address in register @code{$31}. > -- > 2.34.1 >
Re: [PATCH] doc: invoke: RISC-V: Clean up the -mstrict-align wording
On Feb 07 2022, Palmer Dabbelt wrote: > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 0ebe538ccdc..5e8af05e359 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -27702,7 +27702,7 @@ integer load/stores only. > @item -mstrict-align > @itemx -mno-strict-align > @opindex mstrict-align > -Do not or do generate unaligned memory accesses. The default is set > depending I think the logic is that -mstrict-align is "do not" and -mno-strict-align is "do". -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth
> I don't think an option to go to pre-12 behavior is useful. I'll > postpone the series to stage1. FWIW fine with me. -- Eric Botcazou
Re: [Patch, fortran] PR37336 (Finalization) - [F03] Finish derived-type finalization
Hi Harald, Thanks for giving the patch a whirl. > the parent components as an array. I strongly suspect that, from reading > > 7.5.6.2 paragraphs 2 and 3 closely, that ifort has it right. However, > this > > is another issue to come back to in the future. > > Could you specify which version of Intel you tried? > ifort (IFORT) 2021.1 Beta 20201112 > > Testcase finalize_38.f90 fails for me with ifort 2021.5.0 with: > > 131 > That's the point where the interpretation of the standard diverges. Ifort uses the scalar finalization for the parent component, whereas gfortran uses the rank 1. Thus the final count is different by one. I have a version of the patch, where gfortran behaves in the same way as ifort. > This test also fails with crayftn 11 & 12 and nagfor 7.0, > but in a different place. > > > (Also finalize_45.f90 fails with that version with something that > looks like memory corruption, but that might be just a compiler bug.) > I take it 'that version' is of ifort? Mine does the same. I suspect that it is one of the perils of using pointer components in such circumstances! You will notice that I had to nullify pointer components when doing the copy. > > Thanks, > Harald > Could you use the attached version of finalize_38.f90 with crayftn and NAG? All the stop statements are replaced with prints. Ifort gives: 131 3 2 132 0 4 133 5 6 | 0 0 141 4 3 151 7 5 152 3 0 153 0 0 | 1 3 161 13 9 162 20 0 163 0 0 | 10 20 171 14 11 Best regards Paul ! { dg-do run } ! ! Test finalization on intrinsic assignment (F2018 (7.5.6.3)) ! module testmode implicit none type :: simple integer :: ind contains final :: destructor1, destructor2 end type simple type, extends(simple) :: complicated real :: rind contains final :: destructor3, destructor4 end type complicated integer :: check_scalar integer :: check_array(4) real :: check_real real :: check_rarray(4) integer :: final_count = 0 contains subroutine destructor1(self) type(simple), intent(inout) :: self check_scalar = self%ind check_array = 0 final_count = final_count + 1 end subroutine destructor1 subroutine destructor2(self) type(simple), intent(inout) :: self(:) check_scalar = 0 check_array(1:size(self, 1)) = self%ind final_count = final_count + 1 end subroutine destructor2 subroutine destructor3(self) type(complicated), intent(inout) :: self check_real = self%rind check_array = 0.0 final_count = final_count + 1 end subroutine destructor3 subroutine destructor4(self) type(complicated), intent(inout) :: self(:) check_real = 0.0 check_rarray(1:size(self, 1)) = self%rind final_count = final_count + 1 end subroutine destructor4 function constructor1(ind) result(res) type(simple), allocatable :: res integer, intent(in) :: ind allocate (res, source = simple (ind)) end function constructor1 function constructor2(ind, rind) result(res) class(simple), allocatable :: res(:) integer, intent(in) :: ind(:) real, intent(in), optional :: rind(:) type(complicated), allocatable :: src(:) integer :: sz integer :: i if (present (rind)) then sz = min (size (ind, 1), size (rind, 1)) src = [(complicated (ind(i), rind(i)), i = 1, sz)] allocate (res, source = src) else sz = size (ind, 1) allocate (res, source = [(simple (ind(i)), i = 1, sz)]) end if end function constructor2 subroutine test (cnt, scalar, array, off, rind, rarray) integer :: cnt integer :: scalar integer :: array(:) integer :: off real, optional :: rind real, optional :: rarray(:) if (final_count .ne. cnt) print *, 1 + off, final_count, cnt if (check_scalar .ne. scalar) print *, 2 + off, check_scalar, scalar if (any (check_array(1:size (array, 1)) .ne. array)) print *, 3 + off, & check_array(1:size (array, 1)), "|", array if (present (rind)) then if (check_real .ne. rind) print *, 4+off, check_real, rind end if if (present (rarray)) then if (any (check_rarray(1:size (rarray, 1)) .ne. rarray)) print *, 5 + off, & check_rarray(1:size (rarray, 1)), "|", rarray end if end subroutine test end module testmode program test_final use testmode implicit none type(simple), allocatable :: MyType, MyType2 type(simple), allocatable :: MyTypeArray(:) type(simple) :: ThyType = simple(21), ThyType2 = simple(22) class(simple), allocatable :: MyClass class(simple), allocatable :: MyCl
Re: [PATCH] RISC-V: Add target machine headers as a dependency for riscv-sr.o
On Mon, 7 Feb 2022, Kito Cheng wrote: > OK to trunk, thanks for fixing this issue, I hit that issue before but > I didn't figure out what happened...since that issue will disappear > when I clean build :p Committed to trunk now, thanks for you review. How about release branches? Maciej
Re: [PATCH v2] doc: RISC-V: Document the `-misa-spec=' option
On Mon, 7 Feb 2022, Palmer Dabbelt wrote: > > > Good point. I have updated the text to your suggested wording, which is > > > also what I would use if I were to propose it (modulo capitalisation). I > > > will commit the change as included here shortly then unless I hear an > > > objection. > > I wasn't sure about the capitalization either, but that's how it is on the > website. It's in a regular-looking sentence, not a title where I'd expect the > extra capital, so I guess the whole spec name is a proper noun? That has been my understanding as well. > I don't really care either way, just chiming in as I don't see this on trunk. > The original message sounded to me like you intended on committing it, but > just in case I misunderstood and you're waiting for me I wasn't waiting for you to ack, I just gave people at least one full working day to chime in, as not everyone works 24/7. > Reviewed-by: Palmer Dabbelt > Acked-by: Palmer Dabbelt I think we're not using such tags (maybe we should), so I haven't added them to the commit message at this time. I have committed this change now. Thank you and Kito for your reviews. Maciej
Re: [PATCH] RISC-V/testsuite: Run target testing over all the usual optimization levels
On Tue, 8 Feb 2022, Kito Cheng wrote: > Thanks for doing this, OK to trunk. I didn't realise we still have a separate ChangeLog for the testsuite and I have mechanically updated the entry accordingly. Change applied now with that update, thanks for your review. Maciej
[PATCH v4][GCC13] RISC-V: Provide `fmin'/`fmax' RTL patterns
As at r2.2 of the RISC-V ISA specification[1] (equivalent to version 2.0 of the "F" and "D" standard architecture extensions for single-precision and double-precision floating-point respectively) the FMIN and FMAX machine instructions fully match our requirement for the `fminM3' and `fmaxM3' standard RTL patterns: "For FMIN and FMAX, if at least one input is a signaling NaN, or if both inputs are quiet NaNs, the result is the canonical NaN. If one operand is a quiet NaN and the other is not a NaN, the result is the non-NaN operand." suitably for the IEEE 754-2008 `minNum' and `maxNum' operations. However we only define `sminM3' and `smaxM3' standard RTL patterns to produce the FMIN and FMAX machine instructions, which in turn causes the `__builtin_fmin' and `__builtin_fmax' family of intrinsics to emit the corresponding libcalls rather than the relevant machine instructions. This is according to earlier revisions of the RISC-V ISA specification, which we however do not support anymore, as from commit 4b81528241ca ("RISC-V: Support version controling for ISA standard extensions"). As from r20190608 of the RISC-V ISA specification (equivalent to version 2.2 of the "F" and "D" standard ISA extensions for single-precision and double-precision floating-point respectively) the definition of the FMIN and FMAX machine instructions has been updated[2]: "Defined the signed-zero behavior of FMIN.fmt and FMAX.fmt, and changed their behavior on signaling-NaN inputs to conform to the minimumNumber and maximumNumber operations in the proposed IEEE 754-201x specification." and specifically[3]: "Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For the purposes of these instructions only, the value -0.0 is considered to be less than the value +0.0. If both inputs are NaNs, the result is the canonical NaN. If only one operand is a NaN, the result is the non-NaN operand. Signaling NaN inputs set the invalid operation exception flag, even when the result is not NaN." Consequently for forwards compatibility with r20190608+ hardware we cannot use the FMIN and FMAX machine instructions unconditionally even where the ISA level of r2.2 has been specified with the `-misa-spec=2.2' option where operation would be different between ISA revisions, that is the handling of signaling NaN inputs. Therefore provide new `fmin3' and `fmax3' patterns removing the need to emit libcalls with the `__builtin_fmin' and `__builtin_fmax' family of intrinsics, however limit them to where `-fno-signaling-nans' is in effect, deferring to other code generation strategies otherwise as applicable. Use newly-defined UNSPECs as the operation codes so that the patterns are only ever used if referred to by their names, as there is no RTL expression defined for the IEEE 754-2008 `minNum' and `maxNum' operations. References: [1] "The RISC-V Instruction Set Manual, Volume I: User-Level ISA", Document Version 2.2, May 7, 2017, Section 8.3 "NaN Generation and Propagation", p. 48 [1] "The RISC-V Instruction Set Manual, Volume I: Unprivileged ISA", Document Version 20190608-Base-Ratified, June 8, 2019, "Preface", p. ii [2] same, Section 11.6 "Single-Precision Floating-Point Computational Instructions", p. 66 gcc/ * config/riscv/riscv.md (UNSPEC_FMIN, UNSPEC_FMAX): New constants. (fmin3, fmax3): New insns. gcc/testsuite/ * gcc.target/riscv/fmax-snan.c: New test. * gcc.target/riscv/fmax.c: New test. * gcc.target/riscv/fmaxf-snan.c: New test. * gcc.target/riscv/fmaxf.c: New test. * gcc.target/riscv/fmin-snan.c: New test. * gcc.target/riscv/fmin.c: New test. * gcc.target/riscv/fminf-snan.c: New test. * gcc.target/riscv/fminf.c: New test. * gcc.target/riscv/smax-ieee.c: New test. * gcc.target/riscv/smax.c: New test. * gcc.target/riscv/smaxf-ieee.c: New test. * gcc.target/riscv/smaxf.c: New test. * gcc.target/riscv/smin-ieee.c: New test. * gcc.target/riscv/smin.c: New test. * gcc.target/riscv/sminf-ieee.c: New test. * gcc.target/riscv/sminf.c: New test. --- Hi, For the record here's v4 of this change as I observed an issue with the quotation of literal `.' in `scan-assembler' regexps. As TCL strips one level of quotations before handing the string to the regexp interpreter any backslash characters need to be doubled to be received by the regexp parser. The updated test cases have been verified to still pass. Maciej Changes from v3: - Quote `\' characters in `scan-assembler' for literal `.' in regexp. - Mechanically update ChangeLog to have its entries separately for the testsuite. Changes from v2: - Use UNSPECs for the `fmin'/`fmax' patterns, so that they cannot be matched by the RTL expression. - Revert the `HONOR
[committed][nvptx] Choose -mptx default based on -misa
Hi, While testing with driver version 390.147 I ran into the problem that it doesn't support ptx isa version 6.3 (the new default), only 6.1. Furthermore, using the -mptx option is a bit user-unfriendly. Say we want to compile for sm_80. We can use -misa=sm_80 to specify that, but then run into errors because the default ptx version is 6.3, which doesn't support sm_80 yet. Address both these issues by: - picking a default -mptx based on the active -misa, and - ensuring that the default -mptx is at least 6.0 (instead of 6.3). Also add an error in case of incompatible options like "-misa=sm_80 -mptx=6.3": ... cc1: error: PTX version (-mptx) needs to be at least 7.0 to support \ selected -misa (sm_80) ... Tested on x86_64-linux with nvptx accelerator. Committed to trunk. Thanks, - Tom [nvptx] Choose -mptx default based on -misa gcc/ChangeLog: 2022-02-08 Tom de Vries PR target/104283 * config/nvptx/nvptx-opts.h (enum ptx_version): Add PTX_VERSION_3_0 and PTX_VERSION_4_2. * config/nvptx/nvptx.cc (first_ptx_version_supporting_sm) (default_ptx_version_option, ptx_version_to_string) (sm_version_to_string, handle_ptx_version_option): New function. (nvptx_option_override): Call handle_ptx_version_option. (nvptx_file_start): Use ptx_version_to_string and sm_version_to_string. * config/nvptx/nvptx.md (define_insn "nvptx_shuffle") (define_insn "nvptx_vote_ballot"): Use TARGET_PTX_6_0. * config/nvptx/nvptx.opt (mptx): Remove 'Init'. --- gcc/config/nvptx/nvptx-opts.h | 2 + gcc/config/nvptx/nvptx.cc | 133 +- gcc/config/nvptx/nvptx.md | 4 +- gcc/config/nvptx/nvptx.opt| 2 +- 4 files changed, 122 insertions(+), 19 deletions(-) diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h index c754a5193ce..cc488b23720 100644 --- a/gcc/config/nvptx/nvptx-opts.h +++ b/gcc/config/nvptx/nvptx-opts.h @@ -31,7 +31,9 @@ enum ptx_isa enum ptx_version { + PTX_VERSION_3_0, PTX_VERSION_3_1, + PTX_VERSION_4_2, PTX_VERSION_6_0, PTX_VERSION_6_3, PTX_VERSION_7_0 diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index 006fac8c839..1b0227a2c31 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -205,6 +205,109 @@ diagnose_openacc_conflict (bool optval, const char *optname) error ("option %s is not supported together with %<-fopenacc%>", optname); } +static enum ptx_version +first_ptx_version_supporting_sm (enum ptx_isa sm) +{ + switch (sm) +{ +case PTX_ISA_SM30: + return PTX_VERSION_3_0; +case PTX_ISA_SM35: + return PTX_VERSION_3_1; +case PTX_ISA_SM53: + return PTX_VERSION_4_2; +case PTX_ISA_SM75: + return PTX_VERSION_6_3; +case PTX_ISA_SM80: + return PTX_VERSION_7_0; +default: + gcc_unreachable (); +} +} + +static enum ptx_version +default_ptx_version_option (void) +{ + enum ptx_version first += first_ptx_version_supporting_sm ((enum ptx_isa) ptx_isa_option); + + /* Pick a version that supports the sm. */ + enum ptx_version res = first; + + /* Pick at least 3.1. This has been the smallest version historically. */ + res = MAX (res, PTX_VERSION_3_1); + + /* Pick at least 6.0, to enable using bar.warp.sync to have a way to force + warp convergence. */ + res = MAX (res, PTX_VERSION_6_0); + + /* Verify that we pick a version that supports the sm. */ + gcc_assert (first <= res); + return res; +} + +static const char * +ptx_version_to_string (enum ptx_version v) +{ + switch (v) +{ +case PTX_VERSION_3_0: + return "3.0"; +case PTX_VERSION_3_1: + return "3.1"; +case PTX_VERSION_4_2: + return "4.2"; +case PTX_VERSION_6_0: + return "6.0"; +case PTX_VERSION_6_3: + return "6.3"; +case PTX_VERSION_7_0: + return "7.0"; +default: + gcc_unreachable (); +} +} + +static const char * +sm_version_to_string (enum ptx_isa sm) +{ + switch (sm) +{ +case PTX_ISA_SM30: + return "30"; +case PTX_ISA_SM35: + return "35"; +case PTX_ISA_SM53: + return "53"; +case PTX_ISA_SM70: + return "70"; +case PTX_ISA_SM75: + return "75"; +case PTX_ISA_SM80: + return "80"; +default: + gcc_unreachable (); +} +} + +static void +handle_ptx_version_option (void) +{ + if (!OPTION_SET_P (ptx_version_option)) +{ + ptx_version_option = default_ptx_version_option (); + return; +} + + enum ptx_version first += first_ptx_version_supporting_sm ((enum ptx_isa) ptx_isa_option); + + if (ptx_version_option < first) +error ("PTX version (-mptx) needs to be at least %s to support selected" + " -misa (sm_%s)", ptx_version_to_string (first), + sm_version_to_string ((enum ptx_isa)ptx_isa_option)); +} + /* Implement TARGET_OPTION_OVERRIDE. */ static void @@ -212,6 +315,8 @@ nvptx_opt
[PATCH] nvptx: Tweak constraints on copysign instructions.
Many thanks to Thomas Schwinge for confirming my hypothesis that the register usage regression, PR target/104345, is solely due to libgcc's _muldc3 function. In addition to the isinf functionality in the previously proposed nvptx patch at https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588453.html which significantly reduces the number of instructions in _muldc3, the patch below further reduces both the number of instructions and the number of explicitly declared registers, by permitting floating point constant immediate operands in nvptx's copysign instruction. Fingers-crossed, the combination with all of the previous proposed nvptx patches improves things. Ultimately, increasing register usage from 50 to 51 registers, reducing the number of concurrent threads by ~2%, can easily be countered if we're now executing significantly fewer instructions in each kernel, for a net performance win. This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu with a "make" and "make -k check" with no new failures. Ok for mainline? 2022-02-08 Roger Sayle gcc/ChangeLog * config/nvptx/nvptx.md (copysign3): Allow immediate floating point constants as operands 1 and/or 2. Thanks in advance, Roger -- diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 92768dd..0f34792 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -1059,8 +1059,8 @@ (define_insn "copysign3" [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R") - (unspec:SDFM [(match_operand:SDFM 1 "nvptx_register_operand" "R") - (match_operand:SDFM 2 "nvptx_register_operand" "R")] + (unspec:SDFM [(match_operand:SDFM 1 "nvptx_nonmemory_operand" "RF") + (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")] UNSPEC_COPYSIGN))] "" "%.\\tcopysign%t0\\t%0, %2, %1;")
Re: [committed][nvptx] Choose -mptx default based on -misa
On 2/8/22 13:57, Tom de Vries via Gcc-patches wrote: +static const char * +sm_version_to_string (enum ptx_isa sm) +{ + switch (sm) +{ +case PTX_ISA_SM30: + return "30"; +case PTX_ISA_SM35: + return "35"; +case PTX_ISA_SM53: + return "53"; +case PTX_ISA_SM70: + return "70"; This broke the nvptx build, PTX_ISA_SM70 is not defined yet, that was done in another patch, that I didn't commit yet. I'm doing a rebuild with a patch to fix this, and will commit afterwards. Sorry for the disturbance. Thanks, - Tom
Re: [committed][nvptx] Choose -mptx default based on -misa
Hi Tom, if I understand the patch correctly, -misa=sm_53 -mptx=3.1 will ... On 08.02.22 13:57, Tom de Vries via Gcc-patches wrote: Furthermore, using the -mptx option is a bit user-unfriendly. Say we want to compile for sm_80. We can use -misa=sm_80 to specify that, but then run into errors because the default ptx version is 6.3, which doesn't support sm_80 yet. ... Also add an error in case of incompatible options like "-misa=sm_80 -mptx=6.3": ... cc1: error: PTX version (-mptx) needs to be at least 7.0 to support \ selected -misa (sm_80) ... output that -mptx needs to be at least 4.1. I think that's okay but -mptx only supports the values 3.1, 6.3, and 7.0. As the default is ptx = 6.3, it only occurs when both are specified in the way shown above. Thus, we can live with that. (Misleading message for odd corner case, only. In particular, I am not sure we really want to add another PTX version...) (Which points to the question of supporting multilib configurations – as discussed before: https://gcc.gnu.org/pipermail/gcc/2022-February/238224.html ) Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [committed][nvptx] Choose -mptx default based on -misa
On 2/8/22 14:24, Tobias Burnus wrote: Hi Tom, if I understand the patch correctly, -misa=sm_53 -mptx=3.1 will ... On 08.02.22 13:57, Tom de Vries via Gcc-patches wrote: Furthermore, using the -mptx option is a bit user-unfriendly. Say we want to compile for sm_80. We can use -misa=sm_80 to specify that, but then run into errors because the default ptx version is 6.3, which doesn't support sm_80 yet. ... Also add an error in case of incompatible options like "-misa=sm_80 -mptx=6.3": ... cc1: error: PTX version (-mptx) needs to be at least 7.0 to support \ selected -misa (sm_80) ... output that -mptx needs to be at least 4.1. Um, you mean 4.2 I suppose: ... $ ./gcc.sh ~/hello.c -misa=sm_53 -mptx=3.1 cc1: error: PTX version (-mptx) needs to be at least 4.2 to support selected -misa (sm_53) ... ? I think that's okay but -mptx only supports the values 3.1, 6.3, and 7.0. I know. I'm sort of hoping that the new default setting will make using -mptx unnecessary. As the default is ptx = 6.3, Well, that's no longer the case, that's one of the things that this patch changes. It's by default the maximum of: - 6.0, and - the minimal ptx isa required by the sm version it only occurs when both are specified in the way shown above. Thus, we can live with that. (Misleading message for odd corner case, only. In particular, I am not sure we really want to add another PTX version...) Agreed, it's misleading, but I'm hoping people will just specify the sm version. Thanks, - Tom
[committed] libstdc++: Fix filesystem::remove_all for Windows [PR104161]
On Sat, 5 Feb 2022 at 01:08, Jonathan Wakely wrote: > > On Fri, 4 Feb 2022 at 23:55, Jonathan Wakely wrote: > > +// Used to implement filesystem::remove_all. > > +fs::recursive_directory_iterator& > > +fs::recursive_directory_iterator::__erase(error_code* ecptr) > > +{ > > + error_code ec; > > + if (!_M_dirs) > > +{ > > + ec = std::make_error_code(errc::invalid_argument); > > + return *this; > > +} > > + > > + // We never want to skip permission denied when removing files. > > + const bool skip_permission_denied = false; > > + // We never want to follow directory symlinks when removing files. > > + const bool nofollow = true; > > + > > + // Loop until we find something we can remove. > > + while (!ec) > > +{ > > + auto& top = _M_dirs->top(); > > + > > + if (top.entry._M_type == file_type::directory) > > + { > > + _Dir dir = top.open_subdir(skip_permission_denied, nofollow, ec); > > + if (!ec) > > + { > > + __glibcxx_assert(dir.dirp != nullptr); > > + if (dir.advance(skip_permission_denied, ec)) > > + { > > + // Non-empty directory, recurse into it. > > + _M_dirs->push(std::move(dir)); > > + continue; > > + } > > + if (!ec) > > + { > > + // Directory is empty so we can remove it. > > + if (top.rmdir(ec)) > > + break; // Success > > + } > > + } > > + } > > + else if (top.unlink(ec)) > > + break; // Success > > + else if (top.entry._M_type == file_type::none) > > + { > > + // We did not have a cached type, so it's possible that top.entry > > + // is actually a directory, and that's why the unlink above > > failed. > > +#ifdef EPERM > > + // POSIX.1-2017 says unlinking a directory returns EPERM, > > + // but LSB allows EISDIR too. Some targets don't even define > > EPERM. > > + if (ec.value() == EPERM || ec.value() == EISDIR) > > +#else > > + if (ec.value() == EISDIR) > > +#endif > > This doesn't work on Windows because the top.unlink(ec) sets a Windows > error using the system category, so doesn't match the errno values > here. > > I have a fix. > > > std::uintmax_t > > fs::remove_all(const path& p) > > { > > - return fs::do_remove_all(p, ErrorReporter{"cannot remove all", p}); > > + uintmax_t count = 0; > > + auto st = filesystem::status(p); > > + if (!exists(st)) > > +return 0; > > + if (is_directory(st)) > > Gah, this remove_all(const path&) overload was supposed to be using > the same logic as the one below with an error_code parameter. > > I'll fix it on Monday. Here's that fix. Tested x86_64-linux, powerpc-aix, x86_64-w64-mingw. Pushed to trunk. commit 5750952bec1e632d1f804f4a1bed2f74c0f3b189 Author: Jonathan Wakely Date: Mon Feb 7 23:36:47 2022 libstdc++: Fix filesystem::remove_all for Windows [PR104161] The recursive_directory_iterator::__erase member was failing for Windows, because the entry._M_type value is always file_type::none (because _Dir_base::advance doesn't populate it for Windows) and top.unlink uses fs::remove which sets an error using the system_category. That meant that ec.value() was a Windows error code and not an errno value, so the comparisons to EPERM and EISDIR failed. Instead of depending on a specific Windows error code for attempting to remove a directory, just use directory_entry::refresh() to query the type first. This doesn't avoid the TOCTTOU races with directory symlinks, but we can't avoid them on Windows without openat and unlinkat, and creating symlinks requires admin privs on Windows anyway. This also fixes the fs::remove_all(const path&) overload, which was supposed to use the same logic as the other overload, but I forgot to change it before my previous commit. libstdc++-v3/ChangeLog: PR libstdc++/104161 * src/c++17/fs_dir.cc (fs::recursive_directory_iterator::__erase): [i_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Refresh entry._M_type member, instead of checking for errno values indicating a directory. * src/c++17/fs_ops.cc (fs::remove_all(const path&)): Use similar logic to non-throwing overload. (fs::remove_all(const path&, error_code&)): Add comments. * src/filesystem/ops-common.h: Likewise. diff --git a/libstdc++-v3/src/c++17/fs_dir.cc b/libstdc++-v3/src/c++17/fs_dir.cc index 01b8c0d5693..54f135d2baf 100644 --- a/libstdc++-v3/src/c++17/fs_dir.cc +++ b/libstdc++-v3/src/c++17/fs_dir.cc @@ -476,6 +476,16 @@ fs::recursive_directory_iterator::__erase(error_code* ecptr) { auto& top = _M_dirs->top(); +#if _GLIBCXX_FILESYSTEM_IS_WINDOWS + // _Dir::unlink uses fs::remove which uses std::system_category() for + // Windo
[committed] libstdc++: Adjust Filesystem TS test for Windows
Tested x86_64-linux and x86_64-w64-mingw, pushed to trunk. The Filesystem TS isn't really supported for Windows, but the FAIL for this test is just because it doesn't match what happens on Windows. libstdc++-v3/ChangeLog: * testsuite/experimental/filesystem/operations/create_directories.cc: Adjust expected results for Windows. --- .../filesystem/operations/create_directories.cc | 15 +++ 1 file changed, 15 insertions(+) diff --git a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc index 8cdc7030441..03060c6cbb3 100644 --- a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc +++ b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc @@ -108,8 +108,15 @@ test02() VERIFY( !result ); VERIFY( ec == std::errc::not_a_directory ); result = create_directories(file.path/"../bar", ec); +#if defined(__MINGW32__) || defined(__MINGW64__) +VERIFY( result ); +VERIFY( !ec ); +VERIFY( is_directory(dir.path/"bar") ); +remove(dir.path/"bar"); +#else VERIFY( !result ); VERIFY( ec ); +#endif } } @@ -120,11 +127,19 @@ test03() const auto p = __gnu_test::nonexistent_path() / "/"; bool result = create_directories(p); VERIFY( result ); +#if defined(__MINGW32__) || defined(__MINGW64__) + VERIFY( exists(p/".") ); // needed due to PR libstdc++/1 +#else VERIFY( exists(p) ); +#endif remove(p); result = create_directories(p/"foo/"); VERIFY( result ); +#if defined(__MINGW32__) || defined(__MINGW64__) + VERIFY( exists(p/".") ); // needed due to PR libstdc++/1 +#else VERIFY( exists(p) ); +#endif VERIFY( exists(p/"foo") ); remove_all(p); } -- 2.34.1
Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth
Hello, On Tue, 8 Feb 2022, Richard Biener wrote: > int foo(int always_true_at_runtime) > { > { > int *p; > if (always_true_at_runtime) > goto after; > lab: > return *p; > after: > int i = 0; > p = &i; > if (always_true_at_runtime) > goto lab; > } > return 0; > } > > For the implementation I considered starting lifetime at a DECL_EXPR > if it exists and otherwise at the start of the BIND_EXPR. Note the > complication is that control flow has to reach the lifetime-start > clobber before the first access. But that might also save us here, > since for the example above 'i' would be live via the backedge > from goto lab. > > That said, the existing complication is that when the gimplifier > visits the BIND_EXPR it has no way to know whether there will be > a following DECL_EXPR or not (and the gimplifier notes there are > cases a DECL_EXPR variable is not in a BIND_EXPR). My plan is to > compute this during one of the body walks the gimplifier performs > before gimplifying. > > Another complication is that at least the C frontend + gimplifier > combination for > > switch (i) > { > case 1: > int i; > break; > } > > will end up having the BIND_EXPR mentioning 'i' containing the > case label, so just placing the birth at the BIND will make it > unreachable as > > i = BIRTH; > case_label_for_1: > ... I think anything that needs to happen (conceptually) during the jump from switch to case-label needs to happen right before the jump, that might mean creating a new fake BLOCK that contains just the jump and the associated births. > conveniently at least the C frontend has a DECL_EXPR for 'i' > which avoids this and I did not find a way (yet) in the gimplifier > to rectify this when gimplifying switches. In C a 'case' label is nothing else than a normal label, it doesn't start it's own block or the like. So also for that one the births would need to be at the start of their containing blocks. > So there's still work to do but I think that starting the lifetime at a > DECL_EXPR if it exists is the way to go? Depends on where the DECL_EXPR is exactly placed, but probably it wouldn't be okay. You can't place the birth at the fall-through path, because this needs to work (basically your jump example above rewritten as switch): int state = 2, *p, camefrom1 = 0; for (;;) switch (state) { case 1: case 2: ; int i; if (state != 1) { p = &i; i = 0; } if (state == 1) { (*p)++; return *p; } state = 1; continue; } Note how i is initialized during state 2, and needs to be kept initialized during state 1, so there must not be a CLOBBER (birth or other) at the point of the declaration of 'i'. AFAICS in my simple tests a DECL_EXPR for 'i' is placed with the statement associated with 'case 2' label, putting a CLOBBER there would be the wrong thing. If the decl had an initializer it might be harmless, as it would be overwritten at that place, but even so, in this case there is no initializer. Hmm. Another complication arises from simple forward jumps: goto forw; { int i; printf("not reachable\n"); forw: i = 1; } Despite the jump skiping the unreachable head of the BLOCK, 'i' needs to be considered birthed at the label. (In a way the placement of births exactly mirrors the placements of deaths (of course), every path from outside a BLOCK to inside needs to birth-clobber all variables (in C), like every path leaving needs to kill them. It's just that we have a convenient construct for the latter (try-finally), but not for the former) Ciao, Michael.
[committed] libstdc++: Add comment to acinclude.m4
Tested x86_64-linux, pushed to trunk. libstdc++-v3/ChangeLog: * acinclude.m4 (GLIBCXX_ENABLE_LOCK_POLICY): Add comment about checking for CAS on correct word size. --- libstdc++-v3/acinclude.m4 | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index 7cc52f4db96..f53461c85a5 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -3609,6 +3609,8 @@ AC_DEFUN([GLIBCXX_ENABLE_LOCK_POLICY], [ ac_save_CXXFLAGS="$CXXFLAGS" dnl Why do we care about 2-byte CAS on targets with 4-byte _Atomic_word?! +dnl Why don't we check 8-byte CAS for sparc64, where _Atomic_word is long?! +dnl New targets should only check for CAS for the _Atomic_word type. AC_TRY_COMPILE([ #if ! defined __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 # error "No 2-byte compare-and-swap" -- 2.34.1
Re: [PATCH] s390: Change costs for load on condition.
> Patch is ok. Thanks! As discussed off-list, committed as attached. Regards Robin commit 1e3185e714e877b2b4d14ade0865322f71a8cbf6 Author: Robin Dapp Date: Tue Feb 8 14:56:29 2022 +0100 s390: Increase costs for load on condition and change movqicc expander. This patch changes the costs for a load on condition from 5 to 6 in order to ensure that we only if-convert two and not three or more SETS like if (cond) { a = b; c = d; e = f; } In the movqicc expander we emit a paradoxical subreg directly that combine would otherwise try to create by using a non-optimal sequence (which would be too expensive). Also, fix two oversights in ifcvt testcases. gcc/ChangeLog: * config/s390/s390.cc (s390_rtx_costs): Increase costs for load on condition. * config/s390/s390.md: Use paradoxical subreg. gcc/testsuite/ChangeLog: * gcc.target/s390/ifcvt-two-insns-int.c: Fix array size. * gcc.target/s390/ifcvt-two-insns-long.c: Dito. diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index c6cfe41ad7b..d2af6d8813d 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -3636,7 +3636,7 @@ s390_rtx_costs (rtx x, machine_mode mode, int outer_code, /* It is going to be a load/store on condition. Make it slightly more expensive than a normal load. */ - *total = COSTS_N_INSNS (1) + 1; + *total = COSTS_N_INSNS (1) + 2; rtx dst = SET_DEST (x); rtx then = XEXP (SET_SRC (x), 1); diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index e3ccbac58c0..5eee8e86b42 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -7003,9 +7003,9 @@ if (!CONSTANT_P (els)) els = simplify_gen_subreg (E_SImode, els, mode, 0); - rtx tmp_target = gen_reg_rtx (E_SImode); + rtx tmp_target = simplify_gen_subreg (E_SImode, operands[0], mode, 0); + emit_insn (gen_movsicc (tmp_target, operands[1], then, els)); - emit_move_insn (operands[0], gen_lowpart (mode, tmp_target)); DONE; }) diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c index 1ced7107de0..031cc433f56 100644 --- a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c +++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c @@ -5,7 +5,6 @@ /* { dg-final { scan-assembler "lochinle\t%r.?,1" } } */ /* { dg-final { scan-assembler "locrnle\t.*" } } */ -#include #include #include #include @@ -33,7 +32,7 @@ int main() { int a[] = {2, 1, -13, INT_MAX, INT_MIN, 0}; - int res = foo (a, sizeof (a)); + int res = foo (a, sizeof (a) / sizeof (a[0])); assert (res == (INT_MIN + 1)); } diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c index 5a1173fc6ef..cd04d2ad33e 100644 --- a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c +++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c @@ -5,7 +5,6 @@ /* { dg-final { scan-assembler "locghinle\t%r.?,1" } } */ /* { dg-final { scan-assembler "locgrnle\t.*" } } */ -#include #include #include #include @@ -33,7 +32,7 @@ int main() { long a[] = {2, 1, -13, LONG_MAX, LONG_MIN, 0}; - long res = foo (a, sizeof (a)); + long res = foo (a, sizeof (a) / sizeof (a[0])); assert (res == (LONG_MIN + 1)); }
[committed][nvptx] Unbreak build, add PTX_ISA_SM70
Hi, With the commit "[nvptx] Choose -mptx default based on -misa" I introduced a use of PTX_ISA_SM70, without adding it first. Add it, as well as the corresponding TARGET_SM70. Build for x86_64 with nvptx accelerator. Committed to trunk. Thanks, - Tom [nvptx] Unbreak build, add PTX_ISA_SM70 gcc/ChangeLog: 2022-02-08 Tom de Vries * config/nvptx/nvptx-opts.h (enum ptx_isa): Add PTX_ISA_SM70. * config/nvptx/nvptx.h (TARGET_SM70): Define. --- gcc/config/nvptx/nvptx-opts.h | 1 + gcc/config/nvptx/nvptx.h | 1 + 2 files changed, 2 insertions(+) diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h index cc488b23720..e918d43ea16 100644 --- a/gcc/config/nvptx/nvptx-opts.h +++ b/gcc/config/nvptx/nvptx-opts.h @@ -25,6 +25,7 @@ enum ptx_isa PTX_ISA_SM30, PTX_ISA_SM35, PTX_ISA_SM53, + PTX_ISA_SM70, PTX_ISA_SM75, PTX_ISA_SM80 }; diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h index 065d7aa210c..edffd088b15 100644 --- a/gcc/config/nvptx/nvptx.h +++ b/gcc/config/nvptx/nvptx.h @@ -88,6 +88,7 @@ #define TARGET_SM35 (ptx_isa_option >= PTX_ISA_SM35) #define TARGET_SM53 (ptx_isa_option >= PTX_ISA_SM53) +#define TARGET_SM70 (ptx_isa_option >= PTX_ISA_SM70) #define TARGET_SM75 (ptx_isa_option >= PTX_ISA_SM75) #define TARGET_SM80 (ptx_isa_option >= PTX_ISA_SM80)
Re: [PATCH 3/4] tree-optimization/104288 - Update non-null interface to provide better precision.
On 2/8/22 03:25, Richard Biener wrote: On Mon, Feb 7, 2022 at 3:33 PM Andrew MacLeod via Gcc-patches wrote: This changes the non-null interface to provide more granularity and facilitate more precise queries. Previously, we could only ask if NAME was nonnull anywhere in the block, and it applied to the entire block. now we can ask: bool always_nonnull_p (tree name, basic_block bb);// all uses are nonnull in the block bool contains_nonnull_p (tree name, basic_block bb); // there is a nonnull use somewhere in the block. How is the latter useful? To me it seems that bool nonnull_at_p (tree name, gimple *stmt); // 'name' is nonnull at the execution of STMT would be (or nonnull_from_p with starting with STMT), nonnull_from_p could be overloaded with basic_block even. I see the patch uses contains_nonnull_p for range_on_exit. In the previous patch I commented that for Huh, I dont see that comment. but nonetheless... : *p = foo (*q); // fallthru + EH on the fallthru edge we know p and q are not NULL but on the EH edge we only know q is not NULL. [with -fnon-call-exceptions we have a representational issue since the *q load might trap but we cannot currenty separate that EH on GIMPLE] we bail all over the place if cfun->can_throw_non_call_exceptions is true anyway. So given that wouldn't we instead need range_on_edge () instead of range_on_exit? At least contains_nonnull_p cannot be used for range_on_exit in case we derive nonnull from *p? Btw, for -fnon-call-exceptions all ranges derived from the possibly throwing stmt are not realized on the exceptional path. Like for : _1 = _3 / _5; // fallthru + EH on the fallthru edge we have _5 ~[0,0] but on the EH edge we do not (in fact there we have _5 equals to [0,0] in this specific case). Short-term it might be neccessary to not derive ranges from a stmt that throws internally but it would be nice to sort those issues out? yes. So, range_on_exit is only ever called from range_on_edge. You are suggesting that I move the adjustment from range_on_exit into range_on_edge and guarded with !(e->flags & EDGE_EH) Thats pretty straightforward. Andrew
Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth
On Tue, 8 Feb 2022, Michael Matz wrote: > Hello, > > On Tue, 8 Feb 2022, Richard Biener wrote: > > > int foo(int always_true_at_runtime) > > { > > { > > int *p; > > if (always_true_at_runtime) > > goto after; > > lab: > > return *p; > > after: > > int i = 0; > > p = &i; > > if (always_true_at_runtime) > > goto lab; > > } > > return 0; > > } > > > > For the implementation I considered starting lifetime at a DECL_EXPR > > if it exists and otherwise at the start of the BIND_EXPR. Note the > > complication is that control flow has to reach the lifetime-start > > clobber before the first access. But that might also save us here, > > since for the example above 'i' would be live via the backedge > > from goto lab. > > > > That said, the existing complication is that when the gimplifier > > visits the BIND_EXPR it has no way to know whether there will be > > a following DECL_EXPR or not (and the gimplifier notes there are > > cases a DECL_EXPR variable is not in a BIND_EXPR). My plan is to > > compute this during one of the body walks the gimplifier performs > > before gimplifying. > > > > Another complication is that at least the C frontend + gimplifier > > combination for > > > > switch (i) > > { > > case 1: > > int i; > > break; > > } > > > > will end up having the BIND_EXPR mentioning 'i' containing the > > case label, so just placing the birth at the BIND will make it > > unreachable as > > > > i = BIRTH; > > case_label_for_1: > > ... > > I think anything that needs to happen (conceptually) during the jump from > switch to case-label needs to happen right before the jump, that might > mean creating a new fake BLOCK that contains just the jump and the > associated births. > > > conveniently at least the C frontend has a DECL_EXPR for 'i' > > which avoids this and I did not find a way (yet) in the gimplifier > > to rectify this when gimplifying switches. > > In C a 'case' label is nothing else than a normal label, it doesn't start > it's own block or the like. So also for that one the births would need to > be at the start of their containing blocks. > > > So there's still work to do but I think that starting the lifetime at a > > DECL_EXPR if it exists is the way to go? > > Depends on where the DECL_EXPR is exactly placed, but probably it wouldn't > be okay. You can't place the birth at the fall-through path, because this > needs to work (basically your jump example above rewritten as switch): > > int state = 2, *p, camefrom1 = 0; > for (;;) switch (state) { > case 1: > case 2: ; > int i; > if (state != 1) { p = &i; i = 0; } > if (state == 1) { (*p)++; return *p; } > state = 1; > continue; > } > > Note how i is initialized during state 2, and needs to be kept initialized > during state 1, so there must not be a CLOBBER (birth or other) at the > point of the declaration of 'i'. AFAICS in my simple tests a DECL_EXPR > for 'i' is placed with the statement associated with 'case 2' label, > putting a CLOBBER there would be the wrong thing. If the decl had an > initializer it might be harmless, as it would be overwritten at that > place, but even so, in this case there is no initializer. Hmm. You get after gimplification: state = 2; camefrom1 = 0; : switch (state) , case 1: , case 2: > { int i; try { i = {CLOBBER(birth)}; /// ignore, should go away : : i = {CLOBBER(birth)}; if (state != 1) goto ; else goto ; : p = &i; i = 0; : if (state == 1) goto ; else goto ; : _1 = *p; _2 = _1 + 1; *p = _2; D.1995 = *p; // predicted unlikely by early return (on trees) predictor. return D.1995; : state = 1; // predicted unlikely by continue predictor. goto ; } finally { i = {CLOBBER(eol)}; } } which I think is OK? That is, when the abstract machine arrives at 'int i;' then the previous content in 'i' goes away? Or would int foo() { goto ick; use: int i, *p; return *p; ick: i = 1; p = &i; goto use; } require us to return 1? With the current patch 'i' is clobbered before the return. > Another complication arises from simple forward jumps: > > goto forw; > { > int i; > printf("not reachable\n"); > forw: > i = 1; > } > > Despite the jump skiping the unreachable head of the BLOCK, 'i' needs to > be considered birthed at the label. (In a way the placement of births > exactly mirrors the placements of deaths (of course), every path from > outside a BLOCK to inside needs to birth-clobber all variables (in C), > like every path leaving needs to kill them. It's just that we have a > convenient construct for the latter (try-finally), but not for the former) The situation with an unreachable birth is handled conservatively since we consider
Fwd: [PATCH 2/4] tree-optimization/104288 - Register side effects during ranger vrp domwalk.
bah.. wrong reply function Forwarded Message Subject: Re: [PATCH 2/4] tree-optimization/104288 - Register side effects during ranger vrp domwalk. Date: Tue, 8 Feb 2022 09:52:49 -0500 From: Andrew MacLeod To: Richard Biener On 2/8/22 03:14, Richard Biener wrote: On Mon, Feb 7, 2022 at 3:32 PM Andrew MacLeod via Gcc-patches wrote: This patch adds the ability to register side effects within a block to ranger. This is currently only for tracking non-null values. the rvrp_folder performs a DOM walk and then attempts to simplify and the fold each stmt during the walk. This patch adds a call to register_side_effects() to record any effects the stmt might have before returning. This checks if the non-null property is set for any ssa-names on the statement, and if they are, moves the state to only non-null for the block... This allows us to adjust the property within the block as we do the DOM walk. All further queries within the block will then come back as non-null. ALl other queries will be from outside the block, so they will see the same results anyway. Unfortunately, none of the non-null routines in gimple.cc appear to work "in bulk", but rather, on a specific tree-ssa operand request at a time. For this patch, I need to scan the entire statement looking for any operand that is nonnull (or the performance impact is awful). In tree.cc we have get_nonnull_args which you'd pass the gimple_call_fntype, its expense is allocating a bitmap (if any arg is nonnull) which you'd then need to iterate over and free. I think that's an OK overhead for not duplicating the walk? I will experiment. It would be nice to be able to pass an optional bitmap into that routine Since I have bitmap obstacks and temporary bitmaps laying around. Would you be OK if I added an optional bitmap pointer to that? no allocs would be handy I suspect. I took the code in the various infer_nonnull routines, and packed it all together into the two routines block_apply_nonnull && infer_nonnull_call_attributes, which basically perform the same functionality as infer_nonnull_range_by_dereference and infer_nonnull_range_by_attribute, only on all the operands at once. I think its right, but you may want to have a look. I intend to leverage this code in GCC13 for a more generalized range_after_stmt mechanism that will do away with the immediate-use visits of the current non-null +void +non_null_ref::block_apply_nonnull (gimple *s) +{ + if (!flag_delete_null_pointer_checks) + return; + if (is_a (s)) + return; + if (gimple_code (s) == GIMPLE_ASM || gimple_clobber_p (s)) + return; + if (is_a (s)) + { + infer_nonnull_call_attributes (as_a (s)); + return; + } + walk_stmt_load_store_ops (s, (void *)this, non_null_loadstore, + non_null_loadstore); Both ASM and calls can have pointer dereferences embedded so I wonder whether you should do the walk_stmt_load_store_ops anyway. Note that any exceptional control flow (EH or abnormal) for ASM I just did what infer_nonnull_range_by_dereference does: if (!flag_delete_null_pointer_checks || !POINTER_TYPE_P (TREE_TYPE (op)) || gimple_code (stmt) == GIMPLE_ASM || gimple_clobber_p (stmt)) return false; if (walk_stmt_load_store_ops (stmt, (void *)op, check_loadstore, check_loadstore)) ASMs are also skipped in infer_nonnull_range_by_attribute(), and the rest of that routine only deals with calls. The calls.. yes, perhaps, it shgould also fall through. I was thinking that calls were handled by the attribute, but with the || condition it could be looking for load/stores too. will not have realized the outputs on a stmt (LHS of call, outputs of asms), so nonnull carries over only across fallthru or conditional edges. I suppose realizing the LHS could be postponed for GCC 13, for calls you could still walk loads so you get the non-nullness of aggregate arguments like foo (*p) after the stmt. But you can of course do all this for GCC 13 only. The LHS should get covered via the gimple-range-fold mechanism that produces the range for the LHS. fold_using_range::range_of_call() handles the calls already: else if (gimple_call_nonnull_result_p (call) || gimple_call_nonnull_arg (call)) r = range_nonzero (type); DO we handle asms differently somewhere else? the infer() code seems to skip them completely. Andrew
Re: [PATCH 5/4] tree-optimization/104288 - An alternative approach
On 2/8/22 03:32, Richard Biener wrote: On Tue, Feb 8, 2022 at 2:33 AM Andrew MacLeod via Gcc-patches wrote: On 2/7/22 09:29, Andrew MacLeod wrote: I have a proposal for PR 104288. ALL patches (in sequence) bootstrap on their own and each cause no regressions. I've been continuing to work with this towards the GCC13 solution for statement side effects. And There is another possibility we could pursue which is less intrusive I adapted the portions of patch 2/4 which process nonnull, but changes it from being in the nonnull class to being in the cache. THe main difference is it continues to use a single bit, just changing all uses of it to *always* mean its non-null on exit to the block. Range-on-entry is changed to only check dominator blocks, and range_of_expr doesn't check the bit at all.. in both ranger and the cache. When we are doing the block walk and process side effects, the on-entry *cache* range for the block is adjusted to be non-null instead. The statements which precede this stmt have already been processed, and all remaining statements in this block will now pick up this new non-value from the on-entry cache. This should be perfectly safe. The final tweak is when range_of_expr is looking the def block, it normally does not have an on entry cache value. (the def is in the block, so there is no entry cache value). It now checks to see if we have set one, which can only happen when we have been doing the statement walk and set the on-entry cache with a non-null value. This allows us to pick up the non-null range in the def block too... (once we have passed a statement which set nonnull of course). THis patch provides much less change to trunk, and is probably a better approach as its closer to what is going to happen in GCC13. Im still working with it, so the patch isnt fully refined yet... but it bootstraps and passes all test with no regressions. And passes the new tests too. I figured I would mention this before you look to hard at the other patchset.the GCC11 patch doesn't change. Let me know if you prefer this I think I do :-) less churn, and closer to end state. Yes, I very much prefer this - some comments to the other patches still apply to this one. Like using get_nonnull_args and probably adding a bail-out to computing ranges from stmts that can throw internally or have abnormal control flow (to not get into range-on-exit vs. normal vs. exceptional or abnormal edges). Richard. Yeah, I like this one way better too. I will address and repost Andrew
Re: [PATCH 3/4] tree-optimization/104288 - Update non-null interface to provide better precision.
On Tue, Feb 8, 2022 at 3:53 PM Andrew MacLeod wrote: > > On 2/8/22 03:25, Richard Biener wrote: > > On Mon, Feb 7, 2022 at 3:33 PM Andrew MacLeod via Gcc-patches > > wrote: > >> This changes the non-null interface to provide more granularity and > >> facilitate more precise queries. > >> > >> Previously, we could only ask if NAME was nonnull anywhere in the block, > >> and it applied to the entire block. now we can ask: > >> bool always_nonnull_p (tree name, basic_block bb);// all uses are > >> nonnull in the block > >> bool contains_nonnull_p (tree name, basic_block bb); // there is a > >> nonnull use somewhere in the block. > > How is the latter useful? To me it seems that > > > >bool nonnull_at_p (tree name, gimple *stmt); > > // 'name' is nonnull at the execution of STMT > > > > would be (or nonnull_from_p with starting with STMT), nonnull_from_p could > > be overloaded with basic_block even. > > > > I see the patch uses contains_nonnull_p for range_on_exit. In the previous > > patch I commented that for > Huh, I dont see that comment. but nonetheless... > >: > > *p = foo (*q); // fallthru + EH > > > > on the fallthru edge we know p and q are not NULL but on the EH edge > > we only know q is not NULL. [with -fnon-call-exceptions we have a > > representational issue since the *q load might trap but we cannot currenty > > separate that EH on GIMPLE] > > we bail all over the place if cfun->can_throw_non_call_exceptions is > true anyway. > > > > > > So given that wouldn't we instead need range_on_edge () instead of > > range_on_exit? At least contains_nonnull_p cannot be used for > > range_on_exit in case we derive nonnull from *p? > > > > > Btw, for -fnon-call-exceptions all ranges derived from the possibly throwing > > stmt are not realized on the exceptional path. Like for > > > > : > > _1 = _3 / _5; // fallthru + EH > > > > on the fallthru edge we have _5 ~[0,0] but on the EH edge we do not > > (in fact there we have _5 equals to [0,0] in this specific case). > > > > Short-term it might be neccessary to not derive ranges from a > > stmt that throws internally but it would be nice to sort those issues > > out? > > > yes. > > So, range_on_exit is only ever called from range_on_edge. You are > suggesting that I move the adjustment from range_on_exit into > range_on_edge and guarded with > !(e->flags & EDGE_EH) and EDGE_ABNORMAL > Thats pretty straightforward. So yes, if you use nonnull from the dominator for EH/ABNORMAL edges and the anywhere-in-BB nonnull from the fallthru then that would work. But I guess it's easier to exchange the cfun->can_throw_non_call_exceptions checks with stmt_throws_internal () so you don't get ranges from stmts that eventually cause us to go the abnormal/EH edge, then it's valid to use the "rest" of the ranges defined in e->src on those edges as well. > > > Andrew > >
Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth
Hello, On Tue, 8 Feb 2022, Richard Biener wrote: > > int state = 2, *p, camefrom1 = 0; > > for (;;) switch (state) { > > case 1: > > case 2: ; > > int i; > > if (state != 1) { p = &i; i = 0; } > > if (state == 1) { (*p)++; return *p; } > > state = 1; > > continue; > > } > > > > Note how i is initialized during state 2, and needs to be kept initialized > > during state 1, so there must not be a CLOBBER (birth or other) at the > > point of the declaration of 'i'. AFAICS in my simple tests a DECL_EXPR > > for 'i' is placed with the statement associated with 'case 2' label, > > putting a CLOBBER there would be the wrong thing. If the decl had an > > initializer it might be harmless, as it would be overwritten at that > > place, but even so, in this case there is no initializer. Hmm. > > You get after gimplification: > > state = 2; > camefrom1 = 0; > : > switch (state) , case 1: , case 2: > > { > int i; > > try > { > i = {CLOBBER(birth)}; /// ignore, should go away > : > : > i = {CLOBBER(birth)}; I think this clobber here would be the problem, because ... > which I think is OK? That is, when the abstract machine > arrives at 'int i;' then the previous content in 'i' goes > away? Or would > > int foo() > { >goto ick; > use: >int i, *p; >return *p; > ick: >i = 1; >p = &i; >goto use; > > } > > require us to return 1? ... I think this is exactly the logical consequence of lifetime of 'i' being the whole block. We need to return 1. (Joseph: correct me on that! :) ) That's what I was trying to get at with my switch example as well. > With the current patch 'i' is clobbered before the return. > > > Another complication arises from simple forward jumps: > > > > goto forw; > > { > > int i; > > printf("not reachable\n"); > > forw: > > i = 1; > > } > > > > Despite the jump skiping the unreachable head of the BLOCK, 'i' needs to > > be considered birthed at the label. (In a way the placement of births > > exactly mirrors the placements of deaths (of course), every path from > > outside a BLOCK to inside needs to birth-clobber all variables (in C), > > like every path leaving needs to kill them. It's just that we have a > > convenient construct for the latter (try-finally), but not for the former) > > The situation with an unreachable birth is handled conservatively > since we consider a variable without a (visible at RTL expansion time) > birth to conflict with all other variables. That breaks down when a birth is there (because it was otherwise reachable) but not on the taken path: if (nevertrue) goto start; goto forw; start: { int i; printf("not really reachable, but unknowingly so\n"); forw: i = 1; } > I don't see a good way to have a birth magically appear at 'forw' > without trying to argue that the following stmt is the first mentioning > the variable. That's what my 'Hmm' aluded to :) The only correct and precise way I see is to implement something similar like try-finally topside-down. An easier but less precise way is to place the births at the (start of) innermost block containing the decl _and all jumps into the block_. Even less presice, but perhaps even easier is to place the births for decls in blocks with side-entries into the function scope (and for blocks without side entries at their start). Except for switches side-entries are really really seldom, so we might usefully get away with that latter solution. And for switches it might be okay to put the births at the block containing the switch (if it itself doesn't have side entries, and the switch block doesn't have side entries except the case labels). If the birth is moved to outward blocks it might be best if also the dealloc/death clobbers are moved to it, otherwise there might be paths containing a birth but no death. The less precise you get with those births the more non-sharing you'll get, but that's the price. Ciao, Michael.
[Patch]middle-end: updating the reg use in exit block for -fzero-call-used-regs [PR100775]
Hi, Richard, Could you please review this patch? This is a fix to the previous -fzero-call-used-regs implementation. PR 100775 ( ICE: in df_exit_block_bitmap_verify, at df-scan.c:4164 with -mthumb -fzero-call-used-regs=used) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100775 Although the ICE only happens on arm, but this is a bug in the middle end. So, I think this bug has higher priority, Need to be included into gcc12, and also need to be back ported to gcc11. In the pass_zero_call_used_regs, when updating dataflow info after adding the register zeroing sequence in the epilogue of the function, we should call "df_update_exit_block_uses" to update the register use information in the exit block to include all the registers that have been zeroed. The change has been bootstrapped and reg-tested on both x86 and aarch64 (with -enable-checking=yes,rtl,df). Since I cannot find an arm machine, no bootstrap and reg-tested on arm yet. For the arm failure, I just tested it with the cross build and it has no issue withe the fix. (One question here: Previously, I though “df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun))” and a later “df_analyze()” should rescan the changed exit block of the function, and update all the df info automatically, it apparently not the case, the register use info at exit block is not automatically updated, we have to add an explicitly call to “df_update_exit_block_uses”. I checked the pass_thread_prologue_and_epilogue, looks like it also explicitly calls “df_update_entry_exit_and_calls” to update the register use info. Shall the “df_set_bb_dirty” + “df_analyze” automatically update the reg use info of the dirty block?). Let me know whether there is any issue with the fix? Thanks Qing === From e1cca5659c85e7c536f5016a2c75c615e65dba75 Mon Sep 17 00:00:00 2001 From: Qing Zhao Date: Fri, 28 Jan 2022 16:29:51 + Subject: [PATCH] middle-end: updating the reg use in exit block for -fzero-call-used-regs [PR100775] In the pass_zero_call_used_regs, when updating dataflow info after adding the register zeroing sequence in the epilogue of the function, we should call "df_update_exit_block_uses" to update the register use information in the exit block to include all the registers that have been zeroed. 2022-01-27 Qing Zhao gcc/ChangeLog: * function.cc (gen_call_used_regs_seq): Call df_update_exit_block_uses when updating df. gcc/testsuite/ChangeLog: * gcc.target/arm/pr100775.c: New test. --- gcc/function.cc | 1 + gcc/testsuite/gcc.target/arm/pr100775.c | 8 2 files changed, 9 insertions(+) create mode 100644 gcc/testsuite/gcc.target/arm/pr100775.c diff --git a/gcc/function.cc b/gcc/function.cc index e1d2565f8d92..c8a77c9a6246 100644 --- a/gcc/function.cc +++ b/gcc/function.cc @@ -5942,6 +5942,7 @@ gen_call_used_regs_seq (rtx_insn *ret, unsigned int zero_regs_type) /* Update the data flow information. */ crtl->must_be_zero_on_return |= zeroed_hardregs; df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun)); + df_update_exit_block_uses (); } } diff --git a/gcc/testsuite/gcc.target/arm/pr100775.c b/gcc/testsuite/gcc.target/arm/pr100775.c new file mode 100644 index ..dd2255a95492 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr100775.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-mthumb -fzero-call-used-regs=used" } */ + +int +foo (int x) +{ + return x; +} -- 2.27.0
Re: [PATCH] rs6000: Add support for vmsumcud and vec_msumc
On Mon, Feb 07, 2022 at 10:06:36PM -0600, Bill Schmidt wrote: > On 2/7/22 5:05 PM, Segher Boessenkool wrote: > > On Mon, Feb 07, 2022 at 04:20:24PM -0600, Bill Schmidt wrote: > >> I observed recently that a couple of Power10 instructions and built-in > >> functions > >> were somehow not implemented. This patch adds one of them (vmsumcud). > >> Although > >> this isn't normally stage-4 material, this is really simple and carries no > >> discernible risk, so I hope it can be considered. > > But what is the advantage? That will be very tiny as well, afaics? > > > > Ah, this implements a builtin as well. But that builtin is not in the > > PVIPR, so no one yet uses it most likely? > > It's in the yet unpublished version of PVIPR that adds ISA 3.1 support, > currently awaiting public review. It should have been implemented with > the rest of the ISA 3.1 built-ins. (There are two more that were missed > as well, which I haven't yet addressed.) Ugh. Too much process, not enough speed. > >> +;; vmsumcud > >> +(define_insn "vmsumcud" > >> +[(set (match_operand:V1TI 0 "register_operand" "+v") > >> + (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v") > >> +(match_operand:V2DI 2 "register_operand" "v") > >> + (match_operand:V1TI 3 "register_operand" "v")] > >> + UNSPEC_VMSUMCUD))] > >> + "TARGET_POWER10" > >> + "vmsumcud %0,%1,%2,%3" > >> + [(set_attr "type" "vecsimple")] > >> +) > > This can be properly described in RTL instead of using an unspec. This > > is much preferable. I would say compare to maddhd[u], but those insns > > aren't implemented either (maddld is though). > > Is it? Note that vmsumcud produces the carry out of the final > result, not the result itself. I couldn't immediately see how > to express this in RTL. It produces thw top 128 bits of the (infinitely precise) result. But yeah that requires an OImode here (for the temp itself), and we do not have that in the backend yet. > The full operation multiplies the corresponding lanes of each > doubleword of arguments 1 and 2, adds them together with the > 128-bit value in argument 3, and produces the carry out of the > result as a 128-bit value in the result. I think I'd need to > have a 256-bit mode to express this properly in RTL, right? Not if you actually calculate the carry, instead of computing the 256-bit result and truncating it. But this is very unwieldy (it would be fine if adding just two datums, but here there are three). Should the type be vecsimple? Don't we have a type for multiplications? Hrm it looks like we use veccomplex usually. Okay for trunk with that taken care of. Thanks! Segher
Re: [PATCH] tree-optimization/104373 - early uninit diagnostic on unreachable code
On 2/8/2022 12:03 AM, Richard Biener via Gcc-patches wrote: The following improves early uninit diagnostics by computing edge reachability using our value-numbering framework in its cheapest mode and ignoring unreachable blocks when looking for uninitialized uses. To not ICE with -fdump-tree-all the early uninit pass needs a dumpfile since VN tries to dump statistics. For gimple-match.c at -O0 -g this causes a 2% increase in compile-time. In theory all early diagnostic passes could benefit from a VN run but that would require more refactoring that's not appropriate at this stage. This patch addresses a GCC 12 diagnostic regression and also happens to fix one XFAIL in gcc.dg/uninit-pr20644-O0.c Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk? Thanks, Richard. 2022-02-04 Richard Biener PR tree-optimization/104373 * tree-ssa-sccvn.h (do_rpo_vn): New export exposing the walk kind. * tree-ssa-sccvn.cc (do_rpo_vn): Export, get the default walk kind as argument. (run_rpo_vn): Adjust. (pass_fre::execute): Likewise. * tree-ssa-uninit.cc (warn_uninitialized_vars): Skip blocks not reachable. (execute_late_warn_uninitialized): Mark all edges as executable. (execute_early_warn_uninitialized): Use VN to compute executable edges. (pass_data_early_warn_uninitialized): Enable a dump file, change dump name to warn_uninit. * g++.dg/warn/Wuninitialized-32.C: New testcase. * gcc.dg/uninit-pr20644-O0.c: Remove XFAIL. I'm conflicted on this ;-) I generally lean on the side of eliminating false positives in these diagnostics. So identifying unreachable blocks and using that to prune the set of warnings we generate, even at -O0 is good from that point of view. But doing something like this has many of the problems that relying on optimizations does, even if we don't optimize away the unreachable code. Right now the warning should be fairly stable at -O0 -- the set of diagnostics you get isn't going to change a lot release to release which is important to some users. Second, at -O0 whether or not you get a warning isn't a function of how good our unreachable code analysis might be. This was quite a contentious topic many years ago. So much that I dropped some work on Wuninit on the floor as I just couldn't take the arguing. So be aware that you might be opening a can of worms. So the question comes down to a design decision. What's more important to the end users? Fewer false positives or better stability in the warning? I think the former, but there's certainly been a vocal group that prefers the latter. On the implementation side I have zero concerns. Looking further out, ISTM we could mark the blocks as unreachable (rather than deducing it from edge flags). That would make it pretty easy to mark those blocks relatively early and allow us to suppress any middle end diagnostic occurring in an unreachable block. jeff
Re: [PATCH, V2] Use system default for long double if not specified on PowerPC.
On 2/4/22 4:26 PM, Segher Boessenkool wrote: > As I said before, I didn't even read the patch, just the one line > summary was enough for a NAK. If the patch in fact does something else, > then it is still incorrect, and needs a very different subject and > summary. > > I hope you see how "using the default of the underlying system" is > questionable in itself, but is something completely different from using > the default of the build compiler, which makes even less sense. > > You want a configure flag to set the default long double format to be > IEEE QP. This cannot be enabled by default until a (big) majority of > systems "in the wild" will work with that (only on powerpc64le-linux > or some *big* thing like that is fine, only default it to enabled there > then). At that point in time, configure shouls complain, and the user > would have to explicitly *disable* it to build without that support. Can you please clarify one thing for me. Do you think it's possible that we can come up with some type of configure patch that automatically sets the long double default given something on the system we can test for or do you think that's impossible and we'll just have to live with explicitly using a configure option to set the default? ...at least until some time in the far future when enough systems/distros have changed to IEEE128 that we can change the default without a test. Peter
[PATCH] rs6000: Fix up vspltis_shifted [PR102140]
Hi! The following testcase ICEs, because (const_vector:V4SI [ (const_int 0 [0]) repeated x3 (const_int -2147483648 [0x8000]) ]) is recognized as valid easy_vector_constant in between split1 pass and end of RA. The problem is that such constants need to be split, and the only splitter for that is: (define_split [(set (match_operand:VM 0 "altivec_register_operand") (match_operand:VM 1 "easy_vector_constant_vsldoi"))] "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()" There is only a single splitting pass before RA, so after that finishes, if something gets matched in between that and end of RA (after that can_create_pseudo_p () would be no longer true), it will never be successfully split and we ICE at final.cc time or earlier. The i386 backend (and a few others) already use (cfun->curr_properties & PROP_rtl_split_insns) as a test for split1 pass finished, so that some insns that should be split during split1 and shouldn't be matched afterwards are properly guarded. So, the following patch does that for vspltis_shifted too. Bootstrapped/regtested on powerpc64le-linux, ok for trunk? 2022-02-08 Jakub Jelinek PR target/102140 * config/rs6000/rs6000.cc (vspltis_shifted): Return false also if split1 pass has finished already. * gcc.dg/pr102140.c: New test. --- gcc/config/rs6000/rs6000.cc.jj 2022-02-07 17:38:20.873123915 +0100 +++ gcc/config/rs6000/rs6000.cc 2022-02-08 14:15:31.619505410 +0100 @@ -6257,8 +6257,11 @@ vspltis_shifted (rtx op) return false; /* We need to create pseudo registers to do the shift, so don't recognize - shift vector constants after reload. */ - if (!can_create_pseudo_p ()) + shift vector constants after reload. Don't match it even before RA + after split1 is done, because there won't be further splitting pass + before RA to do the splitting. */ + if (!can_create_pseudo_p () + || (cfun->curr_properties & PROP_rtl_split_insns)) return false; nunits = GET_MODE_NUNITS (mode); --- gcc/testsuite/gcc.dg/pr102140.c.jj 2022-02-08 14:24:25.839041166 +0100 +++ gcc/testsuite/gcc.dg/pr102140.c 2022-02-08 14:24:03.038359745 +0100 @@ -0,0 +1,23 @@ +/* PR target/102140 */ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-Og -fipa-cp -fno-tree-ccp -fno-tree-ter -Wno-psabi" } */ + +typedef int __attribute__((__vector_size__ (64))) U; +typedef __int128 __attribute__((__vector_size__ (64))) V; + +int a, b; + +static void +bar (char c, V v) +{ + v *= c; + U u = a + (U) v; + (union { U b; }) { u }; + b = 0; +} + +void +foo (void) +{ + bar (1, (V){((__int128) 9223372036854775808ULL) << 64}); +} Jakub
Re: [PATCH 5/4] tree-optimization/104288 - An alternative approach
On 2/8/22 03:32, Richard Biener wrote: On Tue, Feb 8, 2022 at 2:33 AM Andrew MacLeod via Gcc-patches wrote: On 2/7/22 09:29, Andrew MacLeod wrote: I have a proposal for PR 104288. ALL patches (in sequence) bootstrap on their own and each cause no regressions. I've been continuing to work with this towards the GCC13 solution for statement side effects. And There is another possibility we could pursue which is less intrusive I adapted the portions of patch 2/4 which process nonnull, but changes it from being in the nonnull class to being in the cache. THe main difference is it continues to use a single bit, just changing all uses of it to *always* mean its non-null on exit to the block. Range-on-entry is changed to only check dominator blocks, and range_of_expr doesn't check the bit at all.. in both ranger and the cache. When we are doing the block walk and process side effects, the on-entry *cache* range for the block is adjusted to be non-null instead. The statements which precede this stmt have already been processed, and all remaining statements in this block will now pick up this new non-value from the on-entry cache. This should be perfectly safe. The final tweak is when range_of_expr is looking the def block, it normally does not have an on entry cache value. (the def is in the block, so there is no entry cache value). It now checks to see if we have set one, which can only happen when we have been doing the statement walk and set the on-entry cache with a non-null value. This allows us to pick up the non-null range in the def block too... (once we have passed a statement which set nonnull of course). THis patch provides much less change to trunk, and is probably a better approach as its closer to what is going to happen in GCC13. Im still working with it, so the patch isnt fully refined yet... but it bootstraps and passes all test with no regressions. And passes the new tests too. I figured I would mention this before you look to hard at the other patchset.the GCC11 patch doesn't change. Let me know if you prefer this I think I do :-) less churn, and closer to end state. Yes, I very much prefer this - some comments to the other patches still apply to this one. Like using get_nonnull_args and probably adding a bail-out to computing ranges from stmts that can throw internally or have abnormal control flow (to not get into range-on-exit vs. normal vs. exceptional or abnormal edges). Richard. Here's the update which is going thru testing now. I think I covered the comments? Andrew From e3286a6a185952f3acaa89b90beee33149cd7512 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Mon, 7 Feb 2022 15:52:16 -0500 Subject: [PATCH] Register non-null side effects properly. This patch adjusts uses of nonnull to accurately reflect "somewhere in block". It also adds the ability to register statement side effects within a block for ranger which will apply for the rest of the block. PR tree-optimization/104288 gcc/ * gimple-range-cache.cc (non_null_ref::set_nonnull): New. (ranger_cache::range_of_def): Don't check non-null. (ranger_cache::entry_range): Don't check non-null. (ranger_cache::range_on_edge): Check for nonnull on normal edges. (ranger_cache::update_to_nonnull): New. (non_null_loadstore): New. (ranger_cache::block_apply_nonnull): New. * ranger-cache.h (class non_null_ref): Update prototypes. (class ranger_cache): Update prototypes. * gimple-range-path.cc (path_range_query::range_defined_in_block): Do not search dominators. (path_range_query::adjust_for_non_null_uses): Ditto. * gimple-range.cc (gimple_ranger::range_of_expr): Check on-entry for def overrides. Do not check nonnull. (gimple_ranger::range_on_entry): Check dominators for nonnull. (gimple_ranger::range_on_edge): Check for nonnull on normal edges.. * gimple-range.cc (gimple_ranger::register_side_effects): New. * gimple-range.h (gimple_ranger::register_side_effects): New. * tree-vrp.cc (rvrp_folder::fold_stmt): Call register_side_effects. testsuite/gcc/ * gcc.dg/pr104288.c: New. --- gcc/gimple-range-cache.cc | 103 ++-- gcc/gimple-range-cache.h| 3 + gcc/gimple-range-path.cc| 4 +- gcc/gimple-range.cc | 27 - gcc/gimple-range.h | 1 + gcc/testsuite/gcc.dg/pr104288.c | 23 +++ gcc/tree-vrp.cc | 8 ++- 7 files changed, 156 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr104288.c diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc index 16c881b13e1..583ba97ff75 100644 --- a/gcc/gimple-range-cache.cc +++ b/gcc/gimple-range-cache.cc @@ -29,6 +29,10 @@ along with GCC; see the file COPYING3. If not see #include "gimple-pretty-print.h" #include "gimple-range.h" #include "tree-cfg.h" +#include "target.h" +#include "attribs.h" +#include "gimple-iterator.h" +#include "gimple-walk.h" #define DEBUG_RANGE_CACHE (dump_
Re: [PATCH, V2] Use system default for long double if not specified on PowerPC.
On Feb 08 2022, Peter Bergner wrote: > Can you please clarify one thing for me. Do you think it's possible > that we can come up with some type of configure patch that automatically > sets the long double default given something on the system we can test > for or do you think that's impossible and we'll just have to live with > explicitly using a configure option to set the default? It should be handled the same as the double->long double switch. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH] rs6000: Fix up vspltis_shifted [PR102140]
On Tue, Feb 8, 2022 at 12:25 PM Jakub Jelinek wrote: > > Hi! > > The following testcase ICEs, because > (const_vector:V4SI [ > (const_int 0 [0]) repeated x3 > (const_int -2147483648 [0x8000]) > ]) > is recognized as valid easy_vector_constant in between split1 pass and > end of RA. > The problem is that such constants need to be split, and the only > splitter for that is: > (define_split > [(set (match_operand:VM 0 "altivec_register_operand") > (match_operand:VM 1 "easy_vector_constant_vsldoi"))] > "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()" > There is only a single splitting pass before RA, so after that finishes, > if something gets matched in between that and end of RA (after that > can_create_pseudo_p () would be no longer true), it will never be > successfully split and we ICE at final.cc time or earlier. > > The i386 backend (and a few others) already use > (cfun->curr_properties & PROP_rtl_split_insns) > as a test for split1 pass finished, so that some insns that should be split > during split1 and shouldn't be matched afterwards are properly guarded. > > So, the following patch does that for vspltis_shifted too. > > Bootstrapped/regtested on powerpc64le-linux, ok for trunk? Okay. Thanks, David > > 2022-02-08 Jakub Jelinek > > PR target/102140 > * config/rs6000/rs6000.cc (vspltis_shifted): Return false also if > split1 pass has finished already. > > * gcc.dg/pr102140.c: New test. > > --- gcc/config/rs6000/rs6000.cc.jj 2022-02-07 17:38:20.873123915 +0100 > +++ gcc/config/rs6000/rs6000.cc 2022-02-08 14:15:31.619505410 +0100 > @@ -6257,8 +6257,11 @@ vspltis_shifted (rtx op) > return false; > >/* We need to create pseudo registers to do the shift, so don't recognize > - shift vector constants after reload. */ > - if (!can_create_pseudo_p ()) > + shift vector constants after reload. Don't match it even before RA > + after split1 is done, because there won't be further splitting pass > + before RA to do the splitting. */ > + if (!can_create_pseudo_p () > + || (cfun->curr_properties & PROP_rtl_split_insns)) > return false; > >nunits = GET_MODE_NUNITS (mode); > --- gcc/testsuite/gcc.dg/pr102140.c.jj 2022-02-08 14:24:25.839041166 +0100 > +++ gcc/testsuite/gcc.dg/pr102140.c 2022-02-08 14:24:03.038359745 +0100 > @@ -0,0 +1,23 @@ > +/* PR target/102140 */ > +/* { dg-do compile { target int128 } } */ > +/* { dg-options "-Og -fipa-cp -fno-tree-ccp -fno-tree-ter -Wno-psabi" } */ > + > +typedef int __attribute__((__vector_size__ (64))) U; > +typedef __int128 __attribute__((__vector_size__ (64))) V; > + > +int a, b; > + > +static void > +bar (char c, V v) > +{ > + v *= c; > + U u = a + (U) v; > + (union { U b; }) { u }; > + b = 0; > +} > + > +void > +foo (void) > +{ > + bar (1, (V){((__int128) 9223372036854775808ULL) << 64}); > +} > > Jakub >
Re: [Patch, fortran] PR37336 (Finalization) - [F03] Finish derived-type finalization
Hi Paul, Am 08.02.22 um 12:22 schrieb Paul Richard Thomas via Fortran: Hi Harald, Thanks for giving the patch a whirl. the parent components as an array. I strongly suspect that, from reading 7.5.6.2 paragraphs 2 and 3 closely, that ifort has it right. However, this is another issue to come back to in the future. Could you specify which version of Intel you tried? ifort (IFORT) 2021.1 Beta 20201112 ok, that's good to know. Testcase finalize_38.f90 fails for me with ifort 2021.5.0 with: 131 This test also fails with crayftn 11 & 12 and nagfor 7.0, but in a different place. I have run your modified version of finalize_38.f90, and now I see that you can get a bloody head just from scratching too much... crayftn 12.0.2: 1, 3, 1 2, 21, 0 11, 3, 2 12, 21, 1 21, 4, 3 23, 21, 22 | 42, 43 31, 6, 4 41, 7, 5 51, 9, 7 61, 10, 8 71, 13, 10 101, 2, 1 102, 4, 3 111, 3, 2 121, 4, 2 122, 0, 4 123, 5, 6 | 2*0 131, 5, 2 132, 0, 4 133, 7, 8 | 2*0 141, 6, 3 151, 10, 5 161, 16, 9 171, 18, 11 175, 0., 20. | 10., 20. nagfor 7.0: 1 0 1 11 1 2 23 21 22 | 42 43 71 9 10 72 11 99 131 3 2 132 5 4 141 4 3 151 6 5 161 10 9 171 12 11 Intel 2021.5.0: 131 3 2 132 0 4 133 5 6 | 0 0 141 4 3 151 7 5 152 3 0 153 0 0 | 1 3 forrtl: severe (174): SIGSEGV, segmentation fault occurred [...] That got me reading 7.5.6.3, where is says in paragraph 1: "When an intrinsic assignment statement is executed (10.2.1.3), if the variable is not an unallocated allocatable variable, it is finalized after evaluation of expr and before the definition of the variable. ..." Looking at the beginning of the testcase code (abridged): type(simple), allocatable :: MyType, MyType2 type(simple) :: ThyType = simple(21), ThyType2 = simple(22) ! The original PR - one finalization of 'var' before (re)allocation. MyType = ThyType call test(1, 0, [0,0], 0) This is an intrinsic assignment. Naively I would expect MyType to be initially unallocated. ThyType is not allocatable and non-pointer and cannot become undefined here and would not play any role in finalization. I am probably too blind-sighted to see why there should be a finalization here. What am I missing? Could you use the attached version of finalize_38.f90 with crayftn and NAG? All the stop statements are replaced with prints. Ifort gives: 131 3 2 132 0 4 133 5 6 | 0 0 141 4 3 151 7 5 152 3 0 153 0 0 | 1 3 161 13 9 162 20 0 163 0 0 | 10 20 171 14 11 I think it is a good idea to have these prints in the testcase whenever there is a departure from expectations. So print&stop? Furthermore, for the sake of health of people reading the testcases later, I think it would not harm to add more explanations why we expect a certain behavior... ;-) Best regards Paul Best regards, Harald
Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth
On Tue, 8 Feb 2022, Richard Biener via Gcc-patches wrote: > which I think is OK? That is, when the abstract machine > arrives at 'int i;' then the previous content in 'i' goes > away? Or would Yes, that's correct. "If an initialization is specified for the object, it is performed each time the declaration or compound literal is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached.". -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] rs6000: Add support for vmsumcud and vec_msumc
On 2/8/22 9:45 AM, Segher Boessenkool wrote: > On Mon, Feb 07, 2022 at 10:06:36PM -0600, Bill Schmidt wrote: >> On 2/7/22 5:05 PM, Segher Boessenkool wrote: >>> On Mon, Feb 07, 2022 at 04:20:24PM -0600, Bill Schmidt wrote: I observed recently that a couple of Power10 instructions and built-in functions were somehow not implemented. This patch adds one of them (vmsumcud). Although this isn't normally stage-4 material, this is really simple and carries no discernible risk, so I hope it can be considered. >>> But what is the advantage? That will be very tiny as well, afaics? >>> >>> Ah, this implements a builtin as well. But that builtin is not in the >>> PVIPR, so no one yet uses it most likely? >> It's in the yet unpublished version of PVIPR that adds ISA 3.1 support, >> currently awaiting public review. It should have been implemented with >> the rest of the ISA 3.1 built-ins. (There are two more that were missed >> as well, which I haven't yet addressed.) > Ugh. Too much process, not enough speed. > +;; vmsumcud +(define_insn "vmsumcud" +[(set (match_operand:V1TI 0 "register_operand" "+v") + (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v") +(match_operand:V2DI 2 "register_operand" "v") + (match_operand:V1TI 3 "register_operand" "v")] + UNSPEC_VMSUMCUD))] + "TARGET_POWER10" + "vmsumcud %0,%1,%2,%3" + [(set_attr "type" "vecsimple")] +) >>> This can be properly described in RTL instead of using an unspec. This >>> is much preferable. I would say compare to maddhd[u], but those insns >>> aren't implemented either (maddld is though). >> Is it? Note that vmsumcud produces the carry out of the final >> result, not the result itself. I couldn't immediately see how >> to express this in RTL. > It produces thw top 128 bits of the (infinitely precise) result. But > yeah that requires an OImode here (for the temp itself), and we do not > have that in the backend yet. > >> The full operation multiplies the corresponding lanes of each >> doubleword of arguments 1 and 2, adds them together with the >> 128-bit value in argument 3, and produces the carry out of the >> result as a 128-bit value in the result. I think I'd need to >> have a 256-bit mode to express this properly in RTL, right? > Not if you actually calculate the carry, instead of computing the > 256-bit result and truncating it. But this is very unwieldy (it > would be fine if adding just two datums, but here there are three). > > Should the type be vecsimple? Don't we have a type for multiplications? > Hrm it looks like we use veccomplex usually. > > Okay for trunk with that taken care of. Thanks! Thanks! Revised as requested and pushed as r12-7110 (943d631abdd7be623c). Bill > > > Segher
Re: [PATCH] c++: Remove superflous assert [PR104403]
On 2/8/22 03:16, Jakub Jelinek wrote: Hi! I've added the assert because start_decl diagnoses such vars for C++20 and earlier: if (current_function_decl && VAR_P (decl) && DECL_DECLARED_CONSTEXPR_P (current_function_decl) && cxx_dialect < cxx23) but as can be seen, we cam trigger the assert in older standards e.g. during non-manifestly constant evaluation. Rather than refining the assert that DECL_EXPRs for such vars don't appear for C++20 and older if they are inside of functions declared constexpr this patch just removes the assert, the code rejects encountering those vars in constant expressions anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. 2022-02-08 Jakub Jelinek PR c++/104403 * constexpr.cc (cxx_eval_constant_expression): Don't assert DECL_EXPRs of TREE_STATIC vars may only appear in -std=c++23. * g++.dg/cpp0x/lambda/lambda-104403.C: New test. --- gcc/cp/constexpr.cc.jj 2022-02-06 11:16:22.353814431 +0100 +++ gcc/cp/constexpr.cc 2022-02-07 11:38:59.131165171 +0100 @@ -6652,7 +6652,6 @@ cxx_eval_constant_expression (const cons /* Allow __FUNCTION__ etc. */ && !DECL_ARTIFICIAL (r)) { - gcc_assert (cxx_dialect >= cxx23); if (!ctx->quiet) { if (CP_DECL_THREAD_LOCAL_P (r)) --- gcc/testsuite/g++.dg/cpp0x/lambda/lambda-104403.C.jj2022-02-07 11:40:32.760847269 +0100 +++ gcc/testsuite/g++.dg/cpp0x/lambda/lambda-104403.C 2022-02-07 11:40:06.109223079 +0100 @@ -0,0 +1,8 @@ +// PR c++/104403 +// { dg-do compile { target c++11 } } + +int +main () +{ + []{ switch (0) { case 0: static int value; return &value; } }; +} Jakub
Re: [PATCH] c++: Don't emit repeated -Wshadow warnings for templates/ctors [PR104379]
On 2/8/22 03:22, Jakub Jelinek wrote: Hi! The following patch suppresses extraneous -Wshadow warnings. On the testcase without the patch we emit 14 -Wshadow warnings, with the patch just 4. It is enough to warn once e.g. during parsing of the template or the abstract ctor, while previously we'd warn also on the clones of the ctors and on instantiation. In GCC 8 and earlier we didn't warn because check_local_shadow did /* Inline decls shadow nothing. */ if (DECL_FROM_INLINE (decl)) return; Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. 2022-02-08 Jakub Jelinek PR c++/104379 * name-lookup.cc (check_local_shadow): When diagnosing shadowing of a member or global declaration, add warning suppression for the decl and don't warn again on it. * g++.dg/warn/Wshadow-18.C: New test. --- gcc/cp/name-lookup.cc.jj2022-01-18 11:58:59.0 +0100 +++ gcc/cp/name-lookup.cc 2022-02-07 15:00:19.781820426 +0100 @@ -3296,18 +3296,22 @@ check_local_shadow (tree decl) /* Warn if a variable shadows a non-function, or the variable is a function or a pointer-to-function. */ - if (!OVL_P (member) - || TREE_CODE (decl) == FUNCTION_DECL - || (TREE_TYPE (decl) - && (TYPE_PTRFN_P (TREE_TYPE (decl)) - || TYPE_PTRMEMFUNC_P (TREE_TYPE (decl) + if ((!OVL_P (member) +|| TREE_CODE (decl) == FUNCTION_DECL +|| (TREE_TYPE (decl) +&& (TYPE_PTRFN_P (TREE_TYPE (decl)) +|| TYPE_PTRMEMFUNC_P (TREE_TYPE (decl) + && !warning_suppressed_p (decl, OPT_Wshadow)) { auto_diagnostic_group d; if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wshadow, "declaration of %qD shadows a member of %qT", decl, current_nonlambda_class_type ()) && DECL_P (member)) - inform_shadowed (member); + { + inform_shadowed (member); + suppress_warning (decl, OPT_Wshadow); + } } return; } @@ -3319,14 +3323,18 @@ check_local_shadow (tree decl) || (TREE_CODE (old) == TYPE_DECL && (!DECL_ARTIFICIAL (old) || TREE_CODE (decl) == TYPE_DECL))) - && !instantiating_current_function_p ()) + && !instantiating_current_function_p () + && !warning_suppressed_p (decl, OPT_Wshadow)) /* XXX shadow warnings in outer-more namespaces */ { auto_diagnostic_group d; if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wshadow, "declaration of %qD shadows a global declaration", decl)) - inform_shadowed (old); + { + inform_shadowed (old); + suppress_warning (decl, OPT_Wshadow); + } return; } --- gcc/testsuite/g++.dg/warn/Wshadow-18.C.jj 2022-02-07 15:01:56.509456240 +0100 +++ gcc/testsuite/g++.dg/warn/Wshadow-18.C 2022-02-07 15:03:34.951067880 +0100 @@ -0,0 +1,22 @@ +// PR c++/104379 +// { dg-do compile } +// { dg-options "-Wshadow" } + +int x; + +template +struct S +{ + int i; + S(int i) { (void) i; } // { dg-warning "declaration of 'i' shadows a member of 'S'" } + S(float x) { (void) x; } // { dg-warning "declaration of 'x' shadows a global declaration" } + S(int *p) { int a = 1; (void) p; (void) a; + { int a = 2; (void) a; } }// { dg-warning "declaration of 'a' shadows a previous local" } +}; + +S i(1); +S j(1); +S k(1.0f); +S l(1.0f); +S m(&x); +S n(&x); Jakub
Re: [PATCH] dwarf2out: Don't call expand_expr during early_dwarf [PR104407]
On 2/8/22 03:35, Richard Biener wrote: On Tue, 8 Feb 2022, Jakub Jelinek wrote: Hi! As mentioned in the PR, since PR96690 r11-2834 we call rtl_for_decl_init which can call expand_expr already during early_dwarf. The comment and PR explains it that the intent is to ensure the referenced vars and functions are properly mangled because free_lang_data doesn't cover everything, like template parameters etc. It doesn't work well though, because expand_expr free_lang_data mangling also only runs when generating LTO bytecode... can set DECL_RTLs e.g. on referenced vars and keep them there, and they can be created e.g. with different MEM_ALIGN compared to what they would be created with if they were emitted later. So, the following patch stops calling rtl_for_decl_init and instead for cases for which rtl_for_decl_init does anything at all walks the initializer and ensures referenced vars or functions are mangled. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-02-08 Jakub Jelinek PR debug/104407 * dwarf2out.cc (mangle_referenced_decls): New function. (tree_add_const_value_attribute): Don't call rtl_for_decl_init if early_dwarf. Instead walk the initializer and try to mangle vars or functions referenced from it. * g++.dg/debug/dwarf2/pr104407.C: New test. --- gcc/dwarf2out.cc.jj 2022-02-04 14:36:54.966605844 +0100 +++ gcc/dwarf2out.cc2022-02-07 12:28:13.000520666 +0100 @@ -20881,6 +20881,19 @@ add_location_or_const_value_attribute (d return tree_add_const_value_attribute_for_decl (die, decl); } +/* Mangle referenced decls. */ +static tree +mangle_referenced_decls (tree *tp, int *walk_subtrees, void *) +{ + if (! EXPR_P (*tp) && ! CONSTANT_CLASS_P (*tp)) +*walk_subtrees = 0; + + if (VAR_OR_FUNCTION_DECL_P (*tp)) +assign_assembler_name_if_needed (*tp); + + return NULL_TREE; +} + /* Attach a DW_AT_const_value attribute to DIE. The value of the attribute is the const value T. */ @@ -20889,7 +20902,6 @@ tree_add_const_value_attribute (dw_die_r { tree init; tree type = TREE_TYPE (t); - rtx rtl; if (!t || !TREE_TYPE (t) || TREE_TYPE (t) == error_mark_node) return false; @@ -20910,11 +20922,26 @@ tree_add_const_value_attribute (dw_die_r return true; } } - /* Generate the RTL even if early_dwarf to force mangling of all refered to - symbols. */ - rtl = rtl_for_decl_init (init, type); - if (rtl && !early_dwarf) -return add_const_value_attribute (die, TYPE_MODE (type), rtl); + if (!early_dwarf) +{ + rtx rtl = rtl_for_decl_init (init, type); + if (rtl && !early_dwarf) the !early_dwarf is redundant Otherwise looks good to me, please give Jason the opportunity to comment though. Thanks, Richard. + return add_const_value_attribute (die, TYPE_MODE (type), rtl); +} + else +{ + /* For early_dwarf force mangling of all referenced symbols. */ + tree initializer = init; + STRIP_NOPS (initializer); + /* rtl_for_decl_init punts on other aggregates, and complex values. */ + if (AGGREGATE_TYPE_P (type) + || (TREE_CODE (initializer) == VIEW_CONVERT_EXPR + && AGGREGATE_TYPE_P (TREE_TYPE (TREE_OPERAND (initializer, 0 + || TREE_CODE (type) == COMPLEX_TYPE) + ; Hmm, I don't think we need to mirror the limitations of rtl_for_decl_init here; IMO we might as well go ahead mangle everything so we don't need to change this place as well when the FIXME is resolved. OK either way, but if you decide to keep this condition please update the FIXME in rtl_for_decl_init to point here. + else if (initializer_constant_valid_p (initializer, type)) + walk_tree (&initializer, mangle_referenced_decls, NULL, NULL); +} /* If the host and target are sane, try harder. */ if (CHAR_BIT == 8 && BITS_PER_UNIT == 8 && initializer_constant_valid_p (init, type)) --- gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C.jj 2022-02-07 10:45:51.945041031 +0100 +++ gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C2022-02-07 10:45:09.834635340 +0100 @@ -0,0 +1,12 @@ +// PR debug/104407 +// { dg-do compile { target c++17 } } +// { dg-options "-O1 -fcompare-debug" } + +struct A { int i; long j; int k : 2; char l; } a; + +auto [ aa, bb, cc, dd ] = a; + +namespace N +{ + auto & [ m, n, o, ppp ] = a; +} Jakub
Re: [PATCH] dwarf2out: Don't call expand_expr during early_dwarf [PR104407]
On Tue, Feb 08, 2022 at 02:20:28PM -0500, Jason Merrill wrote: > > > + return add_const_value_attribute (die, TYPE_MODE (type), rtl); > > > +} > > > + else > > > +{ > > > + /* For early_dwarf force mangling of all referenced symbols. */ > > > + tree initializer = init; > > > + STRIP_NOPS (initializer); > > > + /* rtl_for_decl_init punts on other aggregates, and complex > > > values. */ > > > + if (AGGREGATE_TYPE_P (type) > > > + || (TREE_CODE (initializer) == VIEW_CONVERT_EXPR > > > + && AGGREGATE_TYPE_P (TREE_TYPE (TREE_OPERAND (initializer, 0 > > > + || TREE_CODE (type) == COMPLEX_TYPE) > > > + ; > > Hmm, I don't think we need to mirror the limitations of rtl_for_decl_init > here; IMO we might as well go ahead mangle everything so we don't need to > change this place as well when the FIXME is resolved. OK either way, but if > you decide to keep this condition please update the FIXME in > rtl_for_decl_init to point here. I did that because the aggregate case can mean very large initializers that would need mangling a lot of symbols even when we actually don't need that because we'd punt on it, and because I'm a little bit afraid how safe the mangling is for -fcompare-debug if done only for the -g case (I vaguely remember it can try to instantiate some templates etc.). Can certainly adjust the FIXME comment. Jakub
[PATCH] Fix PR 101515 (ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128)
Hi, This is the patch to fix PR101515 (ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101515 It's possible that the TYPE_NAME of a record_type is NULL, therefore when printing the TYPE_NAME, we should check and handle this special case. Please see the comment of pr101515 for more details. The fix is very simple, just check and special handle cases when TYPE_NAME is NULL. Bootstrapped and regression tested on both x86 and aarch64, no issues. Okay for commit? Thanks. Qing = >From f37ee8d21b80cb77d8108cb97a487c84c530545b Mon Sep 17 00:00:00 2001 From: Qing Zhao Date: Tue, 8 Feb 2022 16:10:37 + Subject: [PATCH] Fix PR 101515 ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128. It's possible that the TYPE_NAME of a record_type is NULL, therefore when printing the TYPE_NAME, we should check and handle this special case. gcc/cp/ChangeLog: * cxx-pretty-print.cc (pp_cxx_unqualified_id): Check and handle the case when TYPE_NAME is NULL. gcc/testsuite/ChangeLog: * g++.dg/pr101515.C: New test. --- gcc/cp/cxx-pretty-print.cc | 5 - gcc/testsuite/g++.dg/pr101515.C | 25 + 2 files changed, 29 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/pr101515.C diff --git a/gcc/cp/cxx-pretty-print.cc b/gcc/cp/cxx-pretty-print.cc index 4f9a090e520d..744ed0add5ba 100644 --- a/gcc/cp/cxx-pretty-print.cc +++ b/gcc/cp/cxx-pretty-print.cc @@ -171,7 +171,10 @@ pp_cxx_unqualified_id (cxx_pretty_printer *pp, tree t) case ENUMERAL_TYPE: case TYPENAME_TYPE: case UNBOUND_CLASS_TEMPLATE: - pp_cxx_unqualified_id (pp, TYPE_NAME (t)); + if (TYPE_NAME (t)) + pp_cxx_unqualified_id (pp, TYPE_NAME (t)); + else + pp_string (pp, ""); if (tree ti = TYPE_TEMPLATE_INFO_MAYBE_ALIAS (t)) if (PRIMARY_TEMPLATE_P (TI_TEMPLATE (ti))) { diff --git a/gcc/testsuite/g++.dg/pr101515.C b/gcc/testsuite/g++.dg/pr101515.C new file mode 100644 index ..898c7e003c22 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr101515.C @@ -0,0 +1,25 @@ +/* PR101515 - ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128 + { dg-do compile } + { dg-options "-Wuninitialized -O1" } */ + +struct S +{ + int j; +}; +struct T : public S +{ + virtual void h () {} +}; +struct ptrmemfunc +{ + void (*ptr) (); +}; +typedef void (S::*sp)(); +int main () +{ + T t; + sp x; + ptrmemfunc *xp = (ptrmemfunc *) &x; + if (xp->ptr != ((void (*)())(sizeof(void * /* { dg-warning "is used uninitialized" } */ +return 1; +} -- 2.27.0
Re: [PATCH v3] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]
Hi! >From some discussion today, I think we want to limit the scope of this patch to just the power8-fusion flag that's causing trouble for now, given stage 4. We've talked about making power8-fusion a do- nothing flag, since it doesn't add much benefit now and probably shouldn't be a separate flag anyway. Having it as a meaningless flag makes it more palatable to add an exception for it in the inlining path. Others, feel free to weigh in. Thanks, Bill On 1/5/22 1:34 AM, Kewen.Lin wrote: > Hi, > > This patch is to fix the inconsistent behaviors for non-LTO mode > and LTO mode. As Martin pointed out, currently the function > rs6000_can_inline_p simply makes it inlinable if callee_tree is > NULL, but it's unexpected, we should use the command line options > from target_option_default_node as default. > > It replaces rs6000_isa_flags with target_option_default_node when > caller_tree is NULL since it's more straightforward and doesn't > suffer from some bug not to keep rs6000_isa_flags as default. > > It also extends the scope of the check for the case that callee > has explicit set options, inlining in test case pr102059-5.c can > happen unexpectedly before, it's fixed accordingly. > > As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION > can be neglected for always inlining, this patch also takes some > flags when the callee is attributed by always_inline. > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html > v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586112.html > > This patch is one re-post of this updated version[1] and also > rebased and adjusted on top of the related commit r12-6219. > > Bootstrapped and regtested on powerpc64-linux-gnu P8 and > powerpc64le-linux-gnu P9 and P10. > > Is it ok for trunk? > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586296.html > > BR, > Kewen > - > gcc/ChangeLog: > > PR target/102059 > * config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with > target_option_default_node and consider always_inline_safe flags. > > gcc/testsuite/ChangeLog: > > PR target/102059 > * gcc.target/powerpc/pr102059-4.c: New test. > * gcc.target/powerpc/pr102059-5.c: New test. > * gcc.target/powerpc/pr102059-6.c: New test. > * gcc.target/powerpc/pr102059-7.c: New test. > * gcc.target/powerpc/pr102059-8.c: New test. > * gcc.dg/lto/pr102059-1_0.c: Remove unneeded option. > >
[PATCH V2] tree-optimization/104288 - Register non-null side effects properly.
On 2/8/22 03:32, Richard Biener wrote: On Tue, Feb 8, 2022 at 2:33 AM Andrew MacLeod via Gcc-patches wrote: On 2/7/22 09:29, Andrew MacLeod wrote: I have a proposal for PR 104288. ALL patches (in sequence) bootstrap on their own and each cause no regressions. I've been continuing to work with this towards the GCC13 solution for statement side effects. And There is another possibility we could pursue which is less intrusive I adapted the portions of patch 2/4 which process nonnull, but changes it from being in the nonnull class to being in the cache. THe main difference is it continues to use a single bit, just changing all uses of it to *always* mean its non-null on exit to the block. Range-on-entry is changed to only check dominator blocks, and range_of_expr doesn't check the bit at all.. in both ranger and the cache. When we are doing the block walk and process side effects, the on-entry *cache* range for the block is adjusted to be non-null instead. The statements which precede this stmt have already been processed, and all remaining statements in this block will now pick up this new non-value from the on-entry cache. This should be perfectly safe. The final tweak is when range_of_expr is looking the def block, it normally does not have an on entry cache value. (the def is in the block, so there is no entry cache value). It now checks to see if we have set one, which can only happen when we have been doing the statement walk and set the on-entry cache with a non-null value. This allows us to pick up the non-null range in the def block too... (once we have passed a statement which set nonnull of course). THis patch provides much less change to trunk, and is probably a better approach as its closer to what is going to happen in GCC13. Im still working with it, so the patch isnt fully refined yet... but it bootstraps and passes all test with no regressions. And passes the new tests too. I figured I would mention this before you look to hard at the other patchset.the GCC11 patch doesn't change. Let me know if you prefer this I think I do :-) less churn, and closer to end state. Yes, I very much prefer this - some comments to the other patches still apply to this one. Like using get_nonnull_args and probably adding a bail-out to computing ranges from stmts that can throw internally or have abnormal control flow (to not get into range-on-exit vs. normal vs. exceptional or abnormal edges). Richard. with some minor performance tweaks, such as moving adjust_range() to the header so it can be inlined . range-on-edge now applies the non-null from the src block if appropriate, not in range-on-exit. That should resolve the internal throwing statements I think. and I have switched over to get_nonnull_args(). Bootstraps on build-x86_64-pc-linux-gnu and passes all regressions. OK for trunk, or did I miss something? Andrew PS. odd.. I haven't seen the git diff be wrong before, but it shows the ranger_cache::range_on_edge changes as being in gimple_cache::range_of_expr... They are most definitely are not and it applies/de-applies fine.. so its just an oddity I guess. From 2f38b3de7958d83dab383c73029aedbd108d8930 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Mon, 7 Feb 2022 15:52:16 -0500 Subject: [PATCH] Register non-null side effects properly. This patch adjusts uses of nonnull to accurately reflect "somewhere in block". It also adds the ability to register statement side effects within a block for ranger which will apply for the rest of the block. PR tree-optimization/104288 gcc/ * gimple-range-cache.cc (non_null_ref::set_nonnull): New. (non_null_ref::adjust_range): Move to header. (ranger_cache::range_of_def): Don't check non-null. (ranger_cache::entry_range): Don't check non-null. (ranger_cache::range_on_edge): Check for nonnull on normal edges. (ranger_cache::update_to_nonnull): New. (non_null_loadstore): New. (ranger_cache::block_apply_nonnull): New. * ranger-cache.h (class non_null_ref): Update prototypes. (non_null_ref::adjust_range): Move to here and inline. (class ranger_cache): Update prototypes. * gimple-range-path.cc (path_range_query::range_defined_in_block): Do not search dominators. (path_range_query::adjust_for_non_null_uses): Ditto. * gimple-range.cc (gimple_ranger::range_of_expr): Check on-entry for def overrides. Do not check nonnull. (gimple_ranger::range_on_entry): Check dominators for nonnull. (gimple_ranger::range_on_edge): Check for nonnull on normal edges.. * gimple-range.cc (gimple_ranger::register_side_effects): New. * gimple-range.h (gimple_ranger::register_side_effects): New. * tree-vrp.cc (rvrp_folder::fold_stmt): Call register_side_effects. testsuite/gcc/ * gcc.dg/pr104288.c: New. --- gcc/gimple-range-cache.cc | 135 gcc/gimple-range-cache.h| 31 gcc/gimple-range-path.cc| 4 +- gcc/gimple-range.cc
Re: [PATCH] libstdc++: Decouple HAVE_FCNTL_H from HAVE_DIRENT_H check
On Mon, Feb 07, 2022 at 09:05:45PM +, Jonathan Wakely wrote: > On Mon, 7 Feb 2022 at 21:01, Jonathan Wakely wrote: > > > > > > > On Mon, 7 Feb 2022 at 20:12, Dimitar Dimitrov wrote: > > > >> On PRU target with newlib, we have the following combination in config.h: > >> /* #undef HAVE_DIRENT_H */ > >> #define HAVE_FCNTL_H 1 > >> #define HAVE_UNLINKAT 1 > >> > >> In newlib, targets which do not define dirent.h, get a build error when > >> including : > >> > >> https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/include/sys/dirent.h;hb=HEAD > >> > >> While fs_dir.cc correctly checks for HAVE_FCNTL_H, dir-common.h doesn't, > >> and instead uses HAVE_DIRENT_H. This results in unlinkat() function call > >> in fs_dir.cc without the needed include in dir-common.h. Thus > >> a build failure: > >> .../gcc/libstdc++-v3/src/c++17/fs_dir.cc:151:11: error: ‘::unlinkat’ > >> has not been declared; did you mean ‘unlink’? > >> > >> Fix by encapsulating include with the correct check. > >> > > > > But there's no point doing anything in that file if we don't have > > , the whole thing is unusable. There's no point making the > > members using unlinkat compile if you can't ever construct the type. > > > > So I think we want a different fix. > > > > > Maybe something like: > > --- a/libstdc++-v3/src/filesystem/dir-common.h > +++ b/libstdc++-v3/src/filesystem/dir-common.h > @@ -70,6 +70,8 @@ struct DIR { }; > inline DIR* opendir(const char*) { return nullptr; } > inline dirent* readdir(DIR*) { return nullptr; } > inline int closedir(DIR*) { return -1; } > +#undef _GLIBCXX_HAVE_DIRFD > +#undef _GLIBCXX_HAVE_UNLINKAT > #endif > } // namespace __gnu_posix Yes, this fixes the PRU target, and does not regress x86_64-pc-linux-gnu. Regards, Dimitar
[committed] libstdc++: Simplify resource management in directory iterators
Tested powerpc64le-linux, pushed to trunk. This replaces the _Dir constructor that takes ownership of an existing DIR* resource with one that takes a _Dir_base rvalue instead. This means a raw DIR* is never passed around, but is always owned by a _Dir_base object. libstdc++-v3/ChangeLog: * src/c++17/fs_dir.cc (_Dir(DIR*, const path&)): Change first parameter to _Dir_base&&. * src/filesystem/dir-common.h (_Dir_base(DIR*)): Remove. * src/filesystem/dir.cc (_Dir(DIR*, const path&)): Change first parameter to _Dir_base&&. --- libstdc++-v3/src/c++17/fs_dir.cc | 4 ++-- libstdc++-v3/src/filesystem/dir-common.h | 2 -- libstdc++-v3/src/filesystem/dir.cc | 4 ++-- 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/libstdc++-v3/src/c++17/fs_dir.cc b/libstdc++-v3/src/c++17/fs_dir.cc index 54f135d2baf..c67fe76bc14 100644 --- a/libstdc++-v3/src/c++17/fs_dir.cc +++ b/libstdc++-v3/src/c++17/fs_dir.cc @@ -57,7 +57,7 @@ struct fs::_Dir : _Dir_base path = p; } - _Dir(posix::DIR* dirp, const path& p) : _Dir_base(dirp), path(p) { } + _Dir(_Dir_base&& d, const path& p) : _Dir_base(std::move(d)), path(p) { } _Dir(_Dir&&) = default; @@ -140,7 +140,7 @@ struct fs::_Dir : _Dir_base _Dir_base d(dirfd, pathname, skip_permission_denied, nofollow, ec); // If this->path is empty, the new _Dir should have an empty path too. const fs::path& p = this->path.empty() ? this->path : this->entry.path(); -return _Dir(std::exchange(d.dirp, nullptr), p); +return _Dir(std::move(d), p); } bool diff --git a/libstdc++-v3/src/filesystem/dir-common.h b/libstdc++-v3/src/filesystem/dir-common.h index 0b7665a3f70..511b988f1c7 100644 --- a/libstdc++-v3/src/filesystem/dir-common.h +++ b/libstdc++-v3/src/filesystem/dir-common.h @@ -89,8 +89,6 @@ is_permission_denied_error(int e) struct _Dir_base { - _Dir_base(posix::DIR* dirp = nullptr) : dirp(dirp) { } - // If no error occurs then dirp is non-null, // otherwise null (even if a permission denied error is ignored). _Dir_base(int fd, const posix::char_type* pathname, diff --git a/libstdc++-v3/src/filesystem/dir.cc b/libstdc++-v3/src/filesystem/dir.cc index e838b4bc6bf..b451902c4a1 100644 --- a/libstdc++-v3/src/filesystem/dir.cc +++ b/libstdc++-v3/src/filesystem/dir.cc @@ -59,7 +59,7 @@ struct fs::_Dir : std::filesystem::_Dir_base path = p; } - _Dir(posix::DIR* dirp, const path& p) : _Dir_base(dirp), path(p) { } + _Dir(_Dir_base&& d, const path& p) : _Dir_base(std::move(d)), path(p) { } _Dir(_Dir&&) = default; @@ -133,7 +133,7 @@ struct fs::_Dir : std::filesystem::_Dir_base { auto [dirfd, pathname] = dir_and_pathname(); _Dir_base d(dirfd, pathname, skip_permission_denied, nofollow, ec); -return _Dir(std::exchange(d.dirp, nullptr), entry.path()); +return _Dir(std::move(d), entry.path()); } fs::path path; -- 2.34.1
Re: [PATCH] libstdc++: Decouple HAVE_FCNTL_H from HAVE_DIRENT_H check
On Tue, 8 Feb 2022 at 21:02, Dimitar Dimitrov wrote: > On Mon, Feb 07, 2022 at 09:05:45PM +, Jonathan Wakely wrote: > > On Mon, 7 Feb 2022 at 21:01, Jonathan Wakely > wrote: > > > > > > > > > > > On Mon, 7 Feb 2022 at 20:12, Dimitar Dimitrov > wrote: > > > > > >> On PRU target with newlib, we have the following combination in > config.h: > > >> /* #undef HAVE_DIRENT_H */ > > >> #define HAVE_FCNTL_H 1 > > >> #define HAVE_UNLINKAT 1 > > >> > > >> In newlib, targets which do not define dirent.h, get a build error > when > > >> including : > > >> > > >> > https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/include/sys/dirent.h;hb=HEAD > > >> > > >> While fs_dir.cc correctly checks for HAVE_FCNTL_H, dir-common.h > doesn't, > > >> and instead uses HAVE_DIRENT_H. This results in unlinkat() function > call > > >> in fs_dir.cc without the needed include in dir-common.h. > Thus > > >> a build failure: > > >> .../gcc/libstdc++-v3/src/c++17/fs_dir.cc:151:11: error: ‘::unlinkat’ > > >> has not been declared; did you mean ‘unlink’? > > >> > > >> Fix by encapsulating include with the correct check. > > >> > > > > > > But there's no point doing anything in that file if we don't have > > > , the whole thing is unusable. There's no point making the > > > members using unlinkat compile if you can't ever construct the type. > > > > > > So I think we want a different fix. > > > > > > > > > Maybe something like: > > > > --- a/libstdc++-v3/src/filesystem/dir-common.h > > +++ b/libstdc++-v3/src/filesystem/dir-common.h > > @@ -70,6 +70,8 @@ struct DIR { }; > > inline DIR* opendir(const char*) { return nullptr; } > > inline dirent* readdir(DIR*) { return nullptr; } > > inline int closedir(DIR*) { return -1; } > > +#undef _GLIBCXX_HAVE_DIRFD > > +#undef _GLIBCXX_HAVE_UNLINKAT > > #endif > > } // namespace __gnu_posix > Yes, this fixes the PRU target, and does not regress > x86_64-pc-linux-gnu. > Thanks for checking it. I'm just testing it myself on powerpc64le-linux-gnu and will push when it finishes.
[PATCH] rs6000: Correct function prototypes for vec_replace_unaligned
Hi! Due to a pasto error in the documentation, vec_replace_unaligned was implemented with the same function prototypes as vec_replace_elt. It was intended that vec_replace_unaligned always specify output vectors as having type vector unsigned char, to emphasize that elements are potentially misaligned by this built-in function. This patch corrects the misimplementation. Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this okay for trunk? Eventually I would also like to backport it to GCC 11, after burn-in. Thanks! Bill 2022-02-04 Bill Schmidt gcc/ * config/rs6000/rs6000-builtins.def (VREPLACE_UN_UV2DI): Change function prototype. (VREPLACE_UN_UV4SI): Likewise. (VREPLACE_UN_V2DF): Likewise. (VREPLACE_UN_V2DI): Likewise. (VREPLACE_UN_V4SF): Likewise. (VREPLACE_UN_V4SI): Likewise. * config/rs6000/rs6000-overload.def (VEC_REPLACE_UN): Change all function prototypes. * config/rs6000/vsx.md (vreplace_un_): Remove define_expand. (vreplace_un_): New define_insn. gcc/testsuite/ * gcc.target/powerpc/vec-replace-word-runnable.c: Handle expected prototypes for each call to vec_replace_unaligned. --- gcc/config/rs6000/rs6000-builtins.def | 16 ++-- gcc/config/rs6000/rs6000-overload.def | 12 - gcc/config/rs6000/vsx.md | 25 --- .../powerpc/vec-replace-word-runnable.c | 20 ++- 4 files changed, 38 insertions(+), 35 deletions(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 5c988cc1152..846c0bafd45 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -3387,25 +3387,25 @@ const vull __builtin_altivec_vpextd (vull, vull); VPEXTD vpextd {} - const vull __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \ - const int<4>); + const vuc __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \ + const int<4>); VREPLACE_UN_UV2DI vreplace_un_v2di {} - const vui __builtin_altivec_vreplace_un_uv4si (vui, unsigned int, \ + const vuc __builtin_altivec_vreplace_un_uv4si (vui, unsigned int, \ const int<4>); VREPLACE_UN_UV4SI vreplace_un_v4si {} - const vd __builtin_altivec_vreplace_un_v2df (vd, double, const int<4>); + const vuc __builtin_altivec_vreplace_un_v2df (vd, double, const int<4>); VREPLACE_UN_V2DF vreplace_un_v2df {} - const vsll __builtin_altivec_vreplace_un_v2di (vsll, signed long long, \ - const int<4>); + const vuc __builtin_altivec_vreplace_un_v2di (vsll, signed long long, \ +const int<4>); VREPLACE_UN_V2DI vreplace_un_v2di {} - const vf __builtin_altivec_vreplace_un_v4sf (vf, float, const int<4>); + const vuc __builtin_altivec_vreplace_un_v4sf (vf, float, const int<4>); VREPLACE_UN_V4SF vreplace_un_v4sf {} - const vsi __builtin_altivec_vreplace_un_v4si (vsi, signed int, const int<4>); + const vuc __builtin_altivec_vreplace_un_v4si (vsi, signed int, const int<4>); VREPLACE_UN_V4SI vreplace_un_v4si {} const vull __builtin_altivec_vreplace_uv2di (vull, unsigned long long, \ diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 49a6104ddd2..44e2945aaa0 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -3059,17 +3059,17 @@ VREPLACE_ELT_V2DF [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un] - vui __builtin_vec_replace_un (vui, unsigned int, const int); + vuc __builtin_vec_replace_un (vui, unsigned int, const int); VREPLACE_UN_UV4SI - vsi __builtin_vec_replace_un (vsi, signed int, const int); + vuc __builtin_vec_replace_un (vsi, signed int, const int); VREPLACE_UN_V4SI - vull __builtin_vec_replace_un (vull, unsigned long long, const int); + vuc __builtin_vec_replace_un (vull, unsigned long long, const int); VREPLACE_UN_UV2DI - vsll __builtin_vec_replace_un (vsll, signed long long, const int); + vuc __builtin_vec_replace_un (vsll, signed long long, const int); VREPLACE_UN_V2DI - vf __builtin_vec_replace_un (vf, float, const int); + vuc __builtin_vec_replace_un (vf, float, const int); VREPLACE_UN_V4SF - vd __builtin_vec_replace_un (vd, double, const int); + vuc __builtin_vec_replace_un (vd, double, const int); VREPLACE_UN_V2DF [VEC_REVB, vec_revb, __builtin_vec_revb] diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 2f5a2f7828d..b53de103872 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4197,21 +4197,6 @@ (define_expand "vreplace_elt_" } [(set_attr "type" "vecsimple")]) -(def
[PATCH] handle "invisible" reference in -Wdangling-pointer (PR104436)
Transforming a by-value arguments to by-reference as GCC does for some class types can trigger -Wdangling-pointer when the argument is used to store the address of a local variable. Since the stored value is not accessible in the caller the warning is a false positive. The attached patch handles this case by excluding PARM_DECLs with the DECL_BY_REFERENCE bit set from consideration. While testing the patch I noticed some instances of the warning are uninitentionally duplicated as the pass runs more than once. To avoid that, I also introduce warning suppression into the handler for this instance of the warning. (There might still be others.) Tested on x86_64-linux. MartinAvoid -Wdangling-pointer for by-transparent-reference arguments [PR104436]. Resolves: PR middle-end/104436 - spurious -Wdangling-pointer assigning local address to a class passed by value gcc/ChangeLog: PR middle-end/104436 * gimple-ssa-warn-access.cc (pass_waccess::check_dangling_stores): Check for warning suppression. Avoid by-value arguments transformed into by-transparent-reference. gcc/testsuite/ChangeLog: PR middle-end/104436 * c-c++-common/Wdangling-pointer-7.c: New test. * g++.dg/warn/Wdangling-pointer-4.C: New test. diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc index 80d41ea4383..0c319a32b70 100644 --- a/gcc/gimple-ssa-warn-access.cc +++ b/gcc/gimple-ssa-warn-access.cc @@ -4517,6 +4517,9 @@ pass_waccess::check_dangling_stores (basic_block bb, if (!stmt) break; + if (warning_suppressed_p (stmt, OPT_Wdangling_pointer_)) + continue; + if (is_gimple_call (stmt) && !(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE))) /* Avoid looking before nonconst, nonpure calls since those might @@ -4542,10 +4545,16 @@ pass_waccess::check_dangling_stores (basic_block bb, } else if (TREE_CODE (lhs_ref.ref) == SSA_NAME) { - /* Avoid looking at or before stores into unknown objects. */ gimple *def_stmt = SSA_NAME_DEF_STMT (lhs_ref.ref); if (!gimple_nop_p (def_stmt)) + /* Avoid looking at or before stores into unknown objects. */ return; + + tree var = SSA_NAME_VAR (lhs_ref.ref); + if (DECL_BY_REFERENCE (var)) + /* Avoid by-value arguments transformed into by-reference. */ + continue; + } else if (TREE_CODE (lhs_ref.ref) == MEM_REF) { @@ -4578,6 +4587,8 @@ pass_waccess::check_dangling_stores (basic_block bb, "storing the address of local variable %qD in %qE", rhs_ref.ref, lhs)) { + suppress_warning (stmt, OPT_Wdangling_pointer_); + location_t loc = DECL_SOURCE_LOCATION (rhs_ref.ref); inform (loc, "%qD declared here", rhs_ref.ref); diff --git a/gcc/testsuite/c-c++-common/Wdangling-pointer-7.c b/gcc/testsuite/c-c++-common/Wdangling-pointer-7.c new file mode 100644 index 000..433727dd845 --- /dev/null +++ b/gcc/testsuite/c-c++-common/Wdangling-pointer-7.c @@ -0,0 +1,20 @@ +/* Verify -Wdangling-pointer is issued only once. + { dg-do compile } + { dg-options "-O -Wall" } */ + +void *p; + +void escape_global_warn_once (void) +{ + int x[5]; + + p = &x[3];// { dg-regexp "\[^\n\r\]+: warning: \[^\n\r\]+ \\\[-Wdangling-pointer.?\\\]" "message" } +} + + +void escape_param_warn_once (void **p) +{ + int x[5]; + + *p = &x[3]; // { dg-regexp "\[^\n\r\]+: warning: \[^\n\r\]+ \\\[-Wdangling-pointer.?\\\]" "message" } +} diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-pointer-4.C b/gcc/testsuite/g++.dg/warn/Wdangling-pointer-4.C new file mode 100644 index 000..b3d144a9e6d --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Wdangling-pointer-4.C @@ -0,0 +1,34 @@ +/* PR middle-end/104436 - spurious -Wdangling-pointer assigning local + address to a class passed by value + { dg-do compile } + { dg-options "-O1 -Wall" } */ + +struct S +{ + S (void *p): p (p) { } + S (const S &s): p (s.p) { } + + void *p; +}; + + +void nowarn_assign_by_value (S s) +{ + int i; + S t (&i); + s = t;// { dg-bogus "-Wdangling-pointer" } +} + +void nowarn_assign_by_value_arg (S s) +{ + S t (&s); + s = t;// { dg-bogus "-Wdangling-pointer" } +} + + +void warn_assign_local_by_reference (S &s) +{ + int i; + S t (&i); + s = t;// { dg-warning "-Wdangling-pointer" } +}
Re: [PATCH] Fix PR 101515 (ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128)
On 2/8/22 15:11, Qing Zhao wrote: Hi, This is the patch to fix PR101515 (ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101515 It's possible that the TYPE_NAME of a record_type is NULL, therefore when printing the TYPE_NAME, we should check and handle this special case. Please see the comment of pr101515 for more details. The fix is very simple, just check and special handle cases when TYPE_NAME is NULL. Bootstrapped and regression tested on both x86 and aarch64, no issues. Okay for commit? Thanks. Qing = From f37ee8d21b80cb77d8108cb97a487c84c530545b Mon Sep 17 00:00:00 2001 From: Qing Zhao Date: Tue, 8 Feb 2022 16:10:37 + Subject: [PATCH] Fix PR 101515 ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128. It's possible that the TYPE_NAME of a record_type is NULL, therefore when printing the TYPE_NAME, we should check and handle this special case. gcc/cp/ChangeLog: * cxx-pretty-print.cc (pp_cxx_unqualified_id): Check and handle the case when TYPE_NAME is NULL. gcc/testsuite/ChangeLog: * g++.dg/pr101515.C: New test. --- gcc/cp/cxx-pretty-print.cc | 5 - gcc/testsuite/g++.dg/pr101515.C | 25 + 2 files changed, 29 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/pr101515.C diff --git a/gcc/cp/cxx-pretty-print.cc b/gcc/cp/cxx-pretty-print.cc index 4f9a090e520d..744ed0add5ba 100644 --- a/gcc/cp/cxx-pretty-print.cc +++ b/gcc/cp/cxx-pretty-print.cc @@ -171,7 +171,10 @@ pp_cxx_unqualified_id (cxx_pretty_printer *pp, tree t) case ENUMERAL_TYPE: case TYPENAME_TYPE: case UNBOUND_CLASS_TEMPLATE: - pp_cxx_unqualified_id (pp, TYPE_NAME (t)); + if (TYPE_NAME (t)) + pp_cxx_unqualified_id (pp, TYPE_NAME (t)); + else + pp_string (pp, ""); Hmm, but it's not an unnamed class, it's a pointer to member function type, and it would be better to avoid dumping compiler internal representations like the __pfn field name. I think the real problem comes sooner, when c_fold_indirect_ref_for_warn turns a MEM_REF with RECORD_TYPE into a COMPONENT_REF with POINTER_TYPE. if (tree ti = TYPE_TEMPLATE_INFO_MAYBE_ALIAS (t)) if (PRIMARY_TEMPLATE_P (TI_TEMPLATE (ti))) { diff --git a/gcc/testsuite/g++.dg/pr101515.C b/gcc/testsuite/g++.dg/pr101515.C new file mode 100644 index ..898c7e003c22 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr101515.C @@ -0,0 +1,25 @@ +/* PR101515 - ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128 + { dg-do compile } + { dg-options "-Wuninitialized -O1" } */ + +struct S +{ + int j; +}; +struct T : public S +{ + virtual void h () {} +}; +struct ptrmemfunc +{ + void (*ptr) (); +}; +typedef void (S::*sp)(); +int main () +{ + T t; + sp x; + ptrmemfunc *xp = (ptrmemfunc *) &x; + if (xp->ptr != ((void (*)())(sizeof(void * /* { dg-warning "is used uninitialized" } */ +return 1; +}
Re: [PATCH] handle "invisible" reference in -Wdangling-pointer (PR104436)
On 2/8/22 16:59, Martin Sebor wrote: Transforming a by-value arguments to by-reference as GCC does for some class types can trigger -Wdangling-pointer when the argument is used to store the address of a local variable. Since the stored value is not accessible in the caller the warning is a false positive. The attached patch handles this case by excluding PARM_DECLs with the DECL_BY_REFERENCE bit set from consideration. While testing the patch I noticed some instances of the warning are uninitentionally duplicated as the pass runs more than once. To avoid that, I also introduce warning suppression into the handler for this instance of the warning. (There might still be others.) The second test should verify that we do warn about returning 't' from a function; we don't want to ignore the DECL_BY_REFERENCE RESULT_DECL. + tree var = SSA_NAME_VAR (lhs_ref.ref); + if (DECL_BY_REFERENCE (var)) + /* Avoid by-value arguments transformed into by-reference. */ + continue; I wonder if we can we express this property of invisiref parms somewhere more general? I imagine optimizations would find it useful as well. Could pointer_query somehow treat the reference as pointing to a function-local object? I previously tried to express this by marking the reference as 'restrict', but that was wrong (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97474). Jason
[PATCH] PR tree-optimization/104420: Fix checks for constant folding X*0.0
This patch resolves PR tree-optimization/104420, which is a P1 regression where, as observed by Jakub Jelinek, the conditions for constant folding x*0.0 are incorrect (following my patch for PR tree-optimization/96392). The multiplication x*0.0 may yield a negative zero result, -0.0, if X is negative (not just if x may be negative zero). Hence (without -ffast-math) (int)x*0.0 can't be optimized to 0.0, but (unsigned)x*0.0 can be constant folded. This adds a bunch of test cases to confirm the desired behaviour, and removes an incorrect test from gcc.dg/pr96392.c which checked for the wrong behaviour. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check no new failures. Ok for mainline? 2022-02-08 Roger Sayle gcc/ChangeLog PR tree-optimization/104420 * match.pd (mult @0 real_zerop): Tweak conditions for constant folding X*0.0 (or X*-0.0) to HONOR_SIGNED_ZEROS when appropriate. gcc/testsuite/ChangeLog PR tree-optimization/104420 * gcc.dg/pr104420-1.c: New test case. * gcc.dg/pr104420-2.c: New test case. * gcc.dg/pr104420-3.c: New test case. * gcc.dg/pr104420-4.c: New test case. * gcc.dg/pr96392.c: Remove incorrect test. Thanks in advance (and sorry for the breakage/thinko). Roger -- diff --git a/gcc/match.pd b/gcc/match.pd index 7bbb801..4fe5909 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -262,8 +262,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (mult @0 real_zerop@1) (if (!tree_expr_maybe_nan_p (@0) && (!HONOR_NANS (type) || !tree_expr_maybe_infinite_p (@0)) - && !tree_expr_maybe_real_minus_zero_p (@0) - && !tree_expr_maybe_real_minus_zero_p (@1)) + && (!HONOR_SIGNED_ZEROS (type) || tree_expr_nonnegative_p (@0))) @1)) /* In IEEE floating point, x*1 is not equivalent to x for snans. diff --git a/gcc/testsuite/gcc.dg/pr104420-1.c b/gcc/testsuite/gcc.dg/pr104420-1.c new file mode 100644 index 000..48385fa --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr104420-1.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-add-options ieee } */ + +double f(int a) +{ + return a * 0.0; +} + +/* { dg-final { scan-tree-dump " \\\* 0.0" "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/pr104420-2.c b/gcc/testsuite/gcc.dg/pr104420-2.c new file mode 100644 index 000..49d0189 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr104420-2.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-add-options ieee } */ + +double f(int a) +{ + return a * -0.0; +} + +/* { dg-final { scan-tree-dump " \\\* -0.0" "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/pr104420-3.c b/gcc/testsuite/gcc.dg/pr104420-3.c new file mode 100644 index 000..962dfff --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr104420-3.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-add-options ieee } */ + +double f(unsigned int a) +{ + return a * 0.0; +} + +/* { dg-final { scan-tree-dump "return 0.0" "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/pr104420-4.c b/gcc/testsuite/gcc.dg/pr104420-4.c new file mode 100644 index 000..95ed0cc --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr104420-4.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-add-options ieee } */ + +double f(unsigned int a) +{ + return a * -0.0; +} + +/* { dg-final { scan-tree-dump "return -0.0" "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/pr96392.c b/gcc/testsuite/gcc.dg/pr96392.c index 662bacb..fb7de21 100644 --- a/gcc/testsuite/gcc.dg/pr96392.c +++ b/gcc/testsuite/gcc.dg/pr96392.c @@ -12,11 +12,6 @@ double sub0(int x) return x - 0.0; } -double mult0(int x) -{ - return 0.0 * x; -} - double negate(int x) { return 0.0 - x; @@ -29,5 +24,4 @@ double subtract(int x) /* { dg-final { scan-tree-dump-not " \\+ " "optimized" } } */ /* { dg-final { scan-tree-dump-not " \\- " "optimized" } } */ -/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */
[PATCH] libstdc++: Fix deadlock in atomic wait [PR104442]
This issue was observed as a deadloack in 29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is "laundered" (e.g. type T* does not suffice as a waitable address for the platform's native waiting primitive), the address waited is that of the _M_ver member of __waiter_pool_base, so several threads may wait on the same address for unrelated atomic's. As noted in the PR, the implementation correctly exits the wait for the thread who's data changed, but not for any other threads waiting on the same address. As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting but the other waiters were not reloaded the value of _M_ver before re-entering the wait. Moving the spin call inside the loop accomplishes this, and is consistent with the predicate accepting version of __waiter::_M_do_wait. From ee66736beca3dce4bc09350c5407a2ac7219fbec Mon Sep 17 00:00:00 2001 From: Thomas Rodgers Date: Tue, 8 Feb 2022 16:33:36 -0800 Subject: [PATCH] libstdc++: Fix deadlock in atomic wait [PR104442] This issue was observed as a deadloack in 29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is "laundered" (e.g. type T* does not suffice as a waitable address for the platform's native waiting primitive), the address waited is that of the _M_ver member of __waiter_pool_base, so several threads may wait on the same address for unrelated atomic's. As noted in the PR, the implementation correctly exits the wait for the thread who's data changed, but not for any other threads waiting on the same address. As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting but the other waiters were not reloaded the value of _M_ver before re-entering the wait. Moving the spin call inside the loop accomplishes this, and is consistent with the predicate accepting version of __waiter::_M_do_wait. libstdc++-v3/ChangeLog: PR libstdc++/104442 * include/bits/atomic_wait.h (__waiter::_M_do_wait_v): Move spin loop inside do loop so that threads failing the wait, reload _M_ver. --- libstdc++-v3/include/bits/atomic_wait.h | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h index d7de0d7eb9e..33ce26ade1b 100644 --- a/libstdc++-v3/include/bits/atomic_wait.h +++ b/libstdc++-v3/include/bits/atomic_wait.h @@ -332,7 +332,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } else { - __atomic_load(__addr, &__val, __ATOMIC_RELAXED); + __atomic_load(__addr, &__val, __ATOMIC_SEQ_CST); } return __atomic_spin(__pred, __spin); } @@ -388,12 +388,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION void _M_do_wait_v(_Tp __old, _ValFn __vfn) { - __platform_wait_t __val; - if (__base_type::_M_do_spin_v(__old, __vfn, __val)) - return; - do { + __platform_wait_t __val; + if (__base_type::_M_do_spin_v(__old, __vfn, __val)) + return; __base_type::_M_w._M_do_wait(__base_type::_M_addr, __val); } while (__detail::__atomic_compare(__old, __vfn())); -- 2.34.1
[pushed] c++: cleanup constant-init'd members [PR96876]
This is a case missed by my recent fixes to aggregate initialization and exception cleanup for PR94041 et al: we also need to clean up members with constant initialization if initialization of a later member throws. It also occurs to me that we needn't bother building the cleanups if -fno-exceptions; build_vec_init already doesn't. Tested x86_64-pc-linux-gnu, applying to trunk. PR c++/96876 gcc/cp/ChangeLog: * typeck2.cc (split_nonconstant_init_1): Push cleanups for preceding members with constant initialization. (maybe_push_temp_cleanup): Do nothing if -fno-exceptions. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/aggr-base11.C: New test. * g++.dg/eh/aggregate2.C: New test. --- gcc/cp/typeck2.cc| 26 +++ gcc/testsuite/g++.dg/cpp1z/aggr-base11.C | 19 + gcc/testsuite/g++.dg/eh/aggregate2.C | 27 3 files changed, 72 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp1z/aggr-base11.C create mode 100644 gcc/testsuite/g++.dg/eh/aggregate2.C diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc index 4015bd53257..39d03e4c3c4 100644 --- a/gcc/cp/typeck2.cc +++ b/gcc/cp/typeck2.cc @@ -467,6 +467,8 @@ cxx_incomplete_type_error (location_t loc, const_tree value, const_tree type) static void maybe_push_temp_cleanup (tree sub, vec **flags) { + if (!flag_exceptions) +return; if (tree cleanup = cxx_maybe_build_cleanup (sub, tf_warning_or_error)) { @@ -496,6 +498,7 @@ split_nonconstant_init_1 (tree dest, tree init, bool last, bool array_type_p = false; bool complete_p = true; HOST_WIDE_INT num_split_elts = 0; + tree last_split_elt = NULL_TREE; switch (TREE_CODE (type)) { @@ -572,6 +575,7 @@ split_nonconstant_init_1 (tree dest, tree init, bool last, else { /* Mark element for removal. */ + last_split_elt = field_index; CONSTRUCTOR_ELT (init, idx)->index = NULL_TREE; if (idx < tidx) tidx = idx; @@ -584,6 +588,7 @@ split_nonconstant_init_1 (tree dest, tree init, bool last, flags)); /* Mark element for removal. */ + last_split_elt = field_index; CONSTRUCTOR_ELT (init, idx)->index = NULL_TREE; if (idx < tidx) tidx = idx; @@ -593,6 +598,26 @@ split_nonconstant_init_1 (tree dest, tree init, bool last, { tree code; + /* Push cleanups for any preceding members with constant +initialization. */ + if (CLASS_TYPE_P (type)) + for (tree prev = (last_split_elt ? + DECL_CHAIN (last_split_elt) + : TYPE_FIELDS (type)); +; prev = DECL_CHAIN (prev)) + { + prev = next_initializable_field (prev); + if (prev == field_index) + break; + tree ptype = TREE_TYPE (prev); + if (type_build_dtor_call (ptype)) + { + tree pcref = build3 (COMPONENT_REF, ptype, dest, prev, +NULL_TREE); + maybe_push_temp_cleanup (pcref, flags); + } + } + /* Mark element for removal. */ CONSTRUCTOR_ELT (init, idx)->index = NULL_TREE; if (idx < tidx) @@ -645,6 +670,7 @@ split_nonconstant_init_1 (tree dest, tree init, bool last, maybe_push_temp_cleanup (sub, flags); } + last_split_elt = field_index; num_split_elts++; } } diff --git a/gcc/testsuite/g++.dg/cpp1z/aggr-base11.C b/gcc/testsuite/g++.dg/cpp1z/aggr-base11.C new file mode 100644 index 000..88625dc9533 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/aggr-base11.C @@ -0,0 +1,19 @@ +// PR c++/96876 +// { dg-do compile { target c++17 } } + +struct B { +protected: +~B() {}// { dg-message "" } +}; + +struct A { }; +struct C1: B { int n; }; +struct C2: A, B { int n; }; + +A af (); +int f(); + +void g() { + C1 c1{ {}, f()}; // { dg-error "protected" } + C2 c2{ af(), {}, f()}; // { dg-error "protected" } +} diff --git a/gcc/testsuite/g++.dg/eh/aggregate2.C b/gcc/testsuite/g++.dg/eh/aggregate2.C new file mode 100644 index 000..8424d63de2d --- /dev/null +++ b/gcc/testsuite/g++.dg/eh/aggregate2.C @@ -0,0 +1,27 @@ +// PR c++/96876 +// { dg-do run { target c++11 } } + +int d; +struct B { + ~B() { ++d; } +}; + +struct C1 { B b; int n; }; +struct C2 { int i; B b; int n; }; + +int f() { throw 24; return 42; } +int dummy; +int g() { ++dummy; return 42; } + +int main() { + try { +C1 c{{}, f()}; + } catch (...) { }
Re: [PATCH v3] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]
Hi Bill, on 2022/2/9 上午4:29, Bill Schmidt wrote: > Hi! > > From some discussion today, I think we want to limit the scope of > this patch to just the power8-fusion flag that's causing trouble for > now, given stage 4. We've talked about making power8-fusion a do- > nothing flag, since it doesn't add much benefit now and probably > shouldn't be a separate flag anyway. Having it as a meaningless > flag makes it more palatable to add an exception for it in the > inlining path. > > Others, feel free to weigh in. > Many thanks for caring about this. It's a good idea to handle this power8-fusion specially for now. BR, Kewen > Thanks, > Bill > > On 1/5/22 1:34 AM, Kewen.Lin wrote: >> Hi, >> >> This patch is to fix the inconsistent behaviors for non-LTO mode >> and LTO mode. As Martin pointed out, currently the function >> rs6000_can_inline_p simply makes it inlinable if callee_tree is >> NULL, but it's unexpected, we should use the command line options >> from target_option_default_node as default. >> >> It replaces rs6000_isa_flags with target_option_default_node when >> caller_tree is NULL since it's more straightforward and doesn't >> suffer from some bug not to keep rs6000_isa_flags as default. >> >> It also extends the scope of the check for the case that callee >> has explicit set options, inlining in test case pr102059-5.c can >> happen unexpectedly before, it's fixed accordingly. >> >> As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION >> can be neglected for always inlining, this patch also takes some >> flags when the callee is attributed by always_inline. >> >> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html >> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586112.html >> >> This patch is one re-post of this updated version[1] and also >> rebased and adjusted on top of the related commit r12-6219. >> >> Bootstrapped and regtested on powerpc64-linux-gnu P8 and >> powerpc64le-linux-gnu P9 and P10. >> >> Is it ok for trunk? >> >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586296.html >> >> BR, >> Kewen >> - >> gcc/ChangeLog: >> >> PR target/102059 >> * config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with >> target_option_default_node and consider always_inline_safe flags. >> >> gcc/testsuite/ChangeLog: >> >> PR target/102059 >> * gcc.target/powerpc/pr102059-4.c: New test. >> * gcc.target/powerpc/pr102059-5.c: New test. >> * gcc.target/powerpc/pr102059-6.c: New test. >> * gcc.target/powerpc/pr102059-7.c: New test. >> * gcc.target/powerpc/pr102059-8.c: New test. >> * gcc.dg/lto/pr102059-1_0.c: Remove unneeded option. >> >>
Re: [Patch, fortran] PR37336 (Finalization) - [F03] Finish derived-type finalization
Remember the days when reading very old cryptic Fortran code? Remember the fixed line lengths and cryptic variable names! I fear the Standards committee has achieved history with the Standard itself it is so difficult to understand sometimes. Cheers to Paul and Harald for digging on this. Jerry On 2/8/22 11:29 AM, Harald Anlauf via Fortran wrote: Hi Paul, Am 08.02.22 um 12:22 schrieb Paul Richard Thomas via Fortran: Hi Harald, Thanks for giving the patch a whirl. the parent components as an array. I strongly suspect that, from reading 7.5.6.2 paragraphs 2 and 3 closely, that ifort has it right. However, this is another issue to come back to in the future. Could you specify which version of Intel you tried? ifort (IFORT) 2021.1 Beta 20201112 ok, that's good to know. Testcase finalize_38.f90 fails for me with ifort 2021.5.0 with: 131 This test also fails with crayftn 11 & 12 and nagfor 7.0, but in a different place. I have run your modified version of finalize_38.f90, and now I see that you can get a bloody head just from scratching too much... crayftn 12.0.2: 1, 3, 1 2, 21, 0 11, 3, 2 12, 21, 1 21, 4, 3 23, 21, 22 | 42, 43 31, 6, 4 41, 7, 5 51, 9, 7 61, 10, 8 71, 13, 10 101, 2, 1 102, 4, 3 111, 3, 2 121, 4, 2 122, 0, 4 123, 5, 6 | 2*0 131, 5, 2 132, 0, 4 133, 7, 8 | 2*0 141, 6, 3 151, 10, 5 161, 16, 9 171, 18, 11 175, 0., 20. | 10., 20. nagfor 7.0: 1 0 1 11 1 2 23 21 22 | 42 43 71 9 10 72 11 99 131 3 2 132 5 4 141 4 3 151 6 5 161 10 9 171 12 11 Intel 2021.5.0: 131 3 2 132 0 4 133 5 6 | 0 0 141 4 3 151 7 5 152 3 0 153 0 0 | 1 3 forrtl: severe (174): SIGSEGV, segmentation fault occurred [...] That got me reading 7.5.6.3, where is says in paragraph 1: "When an intrinsic assignment statement is executed (10.2.1.3), if the variable is not an unallocated allocatable variable, it is finalized after evaluation of expr and before the definition of the variable. ..." Looking at the beginning of the testcase code (abridged): type(simple), allocatable :: MyType, MyType2 type(simple) :: ThyType = simple(21), ThyType2 = simple(22) ! The original PR - one finalization of 'var' before (re)allocation. MyType = ThyType call test(1, 0, [0,0], 0) This is an intrinsic assignment. Naively I would expect MyType to be initially unallocated. ThyType is not allocatable and non-pointer and cannot become undefined here and would not play any role in finalization. I am probably too blind-sighted to see why there should be a finalization here. What am I missing? Could you use the attached version of finalize_38.f90 with crayftn and NAG? All the stop statements are replaced with prints. Ifort gives: 131 3 2 132 0 4 133 5 6 | 0 0 141 4 3 151 7 5 152 3 0 153 0 0 | 1 3 161 13 9 162 20 0 163 0 0 | 10 20 171 14 11 I think it is a good idea to have these prints in the testcase whenever there is a departure from expectations. So print&stop? Furthermore, for the sake of health of people reading the testcases later, I think it would not harm to add more explanations why we expect a certain behavior... ;-) Best regards Paul Best regards, Harald
Re: [PATCH] analyzer: Fix tests for glibc 2.35 [PR101081]
On Fri, 2022-02-04 at 11:35 -0500, Joel Teichroeb via Gcc-patches wrote: > In recent versions of glibc fopen has __attribute__((malloc)). > Since we can not detect wether this attribute is present or not, > we avoid including stdio.h and instead forward declare what we > need in each test. > > Signed-off-by: Joel Teichroeb Thanks! I wrote a ChangeLog entry and pushed this to trunk (as commit r12-7119-ge52a683170877d140eebc9782731eaf11897db71). Dave
Re: [PATCH v4] x86: Add -m[no-]direct-extern-access
On Fri, Jan 28, 2022 at 5:53 AM H.J. Lu via Gcc-patches wrote: > > The v3 patch was posted at > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574847.html > > There is no progress with repeated pings since then. Glibc 2.35 and > binutils 2.38 will support GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS. > I'd like to remove copy relocation to improve security and performance > on x86 in GCC 12. Here is the v4 patch This patch won't change default behavior for copy relocation(to avoid conflict with existing .so files), just give users an option under which copy relocation can be removed. The removal of copy relocation is an optimization of the linker(Also improves security), whose patches have been approved and committed to glibc2.35[1], binutils2.38[2]. The compiler part is the final step in completing this optimization. [1] https://sourceware.org/pipermail/libc-alpha/2021-October/131742.html [2] https://sourceware.org/pipermail/binutils/2021-July/117308.html > > 1. Rename the common option to x86 specific -mdirect-extern-access option. > 2. Remove GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS linker check. > 3. Use the existing GNU property function in x86 backend. > > This new behavior is off by default. Are there any objections? > > Changes in the v3 patch. > > 1. GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support has been added to > GNU binutils 2.38. But the -z indirect-extern-access linker option is > only available for Linux/x86. However, the --max-cache-size=SIZE linker > option was also addded within a day. --max-cache-size=SIZE is used to > check for GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support. > > Changes in the v2 patch. > > 1. Rename the option to -fdirect-extern-access. > > --- > On systems with copy relocation: > * A copy in executable is created for the definition in a shared library > at run-time by ld.so. > * The copy is referenced by executable and shared libraries. > * Executable can access the copy directly. > > Issues are: > * Overhead of a copy, time and space, may be visible at run-time. > * Read-only data in the shared library becomes read-write copy in > executable at run-time. > * Local access to data with the STV_PROTECTED visibility in the shared > library must use GOT. > > On systems without function descriptor, function pointers vary depending > on where and how the functions are defined. > * If the function is defined in executable, it can be the address of > function body. > * If the function, including the function with STV_PROTECTED visibility, > is defined in the shared library, it can be the address of the PLT entry > in executable or shared library. > > Issues are: > * The address of function body may not be used as its function pointer. > * ld.so needs to search loaded shared libraries for the function pointer > of the function with STV_PROTECTED visibility. > > Here is a proposal to remove copy relocation and use canonical function > pointer: > > 1. Accesses, including in PIE and non-PIE, to undefined symbols must > use GOT. > a. Linker may optimize out GOT access if the data is defined in PIE or > non-PIE. > 2. Read-only data in the shared library remain read-only at run-time > 3. Address of global data with the STV_PROTECTED visibility in the shared > library is the address of data body. > a. Can use IP-relative access. > b. May need GOT without IP-relative access. > 4. For systems without function descriptor, > a. All global function pointers of undefined functions in PIE and > non-PIE must use GOT. Linker may optimize out GOT access if the > function is defined in PIE or non-PIE. > b. Function pointer of functions with the STV_PROTECTED visibility in > executable and shared library is the address of function body. >i. Can use IP-relative access. >ii. May need GOT without IP-relative access. >iii. Branches to undefined functions may use PLT. > 5. Single global definition marker: > > Add GNU_PROPERTY_1_NEEDED: > > #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO > > to indicate the needed properties by the object file. > > Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS: > > #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0) > > to indicate that the object file requires canonical function pointers and > cannot be used with copy relocation. This bit should be cleared in > executable when there are non-GOT or non-PLT relocations in relocatable > input files without this bit set. > > a. Protected symbol access within the shared library can be treated as > local. > b. Copy relocation should be disallowed at link-time and run-time. > c. GOT function pointer reference is required at link-time and run-time. > > The indirect external access marker can be used in the following ways: > > 1. Linker can decide the best way to resolve a relocation against a > protected symbol before seeing all relocations against the symbol. > 2. Dynamic linker can decide if it is an error to have a copy rel
[committed] analyzer: fix hashing of bit_range_region::key_t [PR104452]
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; manually verified using valgrind. Pushed to trunk as r12-7118-g391512ade5f6cda95456133296c8dcc42d5fbefd. gcc/analyzer/ChangeLog: PR analyzer/104452 * region-model.cc (selftest::test_bit_range_regions): New. (selftest::analyzer_region_model_cc_tests): Call it. * region.h (bit_range_region::key_t::hash): Fix hashing of m_bits to avoid using uninitialized data. gcc/testsuite/ChangeLog: PR analyzer/104452 * gcc.dg/analyzer/pr104452.c: New test. Signed-off-by: David Malcolm --- gcc/analyzer/region-model.cc | 18 ++ gcc/analyzer/region.h| 3 ++- gcc/testsuite/gcc.dg/analyzer/pr104452.c | 10 ++ 3 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr104452.c diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index 6e7a21d0f9c..f8f19769258 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -4710,6 +4710,23 @@ test_descendent_of_p () ASSERT_TRUE (cast_reg->descendent_of_p (x_reg)); } +/* Verify that bit_range_region works as expected. */ + +static void +test_bit_range_regions () +{ + tree x = build_global_decl ("x", integer_type_node); + region_model_manager mgr; + const region *x_reg = mgr.get_region_for_global (x); + const region *byte0 += mgr.get_bit_range (x_reg, char_type_node, bit_range (0, 8)); + const region *byte1 += mgr.get_bit_range (x_reg, char_type_node, bit_range (8, 8)); + ASSERT_TRUE (byte0->descendent_of_p (x_reg)); + ASSERT_TRUE (byte1->descendent_of_p (x_reg)); + ASSERT_NE (byte0, byte1); +} + /* Verify that simple assignments work as expected. */ static void @@ -6009,6 +6026,7 @@ analyzer_region_model_cc_tests () test_binop_svalue_folding (); test_sub_svalue_folding (); test_descendent_of_p (); + test_bit_range_regions (); test_assignment (); test_compound_assignment (); test_stack_frames (); diff --git a/gcc/analyzer/region.h b/gcc/analyzer/region.h index 206b157e908..53112175266 100644 --- a/gcc/analyzer/region.h +++ b/gcc/analyzer/region.h @@ -1156,7 +1156,8 @@ public: inchash::hash hstate; hstate.add_ptr (m_parent); hstate.add_ptr (m_type); - hstate.add (&m_bits, sizeof (m_bits)); + hstate.add_wide_int (m_bits.m_start_bit_offset); + hstate.add_wide_int (m_bits.m_size_in_bits); return hstate.end (); } diff --git a/gcc/testsuite/gcc.dg/analyzer/pr104452.c b/gcc/testsuite/gcc.dg/analyzer/pr104452.c new file mode 100644 index 000..85eb82defbf --- /dev/null +++ b/gcc/testsuite/gcc.dg/analyzer/pr104452.c @@ -0,0 +1,10 @@ +/* { dg-additional-options "-O" } */ + +void +test_1 (void) +{ + int __attribute__((__vector_size__ (16))) x; + for (unsigned i = 0; i < 4;) +if (x[i]) /* { dg-warning "use of uninitialized value" } */ + __builtin_abort (); +} -- 2.26.3
[PATCH, rs6000] Remove TImode from mode iterator BOOL_128 [PR100694]
Hi, This patch removes TImode from mode iterator BOOL_128. Thus, bool operations (AND, IOR, XOR, NOT) on TImode will be split to the relevant operations on word mode during expand (in optabs.c). Potential optimizations can be implemented after the split. The former practice splits it after the reload pass which is too later for some optimizations. The new test case illustrates it. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is this okay for trunk? Any recommendations? Thanks a lot. ChangeLog 2022-02-08 Haochen Gui gcc/ PR target/100694 * config/rs6000/rs6000.md (BOOL_128): Remove TI. gcc/testsuite/ PR target/100694 * gcc.target/powerpc/pr100694.c: New. * gcc.target/powerpc/pr92398.c: New. * gcc.target/powerpc/pr92398.h: Remove. * gcc.target/powerpc/pr92398.p9-.c: Remove. * gcc.target/powerpc/pr92398.p9+.c: Remove. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 6f74075f58d..2bc1b8f497a 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -750,8 +750,7 @@ (define_mode_attr SI_CONVERT_FP [(SF "TARGET_FCFIDS") (DF "TARGET_FCFID")]) ;; Mode iterator for logical operations on 128-bit types -(define_mode_iterator BOOL_128 [TI -PTI +(define_mode_iterator BOOL_128 [PTI (V16QI "TARGET_ALTIVEC") (V8HI "TARGET_ALTIVEC") (V4SI "TARGET_ALTIVEC") diff --git a/gcc/testsuite/gcc.target/powerpc/pr100694.c b/gcc/testsuite/gcc.target/powerpc/pr100694.c new file mode 100644 index 000..7b41d920140 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr100694.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target int128 } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mli\M} } } */ +/* { dg-final { scan-assembler-not {\mor\M} } } */ + +/* It just needs two std. */ +void foo (unsigned __int128* res, unsigned long long hi, unsigned long long lo) +{ + unsigned __int128 i = hi; + i <<= 64; + i |= lo; + *res = i; +} diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.c b/gcc/testsuite/gcc.target/powerpc/pr92398.c new file mode 100644 index 000..7d6201cc5bb --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr92398.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target int128 } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-times {\mnot\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */ + +/* All platforms should generate the same instructions: not;not;std;std. */ +void bar (__int128_t *dst, __int128_t src) +{ + *dst = ~src; +} + diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.h b/gcc/testsuite/gcc.target/powerpc/pr92398.h deleted file mode 100644 index 5a4a8bcab80..000 --- a/gcc/testsuite/gcc.target/powerpc/pr92398.h +++ /dev/null @@ -1,17 +0,0 @@ -/* This test code is included into pr92398.p9-.c and pr92398.p9+.c. - The two files have the tests for the number of instructions generated for - P9- versus P9+. - - store generates difference instructions as below: - P9+: mtvsrdd;xxlnot;stxv. - P8/P7/P6 LE: not;not;std;std. - P8 BE: mtvsrd;mtvsrd;xxpermdi;xxlnor;stxvd2x. - P7/P6 BE: std;std;addi;lxvd2x;xxlnor;stxvd2x. - P9+ and P9- LE are expected, P6/P7/P8 BE are unexpected. */ - -void -bar (__int128_t *dst, __int128_t src) -{ - *dst = ~src; -} - diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c b/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c deleted file mode 100644 index 72dd1d9a274..000 --- a/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c +++ /dev/null @@ -1,12 +0,0 @@ -/* { dg-do compile { target { lp64 && has_arch_pwr9 } } } */ -/* { dg-require-effective-target powerpc_vsx_ok } */ -/* { dg-options "-O2 -mvsx" } */ - -/* { dg-final { scan-assembler-times {\mmtvsrdd\M} 1 } } */ -/* { dg-final { scan-assembler-times {\mxxlnor\M} 1 } } */ -/* { dg-final { scan-assembler-times {\mstxv\M} 1 } } */ -/* { dg-final { scan-assembler-not {\mld\M} } } */ -/* { dg-final { scan-assembler-not {\mnot\M} } } */ - -/* Source code for the test in pr92398.h */ -#include "pr92398.h" diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c b/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c deleted file mode 100644 index bd7fa98af51..000 --- a/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c +++ /dev/null @@ -1,10 +0,0 @@ -/* { dg-do compile { target { lp64 && {! has_arch_pwr9} } } } */ -/* { dg-require-effective-target powerpc_vsx_ok } */ -/* { dg-options "-O2 -mvsx" } */ - -/* { dg-final { scan-assembler-times {\mnot\M} 2 { xfail be } } } */ -/* { dg-final { scan-assembler-times {\mstd\M} 2 { xfail { { {! has_arch_pwr9}
[PATCH] x86: Check each component of source operand for AVX_U128_DIRTY
commit 9775e465c1fbfc32656de77c618c61acf5bd905d Author: H.J. Lu Date: Tue Jul 27 07:46:04 2021 -0700 x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register called ix86_check_avx_upper_register to check mode on source operand. But ix86_check_avx_upper_register doesn't work on source operand like (vec_select:V2DI (reg/v:V4DI 23 xmm3 [orig:91 ymm ] [91]) (parallel [ (const_int 2 [0x2]) (const_int 3 [0x3]) ])) Add ix86_avx_u128_mode_source to check mode for each component of source operand. gcc/ PR target/104441 * config/i386/i386.cc (ix86_avx_u128_mode_source): New function. (ix86_avx_u128_mode_needed): Return AVX_U128_ANY for debug INSN. Call ix86_avx_u128_mode_source to check mode for each component of source operand. gcc/testsuite/ PR target/104441 * gcc.target/i386/pr104441-1a.c: New test. * gcc.target/i386/pr104441-1b.c: Likewise. --- gcc/config/i386/i386.cc | 145 +++- gcc/testsuite/gcc.target/i386/pr104441-1a.c | 57 gcc/testsuite/gcc.target/i386/pr104441-1b.c | 32 + 3 files changed, 168 insertions(+), 66 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr104441-1a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr104441-1b.c diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index dd5584fb8ed..2d87acca7ff 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -14365,11 +14365,82 @@ ix86_check_avx_upper_stores (rtx dest, const_rtx, void *data) } } +/* For YMM/ZMM store or YMM/ZMM extract. Return mode for the source + operand of SRC DEFs in the same basic block before INSN. */ + +static int +ix86_avx_u128_mode_source (rtx_insn *insn, const_rtx src) +{ + basic_block bb = BLOCK_FOR_INSN (insn); + rtx_insn *end = BB_END (bb); + + /* Return AVX_U128_DIRTY if there is no DEF in the same basic + block. */ + int status = AVX_U128_DIRTY; + + for (df_ref def = DF_REG_DEF_CHAIN (REGNO (src)); + def; def = DF_REF_NEXT_REG (def)) +if (DF_REF_BB (def) == bb) + { + /* Ignore DEF from different basic blocks. */ + rtx_insn *def_insn = DF_REF_INSN (def); + + /* Check if DEF_INSN is before INSN. */ + rtx_insn *next; + for (next = NEXT_INSN (def_insn); +next != nullptr && next != end && next != insn; +next = NEXT_INSN (next)) + ; + + /* Skip if DEF_INSN isn't before INSN. */ + if (next != insn) + continue; + + /* Return AVX_U128_DIRTY if the source operand of DEF_INSN + isn't constant zero. */ + + if (CALL_P (def_insn)) + { + bool avx_upper_reg_found = false; + note_stores (def_insn, +ix86_check_avx_upper_stores, +&avx_upper_reg_found); + + /* Return AVX_U128_DIRTY if call returns AVX. */ + if (avx_upper_reg_found) + return AVX_U128_DIRTY; + + continue; + } + + rtx set = single_set (def_insn); + if (!set) + return AVX_U128_DIRTY; + + rtx dest = SET_DEST (set); + + /* Skip if DEF_INSN is not an AVX load. Return AVX_U128_DIRTY + if the source operand isn't constant zero. */ + if (ix86_check_avx_upper_register (dest) + && standard_sse_constant_p (SET_SRC (set), + GET_MODE (dest)) != 1) + return AVX_U128_DIRTY; + + /* We get here only if all AVX loads are from constant zero. */ + status = AVX_U128_ANY; + } + + return status; +} + /* Return needed mode for entity in optimize_mode_switching pass. */ static int ix86_avx_u128_mode_needed (rtx_insn *insn) { + if (DEBUG_INSN_P (insn)) +return AVX_U128_ANY; + if (CALL_P (insn)) { rtx link; @@ -14409,6 +14480,8 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) return AVX_U128_CLEAN; } + subrtx_iterator::array_type array; + rtx set = single_set (insn); if (set) { @@ -14423,74 +14496,15 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) else return AVX_U128_ANY; } - else if (ix86_check_avx_upper_register (src)) + else { - /* This is an YMM/ZMM store. Check for the source operand -of SRC DEFs in the same basic block before INSN. */ - basic_block bb = BLOCK_FOR_INSN (insn); - rtx_insn *end = BB_END (bb); - - /* Return AVX_U128_DIRTY if there is no DEF in the same basic -block. */ - int status = AVX_U128_DIRTY; - - for (df_ref def = DF_REG_DEF_CHAIN (REGNO (src)); - def; def = DF_REF_NEXT_REG (def)) - if (DF_REF_BB (def) == bb) + FOR_EACH_SUBRTX (iter, array, src, NONCONST) + if (ix86_check_avx_upper_register (*iter)) { - /* Ignore DEF from different ba
Re: [PATCH v4] x86: Add -m[no-]direct-extern-access
On Tue, Feb 8, 2022 at 6:38 PM Hongtao Liu wrote: > > On Fri, Jan 28, 2022 at 5:53 AM H.J. Lu via Gcc-patches > wrote: > > > > The v3 patch was posted at > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574847.html > > > > There is no progress with repeated pings since then. Glibc 2.35 and > > binutils 2.38 will support GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS. > > I'd like to remove copy relocation to improve security and performance > > on x86 in GCC 12. Here is the v4 patch > This patch won't change default behavior for copy relocation(to > avoid conflict with existing .so files), > just give users an option under which copy relocation can be removed. > The removal of copy relocation is an optimization of the linker(Also > improves security), whose > patches have been approved and committed to glibc2.35[1], binutils2.38[2]. > The compiler part is the final step in completing this optimization. Thanks for looking into it. Since the new behavior is off by default and none of the maintainers feel comfortable to review my patch, which is related to linker and glibc, for the last 6 months, now is the good time to check it in as any other time. I will check it in tomorrow. > [1] https://sourceware.org/pipermail/libc-alpha/2021-October/131742.html > [2] https://sourceware.org/pipermail/binutils/2021-July/117308.html > > > > 1. Rename the common option to x86 specific -mdirect-extern-access option. > > 2. Remove GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS linker check. > > 3. Use the existing GNU property function in x86 backend. > > > > This new behavior is off by default. Are there any objections? > > > > Changes in the v3 patch. > > > > 1. GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support has been added to > > GNU binutils 2.38. But the -z indirect-extern-access linker option is > > only available for Linux/x86. However, the --max-cache-size=SIZE linker > > option was also addded within a day. --max-cache-size=SIZE is used to > > check for GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support. > > > > Changes in the v2 patch. > > > > 1. Rename the option to -fdirect-extern-access. > > > > --- > > On systems with copy relocation: > > * A copy in executable is created for the definition in a shared library > > at run-time by ld.so. > > * The copy is referenced by executable and shared libraries. > > * Executable can access the copy directly. > > > > Issues are: > > * Overhead of a copy, time and space, may be visible at run-time. > > * Read-only data in the shared library becomes read-write copy in > > executable at run-time. > > * Local access to data with the STV_PROTECTED visibility in the shared > > library must use GOT. > > > > On systems without function descriptor, function pointers vary depending > > on where and how the functions are defined. > > * If the function is defined in executable, it can be the address of > > function body. > > * If the function, including the function with STV_PROTECTED visibility, > > is defined in the shared library, it can be the address of the PLT entry > > in executable or shared library. > > > > Issues are: > > * The address of function body may not be used as its function pointer. > > * ld.so needs to search loaded shared libraries for the function pointer > > of the function with STV_PROTECTED visibility. > > > > Here is a proposal to remove copy relocation and use canonical function > > pointer: > > > > 1. Accesses, including in PIE and non-PIE, to undefined symbols must > > use GOT. > > a. Linker may optimize out GOT access if the data is defined in PIE or > > non-PIE. > > 2. Read-only data in the shared library remain read-only at run-time > > 3. Address of global data with the STV_PROTECTED visibility in the shared > > library is the address of data body. > > a. Can use IP-relative access. > > b. May need GOT without IP-relative access. > > 4. For systems without function descriptor, > > a. All global function pointers of undefined functions in PIE and > > non-PIE must use GOT. Linker may optimize out GOT access if the > > function is defined in PIE or non-PIE. > > b. Function pointer of functions with the STV_PROTECTED visibility in > > executable and shared library is the address of function body. > >i. Can use IP-relative access. > >ii. May need GOT without IP-relative access. > >iii. Branches to undefined functions may use PLT. > > 5. Single global definition marker: > > > > Add GNU_PROPERTY_1_NEEDED: > > > > #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO > > > > to indicate the needed properties by the object file. > > > > Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS: > > > > #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0) > > > > to indicate that the object file requires canonical function pointers and > > cannot be used with copy relocation. This bit should be cleared in > > executable when there are non-GOT or non-PLT relocations in relocatable > > input file
[PATCH v2] MIPS: IPL is 8bit in Cause register if TARGET_MCU
If MIPS MCU extension is enable, the IPL section in Cause register has been expand to 8bit instead of 6bit. gcc/ChangeLog: * config/mips/mips.cc (mips_expand_prologue): IPL is 8bit for MCU ASE. --- gcc/config/mips/mips.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc index 4f9683e8bf4..d823c459b75 100644 --- a/gcc/config/mips/mips.cc +++ b/gcc/config/mips/mips.cc @@ -12255,7 +12255,7 @@ mips_expand_prologue (void) if (!cfun->machine->keep_interrupts_masked_p && cfun->machine->int_mask == INT_MASK_EIC) emit_insn (gen_insvsi (gen_rtx_REG (SImode, K1_REG_NUM), - GEN_INT (6), + TARGET_MCU ? GEN_INT (8) : GEN_INT (6), GEN_INT (SR_IPL), gen_rtx_REG (SImode, K0_REG_NUM))); -- 2.30.2
[PATCH] PR target/102059 Fix inline of target specific functions
Reset -mpower8-fusion for power9 inlining power8 functions, PR 102059. This patch is an attempt to make a much simpler patch to fix PR target/102059 than the previous patch. It just fixes the issue that if a function is specifically declared as a power8 function, you can't inline in functions that are specified with power9 or power10 options. The issue is -mpower8-fusion is cleared when you use -mcpu=power9 or -mcpu=power10. When I wrote the code for controlling which function can inline other functions, -mpower8-fusion was set for -mcpu=power9 (-mcpu=power10 was not an option at that time). This patch fixes this particular problem. Perhaps -mpower8-fusion should go away in the GCC 13 time frame. This patch just goes in and resets the fusion bit for testing inlines. I have built a bootstrapped little endian compiler on power9 and the tests had no regressions. I have built a bootstrapped big endian compiler on power8 and I tested both 32-bit and 64-bit builds, and there were no regressions. Can I install this into the trunk and back port it into GCC 11 after a burn-in period? 2022-02-08 Michael Meissner gcc/ PR target/102059 * config/rs6000/rs6000.cc (rs6000_can_inline_p): Don't test for power8-fusion if the caller was power9 or power10. gcc/testsuite/ PR target/102059 * gcc.target/powerpc/pr102059-4.c: New test. --- gcc/config/rs6000/rs6000.cc | 8 gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 20 +++ 2 files changed, 28 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index eaba9a2d698..e2d94421ae9 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -25278,6 +25278,14 @@ rs6000_can_inline_p (tree caller, tree callee) callee_isa &= ~OPTION_MASK_HTM; explicit_isa &= ~OPTION_MASK_HTM; } + + /* Power9 and power10 do not set power8-fusion. If the callee was +explicitly compiled for power8, and the caller was power9 or +power10, ignore the power8-fusion bits if it was set by +default. */ + if ((caller_isa & OPTION_MASK_P8_FUSION) == 0 + && (explicit_isa & OPTION_MASK_P8_FUSION) == 0) + callee_isa &= ~OPTION_MASK_P8_FUSION; } /* The callee's options must be a subset of the caller's options, i.e. diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-4.c b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c new file mode 100644 index 000..627c6f820c7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10 -Wno-attributes" } */ + +/* Verify that power10 can explicity include functions compiled for power8. + The issue is -mcpu=power8 enables -mpower8-fusion, but -mcpu=power9 or + -mcpu=power10 do not set power8-fusion by default. */ + +static inline int __attribute__ ((always_inline,target("cpu=power8"))) +foo (int *b) +{ + *b += 10; + return *b; +} + +int +bar (int *a) +{ + *a = foo (a); + return 0; +} -- 2.34.1 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Go patch committed: Implement FuncPCABI0, FuncPCABIInternal
This patch to the Go frontend and libgo implements the abi.FuncPCABI0 and abi.FuncPCABIInternal functions. The Go 1.18 standard library uses an internal/abi package with two functions that are implemented in the compiler. This patch implements them in the gofrontend, to support the upcoming update to 1.18. For the gofrontend, which uses the platform ABI, the functions behave identically. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian patc 56fa431758bcc37c4dade7b2d8d38feb0c59097e diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index f78561c3483..5f4adf9d0e7 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -61f7cf4b9db0587ff099aa36832a355b90ee1bf9 +262cb89fd5ed82ab135a3933b2ddf4eb67683149 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc index 79702821336..1e6890a3cc5 100644 --- a/gcc/go/gofrontend/expressions.cc +++ b/gcc/go/gofrontend/expressions.cc @@ -12177,6 +12177,50 @@ Call_expression::intrinsify(Gogo* gogo, return Runtime::make_call(code, loc, 3, a1, a2, a3); } } + else if (package == "internal/abi") +{ + if ((name == "FuncPCABI0" || name == "FuncPCABIInternal") + && this->args_ != NULL + && this->args_->size() == 1) + { + // We expect to see a conversion from the expression to "any". + Expression* expr = this->args_->front(); + Type_conversion_expression* tce = expr->conversion_expression(); + if (tce != NULL) + expr = tce->expr(); + Func_expression* fe = expr->func_expression(); + Interface_field_reference_expression* interface_method = + expr->interface_field_reference_expression(); + if (fe != NULL) + { + Named_object* no = fe->named_object(); + Expression* ref = Expression::make_func_code_reference(no, loc); + Type* uintptr_type = Type::lookup_integer_type("uintptr"); + return Expression::make_cast(uintptr_type, ref, loc); + } + else if (interface_method != NULL) + return interface_method->get_function(); + else + { + expr = this->args_->front(); + go_assert(expr->type()->interface_type() != NULL + && expr->type()->interface_type()->is_empty()); + expr = Expression::make_interface_info(expr, +INTERFACE_INFO_OBJECT, +loc); + // Trust that this is a function type, which means that + // it is a direct iface type and we can use EXPR + // directly. The backend representation of this + // function is a pointer to a struct whose first field + // is the actual function to call. + Type* pvoid = Type::make_pointer_type(Type::make_void_type()); + Type* pfntype = Type::make_pointer_type(pvoid); + Expression* ref = make_unsafe_cast(pfntype, expr, loc); + return Expression::make_dereference(ref, NIL_CHECK_NOT_NEEDED, + loc); + } + } +} return NULL; } diff --git a/libgo/go/internal/abi/abi.go b/libgo/go/internal/abi/abi.go new file mode 100644 index 000..c4a108847ca --- /dev/null +++ b/libgo/go/internal/abi/abi.go @@ -0,0 +1,35 @@ +// Copyright 2020 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package abi + +// FuncPC* intrinsics. +// +// CAREFUL: In programs with plugins, FuncPC* can return different values +// for the same function (because there are actually multiple copies of +// the same function in the address space). To be safe, don't use the +// results of this function in any == expression. It is only safe to +// use the result as an address at which to start executing code. + +// FuncPCABI0 returns the entry PC of the function f, which must be a +// direct reference of a function defined as ABI0. Otherwise it is a +// compile-time error. +// +// Implemented as a compile intrinsic. +func FuncPCABI0(f any) uintptr { + // The compiler should remove all calls. + panic("FuncPCABI0") +} + +// FuncPCABIInternal returns the entry PC of the function f. If f is a +// direct reference of a function, it must be defined as ABIInternal. +// Otherwise it is a compile-time error. If f is not a direct reference +// of a defined function, it assumes that f is a func value. Otherwise +// the behavior is undefined. +// +// Implemented as a compile intrinsic. +func FuncPCABIInternal(f any) uintptr { + // The compiler should remove all calls. + panic("FuncPCABIInternal") +} diff --git
Go patch committed: Recognize Go 1.18 runtime/internal/atomic methods
The Go 1.18 library introduces specific types in runtime/internal/atomic. This patch to the Go frontend recognizes and optimizes the methods on those types, as we do with the functions in runtime/internal/atomic. While we're here avoid getting confused by methods in any other package that we recognize specially. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian * go-gcc.cc (Gcc_backend::Gcc_backend): Define builtins __atomic_load_1 and __atomic_store_1. pa 869fb813039e8933a85f5f2a3a53cde156030b0a diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc index 631996073f4..f3de7a8c183 100644 --- a/gcc/go/go-gcc.cc +++ b/gcc/go/go-gcc.cc @@ -897,6 +897,20 @@ Gcc_backend::Gcc_backend() this->define_builtin(BUILT_IN_ATOMIC_ADD_FETCH_8, "__atomic_add_fetch_8", NULL, t, 0); + t = build_function_type_list(unsigned_char_type_node, + ptr_type_node, + integer_type_node, + NULL_TREE); + this->define_builtin(BUILT_IN_ATOMIC_LOAD_1, "__atomic_load_1", NULL, t, 0); + + t = build_function_type_list(void_type_node, + ptr_type_node, + unsigned_char_type_node, + integer_type_node, + NULL_TREE); + this->define_builtin(BUILT_IN_ATOMIC_STORE_1, "__atomic_store_1", NULL, + t, 0); + t = build_function_type_list(unsigned_char_type_node, ptr_type_node, unsigned_char_type_node, diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index 5f4adf9d0e7..9cd22ef011e 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -262cb89fd5ed82ab135a3933b2ddf4eb67683149 +3b1e46937d11b043d0986a3dfefaee27454c3da0 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc index 1e6890a3cc5..d7b64767a00 100644 --- a/gcc/go/gofrontend/expressions.cc +++ b/gcc/go/gofrontend/expressions.cc @@ -11613,12 +11613,16 @@ Call_expression::intrinsify(Gogo* gogo, std::string package = (no->package() != NULL ? no->package()->pkgpath() : gogo->pkgpath()); + bool is_method = ((no->is_function() && no->func_value()->is_method()) + || (no->is_function_declaration() + && no->func_declaration_value()->is_method())); Location loc = this->location(); Type* int_type = Type::lookup_integer_type("int"); Type* int32_type = Type::lookup_integer_type("int32"); Type* int64_type = Type::lookup_integer_type("int64"); Type* uint_type = Type::lookup_integer_type("uint"); + Type* uint8_type = Type::lookup_integer_type("uint8"); Type* uint32_type = Type::lookup_integer_type("uint32"); Type* uint64_type = Type::lookup_integer_type("uint64"); Type* uintptr_type = Type::lookup_integer_type("uintptr"); @@ -11629,6 +11633,9 @@ Call_expression::intrinsify(Gogo* gogo, if (package == "sync/atomic") { + if (is_method) + return NULL; + // sync/atomic functions and runtime/internal/atomic functions // are very similar. In order not to duplicate code, we just // redirect to the latter and let the code below to handle them. @@ -11694,6 +11701,9 @@ Call_expression::intrinsify(Gogo* gogo, if (package == "runtime/internal/sys") { + if (is_method) + return NULL; + // runtime/internal/sys functions and math/bits functions // are very similar. In order not to duplicate code, we just // redirect to the latter and let the code below to handle them. @@ -11713,6 +11723,9 @@ Call_expression::intrinsify(Gogo* gogo, if (package == "runtime") { + if (is_method) + return NULL; + // Handle a couple of special runtime functions. In the runtime // package, getcallerpc returns the PC of the caller, and // getcallersp returns the frame pointer of the caller. Implement @@ -11743,6 +11756,9 @@ Call_expression::intrinsify(Gogo* gogo, } else if (package == "math/bits") { + if (is_method) + return NULL; + if ((name == "ReverseBytes16" || name == "ReverseBytes32" || name == "ReverseBytes64" || name == "ReverseBytes") && this->args_ != NULL && this->args_->size() == 1) @@ -11913,9 +11929,137 @@ Call_expression::intrinsify(Gogo* gogo, { int memorder = __ATOMIC_SEQ_CST; + if (is_method) + { + Function_type* ftype = (no->is_function() + ? no->func_value()->type() + : no->func_declaration_value()->type()); + Type* rtype = ftype->receiver()->type()->deref(); + go_assert(rtype->named_type() !=
Re: [PATCH] x86: Check each component of source operand for AVX_U128_DIRTY
On Wed, Feb 9, 2022 at 10:53 AM H.J. Lu via Gcc-patches wrote: > > commit 9775e465c1fbfc32656de77c618c61acf5bd905d > Author: H.J. Lu > Date: Tue Jul 27 07:46:04 2021 -0700 > > x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register > > called ix86_check_avx_upper_register to check mode on source operand. > But ix86_check_avx_upper_register doesn't work on source operand like > The new function ix86_avx_u128_mode_source just takes the code from the *else if (ix86_check_avx_upper_register (src))* branch to check each component of src that meets the ix86_check_avx_upper_register condition, which seems reasonable. The patch LGTM. > (vec_select:V2DI (reg/v:V4DI 23 xmm3 [orig:91 ymm ] [91]) > (parallel [ > (const_int 2 [0x2]) > (const_int 3 [0x3]) > ])) > > Add ix86_avx_u128_mode_source to check mode for each component of source > operand. > > gcc/ > > PR target/104441 > * config/i386/i386.cc (ix86_avx_u128_mode_source): New function. > (ix86_avx_u128_mode_needed): Return AVX_U128_ANY for debug INSN. > Call ix86_avx_u128_mode_source to check mode for each component > of source operand. > > gcc/testsuite/ > > PR target/104441 > * gcc.target/i386/pr104441-1a.c: New test. > * gcc.target/i386/pr104441-1b.c: Likewise. > --- > gcc/config/i386/i386.cc | 145 +++- > gcc/testsuite/gcc.target/i386/pr104441-1a.c | 57 > gcc/testsuite/gcc.target/i386/pr104441-1b.c | 32 + > 3 files changed, 168 insertions(+), 66 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr104441-1a.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr104441-1b.c > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index dd5584fb8ed..2d87acca7ff 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -14365,11 +14365,82 @@ ix86_check_avx_upper_stores (rtx dest, const_rtx, > void *data) > } > } > > +/* For YMM/ZMM store or YMM/ZMM extract. Return mode for the source > + operand of SRC DEFs in the same basic block before INSN. */ > + > +static int > +ix86_avx_u128_mode_source (rtx_insn *insn, const_rtx src) > +{ > + basic_block bb = BLOCK_FOR_INSN (insn); > + rtx_insn *end = BB_END (bb); > + > + /* Return AVX_U128_DIRTY if there is no DEF in the same basic > + block. */ > + int status = AVX_U128_DIRTY; > + > + for (df_ref def = DF_REG_DEF_CHAIN (REGNO (src)); > + def; def = DF_REF_NEXT_REG (def)) > +if (DF_REF_BB (def) == bb) > + { > + /* Ignore DEF from different basic blocks. */ > + rtx_insn *def_insn = DF_REF_INSN (def); > + > + /* Check if DEF_INSN is before INSN. */ > + rtx_insn *next; > + for (next = NEXT_INSN (def_insn); > +next != nullptr && next != end && next != insn; > +next = NEXT_INSN (next)) > + ; > + > + /* Skip if DEF_INSN isn't before INSN. */ > + if (next != insn) > + continue; > + > + /* Return AVX_U128_DIRTY if the source operand of DEF_INSN > + isn't constant zero. */ > + > + if (CALL_P (def_insn)) > + { > + bool avx_upper_reg_found = false; > + note_stores (def_insn, > +ix86_check_avx_upper_stores, > +&avx_upper_reg_found); > + > + /* Return AVX_U128_DIRTY if call returns AVX. */ > + if (avx_upper_reg_found) > + return AVX_U128_DIRTY; > + > + continue; > + } > + > + rtx set = single_set (def_insn); > + if (!set) > + return AVX_U128_DIRTY; > + > + rtx dest = SET_DEST (set); > + > + /* Skip if DEF_INSN is not an AVX load. Return AVX_U128_DIRTY > + if the source operand isn't constant zero. */ > + if (ix86_check_avx_upper_register (dest) > + && standard_sse_constant_p (SET_SRC (set), > + GET_MODE (dest)) != 1) > + return AVX_U128_DIRTY; > + > + /* We get here only if all AVX loads are from constant zero. */ > + status = AVX_U128_ANY; > + } > + > + return status; > +} > + > /* Return needed mode for entity in optimize_mode_switching pass. */ > > static int > ix86_avx_u128_mode_needed (rtx_insn *insn) > { > + if (DEBUG_INSN_P (insn)) > +return AVX_U128_ANY; > + >if (CALL_P (insn)) > { >rtx link; > @@ -14409,6 +14480,8 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) >return AVX_U128_CLEAN; > } > > + subrtx_iterator::array_type array; > + >rtx set = single_set (insn); >if (set) > { > @@ -14423,74 +14496,15 @@ ix86_avx_u128_mode_needed (rtx_insn *insn) > else > return AVX_U128_ANY; > } > - else if (ix86_check_avx_upper_register (src)) > + else > { > - /* This is an YMM/ZMM store. Check for the source operand > -of
Re: PING^5 [PATCH v4 0/2] Implement indirect external access
On Mon, Jan 3, 2022 at 7:33 PM H.J. Lu via Gcc-patches wrote: > > On Sat, Dec 11, 2021 at 10:44 AM H.J. Lu wrote: > > > > On Thu, Nov 25, 2021 at 9:54 AM H.J. Lu wrote: > > > > > > On Mon, Nov 1, 2021 at 7:02 AM H.J. Lu wrote: > > > > > > > > On Thu, Oct 21, 2021 at 12:56 PM H.J. Lu wrote: > > > > > > > > > > On Wed, Sep 22, 2021 at 7:02 PM H.J. Lu wrote: > > > > > > > > > > > > Changes in the v4 patch. > > > > > > > > > > > > 1. Add nodirect_extern_access attribute. > > > > > > > > > > > > Changes in the v3 patch. > > > > > > > > > > > > 1. GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support has been > > > > > > added to > > > > > > GNU binutils 2.38. But the -z indirect-extern-access linker option > > > > > > is > > > > > > only available for Linux/x86. However, the --max-cache-size=SIZE > > > > > > linker > > > > > > option was also addded within a day. --max-cache-size=SIZE is used > > > > > > to > > > > > > check for GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support. > > > > > > > > > > > > Changes in the v2 patch. > > > > > > > > > > > > 1. Rename the option to -fdirect-extern-access. > > > > > > > > > > > > --- > > > > > > On systems with copy relocation: > > > > > > * A copy in executable is created for the definition in a shared > > > > > > library > > > > > > at run-time by ld.so. > > > > > > * The copy is referenced by executable and shared libraries. > > > > > > * Executable can access the copy directly. > > > > > > > > > > > > Issues are: > > > > > > * Overhead of a copy, time and space, may be visible at run-time. > > > > > > * Read-only data in the shared library becomes read-write copy in > > > > > > executable at run-time. > > > > > > * Local access to data with the STV_PROTECTED visibility in the > > > > > > shared > > > > > > library must use GOT. > > > > > > > > > > > > On systems without function descriptor, function pointers vary > > > > > > depending > > > > > > on where and how the functions are defined. > > > > > > * If the function is defined in executable, it can be the address of > > > > > > function body. > > > > > > * If the function, including the function with STV_PROTECTED > > > > > > visibility, > > > > > > is defined in the shared library, it can be the address of the PLT > > > > > > entry > > > > > > in executable or shared library. > > > > > > > > > > > > Issues are: > > > > > > * The address of function body may not be used as its function > > > > > > pointer. > > > > > > * ld.so needs to search loaded shared libraries for the function > > > > > > pointer > > > > > > of the function with STV_PROTECTED visibility. > > > > > > > > > > > > Here is a proposal to remove copy relocation and use canonical > > > > > > function > > > > > > pointer: > > > > > > > > > > > > 1. Accesses, including in PIE and non-PIE, to undefined symbols must > > > > > > use GOT. > > > > > > a. Linker may optimize out GOT access if the data is defined in > > > > > > PIE or > > > > > > non-PIE. > > > > > > 2. Read-only data in the shared library remain read-only at run-time > > > > > > 3. Address of global data with the STV_PROTECTED visibility in the > > > > > > shared > > > > > > library is the address of data body. > > > > > > a. Can use IP-relative access. > > > > > > b. May need GOT without IP-relative access. > > > > > > 4. For systems without function descriptor, > > > > > > a. All global function pointers of undefined functions in PIE and > > > > > > non-PIE must use GOT. Linker may optimize out GOT access if the > > > > > > function is defined in PIE or non-PIE. > > > > > > b. Function pointer of functions with the STV_PROTECTED > > > > > > visibility in > > > > > > executable and shared library is the address of function body. > > > > > >i. Can use IP-relative access. > > > > > >ii. May need GOT without IP-relative access. > > > > > >iii. Branches to undefined functions may use PLT. > > > > > > 5. Single global definition marker: > > > > > > > > > > > > Add GNU_PROPERTY_1_NEEDED: > > > > > > > > > > > > #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO > > > > > > > > > > > > to indicate the needed properties by the object file. > > > > > > > > > > > > Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS: > > > > > > > > > > > > #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0) > > > > > > > > > > > > to indicate that the object file requires canonical function > > > > > > pointers and > > > > > > cannot be used with copy relocation. This bit should be cleared in > > > > > > executable when there are non-GOT or non-PLT relocations in > > > > > > relocatable > > > > > > input files without this bit set. > > > > > > > > > > > > a. Protected symbol access within the shared library can be > > > > > > treated as > > > > > > local. > > > > > > b. Copy relocation should be disallowed at link-time and run-time. > > > > > > c. GOT function pointer reference is required at link-time and > > > > > > run-time. > >
Re: [PATCH] tree-optimization/104373 - early uninit diagnostic on unreachable code
On Tue, 8 Feb 2022, Jeff Law wrote: > > > On 2/8/2022 12:03 AM, Richard Biener via Gcc-patches wrote: > > The following improves early uninit diagnostics by computing edge > > reachability using our value-numbering framework in its cheapest > > mode and ignoring unreachable blocks when looking > > for uninitialized uses. To not ICE with -fdump-tree-all the > > early uninit pass needs a dumpfile since VN tries to dump statistics. > > > > For gimple-match.c at -O0 -g this causes a 2% increase in compile-time. > > > > In theory all early diagnostic passes could benefit from a VN run but > > that would require more refactoring that's not appropriate at this stage. > > This patch addresses a GCC 12 diagnostic regression and also happens to > > fix one XFAIL in gcc.dg/uninit-pr20644-O0.c > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk? > > > > Thanks, > > Richard. > > > > 2022-02-04 Richard Biener > > > > PR tree-optimization/104373 > > * tree-ssa-sccvn.h (do_rpo_vn): New export exposing the > > walk kind. > > * tree-ssa-sccvn.cc (do_rpo_vn): Export, get the default > > walk kind as argument. > > (run_rpo_vn): Adjust. > > (pass_fre::execute): Likewise. > > * tree-ssa-uninit.cc (warn_uninitialized_vars): Skip > > blocks not reachable. > > (execute_late_warn_uninitialized): Mark all edges as > > executable. > > (execute_early_warn_uninitialized): Use VN to compute > > executable edges. > > (pass_data_early_warn_uninitialized): Enable a dump file, > > change dump name to warn_uninit. > > > > * g++.dg/warn/Wuninitialized-32.C: New testcase. > > * gcc.dg/uninit-pr20644-O0.c: Remove XFAIL. > I'm conflicted on this ;-) > > I generally lean on the side of eliminating false positives in these > diagnostics. So identifying unreachable blocks and using that to prune the > set of warnings we generate, even at -O0 is good from that point of view. > > But doing something like this has many of the problems that relying on > optimizations does, even if we don't optimize away the unreachable code. > Right now the warning should be fairly stable at -O0 -- the set of diagnostics > you get isn't going to change a lot release to release which is important to > some users. Second, at -O0 whether or not you get a warning isn't a function > of how good our unreachable code analysis might be. > > This was quite a contentious topic many years ago. So much that I dropped > some work on Wuninit on the floor as I just couldn't take the arguing. So be > aware that you might be opening a can of worms. > > So the question comes down to a design decision. What's more important to > the end users? Fewer false positives or better stability in the warning? I > think the former, but there's certainly been a vocal group that prefers the > latter. I see - I didn't think of this aspect at all but that means I have no idea on whether it is important or not ... In our own setup we're running into "instabilities" with optimization when building packages that enable -Werror, so I can see shops doing dev builds at -O0 with warnings and -Werror but drop -Werror for optimized builds. > On the implementation side I have zero concerns. Looking further out, ISTM > we could mark the blocks as unreachable (rather than deducing it from edge > flags). That would make it pretty easy to mark those blocks relatively early > and allow us to suppress any middle end diagnostic occurring in an unreachable > block. So what I had in mind is that for the set of early diagnostic passes PUSH_INSERT_PASSES_WITHIN (pass_build_ssa_passes) NEXT_PASS (pass_fixup_cfg); NEXT_PASS (pass_build_ssa); NEXT_PASS (pass_warn_printf); NEXT_PASS (pass_warn_nonnull_compare); NEXT_PASS (pass_early_warn_uninitialized); NEXT_PASS (pass_warn_access, /*early=*/true); we'd run VN and keep it's lattice around (and not just the EDGE_EXECUTABLE flags). That would for example allow pass_warn_nonnull_compare to see that in void foo (void *p __attribute__((nonnull))) { void *q = p; if (q) bar (q); } we are comparing a never NULL pointer. Currently the q = p copy makes it not realize this. Likewise some constants can be propagated this way. Of course using the VN lattice means quite some changes in those passes. Even without the VN lattice having unreachable edges marked could improve diagnostics for, say PHI nodes, if only a single executable edge remains. Martin, do you have any thoughts here? Any opinion on the patch that for now just marks (not) executable edges to prune diagnostics at -O0? Thanks, Richard.
[PATCH] AutoFDO: Don't try to promote indirect calls that result in recursive direct calls
AutoFDO tries to promote and inline all indirect calls that were promoted and inlined in the original binary and that are still hot. In the included test case, the promotion results in a direct call that is a recursive call. inline_call and optimize_inline_calls can't handle recursive calls at this stage. Currently, inline_call fails with a segmentation fault. This change leaves the indirect call alone if promotion will result in a recursive call. Tested on x86_64-pc-linux-gnu. gcc/ChangeLog: * auto-profile.cc (afdo_indirect_call): Don't attempt to promote indirect calls that will result in direct recursive calls. gcc/testsuite/g++.dg/tree-prof/ChangeLog: * indir-call-recursive-inlining.C : New test. --- gcc/auto-profile.cc | 40 -- .../tree-prof/indir-call-recursive-inlining.C | 54 +++ 2 files changed, 78 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index c7cee639c85..2b34b80b82d 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -975,7 +975,7 @@ read_profile (void) * after annotation, we just need to mark, and let follow-up logic to decide if it needs to promote and inline. */ -static void +static bool afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, bool transform) { @@ -983,12 +983,12 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, tree callee; if (map.size () == 0) -return; +return false; gcall *stmt = dyn_cast (gs); if (!stmt || gimple_call_internal_p (stmt) || gimple_call_fndecl (stmt) != NULL_TREE) -return; +return false; gcov_type total = 0; icall_target_map::const_iterator max_iter = map.end (); @@ -1003,7 +1003,7 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, struct cgraph_node *direct_call = cgraph_node::get_for_asmname ( get_identifier (afdo_string_table->get_name (max_iter->first))); if (direct_call == NULL || !direct_call->profile_id) -return; +return false; callee = gimple_call_fn (stmt); @@ -1013,20 +1013,27 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, hist->hvalue.counters = XNEWVEC (gcov_type, hist->n_counters); gimple_add_histogram_value (cfun, stmt, hist); - // Total counter + /* Total counter */ hist->hvalue.counters[0] = total; - // Number of value/counter pairs + /* Number of value/counter pairs */ hist->hvalue.counters[1] = 1; - // Value + /* Value */ hist->hvalue.counters[2] = direct_call->profile_id; - // Counter + /* Counter */ hist->hvalue.counters[3] = max_iter->second; if (!transform) -return; +return false; + + cgraph_node* current_function_node = cgraph_node::get (current_function_decl); + + /* If the direct call is a recursive call, don't promote it since + we are not set up to inline recursive calls at this stage. */ + if (direct_call == current_function_node) +return false; struct cgraph_edge *indirect_edge - = cgraph_node::get (current_function_decl)->get_edge (stmt); + = current_function_node->get_edge (stmt); if (dump_file) { @@ -1040,13 +1047,13 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, { if (dump_file) fprintf (dump_file, " not transforming\n"); - return; + return false; } if (DECL_STRUCT_FUNCTION (direct_call->decl) == NULL) { if (dump_file) fprintf (dump_file, " no declaration\n"); - return; + return false; } if (dump_file) @@ -1063,16 +1070,17 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map, cgraph_edge::redirect_call_stmt_to_callee (new_edge); gimple_remove_histogram_value (cfun, stmt, hist); inline_call (new_edge, true, NULL, NULL, false); + return true; } /* From AutoFDO profiles, find values inside STMT for that we want to measure histograms and adds them to list VALUES. */ -static void +static bool afdo_vpt (gimple_stmt_iterator *gsi, const icall_target_map &map, bool transform) { - afdo_indirect_call (gsi, map, transform); + return afdo_indirect_call (gsi, map, transform); } typedef std::set bb_set; @@ -1498,8 +1506,8 @@ afdo_vpt_for_early_inline (stmt_set *promoted_stmts) { /* Promote the indirect call and update the promoted_stmts. */ promoted_stmts->insert (stmt); -afdo_vpt (&gsi, info.targets, true); -has_vpt = true; +if (afdo_vpt (&gsi, info.targets, true)) + has_vpt = true; } } } diff --git a/gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C b/gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C new file mode