date:20220208

[PATCH] dwarf2out: Don't call expand_expr during early_dwarf [PR104407]

2022-02-08 Thread Jakub Jelinek via Gcc-patches

Hi!

As mentioned in the PR, since PR96690 r11-2834 we call rtl_for_decl_init
which can call expand_expr already during early_dwarf.  The comment and PR
explains it that the intent is to ensure the referenced vars and functions
are properly mangled because free_lang_data doesn't cover everything, like
template parameters etc.  It doesn't work well though, because expand_expr
can set DECL_RTLs e.g. on referenced vars and keep them there, and they can
be created e.g. with different MEM_ALIGN compared to what they would be
created with if they were emitted later.
So, the following patch stops calling rtl_for_decl_init and instead
for cases for which rtl_for_decl_init does anything at all walks the
initializer and ensures referenced vars or functions are mangled.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-02-08  Jakub Jelinek  

PR debug/104407
* dwarf2out.cc (mangle_referenced_decls): New function.
(tree_add_const_value_attribute): Don't call rtl_for_decl_init if
early_dwarf.  Instead walk the initializer and try to mangle vars or
functions referenced from it.

* g++.dg/debug/dwarf2/pr104407.C: New test.

--- gcc/dwarf2out.cc.jj 2022-02-04 14:36:54.966605844 +0100
+++ gcc/dwarf2out.cc2022-02-07 12:28:13.000520666 +0100
@@ -20881,6 +20881,19 @@ add_location_or_const_value_attribute (d
   return tree_add_const_value_attribute_for_decl (die, decl);
 }
 
+/* Mangle referenced decls.  */
+static tree
+mangle_referenced_decls (tree *tp, int *walk_subtrees, void *)
+{
+  if (! EXPR_P (*tp) && ! CONSTANT_CLASS_P (*tp))
+*walk_subtrees = 0;
+
+  if (VAR_OR_FUNCTION_DECL_P (*tp))
+assign_assembler_name_if_needed (*tp);
+
+  return NULL_TREE;
+}
+
 /* Attach a DW_AT_const_value attribute to DIE. The value of the
attribute is the const value T.  */
 
@@ -20889,7 +20902,6 @@ tree_add_const_value_attribute (dw_die_r
 {
   tree init;
   tree type = TREE_TYPE (t);
-  rtx rtl;
 
   if (!t || !TREE_TYPE (t) || TREE_TYPE (t) == error_mark_node)
 return false;
@@ -20910,11 +20922,26 @@ tree_add_const_value_attribute (dw_die_r
  return true;
}
 }
-  /* Generate the RTL even if early_dwarf to force mangling of all refered to
- symbols.  */
-  rtl = rtl_for_decl_init (init, type);
-  if (rtl && !early_dwarf)
-return add_const_value_attribute (die, TYPE_MODE (type), rtl);
+  if (!early_dwarf)
+{
+  rtx rtl = rtl_for_decl_init (init, type);
+  if (rtl && !early_dwarf)
+   return add_const_value_attribute (die, TYPE_MODE (type), rtl);
+}
+  else
+{
+  /* For early_dwarf force mangling of all referenced symbols.  */
+  tree initializer = init;
+  STRIP_NOPS (initializer);
+  /* rtl_for_decl_init punts on other aggregates, and complex values.  */
+  if (AGGREGATE_TYPE_P (type)
+ || (TREE_CODE (initializer) == VIEW_CONVERT_EXPR
+ && AGGREGATE_TYPE_P (TREE_TYPE (TREE_OPERAND (initializer, 0
+ || TREE_CODE (type) == COMPLEX_TYPE)
+   ;
+  else if (initializer_constant_valid_p (initializer, type))
+   walk_tree (&initializer, mangle_referenced_decls, NULL, NULL);
+}
   /* If the host and target are sane, try harder.  */
   if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
   && initializer_constant_valid_p (init, type))
--- gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C.jj 2022-02-07 
10:45:51.945041031 +0100
+++ gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C2022-02-07 
10:45:09.834635340 +0100
@@ -0,0 +1,12 @@
+// PR debug/104407
+// { dg-do compile { target c++17 } }
+// { dg-options "-O1 -fcompare-debug" }
+
+struct A { int i; long j; int k : 2; char l; } a;
+
+auto [ aa, bb, cc, dd ] = a;
+
+namespace N
+{
+  auto & [ m, n, o, ppp ] = a;
+}

Jakub

Re: [PATCH 2/4] tree-optimization/104288 - Register side effects during ranger vrp domwalk.

2022-02-08 Thread Richard Biener via Gcc-patches

On Mon, Feb 7, 2022 at 3:32 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> This patch adds the ability to register side effects within a block to
> ranger. This is currently only for tracking non-null values.
>
> the rvrp_folder performs a DOM walk and then attempts to simplify and
> the fold each stmt during the walk.  This patch adds a call to
> register_side_effects() to record any effects the stmt might have before
> returning.
>
> This checks if the non-null property is set for any ssa-names on the
> statement, and if they are, moves the state to only non-null for the
> block...  This allows us to adjust the property within the block as we
> do the DOM walk.  All further queries within the block will then come
> back as non-null.   ALl other queries will be from outside the block, so
> they will see the same results anyway.
>
> Unfortunately, none of the non-null routines in gimple.cc appear to work
> "in bulk", but rather, on a specific tree-ssa operand request at a
> time.  For this patch, I need to scan the entire statement looking for
> any operand that is nonnull (or the performance impact is awful).

In tree.cc we have get_nonnull_args which you'd pass the
gimple_call_fntype, its expense is allocating a bitmap (if any arg is nonnull)
which you'd then need to iterate over and free.  I think that's an OK
overhead for not duplicating the walk?

> I took the code in the various infer_nonnull routines, and packed it all
> together into the two routines block_apply_nonnull &&
> infer_nonnull_call_attributes,  which basically perform the same
> functionality as  infer_nonnull_range_by_dereference and
> infer_nonnull_range_by_attribute, only on all the operands at once.  I
> think its right, but you may want to have a look.   I intend to leverage
> this code in GCC13  for a more generalized range_after_stmt mechanism
> that will do away with the immediate-use visits of the current non-null

+void
+non_null_ref::block_apply_nonnull (gimple *s)
+{
+  if (!flag_delete_null_pointer_checks)
+return;
+  if (is_a (s))
+return;
+  if (gimple_code (s) == GIMPLE_ASM || gimple_clobber_p (s))
+return;
+  if (is_a (s))
+{
+  infer_nonnull_call_attributes (as_a (s));
+  return;
+}
+  walk_stmt_load_store_ops (s, (void *)this, non_null_loadstore,
+   non_null_loadstore);

Both ASM and calls can have pointer dereferences embedded so
I wonder whether you should do the walk_stmt_load_store_ops
anyway.  Note that any exceptional control flow (EH or abnormal)
will not have realized the outputs on a stmt (LHS of call, outputs
of asms), so nonnull carries over only across fallthru or conditional
edges.

I suppose realizing the LHS could be postponed for GCC 13, for
calls you could still walk loads so you get the non-nullness of
aggregate arguments like foo (*p) after the stmt.

But you can of course do all this for GCC 13 only.

> This patch also has zero effect on generated code as none of the
> nonnull_deref_p() queries make use of the new granularity available
> yet.  It does set the table for the next patch. Performance impact of
> this is also marginal, about a 1% hit to EVRP/VRP.
>
> Bootstraps with no regressions.  OK for trunk?
>
> Andrew
>
>
>

[PATCH] c++: Remove superflous assert [PR104403]

2022-02-08 Thread Jakub Jelinek via Gcc-patches

Hi!

I've added the assert because start_decl diagnoses such vars for C++20 and
earlier:
  if (current_function_decl && VAR_P (decl)
  && DECL_DECLARED_CONSTEXPR_P (current_function_decl)
  && cxx_dialect < cxx23)
but as can be seen, we cam trigger the assert in older standards e.g. during
non-manifestly constant evaluation.  Rather than refining the assert that
DECL_EXPRs for such vars don't appear for C++20 and older if they are inside
of functions declared constexpr this patch just removes the assert, the
code rejects encountering those vars in constant expressions anyway.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-02-08  Jakub Jelinek  

PR c++/104403
* constexpr.cc (cxx_eval_constant_expression): Don't assert DECL_EXPRs
of TREE_STATIC vars may only appear in -std=c++23.

* g++.dg/cpp0x/lambda/lambda-104403.C: New test.

--- gcc/cp/constexpr.cc.jj  2022-02-06 11:16:22.353814431 +0100
+++ gcc/cp/constexpr.cc 2022-02-07 11:38:59.131165171 +0100
@@ -6652,7 +6652,6 @@ cxx_eval_constant_expression (const cons
/* Allow __FUNCTION__ etc.  */
&& !DECL_ARTIFICIAL (r))
  {
-   gcc_assert (cxx_dialect >= cxx23);
if (!ctx->quiet)
  {
if (CP_DECL_THREAD_LOCAL_P (r))
--- gcc/testsuite/g++.dg/cpp0x/lambda/lambda-104403.C.jj2022-02-07 
11:40:32.760847269 +0100
+++ gcc/testsuite/g++.dg/cpp0x/lambda/lambda-104403.C   2022-02-07 
11:40:06.109223079 +0100
@@ -0,0 +1,8 @@
+// PR c++/104403
+// { dg-do compile { target c++11 } }
+
+int
+main ()
+{
+  []{ switch (0) { case 0: static int value; return &value; } };
+}

Jakub

[PATCH] c++: Don't emit repeated -Wshadow warnings for templates/ctors [PR104379]

2022-02-08 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch suppresses extraneous -Wshadow warnings.
On the testcase without the patch we emit 14 -Wshadow warnings,
with the patch just 4.  It is enough to warn once e.g. during parsing of the
template or the abstract ctor, while previously we'd warn also on the clones
of the ctors and on instantiation.
In GCC 8 and earlier we didn't warn because check_local_shadow did
  /* Inline decls shadow nothing.  */
  if (DECL_FROM_INLINE (decl))
return;

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-02-08  Jakub Jelinek  

PR c++/104379
* name-lookup.cc (check_local_shadow): When diagnosing shadowing
of a member or global declaration, add warning suppression for
the decl and don't warn again on it.

* g++.dg/warn/Wshadow-18.C: New test.

--- gcc/cp/name-lookup.cc.jj2022-01-18 11:58:59.0 +0100
+++ gcc/cp/name-lookup.cc   2022-02-07 15:00:19.781820426 +0100
@@ -3296,18 +3296,22 @@ check_local_shadow (tree decl)
 
/* Warn if a variable shadows a non-function, or the variable
   is a function or a pointer-to-function.  */
-   if (!OVL_P (member)
-   || TREE_CODE (decl) == FUNCTION_DECL
-   || (TREE_TYPE (decl)
-   && (TYPE_PTRFN_P (TREE_TYPE (decl))
-   || TYPE_PTRMEMFUNC_P (TREE_TYPE (decl)
+   if ((!OVL_P (member)
+|| TREE_CODE (decl) == FUNCTION_DECL
+|| (TREE_TYPE (decl)
+&& (TYPE_PTRFN_P (TREE_TYPE (decl))
+|| TYPE_PTRMEMFUNC_P (TREE_TYPE (decl)
+   && !warning_suppressed_p (decl, OPT_Wshadow))
  {
auto_diagnostic_group d;
if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wshadow,
"declaration of %qD shadows a member of %qT",
decl, current_nonlambda_class_type ())
&& DECL_P (member))
- inform_shadowed (member);
+ {
+   inform_shadowed (member);
+   suppress_warning (decl, OPT_Wshadow);
+ }
  }
return;
   }
@@ -3319,14 +3323,18 @@ check_local_shadow (tree decl)
  || (TREE_CODE (old) == TYPE_DECL
  && (!DECL_ARTIFICIAL (old)
  || TREE_CODE (decl) == TYPE_DECL)))
-  && !instantiating_current_function_p ())
+  && !instantiating_current_function_p ()
+  && !warning_suppressed_p (decl, OPT_Wshadow))
 /* XXX shadow warnings in outer-more namespaces */
 {
   auto_diagnostic_group d;
   if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wshadow,
  "declaration of %qD shadows a global declaration",
  decl))
-   inform_shadowed (old);
+   {
+ inform_shadowed (old);
+ suppress_warning (decl, OPT_Wshadow);
+   }
   return;
 }
 
--- gcc/testsuite/g++.dg/warn/Wshadow-18.C.jj   2022-02-07 15:01:56.509456240 
+0100
+++ gcc/testsuite/g++.dg/warn/Wshadow-18.C  2022-02-07 15:03:34.951067880 
+0100
@@ -0,0 +1,22 @@
+// PR c++/104379
+// { dg-do compile }
+// { dg-options "-Wshadow" }
+
+int x;
+
+template
+struct S
+{
+  int i;
+  S(int i) { (void) i; }   // { dg-warning "declaration of 
'i' shadows a member of 'S'" }
+  S(float x) { (void) x; } // { dg-warning "declaration of 
'x' shadows a global declaration" }
+  S(int *p) { int a = 1; (void) p; (void) a;
+ { int a = 2; (void) a; } }// { dg-warning "declaration of 
'a' shadows a previous local" }
+};
+
+S i(1);
+S j(1);
+S k(1.0f);
+S l(1.0f);
+S m(&x);
+S n(&x);

Jakub

Re: [PATCH 3/4] tree-optimization/104288 - Update non-null interface to provide better precision.

2022-02-08 Thread Richard Biener via Gcc-patches

On Mon, Feb 7, 2022 at 3:33 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> This changes the non-null interface to provide more granularity and
> facilitate more precise queries.
>
> Previously, we could only ask if NAME was nonnull anywhere in the block,
> and it applied to the entire block.  now we can ask:
>bool always_nonnull_p (tree name, basic_block bb);// all uses are
> nonnull in the block
>bool contains_nonnull_p (tree name, basic_block bb); // there is a
> nonnull use somewhere in the block.

How is the latter useful?  To me it seems that

  bool nonnull_at_p (tree name, gimple *stmt);
// 'name' is nonnull at the execution of STMT

would be (or nonnull_from_p with starting with STMT), nonnull_from_p could
be overloaded with basic_block even.

I see the patch uses contains_nonnull_p for range_on_exit.  In the previous
patch I commented that for

  :
 *p = foo (*q);  // fallthru + EH

on the fallthru edge we know p and q are not NULL but on the EH edge
we only know q is not NULL.  [with -fnon-call-exceptions we have a
representational issue since the *q load might trap but we cannot currenty
separate that EH on GIMPLE]

So given that wouldn't we instead need range_on_edge () instead of
range_on_exit?  At least contains_nonnull_p cannot be used for
range_on_exit in case we derive nonnull from *p?

Btw, for -fnon-call-exceptions all ranges derived from the possibly throwing
stmt are not realized on the exceptional path.  Like for

 :
_1 = _3 / _5;  // fallthru + EH

on the fallthru edge we have _5 ~[0,0] but on the EH edge we do not
(in fact there we have _5 equals to [0,0] in this specific case).

Short-term it might be neccessary to not derive ranges from a
stmt that throws internally but it would be nice to sort those issues
out?

> Ranger was using this query for on-entry and range_of_expr within
> block.  Range-on-entry was simply wrong.. it should have been asking if
> non-null was true in the dominator of this block.   And this PR
> demonstrate the issue with range_of_expr.
>
> This patch changes all the places which checked the state of non-null.
> An Audit of every use now:
>
> ranger_cache::exit_range - Applies non-null if contains_nonnull (bb) is true
> ranger_cache::range_of_expr -  Applies non-null only if always_nonnull (bb).
> ranger_cache::fill_block_cache - marks a block to have on-entry
> calculated if predecessor contains_nonnull.
> ranger_cache::range_from_dom - Applies non-null if contains_nonnull() is
> true in a visited dominator block.
>
> gimple_ranger::range_of_expr - Applies non-null only if always_nonnull
> in the block.
> gimple_ranger::range_on_entry - Checks if any dominator block
> contains_nonnull.
> gimple_range::range_on_exit - Calls one of the above 2 routines, then
> checks if contains_nonull in this block
>
> gimple_range_path has 2 uses in range_defined_in_block() and  and
> adjust_for_non_null_uses.  Aldy audited both, and as the values are used
> at the end of the path only, so both are correct in using
> contains_nonull.  So there should be no impact to threading from this.
>
> This patch allows us to properly optimize:
>
> void h(int result) __attribute__((noipa));
> void h(int result)
> {
>  if (result)
>  __builtin_exit(0);
> }
>
> void foo (void *p) __attribute__((nonnull(1)));
>
> void bar (void *p)
> {
>h (p == 0);
>foo (p);
>if (!p)
>  __builtin_abort ();
> }
>
> to
>
>_1 = p_3(D) == 0B;
>_2 = (int) _1;
>h (_2);
>foo (p_3(D));
>return;
>
>
>  From a risk perspective, I don't think this patch can make us more
> incorrect than we were before.
>
> Oh, and compilation speed. All the patches combined with checking a
> little less than 1% regression overall in EVRP/VRP.  The audit exposed
> that the threader was invoking an unnecessary DOM lookup version of the
> query, and as a result, we see a 1.3% reduction in the time spent in the
> threader.  the overall compile time impact turned out to be a slight
> improvement of 0.02% overall in 380 GCC files (according to valgrind)
> so its basically all good.
>
> Bootstraps with no regressions.  OK for trunk?
>
> Andrew

Re: [PATCH 4/4] [GCC11] tree-optimization/104288 - range on entry should check dominators for non-null.

2022-02-08 Thread Richard Biener via Gcc-patches

On Mon, Feb 7, 2022 at 3:34 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> The patches resolves the issue for GCC11 in a much simpler way.
>
> By default, ranger and EVRP are running in hybrid mode. This means if
> ranger misses something, evrp will pick up the slack. This enables us to
> change the 2 places which check for non-null to ignore potentially
> incorrect block-wide information and only query dominator blocks for
> on-entry ranges.  This allows ranger to be conservative, and EVRP will
> pick up the intra-block changes.
>
> Bootstraps with no regressions.  OK for gcc 11 branch?

OK.

Thanks,
Richard.

>
>

Re: [PATCH 5/4] tree-optimization/104288 - An alternative approach

2022-02-08 Thread Richard Biener via Gcc-patches

On Tue, Feb 8, 2022 at 2:33 AM Andrew MacLeod via Gcc-patches
 wrote:
>
> On 2/7/22 09:29, Andrew MacLeod wrote:
> > I have a proposal for PR 104288.
> >
> > ALL patches (in sequence) bootstrap on their own and each cause no
> > regressions.
>
> I've been continuing to work with this towards the GCC13 solution for
> statement side effects.  And There is another possibility we could
> pursue which is less intrusive
>
> I adapted the portions of patch 2/4 which process nonnull, but changes
> it from being in the nonnull class to being in the cache.
>
> THe main difference is it continues to use a single bit, just changing
> all uses of it to *always* mean its non-null on exit to the block.
>
> Range-on-entry is changed to only check dominator blocks, and
> range_of_expr doesn't check the bit at all.. in both ranger and the cache.
>
> When we are doing the block walk and process side effects, the on-entry
> *cache* range for the block is adjusted to be non-null instead.   The
> statements which precede this stmt have already been processed, and all
> remaining statements in this block will now pick up this new non-value
> from the on-entry cache.  This should be perfectly safe.
>
> The final tweak is when range_of_expr is looking the def block, it
> normally does not have an on entry cache value.  (the def is in the
> block, so there is no entry cache value).  It now checks to see if we
> have set one, which can only happen when we have been doing the
> statement walk and set the on-entry cache with  a non-null value.  This
> allows us to pick up the non-null range in the def block too... (once we
> have passed a statement which set nonnull of course).
>
> THis patch provides much less change to trunk, and is probably a better
> approach as its closer to what is going to happen in GCC13.
>
> Im still working with it, so the patch isnt fully refined yet... but it
> bootstraps and passes all test with no regressions.  And passes the new
> tests too.   I figured I would mention this before you look to hard at
> the other patchset.the GCC11 patch doesn't change.
>
> Let me know if you prefer this I think I do :-)  less churn, and
> closer to end state.

Yes, I very much prefer this - some comments to the other patches
still apply to this one.  Like using get_nonnull_args and probably
adding a bail-out to computing ranges from stmts that can throw
internally or have abnormal control flow (to not get into range-on-exit
vs. normal vs. exceptional or abnormal edges).

Richard.

>
> Andrew
>
>
>
>
>

Re: [PATCH] dwarf2out: Don't call expand_expr during early_dwarf [PR104407]

2022-02-08 Thread Richard Biener via Gcc-patches

On Tue, 8 Feb 2022, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, since PR96690 r11-2834 we call rtl_for_decl_init
> which can call expand_expr already during early_dwarf.  The comment and PR
> explains it that the intent is to ensure the referenced vars and functions
> are properly mangled because free_lang_data doesn't cover everything, like
> template parameters etc.  It doesn't work well though, because expand_expr

free_lang_data mangling also only runs when generating LTO bytecode...

> can set DECL_RTLs e.g. on referenced vars and keep them there, and they can
> be created e.g. with different MEM_ALIGN compared to what they would be
> created with if they were emitted later.
> So, the following patch stops calling rtl_for_decl_init and instead
> for cases for which rtl_for_decl_init does anything at all walks the
> initializer and ensures referenced vars or functions are mangled.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2022-02-08  Jakub Jelinek  
> 
>   PR debug/104407
>   * dwarf2out.cc (mangle_referenced_decls): New function.
>   (tree_add_const_value_attribute): Don't call rtl_for_decl_init if
>   early_dwarf.  Instead walk the initializer and try to mangle vars or
>   functions referenced from it.
> 
>   * g++.dg/debug/dwarf2/pr104407.C: New test.
> 
> --- gcc/dwarf2out.cc.jj   2022-02-04 14:36:54.966605844 +0100
> +++ gcc/dwarf2out.cc  2022-02-07 12:28:13.000520666 +0100
> @@ -20881,6 +20881,19 @@ add_location_or_const_value_attribute (d
>return tree_add_const_value_attribute_for_decl (die, decl);
>  }
>  
> +/* Mangle referenced decls.  */
> +static tree
> +mangle_referenced_decls (tree *tp, int *walk_subtrees, void *)
> +{
> +  if (! EXPR_P (*tp) && ! CONSTANT_CLASS_P (*tp))
> +*walk_subtrees = 0;
> +
> +  if (VAR_OR_FUNCTION_DECL_P (*tp))
> +assign_assembler_name_if_needed (*tp);
> +
> +  return NULL_TREE;
> +}
> +
>  /* Attach a DW_AT_const_value attribute to DIE. The value of the
> attribute is the const value T.  */
>  
> @@ -20889,7 +20902,6 @@ tree_add_const_value_attribute (dw_die_r
>  {
>tree init;
>tree type = TREE_TYPE (t);
> -  rtx rtl;
>  
>if (!t || !TREE_TYPE (t) || TREE_TYPE (t) == error_mark_node)
>  return false;
> @@ -20910,11 +20922,26 @@ tree_add_const_value_attribute (dw_die_r
> return true;
>   }
>  }
> -  /* Generate the RTL even if early_dwarf to force mangling of all refered to
> - symbols.  */
> -  rtl = rtl_for_decl_init (init, type);
> -  if (rtl && !early_dwarf)
> -return add_const_value_attribute (die, TYPE_MODE (type), rtl);
> +  if (!early_dwarf)
> +{
> +  rtx rtl = rtl_for_decl_init (init, type);
> +  if (rtl && !early_dwarf)

the !early_dwarf is redundant

Otherwise looks good to me, please give Jason the opportunity to comment
though.

Thanks,
Richard.

> + return add_const_value_attribute (die, TYPE_MODE (type), rtl);
> +}
> +  else
> +{
> +  /* For early_dwarf force mangling of all referenced symbols.  */
> +  tree initializer = init;
> +  STRIP_NOPS (initializer);
> +  /* rtl_for_decl_init punts on other aggregates, and complex values.  */
> +  if (AGGREGATE_TYPE_P (type)
> +   || (TREE_CODE (initializer) == VIEW_CONVERT_EXPR
> +   && AGGREGATE_TYPE_P (TREE_TYPE (TREE_OPERAND (initializer, 0
> +   || TREE_CODE (type) == COMPLEX_TYPE)
> + ;
> +  else if (initializer_constant_valid_p (initializer, type))
> + walk_tree (&initializer, mangle_referenced_decls, NULL, NULL);
> +}
>/* If the host and target are sane, try harder.  */
>if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
>&& initializer_constant_valid_p (init, type))
> --- gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C.jj   2022-02-07 
> 10:45:51.945041031 +0100
> +++ gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C  2022-02-07 
> 10:45:09.834635340 +0100
> @@ -0,0 +1,12 @@
> +// PR debug/104407
> +// { dg-do compile { target c++17 } }
> +// { dg-options "-O1 -fcompare-debug" }
> +
> +struct A { int i; long j; int k : 2; char l; } a;
> +
> +auto [ aa, bb, cc, dd ] = a;
> +
> +namespace N
> +{
> +  auto & [ m, n, o, ppp ] = a;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

[committed] libgomp: Fix segfault with posthumous orphan tasks [PR104385]

2022-02-08 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch fixes crashes with posthumous orphan tasks.
When a parent task finishes, gomp_clear_parent clears the parent
pointers of its children tasks present in the parent->children_queue.
But children that are still waiting for dependencies aren't in that
queue yet, they will be added there only when the sibling they are
waiting for exits.  Unfortunately we were adding those tasks into
the queues with the original task->parent which then causes crashes
because that task is gone and freed.  The following patch fixes that
by clearing the parent field when we schedule such task for running
by adding it into the queues and we know that the sibling task which
is about to finish has NULL parent.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to
trunk so far, queued for backporting.

2022-02-08  Jakub Jelinek  

PR libgomp/104385
* task.c (gomp_task_run_post_handle_dependers): If parent is NULL,
clear task->parent.
* testsuite/libgomp.c/pr104385.c: New test.

--- libgomp/task.c.jj   2022-01-11 23:11:23.914268738 +0100
+++ libgomp/task.c  2022-02-07 18:33:14.399083055 +0100
@@ -1248,6 +1248,8 @@ gomp_task_run_post_handle_dependers (str
}
}
}
+  else
+   task->parent = NULL;
   if (taskgroup)
{
  priority_queue_insert (PQ_TASKGROUP, &taskgroup->taskgroup_queue,
--- libgomp/testsuite/libgomp.c/pr104385.c.jj   2022-02-07 18:45:36.677690756 
+0100
+++ libgomp/testsuite/libgomp.c/pr104385.c  2022-02-07 18:45:31.005770166 
+0100
@@ -0,0 +1,26 @@
+/* PR libgomp/104385 */
+
+#include 
+
+int
+main ()
+{
+  int j = 0;
+  #pragma omp parallel shared(j) num_threads(2)
+  {
+#pragma omp barrier
+#pragma omp master
+#pragma omp task shared(j)
+{
+  #pragma omp task depend(out: j) shared(j)
+  {
+usleep (1);
+j = 1;
+  }
+
+  #pragma omp task depend(inout: j) shared(j)
+  j += 1;
+}
+  }
+  return j - 2;
+}

Jakub

Re: [PATCH] [vect] Add vect_recog_cond_expr_convert_pattern.

2022-02-08 Thread Richard Biener via Gcc-patches

On Mon, Jan 24, 2022 at 2:01 PM liuhongt via Gcc-patches
 wrote:
>
> The pattern converts (cond (cmp a b) (convert c) (convert d))
> to (convert (cond (cmp a b) c d)) when
> 1) types_match (c, d)
> 2) single_use for (convert c) and (convert d)
> 3) TYPE_PRECISION (TREE_TYPE (c)) == TYPE_PRECISION (TREE_TYPE (a))
> 4) INTEGERAL_TYPE_P (TREE_TYPE (c))
>
> The pattern can save packing of mask and data(partial for data, 2 vs
> 1).
>
> Bootstrapped and regtested for x86_64-pc-linux-gnu{-m32,}
> and x86_64-pc-linux-gnu{-m32\ -march=native,\ -march=native} on CLX.
> Ok for trunk?

Sorry for the slow response, see for some comments below

> gcc/ChangeLog:
>
> PR target/103771
> * match.pd (cond_expr_convert_p): New match.
> * tree-vect-patterns.cc (gimple_cond_expr_convert_p): Declare.
> (vect_recog_cond_expr_convert_pattern): New.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr103771-2.c: New test.
> ---
>  gcc/match.pd   |  8 +++
>  gcc/testsuite/gcc.target/i386/pr103771-2.c |  8 +++
>  gcc/tree-vect-patterns.cc  | 83 ++
>  3 files changed, 99 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103771-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index c68eed70a26..5808c4561ee 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7647,3 +7647,11 @@ and,
> to the number of trailing zeroes.  */
>  (match (ctz_table_index @1 @2 @3)
>(rshift (mult (bit_and:c (negate @1) @1) INTEGER_CST@2) INTEGER_CST@3))
> +
> +(match (cond_expr_convert_p @0 @2 @3 @6)
> + (cond (simple_comparison@6 @0 @1) (convert@4 @2) (convert@5 @3))
> +  (if (types_match (TREE_TYPE (@2), TREE_TYPE (@3))

But in principle @2 or @3 could safely differ in sign, you'd then need to ensure
to insert sign conversions to @2/@3 to the signedness of @4/@5.

> +   && INTEGRAL_TYPE_P (type)
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@2))
> +   && single_use (@4)
> +   && single_use (@5
> diff --git a/gcc/testsuite/gcc.target/i386/pr103771-2.c 
> b/gcc/testsuite/gcc.target/i386/pr103771-2.c
> new file mode 100644
> index 000..962a3a74ecf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103771-2.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=cascadelake -O3" } */
> +/* { dg-final { scan-assembler-not "kunpck" } } */
> +/* { dg-final { scan-assembler-not "kand" } } */
> +/* { dg-final { scan-assembler-not "kor" } } */
> +/* { dg-final { scan-assembler-not "kshift" } } */
> +
> +#include "pr103771.c"
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index bea04992160..cbdfa96789c 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -924,6 +924,88 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>return true;
>  }
>
> +/* match.pd function to match
> +   (cond (cmp@3 a b) (convert@1 c) (convert@2 d))
> +   with conditions:
> +   1) there's single_use for both @1 and @2.
> +   2) c and d have same type.
> +   record a and c and d and @3.  */
> +
> +extern bool gimple_cond_expr_convert_p (tree, tree*, tree (*)(tree));
> +
> +/* Function vect_recog_cond_expr_convert
> +
> +   Try to find the following pattern:
> +
> +   TYPE1 A, B;
> +   TYPE2 C,D;
> +   TYPE3 E;
> +   TYPE3 op_true = (TYPE3)A;
> +   TYPE4 op_false = (TYPE3)B;
> +
> +   E = C cmp D ? op_true : op_false;
> +
> +   where
> +   TYPE_PRECISION (TYPE1) != TYPE_PRECISION (TYPE3);

you are not testing for this anywhere?

Btw, matching up the comments with the types is somewhat difficult,
maybe using TYPE_AB, TYPE_CD, TYPE_E instead of 1,2,3 will
make that easier ;)

> +   TYPE_PRECISION (TYPE1) == TYPE_PRECISION (TYPE2);
> +   single_use of op_true and op_false.
> +
> +   Input:
> +
> +   * STMT_VINFO: The stmt from which the pattern search begins.
> +   here it starts with E = c cmp D ? op_true : op_false;
> +
> +   Output:
> +
> +   TYPE1 E' = C cmp D ? A : B;
> +   TYPE3 E = (TYPE3) E';
> +
> +   * TYPE_OUT: The vector type of the output of this pattern.
> +
> +   * Return value: A new stmt that will be used to replace the sequence of
> +   stmts that constitute the pattern. In this case it will be:
> +   E = (TYPE3)E';
> +   E' = C cmp D ? A : B; is recorded in pattern definition statements;  */
> +
> +static gimple *
> +vect_recog_cond_expr_convert_pattern (vec_info *vinfo,
> + stmt_vec_info stmt_vinfo, tree 
> *type_out)
> +{
> +  gassign *last_stmt = dyn_cast  (stmt_vinfo->stmt);
> +  tree lhs, match[4], temp, type, new_lhs;
> +  gimple *cond_stmt;
> +  gimple *pattern_stmt;
> +
> +  if (!last_stmt)
> +return NULL;
> +
> +  lhs = gimple_assign_lhs (last_stmt);
> +
> +  /* Find E = C cmp D ? (TYPE3) A ? (TYPE3) B;
> + TYPE_PRECISION (A) == TYPE_PRECISION (C).  */
> +  if (!gimple_cond_expr_convert_p (lhs, &match[0], NULL)
> +  || (TYPE_PRECISION (TREE_TYPE (match[0]))
> + != TYPE_PRECISION (TREE_TYP

[committed][nvptx] Fix .local atomic regressions

2022-02-08 Thread Tom de Vries via Gcc-patches

Hi,

In PR target/104364, two problems were reported:
- in muniform-simt mode, an atom.cas insn is no longer executed in the
  "master lane" only.
- in msoft-stack mode, an __atomic_compare_exchange_n on stack memory is
  translated assuming it accesses local memory, while that's not the case.

Fix these by:
- ensuring that all insns with atomic attribute are also predicable, such
  that the validate_change in nvptx_reorg_uniform_simt will succeed, and
  asserting that it does, and
- guarding the local atomics implementation with a new function
  nvptx_mem_local_p that correctly handles msoft-stack.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix .local atomic regressions

gcc/ChangeLog:

2022-02-04  Tom de Vries  

PR target/104364
* config/nvptx/nvptx-protos.h (nvptx_mem_local_p): Declare.
* config/nvptx/nvptx.cc (nvptx_reorg_uniform_simt): Assert that
change is validated.
(nvptx_mem_local_p): New function.
* config/nvptx/nvptx.md: Use nvptx_mem_local_p.
(define_c_enum "unspecv"): Add UNSPECV_CAS_LOCAL.
(define_insn "atomic_compare_and_swap_1_local"): New
non-atomic, non-predicable define_insn, factored out of ...
(define_insn "atomic_compare_and_swap_1"): ... here.
Make predicable again.
(define_expand "atomic_compare_and_swap"): Use
atomic_compare_and_swap_1_local.

gcc/testsuite/ChangeLog:

2022-02-04  Tom de Vries  

PR target/104364
* gcc.target/nvptx/softstack-2.c: New test.
* gcc.target/nvptx/uniform-simt-1.c: New test.

---
 gcc/config/nvptx/nvptx-protos.h |  1 +
 gcc/config/nvptx/nvptx.cc   | 25 +-
 gcc/config/nvptx/nvptx.md   | 63 +
 gcc/testsuite/gcc.target/nvptx/softstack-2.c| 11 +
 gcc/testsuite/gcc.target/nvptx/uniform-simt-1.c | 18 +++
 5 files changed, 87 insertions(+), 31 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 3d6ad148cb4..a846e341917 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -59,5 +59,6 @@ extern const char *nvptx_output_simt_enter (rtx, rtx, rtx);
 extern const char *nvptx_output_simt_exit (rtx);
 extern const char *nvptx_output_red_partition (rtx, rtx);
 extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int);
+extern bool nvptx_mem_local_p (rtx);
 #endif
 #endif
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index b3bb97c3c14..2a694926b7a 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -3150,7 +3150,8 @@ nvptx_reorg_uniform_simt ()
   rtx pred = nvptx_get_unisimt_predicate ();
   pred = gen_rtx_NE (BImode, pred, const0_rtx);
   pat = gen_rtx_COND_EXEC (VOIDmode, pred, pat);
-  validate_change (insn, &PATTERN (insn), pat, false);
+  bool changed_p = validate_change (insn, &PATTERN (insn), pat, false);
+  gcc_assert (changed_p);
 }
 }
 
@@ -6894,6 +6895,28 @@ nvptx_libc_has_function (enum function_class fn_class, 
tree type)
   return default_libc_has_function (fn_class, type);
 }
 
+bool
+nvptx_mem_local_p (rtx mem)
+{
+  gcc_assert (GET_CODE (mem) == MEM);
+
+  struct address_info info;
+  decompose_mem_address (&info, mem);
+
+  if (info.base != NULL && REG_P (*info.base)
+  && REGNO_PTR_FRAME_P (REGNO (*info.base)))
+{
+  if (TARGET_SOFT_STACK)
+   {
+ /* Frame-related doesn't mean local.  */
+   }
+  else
+   return true;
+}
+
+  return false;
+}
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
 
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 92768dd9e95..d64dbfd8b33 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -54,6 +54,7 @@ (define_c_enum "unspec" [
 (define_c_enum "unspecv" [
UNSPECV_LOCK
UNSPECV_CAS
+   UNSPECV_CAS_LOCAL
UNSPECV_XCHG
UNSPECV_BARSYNC
UNSPECV_WARPSYNC
@@ -1771,8 +1772,14 @@ (define_expand "atomic_compare_and_swap"
(match_operand:SI 7 "const_int_operand")]   ;; failure model
   ""
 {
-  emit_insn (gen_atomic_compare_and_swap_1
-(operands[1], operands[2], operands[3], operands[4], operands[6]));
+  if (nvptx_mem_local_p (operands[2]))
+emit_insn (gen_atomic_compare_and_swap_1_local
+   (operands[1], operands[2], operands[3], operands[4],
+operands[6]));
+  else
+emit_insn (gen_atomic_compare_and_swap_1
+   (operands[1], operands[2], operands[3], operands[4],
+operands[6]));
 
   rtx cond = gen_reg_rtx (BImode);
   emit_move_insn (cond, gen_rtx_EQ (BImode, operands[1], operands[3]));
@@ -1780,23 +1787,18 @@ (define_expand "atomic_compare_and_swap"
   DONE;
 })
 
-(define_insn "atomic_compare_and_swap_1"
+(define_insn "atomic_compare_and_swap_1_local"
   [(set (match_operand:SD

[committed][testsuite] Require c99_runtime to run builtin-sprintf.c

2022-02-08 Thread Tom de Vries via Gcc-patches

Hi,

On nvptx, I run into an execution failure in test-case
gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh'
modifier.

The port uses newlib, which does by default not support that modifier.
There's a configure option --enable-newlib-io-c99-formats to enable this
support, but that requires alloca support, which nvptx doesn't have.

Fix this by requiring c99_runtime for running the test-case.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[testsuite] Require c99_runtime to run builtin-sprintf.c

gcc/testsuite/ChangeLog:

2022-02-07  Tom de Vries  

* gcc.dg/tree-ssa/builtin-sprintf.c: Require c99_runtime for
dg-do run.

---
 gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c
index f90558e9b7e..9368a2e0e50 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c
@@ -3,7 +3,8 @@
constant folding.  With optimization enabled the test will fail to
link if any of the assertions fails.  Without optimization the test
aborts at runtime if any of the assertions fails.  */
-/* { dg-do run } */
+/* { dg-do run { target c99_runtime } } */
+/* { dg-do link { target { ! c99_runtime } } } */
 /* { dg-skip-if "not IEEE float layout" { "pdp11-*-*" } } */
 /* { dg-additional-options "-O2 -Wall -Wno-pedantic -fprintf-return-value" } */

[committed][testsuite] Require c99_runtime to run builtin-sprintf.c

2022-02-08 Thread Tom de Vries via Gcc-patches

Hi,

On nvptx, I run into an execution failure in test-case
gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh'
modifier.

The port uses newlib, which does by default not support that modifier.
There's a configure option --enable-newlib-io-c99-formats to enable this
support, but that requires alloca support, which nvptx doesn't have.

Fix this by requiring c99_runtime for running the test-case.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[testsuite] Require c99_runtime to run builtin-sprintf.c

gcc/testsuite/ChangeLog:

2022-02-07  Tom de Vries  

* gcc.dg/tree-ssa/builtin-sprintf.c: Require c99_runtime for
dg-do run.

---
 gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c
index f90558e9b7e..9368a2e0e50 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf.c
@@ -3,7 +3,8 @@
constant folding.  With optimization enabled the test will fail to
link if any of the assertions fails.  Without optimization the test
aborts at runtime if any of the assertions fails.  */
-/* { dg-do run } */
+/* { dg-do run { target c99_runtime } } */
+/* { dg-do link { target { ! c99_runtime } } } */
 /* { dg-skip-if "not IEEE float layout" { "pdp11-*-*" } } */
 /* { dg-additional-options "-O2 -Wall -Wno-pedantic -fprintf-return-value" } */

Re: [PATCH] doc: invoke: RISC-V: Clean up the -mstrict-align wording

2022-02-08 Thread Kito Cheng via Gcc-patches

LGTM, thanks :)

On Tue, Feb 8, 2022 at 12:30 PM Palmer Dabbelt  wrote:
>
> The polarity of do/do not was reversed for this option when compored to
> the rest of them.  This seems to have been copied from PowerPC, when the
> polarity of the arguments in the docs was reversed (presumably to match
> the default), but appears to have never made sense on RISC-V.
>
> gcc/ChangeLog
>
> * doc/invoke.texi (RISC-V -mstrict-align): Re-word the do/do not
> language.
>
> Signed-off-by: Palmer Dabbelt 
> ---
>  gcc/doc/invoke.texi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 0ebe538ccdc..5e8af05e359 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -27702,7 +27702,7 @@ integer load/stores only.
>  @item -mstrict-align
>  @itemx -mno-strict-align
>  @opindex mstrict-align
> -Do not or do generate unaligned memory accesses.  The default is set 
> depending
> +Do or do not generate unaligned memory accesses.  The default is set 
> depending
>  on whether the processor we are optimizing for supports fast unaligned access
>  or not.
>
> --
> 2.34.1
>

Re: [PATCH] doc: invoke: RISC-V: Clean up the -memit-attribute wording

2022-02-08 Thread Kito Cheng via Gcc-patches

Hi Palmer:

Seems like...you update the wrong part? you are updating
-mmcount-ra-address/-mno-mcount-ra-address rather than
-memit-attribute/ -mno-emit-attribute ?

On Tue, Feb 8, 2022 at 12:30 PM Palmer Dabbelt  wrote:
>
> The previous wording makes it sound like "do not emit" is a
> clarification of "emit", which is not accurate.  This cleans up the
> wording to match (most) of the other arguments.
>
> gcc/ChangeLog
>
> * doc/invoke.texi (RISC-V -memit-attribute): Re-word the do/do
> not language.
>
> Signed-off-by: Palmer Dabbelt 
> ---
>  gcc/doc/invoke.texi | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 8bd5293fce7..0ebe538ccdc 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -26294,12 +26294,11 @@ assembler and the linker alone without help from 
> the compiler.
>  @itemx -mno-mcount-ra-address
>  @opindex mmcount-ra-address
>  @opindex mno-mcount-ra-address
> -Emit (do not emit) code that allows @code{_mcount} to modify the
> -calling function's return address.  When enabled, this option extends
> -the usual @code{_mcount} interface with a new @var{ra-address}
> -parameter, which has type @code{intptr_t *} and is passed in register
> -@code{$12}.  @code{_mcount} can then modify the return address by
> -doing both of the following:
> +Do or do not emit code that allows @code{_mcount} to modify the calling
> +function's return address.  When enabled, this option extends the usual
> +@code{_mcount} interface with a new @var{ra-address} parameter, which has 
> type
> +@code{intptr_t *} and is passed in register @code{$12}.  @code{_mcount} can
> +then modify the return address by doing both of the following:
>  @itemize
>  @item
>  Returning the new address in register @code{$31}.
> --
> 2.34.1
>

Re: [PATCH] doc: invoke: RISC-V: Clean up the -mstrict-align wording

2022-02-08 Thread Andreas Schwab

On Feb 07 2022, Palmer Dabbelt wrote:

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 0ebe538ccdc..5e8af05e359 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -27702,7 +27702,7 @@ integer load/stores only.
>  @item -mstrict-align
>  @itemx -mno-strict-align
>  @opindex mstrict-align
> -Do not or do generate unaligned memory accesses.  The default is set 
> depending

I think the logic is that -mstrict-align is "do not" and
-mno-strict-align is "do".

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-08 Thread Eric Botcazou via Gcc-patches

> I don't think an option to go to pre-12 behavior is useful.  I'll
> postpone the series to stage1.

FWIW fine with me.

-- 
Eric Botcazou

Re: [Patch, fortran] PR37336 (Finalization) - [F03] Finish derived-type finalization

2022-02-08 Thread Paul Richard Thomas via Gcc-patches

Hi Harald,

Thanks for giving the patch a whirl.


> the parent components as an array. I strongly suspect that, from reading
> > 7.5.6.2 paragraphs 2 and 3 closely, that ifort has it right. However,
> this
> > is another issue to come back to in the future.
>
> Could you specify which version of Intel you tried?
>

ifort (IFORT) 2021.1 Beta 20201112

>
> Testcase finalize_38.f90 fails for me with ifort 2021.5.0 with:
>
> 131
>

That's the point where the interpretation of the standard diverges. Ifort
uses the scalar finalization for the parent component, whereas gfortran
uses the rank 1. Thus the final count is different by one. I have a version
of the patch, where gfortran behaves in the same way as ifort.


> This test also fails with crayftn 11 & 12 and nagfor 7.0,
> but in a different place.
>



>
> (Also finalize_45.f90 fails with that version with something that
> looks like memory corruption, but that might be just a compiler bug.)
>

I take it 'that version' is of ifort? Mine does the same. I suspect that it
is one of the perils of using pointer components in such circumstances! You
will notice that I had to nullify pointer components when doing the copy.

>
> Thanks,
> Harald
>

Could you use the attached version of finalize_38.f90 with crayftn and NAG?
All the stop statements are replaced with prints. Ifort gives:
 131   3   2
 132   0   4
 133   5   6 |   0   0
 141   4   3
 151   7   5
 152   3   0
 153   0   0 |   1   3
 161  13   9
 162  20   0
 163   0   0 |  10  20
 171  14  11

Best regards

Paul
! { dg-do run }
!
! Test finalization on intrinsic assignment (F2018 (7.5.6.3))
!
module testmode
  implicit none

  type :: simple
integer :: ind
  contains
final :: destructor1, destructor2
  end type simple

  type, extends(simple) :: complicated
real :: rind
  contains
final :: destructor3, destructor4
  end type complicated

  integer :: check_scalar
  integer :: check_array(4)
  real :: check_real
  real :: check_rarray(4)
  integer :: final_count = 0

contains

  subroutine destructor1(self)
type(simple), intent(inout) :: self
check_scalar = self%ind
check_array = 0
final_count = final_count + 1
  end subroutine destructor1

  subroutine destructor2(self)
type(simple), intent(inout) :: self(:)
check_scalar = 0
check_array(1:size(self, 1)) = self%ind
final_count = final_count + 1
  end subroutine destructor2

  subroutine destructor3(self)
type(complicated), intent(inout) :: self
check_real = self%rind
check_array = 0.0
final_count = final_count + 1
  end subroutine destructor3

  subroutine destructor4(self)
type(complicated), intent(inout) :: self(:)
check_real = 0.0
check_rarray(1:size(self, 1)) = self%rind
final_count = final_count + 1
  end subroutine destructor4

  function constructor1(ind) result(res)
type(simple), allocatable :: res
integer, intent(in) :: ind
allocate (res, source = simple (ind))
  end function constructor1

  function constructor2(ind, rind) result(res)
class(simple), allocatable :: res(:)
integer, intent(in) :: ind(:)
real, intent(in), optional :: rind(:)
type(complicated), allocatable :: src(:)
integer :: sz
integer :: i
if (present (rind)) then
  sz = min (size (ind, 1), size (rind, 1))
  src  = [(complicated (ind(i), rind(i)), i = 1, sz)]
  allocate (res, source = src)
else
  sz = size (ind, 1)
  allocate (res, source = [(simple (ind(i)), i = 1, sz)])
end if
  end function constructor2

  subroutine test (cnt, scalar, array, off, rind, rarray)
integer :: cnt
integer :: scalar
integer :: array(:)
integer :: off
real, optional :: rind
real, optional :: rarray(:)
if (final_count .ne. cnt)  print *, 1 + off, final_count, cnt
if (check_scalar .ne. scalar) print *, 2 + off, check_scalar, scalar
if (any (check_array(1:size (array, 1)) .ne. array)) print *,  3 + off, &
   check_array(1:size (array, 1)), "|", array
if (present (rind)) then
  if (check_real .ne. rind)  print *,  4+off, check_real, rind
end if
if (present (rarray)) then
  if (any (check_rarray(1:size (rarray, 1)) .ne. rarray)) print *,  5 + off, &
   check_rarray(1:size (rarray, 1)), "|", rarray
end if
  end subroutine test

end module testmode

program test_final
  use testmode
  implicit none

  type(simple), allocatable :: MyType, MyType2
  type(simple), allocatable :: MyTypeArray(:)
  type(simple) :: ThyType = simple(21), ThyType2 = simple(22)
  class(simple), allocatable :: MyClass
  class(simple), allocatable :: MyCl

Re: [PATCH] RISC-V: Add target machine headers as a dependency for riscv-sr.o

2022-02-08 Thread Maciej W. Rozycki

On Mon, 7 Feb 2022, Kito Cheng wrote:

> OK to trunk, thanks for fixing this issue, I hit that issue before but
> I didn't figure out what happened...since that issue will disappear
> when I clean build :p

 Committed to trunk now, thanks for you review.

 How about release branches?

  Maciej

Re: [PATCH v2] doc: RISC-V: Document the `-misa-spec=' option

2022-02-08 Thread Maciej W. Rozycki

On Mon, 7 Feb 2022, Palmer Dabbelt wrote:

> > >  Good point.  I have updated the text to your suggested wording, which is
> > > also what I would use if I were to propose it (modulo capitalisation).  I
> > > will commit the change as included here shortly then unless I hear an
> > > objection.
> 
> I wasn't sure about the capitalization either, but that's how it is on the
> website.  It's in a regular-looking sentence, not a title where I'd expect the
> extra capital, so I guess the whole spec name is a proper noun?

 That has been my understanding as well.

> I don't really care either way, just chiming in as I don't see this on trunk.
> The original message sounded to me like you intended on committing it, but
> just in case I misunderstood and you're waiting for me

 I wasn't waiting for you to ack, I just gave people at least one full 
working day to chime in, as not everyone works 24/7.

> Reviewed-by: Palmer Dabbelt 
> Acked-by: Palmer Dabbelt 

 I think we're not using such tags (maybe we should), so I haven't added 
them to the commit message at this time.

 I have committed this change now.  Thank you and Kito for your reviews.

  Maciej

Re: [PATCH] RISC-V/testsuite: Run target testing over all the usual optimization levels

2022-02-08 Thread Maciej W. Rozycki

On Tue, 8 Feb 2022, Kito Cheng wrote:

> Thanks for doing this, OK to trunk.

 I didn't realise we still have a separate ChangeLog for the testsuite and 
I have mechanically updated the entry accordingly.

 Change applied now with that update, thanks for your review.

  Maciej

[PATCH v4][GCC13] RISC-V: Provide `fmin'/`fmax' RTL patterns

2022-02-08 Thread Maciej W. Rozycki

As at r2.2 of the RISC-V ISA specification[1] (equivalent to version 2.0 
of the "F" and "D" standard architecture extensions for single-precision 
and double-precision floating-point respectively) the FMIN and FMAX 
machine instructions fully match our requirement for the `fminM3' and 
`fmaxM3' standard RTL patterns:

"For FMIN and FMAX, if at least one input is a signaling NaN, or if both 
inputs are quiet NaNs, the result is the canonical NaN.  If one operand 
is a quiet NaN and the other is not a NaN, the result is the non-NaN 
operand."

suitably for the IEEE 754-2008 `minNum' and `maxNum' operations.

However we only define `sminM3' and `smaxM3' standard RTL patterns to 
produce the FMIN and FMAX machine instructions, which in turn causes the 
`__builtin_fmin' and `__builtin_fmax' family of intrinsics to emit the 
corresponding libcalls rather than the relevant machine instructions.  
This is according to earlier revisions of the RISC-V ISA specification, 
which we however do not support anymore, as from commit 4b81528241ca 
("RISC-V: Support version controling for ISA standard extensions").

As from r20190608 of the RISC-V ISA specification (equivalent to version 
2.2 of the "F" and "D" standard ISA extensions for single-precision and 
double-precision floating-point respectively) the definition of the FMIN 
and FMAX machine instructions has been updated[2]:

"Defined the signed-zero behavior of FMIN.fmt and FMAX.fmt, and changed 
their behavior on signaling-NaN inputs to conform to the minimumNumber 
and maximumNumber operations in the proposed IEEE 754-201x 
specification."

and specifically[3]:

"Floating-point minimum-number and maximum-number instructions FMIN.S 
and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to 
rd.  For the purposes of these instructions only, the value -0.0 is 
considered to be less than the value +0.0.  If both inputs are NaNs, the 
result is the canonical NaN.  If only one operand is a NaN, the result 
is the non-NaN operand.  Signaling NaN inputs set the invalid operation 
exception flag, even when the result is not NaN."

Consequently for forwards compatibility with r20190608+ hardware we 
cannot use the FMIN and FMAX machine instructions unconditionally even 
where the ISA level of r2.2 has been specified with the `-misa-spec=2.2' 
option where operation would be different between ISA revisions, that 
is the handling of signaling NaN inputs.

Therefore provide new `fmin3' and `fmax3' patterns removing 
the need to emit libcalls with the `__builtin_fmin' and `__builtin_fmax' 
family of intrinsics, however limit them to where `-fno-signaling-nans' 
is in effect, deferring to other code generation strategies otherwise as 
applicable.  Use newly-defined UNSPECs as the operation codes so that 
the patterns are only ever used if referred to by their names, as there
is no RTL expression defined for the IEEE 754-2008 `minNum' and `maxNum' 
operations.

References:

[1] "The RISC-V Instruction Set Manual, Volume I: User-Level ISA",
Document Version 2.2, May 7, 2017, Section 8.3 "NaN Generation and 
Propagation", p. 48

[1] "The RISC-V Instruction Set Manual, Volume I: Unprivileged ISA",
Document Version 20190608-Base-Ratified, June 8, 2019, "Preface",
p. ii

[2] same, Section 11.6 "Single-Precision Floating-Point Computational
Instructions", p. 66

gcc/
* config/riscv/riscv.md (UNSPEC_FMIN, UNSPEC_FMAX): New
constants.
(fmin3, fmax3): New insns.

gcc/testsuite/
* gcc.target/riscv/fmax-snan.c: New test.
* gcc.target/riscv/fmax.c: New test.
* gcc.target/riscv/fmaxf-snan.c: New test.
* gcc.target/riscv/fmaxf.c: New test.
* gcc.target/riscv/fmin-snan.c: New test.
* gcc.target/riscv/fmin.c: New test.
* gcc.target/riscv/fminf-snan.c: New test.
* gcc.target/riscv/fminf.c: New test.
* gcc.target/riscv/smax-ieee.c: New test.
* gcc.target/riscv/smax.c: New test.
* gcc.target/riscv/smaxf-ieee.c: New test.
* gcc.target/riscv/smaxf.c: New test.
* gcc.target/riscv/smin-ieee.c: New test.
* gcc.target/riscv/smin.c: New test.
* gcc.target/riscv/sminf-ieee.c: New test.
* gcc.target/riscv/sminf.c: New test.
---
Hi,

 For the record here's v4 of this change as I observed an issue with the 
quotation of literal `.' in `scan-assembler' regexps.  As TCL strips one 
level of quotations before handing the string to the regexp interpreter 
any backslash characters need to be doubled to be received by the regexp 
parser.  The updated test cases have been verified to still pass.

  Maciej

Changes from v3:

- Quote `\' characters in `scan-assembler' for literal `.' in regexp.

- Mechanically update ChangeLog to have its entries separately for the 
  testsuite.

Changes from v2:

- Use UNSPECs for the `fmin'/`fmax' patterns, so that they cannot be 
  matched by the RTL expression.

- Revert the `HONOR

[committed][nvptx] Choose -mptx default based on -misa

2022-02-08 Thread Tom de Vries via Gcc-patches

Hi,

While testing with driver version 390.147 I ran into the problem that it
doesn't support ptx isa version 6.3 (the new default), only 6.1.

Furthermore, using the -mptx option is a bit user-unfriendly.

Say we want to compile for sm_80.  We can use -misa=sm_80 to specify that, but
then run into errors because the default ptx version is 6.3, which doesn't
support sm_80 yet.

Address both these issues by:
- picking a default -mptx based on the active -misa, and
- ensuring that the default -mptx is at least 6.0 (instead
  of 6.3).

Also add an error in case of incompatible options like
"-misa=sm_80 -mptx=6.3":
...
cc1: error: PTX version (-mptx) needs to be at least 7.0 to support \
  selected -misa (sm_80)
...

Tested on x86_64-linux with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Choose -mptx default based on -misa

gcc/ChangeLog:

2022-02-08  Tom de Vries  

PR target/104283
* config/nvptx/nvptx-opts.h (enum ptx_version): Add PTX_VERSION_3_0
and PTX_VERSION_4_2.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm)
(default_ptx_version_option, ptx_version_to_string)
(sm_version_to_string, handle_ptx_version_option): New function.
(nvptx_option_override): Call handle_ptx_version_option.
(nvptx_file_start): Use ptx_version_to_string and sm_version_to_string.
* config/nvptx/nvptx.md (define_insn "nvptx_shuffle")
(define_insn "nvptx_vote_ballot"): Use TARGET_PTX_6_0.
* config/nvptx/nvptx.opt (mptx): Remove 'Init'.

---
 gcc/config/nvptx/nvptx-opts.h |   2 +
 gcc/config/nvptx/nvptx.cc | 133 +-
 gcc/config/nvptx/nvptx.md |   4 +-
 gcc/config/nvptx/nvptx.opt|   2 +-
 4 files changed, 122 insertions(+), 19 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index c754a5193ce..cc488b23720 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -31,7 +31,9 @@ enum ptx_isa
 
 enum ptx_version
 {
+  PTX_VERSION_3_0,
   PTX_VERSION_3_1,
+  PTX_VERSION_4_2,
   PTX_VERSION_6_0,
   PTX_VERSION_6_3,
   PTX_VERSION_7_0
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 006fac8c839..1b0227a2c31 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -205,6 +205,109 @@ diagnose_openacc_conflict (bool optval, const char 
*optname)
 error ("option %s is not supported together with %<-fopenacc%>", optname);
 }
 
+static enum ptx_version
+first_ptx_version_supporting_sm (enum ptx_isa sm)
+{
+  switch (sm)
+{
+case PTX_ISA_SM30:
+  return PTX_VERSION_3_0;
+case PTX_ISA_SM35:
+  return PTX_VERSION_3_1;
+case PTX_ISA_SM53:
+  return PTX_VERSION_4_2;
+case PTX_ISA_SM75:
+  return PTX_VERSION_6_3;
+case PTX_ISA_SM80:
+  return PTX_VERSION_7_0;
+default:
+  gcc_unreachable ();
+}
+}
+
+static enum ptx_version
+default_ptx_version_option (void)
+{
+  enum ptx_version first
+= first_ptx_version_supporting_sm ((enum ptx_isa) ptx_isa_option);
+
+  /* Pick a version that supports the sm.  */
+  enum ptx_version res = first;
+
+  /* Pick at least 3.1.  This has been the smallest version historically.  */
+  res = MAX (res, PTX_VERSION_3_1);
+
+  /* Pick at least 6.0, to enable using bar.warp.sync to have a way to force
+ warp convergence.  */
+  res = MAX (res, PTX_VERSION_6_0);
+
+  /* Verify that we pick a version that supports the sm.  */
+  gcc_assert (first <= res);
+  return res;
+}
+
+static const char *
+ptx_version_to_string (enum ptx_version v)
+{
+  switch (v)
+{
+case PTX_VERSION_3_0:
+  return "3.0";
+case PTX_VERSION_3_1:
+  return "3.1";
+case PTX_VERSION_4_2:
+  return "4.2";
+case PTX_VERSION_6_0:
+  return "6.0";
+case PTX_VERSION_6_3:
+  return "6.3";
+case PTX_VERSION_7_0:
+  return "7.0";
+default:
+  gcc_unreachable ();
+}
+}
+
+static const char *
+sm_version_to_string (enum ptx_isa sm)
+{
+  switch (sm)
+{
+case PTX_ISA_SM30:
+  return "30";
+case PTX_ISA_SM35:
+  return "35";
+case PTX_ISA_SM53:
+  return "53";
+case PTX_ISA_SM70:
+  return "70";
+case PTX_ISA_SM75:
+  return "75";
+case PTX_ISA_SM80:
+  return "80";
+default:
+  gcc_unreachable ();
+}
+}
+
+static void
+handle_ptx_version_option (void)
+{
+  if (!OPTION_SET_P (ptx_version_option))
+{
+  ptx_version_option = default_ptx_version_option ();
+  return;
+}
+
+  enum ptx_version first
+= first_ptx_version_supporting_sm ((enum ptx_isa) ptx_isa_option);
+
+  if (ptx_version_option < first)
+error ("PTX version (-mptx) needs to be at least %s to support selected"
+  " -misa (sm_%s)", ptx_version_to_string (first),
+  sm_version_to_string ((enum ptx_isa)ptx_isa_option));
+}
+
 /* Implement TARGET_OPTION_OVERRIDE.  */
 
 static void
@@ -212,6 +315,8 @@ nvptx_opt

[PATCH] nvptx: Tweak constraints on copysign instructions.

2022-02-08 Thread Roger Sayle


Many thanks to Thomas Schwinge for confirming my hypothesis that the
register
usage regression, PR target/104345, is solely due to libgcc's _muldc3
function.
In addition to the isinf functionality in the previously proposed nvptx
patch at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588453.html which
significantly reduces the number of instructions in _muldc3, the patch below
further reduces both the number of instructions and the number of explicitly
declared registers, by permitting floating point constant immediate operands
in nvptx's copysign instruction.

Fingers-crossed, the combination with all of the previous proposed nvptx
patches improves things.  Ultimately, increasing register usage from 50 to
51 registers, reducing the number of concurrent threads by ~2%, can easily
be countered if we're now executing significantly fewer instructions in each
kernel, for a net performance win.

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
with a "make" and "make -k check" with no new failures.  Ok for mainline?


2022-02-08  Roger Sayle  

gcc/ChangeLog
* config/nvptx/nvptx.md (copysign3): Allow immediate
floating point constants as operands 1 and/or 2.

Thanks in advance,
Roger
--

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 92768dd..0f34792 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1059,8 +1059,8 @@
 
 (define_insn "copysign3"
   [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
-   (unspec:SDFM [(match_operand:SDFM 1 "nvptx_register_operand" "R")
- (match_operand:SDFM 2 "nvptx_register_operand" "R")]
+   (unspec:SDFM [(match_operand:SDFM 1 "nvptx_nonmemory_operand" "RF")
+ (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")]
  UNSPEC_COPYSIGN))]
   ""
   "%.\\tcopysign%t0\\t%0, %2, %1;")

Re: [committed][nvptx] Choose -mptx default based on -misa

2022-02-08 Thread Tom de Vries via Gcc-patches


On 2/8/22 13:57, Tom de Vries via Gcc-patches wrote:

+static const char *
+sm_version_to_string (enum ptx_isa sm)
+{
+  switch (sm)
+{
+case PTX_ISA_SM30:
+  return "30";
+case PTX_ISA_SM35:
+  return "35";
+case PTX_ISA_SM53:
+  return "53";
+case PTX_ISA_SM70:
+  return "70";


This broke the nvptx build, PTX_ISA_SM70 is not defined yet, that was 
done in another patch, that I didn't commit yet.


I'm doing a rebuild with a patch to fix this, and will commit afterwards.

Sorry for the disturbance.

Thanks,
- Tom

Re: [committed][nvptx] Choose -mptx default based on -misa

2022-02-08 Thread Tobias Burnus


Hi Tom,

if I understand the patch correctly, -misa=sm_53 -mptx=3.1 will ...

On 08.02.22 13:57, Tom de Vries via Gcc-patches wrote:

Furthermore, using the -mptx option is a bit user-unfriendly.

Say we want to compile for sm_80.  We can use -misa=sm_80 to specify that, but
then run into errors because the default ptx version is 6.3, which doesn't
support sm_80 yet.
...
Also add an error in case of incompatible options like
"-misa=sm_80 -mptx=6.3":
...
cc1: error: PTX version (-mptx) needs to be at least 7.0 to support \
   selected -misa (sm_80)


... output that -mptx needs to be at least 4.1.

I think that's okay but -mptx only supports the values 3.1, 6.3, and 7.0.

As the default is ptx = 6.3, it only occurs when both are specified in
the way shown above. Thus, we can live with that. (Misleading message
for odd corner case, only. In particular, I am not sure we really want
to add another PTX version...)

(Which points to the question of supporting multilib configurations – as
discussed before:
https://gcc.gnu.org/pipermail/gcc/2022-February/238224.html )

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [committed][nvptx] Choose -mptx default based on -misa

2022-02-08 Thread Tom de Vries via Gcc-patches


On 2/8/22 14:24, Tobias Burnus wrote:

Hi Tom,

if I understand the patch correctly, -misa=sm_53 -mptx=3.1 will ...

On 08.02.22 13:57, Tom de Vries via Gcc-patches wrote:

Furthermore, using the -mptx option is a bit user-unfriendly.

Say we want to compile for sm_80.  We can use -misa=sm_80 to specify 
that, but
then run into errors because the default ptx version is 6.3, which 
doesn't

support sm_80 yet.
...
Also add an error in case of incompatible options like
"-misa=sm_80 -mptx=6.3":
...
cc1: error: PTX version (-mptx) needs to be at least 7.0 to support \
   selected -misa (sm_80)


... output that -mptx needs to be at least 4.1.


Um, you mean 4.2 I suppose:
...
$ ./gcc.sh ~/hello.c -misa=sm_53 -mptx=3.1
cc1: error: PTX version (-mptx) needs to be at least 4.2 to support 
selected -misa (sm_53)

...
?


I think that's okay but -mptx only supports the values 3.1, 6.3, and 7.0.



I know.  I'm sort of hoping that the new default setting will make using 
-mptx unnecessary.



As the default is ptx = 6.3,


Well, that's no longer the case, that's one of the things that this 
patch changes.  It's by default the maximum of:

- 6.0, and
- the minimal ptx isa required by the sm version


it only occurs when both are specified in
the way shown above. Thus, we can live with that. (Misleading message
for odd corner case, only. In particular, I am not sure we really want
to add another PTX version...)



Agreed, it's misleading, but I'm hoping people will just specify the sm 
version.


Thanks,
- Tom

[committed] libstdc++: Fix filesystem::remove_all for Windows [PR104161]

2022-02-08 Thread Jonathan Wakely via Gcc-patches

On Sat, 5 Feb 2022 at 01:08, Jonathan Wakely wrote:
>
> On Fri, 4 Feb 2022 at 23:55, Jonathan Wakely wrote:
> > +// Used to implement filesystem::remove_all.
> > +fs::recursive_directory_iterator&
> > +fs::recursive_directory_iterator::__erase(error_code* ecptr)
> > +{
> > +  error_code ec;
> > +  if (!_M_dirs)
> > +{
> > +  ec = std::make_error_code(errc::invalid_argument);
> > +  return *this;
> > +}
> > +
> > +  // We never want to skip permission denied when removing files.
> > +  const bool skip_permission_denied = false;
> > +  // We never want to follow directory symlinks when removing files.
> > +  const bool nofollow = true;
> > +
> > +  // Loop until we find something we can remove.
> > +  while (!ec)
> > +{
> > +  auto& top = _M_dirs->top();
> > +
> > +  if (top.entry._M_type == file_type::directory)
> > +   {
> > + _Dir dir = top.open_subdir(skip_permission_denied, nofollow, ec);
> > + if (!ec)
> > +   {
> > + __glibcxx_assert(dir.dirp != nullptr);
> > + if (dir.advance(skip_permission_denied, ec))
> > +   {
> > + // Non-empty directory, recurse into it.
> > + _M_dirs->push(std::move(dir));
> > + continue;
> > +   }
> > + if (!ec)
> > +   {
> > + // Directory is empty so we can remove it.
> > + if (top.rmdir(ec))
> > +   break; // Success
> > +   }
> > +   }
> > +   }
> > +  else if (top.unlink(ec))
> > +   break; // Success
> > +  else if (top.entry._M_type == file_type::none)
> > +   {
> > + // We did not have a cached type, so it's possible that top.entry
> > + // is actually a directory, and that's why the unlink above 
> > failed.
> > +#ifdef EPERM
> > + // POSIX.1-2017 says unlinking a directory returns EPERM,
> > + // but LSB allows EISDIR too. Some targets don't even define 
> > EPERM.
> > + if (ec.value() == EPERM || ec.value() == EISDIR)
> > +#else
> > + if (ec.value() == EISDIR)
> > +#endif
>
> This doesn't work on Windows because the top.unlink(ec) sets a Windows
> error using the system category, so doesn't match the errno values
> here.
>
> I have a fix.
>
> >  std::uintmax_t
> >  fs::remove_all(const path& p)
> >  {
> > -  return fs::do_remove_all(p, ErrorReporter{"cannot remove all", p});
> > +  uintmax_t count = 0;
> > +  auto st = filesystem::status(p);
> > +  if (!exists(st))
> > +return 0;
> > +  if (is_directory(st))
>
> Gah, this remove_all(const path&) overload was supposed to be using
> the same logic as the one below with an error_code parameter.
>
> I'll fix it on Monday.

Here's that fix.
Tested x86_64-linux, powerpc-aix, x86_64-w64-mingw.
Pushed to trunk.
commit 5750952bec1e632d1f804f4a1bed2f74c0f3b189
Author: Jonathan Wakely 
Date:   Mon Feb 7 23:36:47 2022

libstdc++: Fix filesystem::remove_all for Windows [PR104161]

The recursive_directory_iterator::__erase member was failing for
Windows, because the entry._M_type value is always file_type::none
(because _Dir_base::advance doesn't populate it for Windows) and
top.unlink uses fs::remove which sets an error using the
system_category. That meant that ec.value() was a Windows error code and
not an errno value, so the comparisons to EPERM and EISDIR failed.
Instead of depending on a specific Windows error code for attempting to
remove a directory, just use directory_entry::refresh() to query the
type first. This doesn't avoid the TOCTTOU races with directory
symlinks, but we can't avoid them on Windows without openat and
unlinkat, and creating symlinks requires admin privs on Windows anyway.

This also fixes the fs::remove_all(const path&) overload, which was
supposed to use the same logic as the other overload, but I forgot to
change it before my previous commit.

libstdc++-v3/ChangeLog:

PR libstdc++/104161
* src/c++17/fs_dir.cc (fs::recursive_directory_iterator::__erase):
[i_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Refresh entry._M_type member,
instead of checking for errno values indicating a directory.
* src/c++17/fs_ops.cc (fs::remove_all(const path&)): Use similar
logic to non-throwing overload.
(fs::remove_all(const path&, error_code&)): Add comments.
* src/filesystem/ops-common.h: Likewise.

diff --git a/libstdc++-v3/src/c++17/fs_dir.cc b/libstdc++-v3/src/c++17/fs_dir.cc
index 01b8c0d5693..54f135d2baf 100644
--- a/libstdc++-v3/src/c++17/fs_dir.cc
+++ b/libstdc++-v3/src/c++17/fs_dir.cc
@@ -476,6 +476,16 @@ fs::recursive_directory_iterator::__erase(error_code* 
ecptr)
 {
   auto& top = _M_dirs->top();

+#if _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  // _Dir::unlink uses fs::remove which uses std::system_category() for
+  // Windo

[committed] libstdc++: Adjust Filesystem TS test for Windows

2022-02-08 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux and x86_64-w64-mingw, pushed to trunk.


The Filesystem TS isn't really supported for Windows, but the FAIL for
this test is just because it doesn't match what happens on Windows.

libstdc++-v3/ChangeLog:

* testsuite/experimental/filesystem/operations/create_directories.cc:
Adjust expected results for Windows.
---
 .../filesystem/operations/create_directories.cc   | 15 +++
 1 file changed, 15 insertions(+)

diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc
 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc
index 8cdc7030441..03060c6cbb3 100644
--- 
a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_directories.cc
@@ -108,8 +108,15 @@ test02()
 VERIFY( !result );
 VERIFY( ec == std::errc::not_a_directory );
 result = create_directories(file.path/"../bar", ec);
+#if defined(__MINGW32__) || defined(__MINGW64__)
+VERIFY( result );
+VERIFY( !ec );
+VERIFY( is_directory(dir.path/"bar") );
+remove(dir.path/"bar");
+#else
 VERIFY( !result );
 VERIFY( ec );
+#endif
   }
 }
 
@@ -120,11 +127,19 @@ test03()
   const auto p = __gnu_test::nonexistent_path() / "/";
   bool result = create_directories(p);
   VERIFY( result );
+#if defined(__MINGW32__) || defined(__MINGW64__)
+  VERIFY( exists(p/".") ); // needed due to PR libstdc++/1
+#else
   VERIFY( exists(p) );
+#endif
   remove(p);
   result = create_directories(p/"foo/");
   VERIFY( result );
+#if defined(__MINGW32__) || defined(__MINGW64__)
+  VERIFY( exists(p/".") ); // needed due to PR libstdc++/1
+#else
   VERIFY( exists(p) );
+#endif
   VERIFY( exists(p/"foo") );
   remove_all(p);
 }
-- 
2.34.1

Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-08 Thread Michael Matz via Gcc-patches

Hello,

On Tue, 8 Feb 2022, Richard Biener wrote:

> int foo(int always_true_at_runtime)
> {
>   {
> int *p;
> if (always_true_at_runtime)
>   goto after;
> lab:
> return *p;
> after:
> int i = 0;
> p = &i;
> if (always_true_at_runtime)
>   goto lab;
>   }
>   return 0;
> }
> 
> For the implementation I considered starting lifetime at a DECL_EXPR
> if it exists and otherwise at the start of the BIND_EXPR.  Note the
> complication is that control flow has to reach the lifetime-start
> clobber before the first access.  But that might also save us here,
> since for the example above 'i' would be live via the backedge
> from goto lab.
> 
> That said, the existing complication is that when the gimplifier
> visits the BIND_EXPR it has no way to know whether there will be
> a following DECL_EXPR or not (and the gimplifier notes there are
> cases a DECL_EXPR variable is not in a BIND_EXPR).  My plan is to
> compute this during one of the body walks the gimplifier performs
> before gimplifying.
> 
> Another complication is that at least the C frontend + gimplifier
> combination for
> 
>   switch (i)
>   {
>   case 1:
> int i;
> break;
>   }
> 
> will end up having the BIND_EXPR mentioning 'i' containing the
> case label, so just placing the birth at the BIND will make it
> unreachable as
> 
>   i = BIRTH;
> case_label_for_1:
>   ...

I think anything that needs to happen (conceptually) during the jump from 
switch to case-label needs to happen right before the jump, that might 
mean creating a new fake BLOCK that contains just the jump and the 
associated births.

> conveniently at least the C frontend has a DECL_EXPR for 'i'
> which avoids this and I did not find a way (yet) in the gimplifier
> to rectify this when gimplifying switches.

In C a 'case' label is nothing else than a normal label, it doesn't start 
it's own block or the like.  So also for that one the births would need to 
be at the start of their containing blocks.

> So there's still work to do but I think that starting the lifetime at a 
> DECL_EXPR if it exists is the way to go?

Depends on where the DECL_EXPR is exactly placed, but probably it wouldn't 
be okay.  You can't place the birth at the fall-through path, because this 
needs to work (basically your jump example above rewritten as switch):

int state = 2, *p, camefrom1 = 0;
for (;;) switch (state) {
  case 1: 
  case 2: ;
int i;
if (state != 1) { p = &i; i = 0; }
if (state == 1) { (*p)++; return *p; }
state = 1;
continue;
}

Note how i is initialized during state 2, and needs to be kept initialized 
during state 1, so there must not be a CLOBBER (birth or other) at the 
point of the declaration of 'i'.  AFAICS in my simple tests a DECL_EXPR 
for 'i' is placed with the statement associated with 'case 2' label, 
putting a CLOBBER there would be the wrong thing.  If the decl had an 
initializer it might be harmless, as it would be overwritten at that 
place, but even so, in this case there is no initializer.  Hmm.

Another complication arises from simple forward jumps:

  goto forw;
  {
int i;
printf("not reachable\n");
  forw:
i = 1;
  }

Despite the jump skiping the unreachable head of the BLOCK, 'i' needs to 
be considered birthed at the label.  (In a way the placement of births 
exactly mirrors the placements of deaths (of course), every path from 
outside a BLOCK to inside needs to birth-clobber all variables (in C), 
like every path leaving needs to kill them.  It's just that we have a 
convenient construct for the latter (try-finally), but not for the former)

Ciao,
Michael.

[committed] libstdc++: Add comment to acinclude.m4

2022-02-08 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.


libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ENABLE_LOCK_POLICY): Add comment about
checking for CAS on correct word size.
---
 libstdc++-v3/acinclude.m4 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 7cc52f4db96..f53461c85a5 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -3609,6 +3609,8 @@ AC_DEFUN([GLIBCXX_ENABLE_LOCK_POLICY], [
 ac_save_CXXFLAGS="$CXXFLAGS"
 
 dnl Why do we care about 2-byte CAS on targets with 4-byte _Atomic_word?!
+dnl Why don't we check 8-byte CAS for sparc64, where _Atomic_word is long?!
+dnl New targets should only check for CAS for the _Atomic_word type.
 AC_TRY_COMPILE([
 #if ! defined __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2
 # error "No 2-byte compare-and-swap"
-- 
2.34.1

Re: [PATCH] s390: Change costs for load on condition.

2022-02-08 Thread Robin Dapp via Gcc-patches

> Patch is ok. Thanks!

As discussed off-list, committed as attached.

Regards
 Robin
commit 1e3185e714e877b2b4d14ade0865322f71a8cbf6
Author: Robin Dapp 
Date:   Tue Feb 8 14:56:29 2022 +0100

s390: Increase costs for load on condition and change movqicc expander.

This patch changes the costs for a load on condition from 5 to 6 in
order to ensure that we only if-convert two and not three or more SETS like

if (cond)
{
  a = b;
  c = d;
  e = f;
}

In the movqicc expander we emit a paradoxical subreg directly that
combine would otherwise try to create by using a non-optimal sequence
(which would be too expensive).

Also, fix two oversights in ifcvt testcases.

gcc/ChangeLog:

* config/s390/s390.cc (s390_rtx_costs): Increase costs for load
on condition.
* config/s390/s390.md: Use paradoxical subreg.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ifcvt-two-insns-int.c: Fix array size.
* gcc.target/s390/ifcvt-two-insns-long.c: Dito.

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index c6cfe41ad7b..d2af6d8813d 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -3636,7 +3636,7 @@ s390_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
 	/* It is going to be a load/store on condition.  Make it
 	   slightly more expensive than a normal load.  */
-	*total = COSTS_N_INSNS (1) + 1;
+	*total = COSTS_N_INSNS (1) + 2;
 
 	rtx dst = SET_DEST (x);
 	rtx then = XEXP (SET_SRC (x), 1);
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index e3ccbac58c0..5eee8e86b42 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -7003,9 +7003,9 @@
   if (!CONSTANT_P (els))
 els = simplify_gen_subreg (E_SImode, els, mode, 0);
 
-  rtx tmp_target = gen_reg_rtx (E_SImode);
+  rtx tmp_target = simplify_gen_subreg (E_SImode, operands[0], mode, 0);
+
   emit_insn (gen_movsicc (tmp_target, operands[1], then, els));
-  emit_move_insn (operands[0], gen_lowpart (mode, tmp_target));
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
index 1ced7107de0..031cc433f56 100644
--- a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-int.c
@@ -5,7 +5,6 @@
 
 /* { dg-final { scan-assembler "lochinle\t%r.?,1" } } */
 /* { dg-final { scan-assembler "locrnle\t.*" } } */
-#include 
 #include 
 #include 
 #include 
@@ -33,7 +32,7 @@ int main()
 {
   int a[] = {2, 1, -13, INT_MAX, INT_MIN, 0};
 
-  int res = foo (a, sizeof (a));
+  int res = foo (a, sizeof (a) / sizeof (a[0]));
 
   assert (res == (INT_MIN + 1));
 }
diff --git a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c
index 5a1173fc6ef..cd04d2ad33e 100644
--- a/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c
+++ b/gcc/testsuite/gcc.target/s390/ifcvt-two-insns-long.c
@@ -5,7 +5,6 @@
 
 /* { dg-final { scan-assembler "locghinle\t%r.?,1" } } */
 /* { dg-final { scan-assembler "locgrnle\t.*" } } */
-#include 
 #include 
 #include 
 #include 
@@ -33,7 +32,7 @@ int main()
 {
   long a[] = {2, 1, -13, LONG_MAX, LONG_MIN, 0};
 
-  long res = foo (a, sizeof (a));
+  long res = foo (a, sizeof (a) / sizeof (a[0]));
 
   assert (res == (LONG_MIN + 1));
 }

[committed][nvptx] Unbreak build, add PTX_ISA_SM70

2022-02-08 Thread Tom de Vries via Gcc-patches

Hi,

With the commit "[nvptx] Choose -mptx default based on -misa" I introduced a
use of PTX_ISA_SM70, without adding it first.

Add it, as well as the corresponding TARGET_SM70.

Build for x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Unbreak build, add PTX_ISA_SM70

gcc/ChangeLog:

2022-02-08  Tom de Vries  

* config/nvptx/nvptx-opts.h (enum ptx_isa): Add PTX_ISA_SM70.
* config/nvptx/nvptx.h (TARGET_SM70): Define.

---
 gcc/config/nvptx/nvptx-opts.h | 1 +
 gcc/config/nvptx/nvptx.h  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index cc488b23720..e918d43ea16 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -25,6 +25,7 @@ enum ptx_isa
   PTX_ISA_SM30,
   PTX_ISA_SM35,
   PTX_ISA_SM53,
+  PTX_ISA_SM70,
   PTX_ISA_SM75,
   PTX_ISA_SM80
 };
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 065d7aa210c..edffd088b15 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -88,6 +88,7 @@
 
 #define TARGET_SM35 (ptx_isa_option >= PTX_ISA_SM35)
 #define TARGET_SM53 (ptx_isa_option >= PTX_ISA_SM53)
+#define TARGET_SM70 (ptx_isa_option >= PTX_ISA_SM70)
 #define TARGET_SM75 (ptx_isa_option >= PTX_ISA_SM75)
 #define TARGET_SM80 (ptx_isa_option >= PTX_ISA_SM80)

Re: [PATCH 3/4] tree-optimization/104288 - Update non-null interface to provide better precision.

2022-02-08 Thread Andrew MacLeod via Gcc-patches


On 2/8/22 03:25, Richard Biener wrote:

On Mon, Feb 7, 2022 at 3:33 PM Andrew MacLeod via Gcc-patches
 wrote:

This changes the non-null interface to provide more granularity and
facilitate more precise queries.

Previously, we could only ask if NAME was nonnull anywhere in the block,
and it applied to the entire block.  now we can ask:
bool always_nonnull_p (tree name, basic_block bb);// all uses are
nonnull in the block
bool contains_nonnull_p (tree name, basic_block bb); // there is a
nonnull use somewhere in the block.

How is the latter useful?  To me it seems that

   bool nonnull_at_p (tree name, gimple *stmt);
// 'name' is nonnull at the execution of STMT

would be (or nonnull_from_p with starting with STMT), nonnull_from_p could
be overloaded with basic_block even.

I see the patch uses contains_nonnull_p for range_on_exit.  In the previous
patch I commented that for

Huh, I dont see that comment.  but nonetheless...

   :
  *p = foo (*q);  // fallthru + EH

on the fallthru edge we know p and q are not NULL but on the EH edge
we only know q is not NULL.  [with -fnon-call-exceptions we have a
representational issue since the *q load might trap but we cannot currenty
separate that EH on GIMPLE]


we bail all over the place if cfun->can_throw_non_call_exceptions is 
true anyway.





So given that wouldn't we instead need range_on_edge () instead of
range_on_exit?  At least contains_nonnull_p cannot be used for
range_on_exit in case we derive nonnull from *p?




Btw, for -fnon-call-exceptions all ranges derived from the possibly throwing
stmt are not realized on the exceptional path.  Like for

  :
 _1 = _3 / _5;  // fallthru + EH

on the fallthru edge we have _5 ~[0,0] but on the EH edge we do not
(in fact there we have _5 equals to [0,0] in this specific case).

Short-term it might be neccessary to not derive ranges from a
stmt that throws internally but it would be nice to sort those issues
out?


yes.

So, range_on_exit is only ever called from range_on_edge. You are 
suggesting that I move the adjustment from range_on_exit into 
range_on_edge and guarded with

!(e->flags & EDGE_EH)
Thats pretty straightforward.


Andrew

Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-08 Thread Richard Biener via Gcc-patches

On Tue, 8 Feb 2022, Michael Matz wrote:

> Hello,
> 
> On Tue, 8 Feb 2022, Richard Biener wrote:
> 
> > int foo(int always_true_at_runtime)
> > {
> >   {
> > int *p;
> > if (always_true_at_runtime)
> >   goto after;
> > lab:
> > return *p;
> > after:
> > int i = 0;
> > p = &i;
> > if (always_true_at_runtime)
> >   goto lab;
> >   }
> >   return 0;
> > }
> > 
> > For the implementation I considered starting lifetime at a DECL_EXPR
> > if it exists and otherwise at the start of the BIND_EXPR.  Note the
> > complication is that control flow has to reach the lifetime-start
> > clobber before the first access.  But that might also save us here,
> > since for the example above 'i' would be live via the backedge
> > from goto lab.
> > 
> > That said, the existing complication is that when the gimplifier
> > visits the BIND_EXPR it has no way to know whether there will be
> > a following DECL_EXPR or not (and the gimplifier notes there are
> > cases a DECL_EXPR variable is not in a BIND_EXPR).  My plan is to
> > compute this during one of the body walks the gimplifier performs
> > before gimplifying.
> > 
> > Another complication is that at least the C frontend + gimplifier
> > combination for
> > 
> >   switch (i)
> >   {
> >   case 1:
> > int i;
> > break;
> >   }
> > 
> > will end up having the BIND_EXPR mentioning 'i' containing the
> > case label, so just placing the birth at the BIND will make it
> > unreachable as
> > 
> >   i = BIRTH;
> > case_label_for_1:
> >   ...
> 
> I think anything that needs to happen (conceptually) during the jump from 
> switch to case-label needs to happen right before the jump, that might 
> mean creating a new fake BLOCK that contains just the jump and the 
> associated births.
> 
> > conveniently at least the C frontend has a DECL_EXPR for 'i'
> > which avoids this and I did not find a way (yet) in the gimplifier
> > to rectify this when gimplifying switches.
> 
> In C a 'case' label is nothing else than a normal label, it doesn't start 
> it's own block or the like.  So also for that one the births would need to 
> be at the start of their containing blocks.
> 
> > So there's still work to do but I think that starting the lifetime at a 
> > DECL_EXPR if it exists is the way to go?
> 
> Depends on where the DECL_EXPR is exactly placed, but probably it wouldn't 
> be okay.  You can't place the birth at the fall-through path, because this 
> needs to work (basically your jump example above rewritten as switch):
> 
> int state = 2, *p, camefrom1 = 0;
> for (;;) switch (state) {
>   case 1: 
>   case 2: ;
> int i;
> if (state != 1) { p = &i; i = 0; }
> if (state == 1) { (*p)++; return *p; }
> state = 1;
> continue;
> }
> 
> Note how i is initialized during state 2, and needs to be kept initialized 
> during state 1, so there must not be a CLOBBER (birth or other) at the 
> point of the declaration of 'i'.  AFAICS in my simple tests a DECL_EXPR 
> for 'i' is placed with the statement associated with 'case 2' label, 
> putting a CLOBBER there would be the wrong thing.  If the decl had an 
> initializer it might be harmless, as it would be overwritten at that 
> place, but even so, in this case there is no initializer.  Hmm.

You get after gimplification:

  state = 2;
  camefrom1 = 0;
  :
  switch (state) , case 1: , case 2: >
  {
int i;

try
  {
i = {CLOBBER(birth)};  /// ignore, should go away
:
:
i = {CLOBBER(birth)};
if (state != 1) goto ; else goto ;
:
p = &i;
i = 0;
:
if (state == 1) goto ; else goto ;
:
_1 = *p;
_2 = _1 + 1;
*p = _2;
D.1995 = *p;
// predicted unlikely by early return (on trees) predictor.
return D.1995;
:
state = 1;
// predicted unlikely by continue predictor.
goto ;
  }
finally
  {
i = {CLOBBER(eol)};
  }
  }

which I think is OK?  That is, when the abstract machine
arrives at 'int i;' then the previous content in 'i' goes
away?  Or would

int foo()
{
   goto ick;
use:
   int i, *p;
   return *p;
ick:
   i = 1;
   p = &i;
   goto use;

}

require us to return 1?  With the current patch 'i' is clobbered
before the return.

> Another complication arises from simple forward jumps:
> 
>   goto forw;
>   {
> int i;
> printf("not reachable\n");
>   forw:
> i = 1;
>   }
> 
> Despite the jump skiping the unreachable head of the BLOCK, 'i' needs to 
> be considered birthed at the label.  (In a way the placement of births 
> exactly mirrors the placements of deaths (of course), every path from 
> outside a BLOCK to inside needs to birth-clobber all variables (in C), 
> like every path leaving needs to kill them.  It's just that we have a 
> convenient construct for the latter (try-finally), but not for the former)

The situation with an unreachable birth is handled conservatively
since we consider

Fwd: [PATCH 2/4] tree-optimization/104288 - Register side effects during ranger vrp domwalk.

2022-02-08 Thread Andrew MacLeod via Gcc-patches

bah.. wrong reply function

 Forwarded Message 
Subject: 	Re: [PATCH 2/4] tree-optimization/104288 - Register side 
effects during ranger vrp domwalk.

Date:   Tue, 8 Feb 2022 09:52:49 -0500
From:   Andrew MacLeod 
To: Richard Biener 

On 2/8/22 03:14, Richard Biener wrote:

On Mon, Feb 7, 2022 at 3:32 PM Andrew MacLeod via Gcc-patches
 wrote:

This patch adds the ability to register side effects within a block to
ranger. This is currently only for tracking non-null values.

the rvrp_folder performs a DOM walk and then attempts to simplify and
the fold each stmt during the walk. This patch adds a call to
register_side_effects() to record any effects the stmt might have before
returning.

This checks if the non-null property is set for any ssa-names on the
statement, and if they are, moves the state to only non-null for the
block... This allows us to adjust the property within the block as we
do the DOM walk. All further queries within the block will then come
back as non-null. ALl other queries will be from outside the block, so
they will see the same results anyway.

Unfortunately, none of the non-null routines in gimple.cc appear to work
"in bulk", but rather, on a specific tree-ssa operand request at a
time. For this patch, I need to scan the entire statement looking for
any operand that is nonnull (or the performance impact is awful).

In tree.cc we have get_nonnull_args which you'd pass the
gimple_call_fntype, its expense is allocating a bitmap (if any arg is 
nonnull)

which you'd then need to iterate over and free. I think that's an OK
overhead for not duplicating the walk?

I will experiment.

It would be nice to be able to pass an optional bitmap into that 
routine Since I have bitmap obstacks and temporary bitmaps laying 
around.

Would you be OK if I added an optional bitmap pointer to that? no allocs 
would be handy I suspect.

I took the code in the various infer_nonnull routines, and packed it all
together into the two routines block_apply_nonnull &&
infer_nonnull_call_attributes, which basically perform the same
functionality as infer_nonnull_range_by_dereference and
infer_nonnull_range_by_attribute, only on all the operands at once. I
think its right, but you may want to have a look. I intend to leverage
this code in GCC13 for a more generalized range_after_stmt mechanism
that will do away with the immediate-use visits of the current non-null

+void
+non_null_ref::block_apply_nonnull (gimple *s)
+{
+ if (!flag_delete_null_pointer_checks)
+ return;
+ if (is_a (s))
+ return;
+ if (gimple_code (s) == GIMPLE_ASM || gimple_clobber_p (s))
+ return;
+ if (is_a (s))
+ {
+ infer_nonnull_call_attributes (as_a (s));
+ return;
+ }
+ walk_stmt_load_store_ops (s, (void *)this, non_null_loadstore,
+ non_null_loadstore);

Both ASM and calls can have pointer dereferences embedded so
I wonder whether you should do the walk_stmt_load_store_ops
anyway. Note that any exceptional control flow (EH or abnormal)

for ASM I just did what infer_nonnull_range_by_dereference does:

  if (!flag_delete_null_pointer_checks
  || !POINTER_TYPE_P (TREE_TYPE (op))
  || gimple_code (stmt) == GIMPLE_ASM
  || gimple_clobber_p (stmt))
    return false;

if (walk_stmt_load_store_ops (stmt, (void *)op,
    check_loadstore, check_loadstore))

ASMs are also skipped in infer_nonnull_range_by_attribute(), and the 
rest  of that routine only deals with calls.

The calls.. yes, perhaps, it shgould also fall through.   I was thinking 
that calls were handled by the attribute, but with the || condition it 
could be looking for  load/stores too.

will not have realized the outputs on a stmt (LHS of call, outputs
of asms), so nonnull carries over only across fallthru or conditional
edges.

I suppose realizing the LHS could be postponed for GCC 13, for
calls you could still walk loads so you get the non-nullness of
aggregate arguments like foo (*p) after the stmt.

But you can of course do all this for GCC 13 only.

The LHS should get covered via  the gimple-range-fold mechanism that 
produces the range for the LHS. fold_using_range::range_of_call() 
handles the calls already:

else if (gimple_call_nonnull_result_p (call)
   || gimple_call_nonnull_arg (call))
    r = range_nonzero (type);

DO we handle asms differently somewhere else? the infer() code seems to 
skip them completely.

Andrew

Re: [PATCH 5/4] tree-optimization/104288 - An alternative approach

2022-02-08 Thread Andrew MacLeod via Gcc-patches


On 2/8/22 03:32, Richard Biener wrote:

On Tue, Feb 8, 2022 at 2:33 AM Andrew MacLeod via Gcc-patches
 wrote:

On 2/7/22 09:29, Andrew MacLeod wrote:

I have a proposal for PR 104288.

ALL patches (in sequence) bootstrap on their own and each cause no
regressions.

I've been continuing to work with this towards the GCC13 solution for
statement side effects.  And There is another possibility we could
pursue which is less intrusive

I adapted the portions of patch 2/4 which process nonnull, but changes
it from being in the nonnull class to being in the cache.

THe main difference is it continues to use a single bit, just changing
all uses of it to *always* mean its non-null on exit to the block.

Range-on-entry is changed to only check dominator blocks, and
range_of_expr doesn't check the bit at all.. in both ranger and the cache.

When we are doing the block walk and process side effects, the on-entry
*cache* range for the block is adjusted to be non-null instead.   The
statements which precede this stmt have already been processed, and all
remaining statements in this block will now pick up this new non-value
from the on-entry cache.  This should be perfectly safe.

The final tweak is when range_of_expr is looking the def block, it
normally does not have an on entry cache value.  (the def is in the
block, so there is no entry cache value).  It now checks to see if we
have set one, which can only happen when we have been doing the
statement walk and set the on-entry cache with  a non-null value.  This
allows us to pick up the non-null range in the def block too... (once we
have passed a statement which set nonnull of course).

THis patch provides much less change to trunk, and is probably a better
approach as its closer to what is going to happen in GCC13.

Im still working with it, so the patch isnt fully refined yet... but it
bootstraps and passes all test with no regressions.  And passes the new
tests too.   I figured I would mention this before you look to hard at
the other patchset.the GCC11 patch doesn't change.

Let me know if you prefer this I think I do :-)  less churn, and
closer to end state.

Yes, I very much prefer this - some comments to the other patches
still apply to this one.  Like using get_nonnull_args and probably
adding a bail-out to computing ranges from stmts that can throw
internally or have abnormal control flow (to not get into range-on-exit
vs. normal vs. exceptional or abnormal edges).

Richard.


Yeah, I like this one way better too.  I will address and repost

Andrew

Re: [PATCH 3/4] tree-optimization/104288 - Update non-null interface to provide better precision.

2022-02-08 Thread Richard Biener via Gcc-patches

On Tue, Feb 8, 2022 at 3:53 PM Andrew MacLeod  wrote:
>
> On 2/8/22 03:25, Richard Biener wrote:
> > On Mon, Feb 7, 2022 at 3:33 PM Andrew MacLeod via Gcc-patches
> >  wrote:
> >> This changes the non-null interface to provide more granularity and
> >> facilitate more precise queries.
> >>
> >> Previously, we could only ask if NAME was nonnull anywhere in the block,
> >> and it applied to the entire block.  now we can ask:
> >> bool always_nonnull_p (tree name, basic_block bb);// all uses are
> >> nonnull in the block
> >> bool contains_nonnull_p (tree name, basic_block bb); // there is a
> >> nonnull use somewhere in the block.
> > How is the latter useful?  To me it seems that
> >
> >bool nonnull_at_p (tree name, gimple *stmt);
> > // 'name' is nonnull at the execution of STMT
> >
> > would be (or nonnull_from_p with starting with STMT), nonnull_from_p could
> > be overloaded with basic_block even.
> >
> > I see the patch uses contains_nonnull_p for range_on_exit.  In the previous
> > patch I commented that for
> Huh, I dont see that comment.  but nonetheless...
> >:
> >   *p = foo (*q);  // fallthru + EH
> >
> > on the fallthru edge we know p and q are not NULL but on the EH edge
> > we only know q is not NULL.  [with -fnon-call-exceptions we have a
> > representational issue since the *q load might trap but we cannot currenty
> > separate that EH on GIMPLE]
>
> we bail all over the place if cfun->can_throw_non_call_exceptions is
> true anyway.
>
>
> >
> > So given that wouldn't we instead need range_on_edge () instead of
> > range_on_exit?  At least contains_nonnull_p cannot be used for
> > range_on_exit in case we derive nonnull from *p?
>
> >
> > Btw, for -fnon-call-exceptions all ranges derived from the possibly throwing
> > stmt are not realized on the exceptional path.  Like for
> >
> >   :
> >  _1 = _3 / _5;  // fallthru + EH
> >
> > on the fallthru edge we have _5 ~[0,0] but on the EH edge we do not
> > (in fact there we have _5 equals to [0,0] in this specific case).
> >
> > Short-term it might be neccessary to not derive ranges from a
> > stmt that throws internally but it would be nice to sort those issues
> > out?
> >
> yes.
>
> So, range_on_exit is only ever called from range_on_edge. You are
> suggesting that I move the adjustment from range_on_exit into
> range_on_edge and guarded with
> !(e->flags & EDGE_EH)

and EDGE_ABNORMAL

> Thats pretty straightforward.

So yes, if you use nonnull from the dominator for EH/ABNORMAL edges
and the anywhere-in-BB nonnull from the fallthru then that would work.

But I guess it's easier to exchange the cfun->can_throw_non_call_exceptions
checks with stmt_throws_internal () so you don't get ranges from stmts
that eventually cause us to go the abnormal/EH edge, then it's valid to
use the "rest" of the ranges defined in e->src on those edges as well.

>
>
> Andrew
>
>

Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-08 Thread Michael Matz via Gcc-patches

Hello,

On Tue, 8 Feb 2022, Richard Biener wrote:

> > int state = 2, *p, camefrom1 = 0;
> > for (;;) switch (state) {
> >   case 1: 
> >   case 2: ;
> > int i;
> > if (state != 1) { p = &i; i = 0; }
> > if (state == 1) { (*p)++; return *p; }
> > state = 1;
> > continue;
> > }
> > 
> > Note how i is initialized during state 2, and needs to be kept initialized 
> > during state 1, so there must not be a CLOBBER (birth or other) at the 
> > point of the declaration of 'i'.  AFAICS in my simple tests a DECL_EXPR 
> > for 'i' is placed with the statement associated with 'case 2' label, 
> > putting a CLOBBER there would be the wrong thing.  If the decl had an 
> > initializer it might be harmless, as it would be overwritten at that 
> > place, but even so, in this case there is no initializer.  Hmm.
> 
> You get after gimplification:
> 
>   state = 2;
>   camefrom1 = 0;
>   :
>   switch (state) , case 1: , case 2: >
>   {
> int i;
> 
> try
>   {
> i = {CLOBBER(birth)};  /// ignore, should go away
> :
> :
> i = {CLOBBER(birth)};

I think this clobber here would be the problem, because ...

> which I think is OK?  That is, when the abstract machine
> arrives at 'int i;' then the previous content in 'i' goes
> away?  Or would
> 
> int foo()
> {
>goto ick;
> use:
>int i, *p;
>return *p;
> ick:
>i = 1;
>p = &i;
>goto use;
> 
> }
> 
> require us to return 1?

... I think this is exactly the logical consequence of lifetime of 'i' 
being the whole block.  We need to return 1. (Joseph: correct me on that! 
:) )  That's what I was trying to get at with my switch example as well.

> With the current patch 'i' is clobbered before the return.
> 
> > Another complication arises from simple forward jumps:
> > 
> >   goto forw;
> >   {
> > int i;
> > printf("not reachable\n");
> >   forw:
> > i = 1;
> >   }
> > 
> > Despite the jump skiping the unreachable head of the BLOCK, 'i' needs to 
> > be considered birthed at the label.  (In a way the placement of births 
> > exactly mirrors the placements of deaths (of course), every path from 
> > outside a BLOCK to inside needs to birth-clobber all variables (in C), 
> > like every path leaving needs to kill them.  It's just that we have a 
> > convenient construct for the latter (try-finally), but not for the former)
> 
> The situation with an unreachable birth is handled conservatively
> since we consider a variable without a (visible at RTL expansion time)
> birth to conflict with all other variables.

That breaks down when a birth is there (because it was otherwise 
reachable) but not on the taken path:

  if (nevertrue)
goto start;
  goto forw;
  start:
  {
int i;
printf("not really reachable, but unknowingly so\n");
  forw:
i = 1;
  }

> I don't see a good way to have a birth magically appear at 'forw' 
> without trying to argue that the following stmt is the first mentioning 
> the variable.

That's what my 'Hmm' aluded to :)  The only correct and precise way I see 
is to implement something similar like try-finally topside-down.  An 
easier but less precise way is to place the births at the (start of) 
innermost block containing the decl _and all jumps into the block_.  Even 
less presice, but perhaps even easier is to place the births for decls in 
blocks with side-entries into the function scope (and for blocks without 
side entries at their start).

Except for switches side-entries are really really seldom, so we might 
usefully get away with that latter solution.  And for switches it might be 
okay to put the births at the block containing the switch (if it itself 
doesn't have side entries, and the switch block doesn't have side 
entries except the case labels).

If the birth is moved to outward blocks it might be best if also the 
dealloc/death clobbers are moved to it, otherwise there might be paths 
containing a birth but no death.

The less precise you get with those births the more non-sharing you'll 
get, but that's the price.

Ciao,
Michael.

[Patch]middle-end: updating the reg use in exit block for -fzero-call-used-regs [PR100775]

2022-02-08 Thread Qing Zhao via Gcc-patches

Hi, Richard,

Could you please review this patch? This is a fix to the previous 
-fzero-call-used-regs implementation. 

PR 100775 ( ICE: in df_exit_block_bitmap_verify, at df-scan.c:4164 with -mthumb 
-fzero-call-used-regs=used)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100775

Although the ICE only happens on arm, but this is a bug in the middle end. So, 
I think this bug has higher priority, 
Need to be included into gcc12, and also need to be back ported to gcc11. 

In the pass_zero_call_used_regs, when updating dataflow info after adding
the register zeroing sequence in the epilogue of the function, we should
call "df_update_exit_block_uses" to update the register use information in
the exit block to include all the registers that have been zeroed.

The change has been bootstrapped and reg-tested on both x86 and aarch64 (with 
-enable-checking=yes,rtl,df). 
Since I cannot find an arm machine,  no bootstrap and reg-tested on arm yet.

For the arm failure, I just tested it with the cross build and it has no issue 
withe the fix.

(One question here:
Previously, I though “df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun))” and a 
later “df_analyze()” should rescan 
the changed exit block of the function, and update all the df info 
automatically, it apparently not the case, the register
use info at exit block is not automatically updated, we have to add an 
explicitly call to “df_update_exit_block_uses”.
I checked the pass_thread_prologue_and_epilogue, looks like it also explicitly 
calls “df_update_entry_exit_and_calls” 
to update the register use info.
Shall the “df_set_bb_dirty” + “df_analyze” automatically update the reg use 
info of the dirty block?).

Let me know whether there is any issue with the fix?

Thanks

Qing

===

From e1cca5659c85e7c536f5016a2c75c615e65dba75 Mon Sep 17 00:00:00 2001
From: Qing Zhao 
Date: Fri, 28 Jan 2022 16:29:51 +
Subject: [PATCH] middle-end: updating the reg use in exit block for
-fzero-call-used-regs [PR100775]

In the pass_zero_call_used_regs, when updating dataflow info after adding
the register zeroing sequence in the epilogue of the function, we should
call "df_update_exit_block_uses" to update the register use information in
the exit block to include all the registers that have been zeroed.

2022-01-27  Qing Zhao  

gcc/ChangeLog:

* function.cc (gen_call_used_regs_seq): Call
df_update_exit_block_uses when updating df.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr100775.c: New test.
---
gcc/function.cc | 1 +
gcc/testsuite/gcc.target/arm/pr100775.c | 8 
2 files changed, 9 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/arm/pr100775.c

diff --git a/gcc/function.cc b/gcc/function.cc
index e1d2565f8d92..c8a77c9a6246 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -5942,6 +5942,7 @@ gen_call_used_regs_seq (rtx_insn *ret, unsigned int 
zero_regs_type)
  /* Update the data flow information.  */
  crtl->must_be_zero_on_return |= zeroed_hardregs;
  df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
+  df_update_exit_block_uses ();
}
}

diff --git a/gcc/testsuite/gcc.target/arm/pr100775.c 
b/gcc/testsuite/gcc.target/arm/pr100775.c
new file mode 100644
index ..dd2255a95492
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr100775.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-mthumb -fzero-call-used-regs=used" } */
+
+int
+foo (int x)
+{
+  return x;
+}
-- 
2.27.0

Re: [PATCH] rs6000: Add support for vmsumcud and vec_msumc

2022-02-08 Thread Segher Boessenkool

On Mon, Feb 07, 2022 at 10:06:36PM -0600, Bill Schmidt wrote:
> On 2/7/22 5:05 PM, Segher Boessenkool wrote:
> > On Mon, Feb 07, 2022 at 04:20:24PM -0600, Bill Schmidt wrote:
> >> I observed recently that a couple of Power10 instructions and built-in 
> >> functions
> >> were somehow not implemented.  This patch adds one of them (vmsumcud).  
> >> Although
> >> this isn't normally stage-4 material, this is really simple and carries no
> >> discernible risk, so I hope it can be considered.
> > But what is the advantage?  That will be very tiny as well, afaics?
> >
> > Ah, this implements a builtin as well.  But that builtin is not in the
> > PVIPR, so no one yet uses it most likely?
> 
> It's in the yet unpublished version of PVIPR that adds ISA 3.1 support,
> currently awaiting public review.  It should have been implemented with
> the rest of the ISA 3.1 built-ins.  (There are two more that were missed
> as well, which I haven't yet addressed.)

Ugh.  Too much process, not enough speed.

> >> +;; vmsumcud
> >> +(define_insn "vmsumcud"
> >> +[(set (match_operand:V1TI 0 "register_operand" "+v")
> >> +  (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
> >> +(match_operand:V2DI 2 "register_operand" "v")
> >> +  (match_operand:V1TI 3 "register_operand" "v")]
> >> + UNSPEC_VMSUMCUD))]
> >> +  "TARGET_POWER10"
> >> +  "vmsumcud %0,%1,%2,%3"
> >> +  [(set_attr "type" "vecsimple")]
> >> +)
> > This can be properly described in RTL instead of using an unspec.  This
> > is much preferable.  I would say compare to maddhd[u], but those insns
> > aren't implemented either (maddld is though).
> 
> Is it?  Note that vmsumcud produces the carry out of the final
> result, not the result itself.  I couldn't immediately see how
> to express this in RTL.

It produces thw top 128 bits of the (infinitely precise) result.  But
yeah that requires an OImode here (for the temp itself), and we do not
have that in the backend yet.

> The full operation multiplies the corresponding lanes of each
> doubleword of arguments 1 and 2, adds them together with the
> 128-bit value in argument 3, and produces the carry out of the
> result as a 128-bit value in the result.  I think I'd need to
> have a 256-bit mode to express this properly in RTL, right?

Not if you actually calculate the carry, instead of computing the
256-bit result and truncating it.  But this is very unwieldy (it
would be fine if adding just two datums, but here there are three).

Should the type be vecsimple?  Don't we have a type for multiplications?
Hrm it looks like we use veccomplex usually.

Okay for trunk with that taken care of.  Thanks!


Segher

Re: [PATCH] tree-optimization/104373 - early uninit diagnostic on unreachable code

2022-02-08 Thread Jeff Law via Gcc-patches





On 2/8/2022 12:03 AM, Richard Biener via Gcc-patches wrote:

The following improves early uninit diagnostics by computing edge
reachability using our value-numbering framework in its cheapest
mode and ignoring unreachable blocks when looking
for uninitialized uses.  To not ICE with -fdump-tree-all the
early uninit pass needs a dumpfile since VN tries to dump statistics.

For gimple-match.c at -O0 -g this causes a 2% increase in compile-time.

In theory all early diagnostic passes could benefit from a VN run but
that would require more refactoring that's not appropriate at this stage.
This patch addresses a GCC 12 diagnostic regression and also happens to
fix one XFAIL in gcc.dg/uninit-pr20644-O0.c

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk?

Thanks,
Richard.

2022-02-04  Richard Biener  

PR tree-optimization/104373
* tree-ssa-sccvn.h (do_rpo_vn): New export exposing the
walk kind.
* tree-ssa-sccvn.cc (do_rpo_vn): Export, get the default
walk kind as argument.
(run_rpo_vn): Adjust.
(pass_fre::execute): Likewise.
* tree-ssa-uninit.cc (warn_uninitialized_vars): Skip
blocks not reachable.
(execute_late_warn_uninitialized): Mark all edges as
executable.
(execute_early_warn_uninitialized): Use VN to compute
executable edges.
(pass_data_early_warn_uninitialized): Enable a dump file,
change dump name to warn_uninit.

* g++.dg/warn/Wuninitialized-32.C: New testcase.
* gcc.dg/uninit-pr20644-O0.c: Remove XFAIL.

I'm conflicted on this ;-)

I generally lean on the side of eliminating false positives in these 
diagnostics.  So identifying unreachable blocks and using that to prune 
the set of warnings we generate, even at -O0 is good from that point of 
view.


But doing something like this has many of the problems that relying on 
optimizations does, even if we don't optimize away the unreachable 
code.  Right now the warning should be fairly stable at -O0 -- the set 
of diagnostics you get isn't going to change a lot release to release 
which is important to some users.   Second, at -O0 whether or not you 
get a warning isn't a function of how good our unreachable code analysis 
might be.


This was quite a contentious topic many years ago.  So much that I 
dropped some work on Wuninit on the floor as I just couldn't take the 
arguing.  So be aware that you might be opening a can of worms.



So the question comes down to a design decision.   What's more important 
to the end users?  Fewer false positives or better stability in the 
warning?  I think the former, but there's certainly been a vocal group 
that prefers the latter.


On the implementation side I have zero concerns.    Looking further out, 
ISTM we could mark the blocks as unreachable (rather than deducing it 
from edge flags).  That would make it pretty easy to mark those blocks 
relatively early and allow us to suppress any middle end diagnostic 
occurring in an unreachable block.


jeff

Re: [PATCH, V2] Use system default for long double if not specified on PowerPC.

2022-02-08 Thread Peter Bergner via Gcc-patches

On 2/4/22 4:26 PM, Segher Boessenkool wrote:
> As I said before, I didn't even read the patch, just the one line
> summary was enough for a NAK.  If the patch in fact does something else,
> then it is still incorrect, and needs a very different subject and
> summary.
> 
> I hope you see how "using the default of the underlying system" is
> questionable in itself, but is something completely different from using
> the default of the build compiler, which makes even less sense.
> 
> You want a configure flag to set the default long double format to be
> IEEE QP.  This cannot be enabled by default until a (big) majority of
> systems "in the wild" will work with that (only on powerpc64le-linux
> or some *big* thing like that is fine, only default it to enabled there
> then).  At that point in time, configure shouls complain, and the user
> would have to explicitly *disable* it to build without that support.

Can you please clarify one thing for me.  Do you think it's possible
that we can come up with some type of configure patch that automatically
sets the long double default given something on the system we can test
for or do you think that's impossible and we'll just have to live with
explicitly using a configure option to set the default?

...at least until some time in the far future when enough systems/distros
have changed to IEEE128 that we can change the default without a test.

Peter

[PATCH] rs6000: Fix up vspltis_shifted [PR102140]

2022-02-08 Thread Jakub Jelinek via Gcc-patches

Hi!

The following testcase ICEs, because
(const_vector:V4SI [
(const_int 0 [0]) repeated x3
(const_int -2147483648 [0x8000])
])
is recognized as valid easy_vector_constant in between split1 pass and
end of RA.
The problem is that such constants need to be split, and the only
splitter for that is:
(define_split
  [(set (match_operand:VM 0 "altivec_register_operand")
(match_operand:VM 1 "easy_vector_constant_vsldoi"))]
  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()"
There is only a single splitting pass before RA, so after that finishes,
if something gets matched in between that and end of RA (after that
can_create_pseudo_p () would be no longer true), it will never be
successfully split and we ICE at final.cc time or earlier.

The i386 backend (and a few others) already use
(cfun->curr_properties & PROP_rtl_split_insns)
as a test for split1 pass finished, so that some insns that should be split
during split1 and shouldn't be matched afterwards are properly guarded.

So, the following patch does that for vspltis_shifted too.

Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

2022-02-08  Jakub Jelinek  

PR target/102140
* config/rs6000/rs6000.cc (vspltis_shifted): Return false also if
split1 pass has finished already.

* gcc.dg/pr102140.c: New test.

--- gcc/config/rs6000/rs6000.cc.jj  2022-02-07 17:38:20.873123915 +0100
+++ gcc/config/rs6000/rs6000.cc 2022-02-08 14:15:31.619505410 +0100
@@ -6257,8 +6257,11 @@ vspltis_shifted (rtx op)
 return false;
 
   /* We need to create pseudo registers to do the shift, so don't recognize
- shift vector constants after reload.  */
-  if (!can_create_pseudo_p ())
+ shift vector constants after reload.  Don't match it even before RA
+ after split1 is done, because there won't be further splitting pass
+ before RA to do the splitting.  */
+  if (!can_create_pseudo_p ()
+  || (cfun->curr_properties & PROP_rtl_split_insns))
 return false;
 
   nunits = GET_MODE_NUNITS (mode);
--- gcc/testsuite/gcc.dg/pr102140.c.jj  2022-02-08 14:24:25.839041166 +0100
+++ gcc/testsuite/gcc.dg/pr102140.c 2022-02-08 14:24:03.038359745 +0100
@@ -0,0 +1,23 @@
+/* PR target/102140 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-Og -fipa-cp -fno-tree-ccp -fno-tree-ter -Wno-psabi" } */
+
+typedef int __attribute__((__vector_size__ (64))) U;
+typedef __int128 __attribute__((__vector_size__ (64))) V;
+
+int a, b;
+
+static void
+bar (char c, V v)
+{
+  v *= c;
+  U u = a + (U) v;
+  (union { U b; }) { u };
+  b = 0;
+}
+
+void
+foo (void)
+{
+  bar (1, (V){((__int128) 9223372036854775808ULL) << 64});
+}

Jakub

Re: [PATCH 5/4] tree-optimization/104288 - An alternative approach

2022-02-08 Thread Andrew MacLeod via Gcc-patches


On 2/8/22 03:32, Richard Biener wrote:

On Tue, Feb 8, 2022 at 2:33 AM Andrew MacLeod via Gcc-patches
 wrote:

On 2/7/22 09:29, Andrew MacLeod wrote:

I have a proposal for PR 104288.

ALL patches (in sequence) bootstrap on their own and each cause no
regressions.

I've been continuing to work with this towards the GCC13 solution for
statement side effects.  And There is another possibility we could
pursue which is less intrusive

I adapted the portions of patch 2/4 which process nonnull, but changes
it from being in the nonnull class to being in the cache.

THe main difference is it continues to use a single bit, just changing
all uses of it to *always* mean its non-null on exit to the block.

Range-on-entry is changed to only check dominator blocks, and
range_of_expr doesn't check the bit at all.. in both ranger and the cache.

When we are doing the block walk and process side effects, the on-entry
*cache* range for the block is adjusted to be non-null instead.   The
statements which precede this stmt have already been processed, and all
remaining statements in this block will now pick up this new non-value
from the on-entry cache.  This should be perfectly safe.

The final tweak is when range_of_expr is looking the def block, it
normally does not have an on entry cache value.  (the def is in the
block, so there is no entry cache value).  It now checks to see if we
have set one, which can only happen when we have been doing the
statement walk and set the on-entry cache with  a non-null value.  This
allows us to pick up the non-null range in the def block too... (once we
have passed a statement which set nonnull of course).

THis patch provides much less change to trunk, and is probably a better
approach as its closer to what is going to happen in GCC13.

Im still working with it, so the patch isnt fully refined yet... but it
bootstraps and passes all test with no regressions.  And passes the new
tests too.   I figured I would mention this before you look to hard at
the other patchset.the GCC11 patch doesn't change.

Let me know if you prefer this I think I do :-)  less churn, and
closer to end state.

Yes, I very much prefer this - some comments to the other patches
still apply to this one.  Like using get_nonnull_args and probably
adding a bail-out to computing ranges from stmts that can throw
internally or have abnormal control flow (to not get into range-on-exit
vs. normal vs. exceptional or abnormal edges).

Richard.


Here's the update which is going thru testing now.  I think I covered 
the comments?


Andrew


From e3286a6a185952f3acaa89b90beee33149cd7512 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 7 Feb 2022 15:52:16 -0500
Subject: [PATCH] Register non-null side effects properly.

This patch adjusts uses of nonnull to accurately reflect "somewhere in block".
It also adds the ability to register statement side effects within a block
for ranger which will apply for the rest of the block.

	PR tree-optimization/104288
	gcc/
	* gimple-range-cache.cc (non_null_ref::set_nonnull): New.
	(ranger_cache::range_of_def): Don't check non-null.
	(ranger_cache::entry_range): Don't check non-null.
	(ranger_cache::range_on_edge): Check for nonnull on normal edges.
	(ranger_cache::update_to_nonnull): New.
(non_null_loadstore): New.
(ranger_cache::block_apply_nonnull): New.
	* ranger-cache.h (class non_null_ref): Update prototypes.
	(class ranger_cache): Update prototypes.
	* gimple-range-path.cc (path_range_query::range_defined_in_block): Do
	not search dominators.
	(path_range_query::adjust_for_non_null_uses): Ditto.
	* gimple-range.cc (gimple_ranger::range_of_expr): Check on-entry for
	def overrides.  Do not check nonnull.
	(gimple_ranger::range_on_entry): Check dominators for nonnull.
	(gimple_ranger::range_on_edge): Check for nonnull on normal edges..
	* gimple-range.cc (gimple_ranger::register_side_effects): New.
	* gimple-range.h (gimple_ranger::register_side_effects): New.
	* tree-vrp.cc (rvrp_folder::fold_stmt): Call register_side_effects.

	testsuite/gcc/
	* gcc.dg/pr104288.c: New.
---
 gcc/gimple-range-cache.cc   | 103 ++--
 gcc/gimple-range-cache.h|   3 +
 gcc/gimple-range-path.cc|   4 +-
 gcc/gimple-range.cc |  27 -
 gcc/gimple-range.h  |   1 +
 gcc/testsuite/gcc.dg/pr104288.c |  23 +++
 gcc/tree-vrp.cc |   8 ++-
 7 files changed, 156 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr104288.c

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 16c881b13e1..583ba97ff75 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -29,6 +29,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "gimple-range.h"
 #include "tree-cfg.h"
+#include "target.h"
+#include "attribs.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
 
 #define DEBUG_RANGE_CACHE (dump_

Re: [PATCH, V2] Use system default for long double if not specified on PowerPC.

2022-02-08 Thread Andreas Schwab

On Feb 08 2022, Peter Bergner wrote:

> Can you please clarify one thing for me.  Do you think it's possible
> that we can come up with some type of configure patch that automatically
> sets the long double default given something on the system we can test
> for or do you think that's impossible and we'll just have to live with
> explicitly using a configure option to set the default?

It should be handled the same as the double->long double switch.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: [PATCH] rs6000: Fix up vspltis_shifted [PR102140]

2022-02-08 Thread David Edelsohn via Gcc-patches

On Tue, Feb 8, 2022 at 12:25 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase ICEs, because
> (const_vector:V4SI [
> (const_int 0 [0]) repeated x3
> (const_int -2147483648 [0x8000])
> ])
> is recognized as valid easy_vector_constant in between split1 pass and
> end of RA.
> The problem is that such constants need to be split, and the only
> splitter for that is:
> (define_split
>   [(set (match_operand:VM 0 "altivec_register_operand")
> (match_operand:VM 1 "easy_vector_constant_vsldoi"))]
>   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()"
> There is only a single splitting pass before RA, so after that finishes,
> if something gets matched in between that and end of RA (after that
> can_create_pseudo_p () would be no longer true), it will never be
> successfully split and we ICE at final.cc time or earlier.
>
> The i386 backend (and a few others) already use
> (cfun->curr_properties & PROP_rtl_split_insns)
> as a test for split1 pass finished, so that some insns that should be split
> during split1 and shouldn't be matched afterwards are properly guarded.
>
> So, the following patch does that for vspltis_shifted too.
>
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

Okay.

Thanks, David

>
> 2022-02-08  Jakub Jelinek  
>
> PR target/102140
> * config/rs6000/rs6000.cc (vspltis_shifted): Return false also if
> split1 pass has finished already.
>
> * gcc.dg/pr102140.c: New test.
>
> --- gcc/config/rs6000/rs6000.cc.jj  2022-02-07 17:38:20.873123915 +0100
> +++ gcc/config/rs6000/rs6000.cc 2022-02-08 14:15:31.619505410 +0100
> @@ -6257,8 +6257,11 @@ vspltis_shifted (rtx op)
>  return false;
>
>/* We need to create pseudo registers to do the shift, so don't recognize
> - shift vector constants after reload.  */
> -  if (!can_create_pseudo_p ())
> + shift vector constants after reload.  Don't match it even before RA
> + after split1 is done, because there won't be further splitting pass
> + before RA to do the splitting.  */
> +  if (!can_create_pseudo_p ()
> +  || (cfun->curr_properties & PROP_rtl_split_insns))
>  return false;
>
>nunits = GET_MODE_NUNITS (mode);
> --- gcc/testsuite/gcc.dg/pr102140.c.jj  2022-02-08 14:24:25.839041166 +0100
> +++ gcc/testsuite/gcc.dg/pr102140.c 2022-02-08 14:24:03.038359745 +0100
> @@ -0,0 +1,23 @@
> +/* PR target/102140 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-Og -fipa-cp -fno-tree-ccp -fno-tree-ter -Wno-psabi" } */
> +
> +typedef int __attribute__((__vector_size__ (64))) U;
> +typedef __int128 __attribute__((__vector_size__ (64))) V;
> +
> +int a, b;
> +
> +static void
> +bar (char c, V v)
> +{
> +  v *= c;
> +  U u = a + (U) v;
> +  (union { U b; }) { u };
> +  b = 0;
> +}
> +
> +void
> +foo (void)
> +{
> +  bar (1, (V){((__int128) 9223372036854775808ULL) << 64});
> +}
>
> Jakub
>

Re: [Patch, fortran] PR37336 (Finalization) - [F03] Finish derived-type finalization

2022-02-08 Thread Harald Anlauf via Gcc-patches


Hi Paul,

Am 08.02.22 um 12:22 schrieb Paul Richard Thomas via Fortran:

Hi Harald,

Thanks for giving the patch a whirl.



the parent components as an array. I strongly suspect that, from reading

7.5.6.2 paragraphs 2 and 3 closely, that ifort has it right. However,

this

is another issue to come back to in the future.


Could you specify which version of Intel you tried?



ifort (IFORT) 2021.1 Beta 20201112


ok, that's good to know.



Testcase finalize_38.f90 fails for me with ifort 2021.5.0 with:

131

This test also fails with crayftn 11 & 12 and nagfor 7.0,
but in a different place.



I have run your modified version of finalize_38.f90, and now I see
that you can get a bloody head just from scratching too much...

crayftn 12.0.2:

 1,  3,  1
 2,  21,  0
 11,  3,  2
 12,  21,  1
 21,  4,  3
 23,  21,  22 | 42,  43
 31,  6,  4
 41,  7,  5
 51,  9,  7
 61,  10,  8
 71,  13,  10
 101,  2,  1
 102,  4,  3
 111,  3,  2
 121,  4,  2
 122,  0,  4
 123,  5,  6 | 2*0
 131,  5,  2
 132,  0,  4
 133,  7,  8 | 2*0
 141,  6,  3
 151,  10,  5
 161,  16,  9
 171,  18,  11
 175,  0.,  20. | 10.,  20.

nagfor 7.0:

 1 0 1
 11 1 2
 23 21 22 | 42 43
 71 9 10
 72 11 99
 131 3 2
 132 5 4
 141 4 3
 151 6 5
 161 10 9
 171 12 11

Intel 2021.5.0:

 131   3   2
 132   0   4
 133   5   6 |   0   0
 141   4   3
 151   7   5
 152   3   0
 153   0   0 |   1   3
forrtl: severe (174): SIGSEGV, segmentation fault occurred
[...]


That got me reading 7.5.6.3, where is says in paragraph 1:

"When an intrinsic assignment statement is executed (10.2.1.3), if the
variable is not an unallocated allocatable variable, it is finalized
after evaluation of expr and before the definition of the variable.
..."

Looking at the beginning of the testcase code (abridged):

  type(simple), allocatable :: MyType, MyType2
  type(simple) :: ThyType = simple(21), ThyType2 = simple(22)

! The original PR - one finalization of 'var' before (re)allocation.
  MyType = ThyType
  call test(1, 0, [0,0], 0)


This is an intrinsic assignment.

Naively I would expect MyType to be initially unallocated.

ThyType is not allocatable and non-pointer and cannot become
undefined here and would not play any role in finalization.

I am probably too blind-sighted to see why there should be
a finalization here.  What am I missing?


Could you use the attached version of finalize_38.f90 with crayftn and NAG?
All the stop statements are replaced with prints. Ifort gives:
  131   3   2
  132   0   4
  133   5   6 |   0   0
  141   4   3
  151   7   5
  152   3   0
  153   0   0 |   1   3
  161  13   9
  162  20   0
  163   0   0 |  10  20
  171  14  11


I think it is a good idea to have these prints in the testcase
whenever there is a departure from expectations.  So print&stop?

Furthermore, for the sake of health of people reading the testcases
later, I think it would not harm to add more explanations why we
expect a certain behavior... ;-)


Best regards

Paul


Best regards,
Harald

Re: [PATCH 1/4][RFC] middle-end/90348 - add explicit birth

2022-02-08 Thread Joseph Myers

On Tue, 8 Feb 2022, Richard Biener via Gcc-patches wrote:

> which I think is OK?  That is, when the abstract machine
> arrives at 'int i;' then the previous content in 'i' goes
> away?  Or would

Yes, that's correct.  "If an initialization is specified for the object, 
it is performed each time the declaration or compound literal is reached 
in the execution of the block; otherwise, the value becomes indeterminate 
each time the declaration is reached.".

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] rs6000: Add support for vmsumcud and vec_msumc

2022-02-08 Thread Bill Schmidt via Gcc-patches



On 2/8/22 9:45 AM, Segher Boessenkool wrote:
> On Mon, Feb 07, 2022 at 10:06:36PM -0600, Bill Schmidt wrote:
>> On 2/7/22 5:05 PM, Segher Boessenkool wrote:
>>> On Mon, Feb 07, 2022 at 04:20:24PM -0600, Bill Schmidt wrote:
 I observed recently that a couple of Power10 instructions and built-in 
 functions
 were somehow not implemented.  This patch adds one of them (vmsumcud).  
 Although
 this isn't normally stage-4 material, this is really simple and carries no
 discernible risk, so I hope it can be considered.
>>> But what is the advantage?  That will be very tiny as well, afaics?
>>>
>>> Ah, this implements a builtin as well.  But that builtin is not in the
>>> PVIPR, so no one yet uses it most likely?
>> It's in the yet unpublished version of PVIPR that adds ISA 3.1 support,
>> currently awaiting public review.  It should have been implemented with
>> the rest of the ISA 3.1 built-ins.  (There are two more that were missed
>> as well, which I haven't yet addressed.)
> Ugh.  Too much process, not enough speed.
>
 +;; vmsumcud
 +(define_insn "vmsumcud"
 +[(set (match_operand:V1TI 0 "register_operand" "+v")
 +  (unspec:V1TI [(match_operand:V2DI 1 "register_operand" "v")
 +(match_operand:V2DI 2 "register_operand" "v")
 +  (match_operand:V1TI 3 "register_operand" "v")]
 + UNSPEC_VMSUMCUD))]
 +  "TARGET_POWER10"
 +  "vmsumcud %0,%1,%2,%3"
 +  [(set_attr "type" "vecsimple")]
 +)
>>> This can be properly described in RTL instead of using an unspec.  This
>>> is much preferable.  I would say compare to maddhd[u], but those insns
>>> aren't implemented either (maddld is though).
>> Is it?  Note that vmsumcud produces the carry out of the final
>> result, not the result itself.  I couldn't immediately see how
>> to express this in RTL.
> It produces thw top 128 bits of the (infinitely precise) result.  But
> yeah that requires an OImode here (for the temp itself), and we do not
> have that in the backend yet.
>
>> The full operation multiplies the corresponding lanes of each
>> doubleword of arguments 1 and 2, adds them together with the
>> 128-bit value in argument 3, and produces the carry out of the
>> result as a 128-bit value in the result.  I think I'd need to
>> have a 256-bit mode to express this properly in RTL, right?
> Not if you actually calculate the carry, instead of computing the
> 256-bit result and truncating it.  But this is very unwieldy (it
> would be fine if adding just two datums, but here there are three).
>
> Should the type be vecsimple?  Don't we have a type for multiplications?
> Hrm it looks like we use veccomplex usually.
>
> Okay for trunk with that taken care of.  Thanks!

Thanks!  Revised as requested and pushed as r12-7110 (943d631abdd7be623c).

Bill

>
>
> Segher

Re: [PATCH] c++: Remove superflous assert [PR104403]

2022-02-08 Thread Jason Merrill via Gcc-patches


On 2/8/22 03:16, Jakub Jelinek wrote:

Hi!

I've added the assert because start_decl diagnoses such vars for C++20 and
earlier:
   if (current_function_decl && VAR_P (decl)
   && DECL_DECLARED_CONSTEXPR_P (current_function_decl)
   && cxx_dialect < cxx23)
but as can be seen, we cam trigger the assert in older standards e.g. during
non-manifestly constant evaluation.  Rather than refining the assert that
DECL_EXPRs for such vars don't appear for C++20 and older if they are inside
of functions declared constexpr this patch just removes the assert, the
code rejects encountering those vars in constant expressions anyway.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2022-02-08  Jakub Jelinek  

PR c++/104403
* constexpr.cc (cxx_eval_constant_expression): Don't assert DECL_EXPRs
of TREE_STATIC vars may only appear in -std=c++23.

* g++.dg/cpp0x/lambda/lambda-104403.C: New test.

--- gcc/cp/constexpr.cc.jj  2022-02-06 11:16:22.353814431 +0100
+++ gcc/cp/constexpr.cc 2022-02-07 11:38:59.131165171 +0100
@@ -6652,7 +6652,6 @@ cxx_eval_constant_expression (const cons
/* Allow __FUNCTION__ etc.  */
&& !DECL_ARTIFICIAL (r))
  {
-   gcc_assert (cxx_dialect >= cxx23);
if (!ctx->quiet)
  {
if (CP_DECL_THREAD_LOCAL_P (r))
--- gcc/testsuite/g++.dg/cpp0x/lambda/lambda-104403.C.jj2022-02-07 
11:40:32.760847269 +0100
+++ gcc/testsuite/g++.dg/cpp0x/lambda/lambda-104403.C   2022-02-07 
11:40:06.109223079 +0100
@@ -0,0 +1,8 @@
+// PR c++/104403
+// { dg-do compile { target c++11 } }
+
+int
+main ()
+{
+  []{ switch (0) { case 0: static int value; return &value; } };
+}

Jakub

Re: [PATCH] c++: Don't emit repeated -Wshadow warnings for templates/ctors [PR104379]

2022-02-08 Thread Jason Merrill via Gcc-patches


On 2/8/22 03:22, Jakub Jelinek wrote:

Hi!

The following patch suppresses extraneous -Wshadow warnings.
On the testcase without the patch we emit 14 -Wshadow warnings,
with the patch just 4.  It is enough to warn once e.g. during parsing of the
template or the abstract ctor, while previously we'd warn also on the clones
of the ctors and on instantiation.
In GCC 8 and earlier we didn't warn because check_local_shadow did
   /* Inline decls shadow nothing.  */
   if (DECL_FROM_INLINE (decl))
 return;

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2022-02-08  Jakub Jelinek  

PR c++/104379
* name-lookup.cc (check_local_shadow): When diagnosing shadowing
of a member or global declaration, add warning suppression for
the decl and don't warn again on it.

* g++.dg/warn/Wshadow-18.C: New test.

--- gcc/cp/name-lookup.cc.jj2022-01-18 11:58:59.0 +0100
+++ gcc/cp/name-lookup.cc   2022-02-07 15:00:19.781820426 +0100
@@ -3296,18 +3296,22 @@ check_local_shadow (tree decl)
  
  	/* Warn if a variable shadows a non-function, or the variable

   is a function or a pointer-to-function.  */
-   if (!OVL_P (member)
-   || TREE_CODE (decl) == FUNCTION_DECL
-   || (TREE_TYPE (decl)
-   && (TYPE_PTRFN_P (TREE_TYPE (decl))
-   || TYPE_PTRMEMFUNC_P (TREE_TYPE (decl)
+   if ((!OVL_P (member)
+|| TREE_CODE (decl) == FUNCTION_DECL
+|| (TREE_TYPE (decl)
+&& (TYPE_PTRFN_P (TREE_TYPE (decl))
+|| TYPE_PTRMEMFUNC_P (TREE_TYPE (decl)
+   && !warning_suppressed_p (decl, OPT_Wshadow))
  {
auto_diagnostic_group d;
if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wshadow,
"declaration of %qD shadows a member of %qT",
decl, current_nonlambda_class_type ())
&& DECL_P (member))
- inform_shadowed (member);
+ {
+   inform_shadowed (member);
+   suppress_warning (decl, OPT_Wshadow);
+ }
  }
return;
}
@@ -3319,14 +3323,18 @@ check_local_shadow (tree decl)
  || (TREE_CODE (old) == TYPE_DECL
  && (!DECL_ARTIFICIAL (old)
  || TREE_CODE (decl) == TYPE_DECL)))
-  && !instantiating_current_function_p ())
+  && !instantiating_current_function_p ()
+  && !warning_suppressed_p (decl, OPT_Wshadow))
  /* XXX shadow warnings in outer-more namespaces */
  {
auto_diagnostic_group d;
if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wshadow,
  "declaration of %qD shadows a global declaration",
  decl))
-   inform_shadowed (old);
+   {
+ inform_shadowed (old);
+ suppress_warning (decl, OPT_Wshadow);
+   }
return;
  }
  
--- gcc/testsuite/g++.dg/warn/Wshadow-18.C.jj	2022-02-07 15:01:56.509456240 +0100

+++ gcc/testsuite/g++.dg/warn/Wshadow-18.C  2022-02-07 15:03:34.951067880 
+0100
@@ -0,0 +1,22 @@
+// PR c++/104379
+// { dg-do compile }
+// { dg-options "-Wshadow" }
+
+int x;
+
+template
+struct S
+{
+  int i;
+  S(int i) { (void) i; }   // { dg-warning "declaration of 'i' 
shadows a member of 'S'" }
+  S(float x) { (void) x; } // { dg-warning "declaration of 'x' 
shadows a global declaration" }
+  S(int *p) { int a = 1; (void) p; (void) a;
+ { int a = 2; (void) a; } }// { dg-warning "declaration of 'a' 
shadows a previous local" }
+};
+
+S i(1);
+S j(1);
+S k(1.0f);
+S l(1.0f);
+S m(&x);
+S n(&x);

Jakub

Re: [PATCH] dwarf2out: Don't call expand_expr during early_dwarf [PR104407]

2022-02-08 Thread Jason Merrill via Gcc-patches


On 2/8/22 03:35, Richard Biener wrote:

On Tue, 8 Feb 2022, Jakub Jelinek wrote:


Hi!

As mentioned in the PR, since PR96690 r11-2834 we call rtl_for_decl_init
which can call expand_expr already during early_dwarf.  The comment and PR
explains it that the intent is to ensure the referenced vars and functions
are properly mangled because free_lang_data doesn't cover everything, like
template parameters etc.  It doesn't work well though, because expand_expr


free_lang_data mangling also only runs when generating LTO bytecode...


can set DECL_RTLs e.g. on referenced vars and keep them there, and they can
be created e.g. with different MEM_ALIGN compared to what they would be
created with if they were emitted later.
So, the following patch stops calling rtl_for_decl_init and instead
for cases for which rtl_for_decl_init does anything at all walks the
initializer and ensures referenced vars or functions are mangled.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-02-08  Jakub Jelinek  

PR debug/104407
* dwarf2out.cc (mangle_referenced_decls): New function.
(tree_add_const_value_attribute): Don't call rtl_for_decl_init if
early_dwarf.  Instead walk the initializer and try to mangle vars or
functions referenced from it.

* g++.dg/debug/dwarf2/pr104407.C: New test.

--- gcc/dwarf2out.cc.jj 2022-02-04 14:36:54.966605844 +0100
+++ gcc/dwarf2out.cc2022-02-07 12:28:13.000520666 +0100
@@ -20881,6 +20881,19 @@ add_location_or_const_value_attribute (d
return tree_add_const_value_attribute_for_decl (die, decl);
  }
  
+/* Mangle referenced decls.  */

+static tree
+mangle_referenced_decls (tree *tp, int *walk_subtrees, void *)
+{
+  if (! EXPR_P (*tp) && ! CONSTANT_CLASS_P (*tp))
+*walk_subtrees = 0;
+
+  if (VAR_OR_FUNCTION_DECL_P (*tp))
+assign_assembler_name_if_needed (*tp);
+
+  return NULL_TREE;
+}
+
  /* Attach a DW_AT_const_value attribute to DIE. The value of the
 attribute is the const value T.  */
  
@@ -20889,7 +20902,6 @@ tree_add_const_value_attribute (dw_die_r

  {
tree init;
tree type = TREE_TYPE (t);
-  rtx rtl;
  
if (!t || !TREE_TYPE (t) || TREE_TYPE (t) == error_mark_node)

  return false;
@@ -20910,11 +20922,26 @@ tree_add_const_value_attribute (dw_die_r
  return true;
}
  }
-  /* Generate the RTL even if early_dwarf to force mangling of all refered to
- symbols.  */
-  rtl = rtl_for_decl_init (init, type);
-  if (rtl && !early_dwarf)
-return add_const_value_attribute (die, TYPE_MODE (type), rtl);
+  if (!early_dwarf)
+{
+  rtx rtl = rtl_for_decl_init (init, type);
+  if (rtl && !early_dwarf)


the !early_dwarf is redundant

Otherwise looks good to me, please give Jason the opportunity to comment
though.

Thanks,
Richard.


+   return add_const_value_attribute (die, TYPE_MODE (type), rtl);
+}
+  else
+{
+  /* For early_dwarf force mangling of all referenced symbols.  */
+  tree initializer = init;
+  STRIP_NOPS (initializer);
+  /* rtl_for_decl_init punts on other aggregates, and complex values.  */
+  if (AGGREGATE_TYPE_P (type)
+ || (TREE_CODE (initializer) == VIEW_CONVERT_EXPR
+ && AGGREGATE_TYPE_P (TREE_TYPE (TREE_OPERAND (initializer, 0
+ || TREE_CODE (type) == COMPLEX_TYPE)
+   ;


Hmm, I don't think we need to mirror the limitations of 
rtl_for_decl_init here; IMO we might as well go ahead mangle everything 
so we don't need to change this place as well when the FIXME is 
resolved.  OK either way, but if you decide to keep this condition 
please update the FIXME in rtl_for_decl_init to point here.



+  else if (initializer_constant_valid_p (initializer, type))
+   walk_tree (&initializer, mangle_referenced_decls, NULL, NULL);
+}
/* If the host and target are sane, try harder.  */
if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
&& initializer_constant_valid_p (init, type))
--- gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C.jj 2022-02-07 
10:45:51.945041031 +0100
+++ gcc/testsuite/g++.dg/debug/dwarf2/pr104407.C2022-02-07 
10:45:09.834635340 +0100
@@ -0,0 +1,12 @@
+// PR debug/104407
+// { dg-do compile { target c++17 } }
+// { dg-options "-O1 -fcompare-debug" }
+
+struct A { int i; long j; int k : 2; char l; } a;
+
+auto [ aa, bb, cc, dd ] = a;
+
+namespace N
+{
+  auto & [ m, n, o, ppp ] = a;
+}

Jakub

Re: [PATCH] dwarf2out: Don't call expand_expr during early_dwarf [PR104407]

2022-02-08 Thread Jakub Jelinek via Gcc-patches

On Tue, Feb 08, 2022 at 02:20:28PM -0500, Jason Merrill wrote:
> > > + return add_const_value_attribute (die, TYPE_MODE (type), rtl);
> > > +}
> > > +  else
> > > +{
> > > +  /* For early_dwarf force mangling of all referenced symbols.  */
> > > +  tree initializer = init;
> > > +  STRIP_NOPS (initializer);
> > > +  /* rtl_for_decl_init punts on other aggregates, and complex 
> > > values.  */
> > > +  if (AGGREGATE_TYPE_P (type)
> > > +   || (TREE_CODE (initializer) == VIEW_CONVERT_EXPR
> > > +   && AGGREGATE_TYPE_P (TREE_TYPE (TREE_OPERAND (initializer, 0
> > > +   || TREE_CODE (type) == COMPLEX_TYPE)
> > > + ;
> 
> Hmm, I don't think we need to mirror the limitations of rtl_for_decl_init
> here; IMO we might as well go ahead mangle everything so we don't need to
> change this place as well when the FIXME is resolved.  OK either way, but if
> you decide to keep this condition please update the FIXME in
> rtl_for_decl_init to point here.

I did that because the aggregate case can mean very large initializers that
would need mangling a lot of symbols even when we actually don't need that
because we'd punt on it, and because I'm a little bit afraid how safe the
mangling is for -fcompare-debug if done only for the -g case (I vaguely
remember it can try to instantiate some templates etc.).
Can certainly adjust the FIXME comment.

Jakub

[PATCH] Fix PR 101515 (ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128)

2022-02-08 Thread Qing Zhao via Gcc-patches

Hi, 

This is the patch to fix PR101515 (ICE in pp_cxx_unqualified_id, at  
cp/cxx-pretty-print.c:128)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101515

It's possible that the TYPE_NAME of a record_type is NULL, therefore when
printing the TYPE_NAME, we should check and handle this special case.

Please see the comment of pr101515 for more details. 

The fix is very simple, just check and special handle cases when TYPE_NAME is 
NULL. 

Bootstrapped and regression tested on both x86 and aarch64, no issues.

Okay for commit?

Thanks.

Qing

=

>From f37ee8d21b80cb77d8108cb97a487c84c530545b Mon Sep 17 00:00:00 2001
From: Qing Zhao 
Date: Tue, 8 Feb 2022 16:10:37 +
Subject: [PATCH] Fix PR 101515 ICE in pp_cxx_unqualified_id, at
 cp/cxx-pretty-print.c:128.

It's possible that the TYPE_NAME of a record_type is NULL, therefore when
printing the TYPE_NAME, we should check and handle this special case.

gcc/cp/ChangeLog:

* cxx-pretty-print.cc (pp_cxx_unqualified_id): Check and handle
the case when TYPE_NAME is NULL.

gcc/testsuite/ChangeLog:

* g++.dg/pr101515.C: New test.
---
 gcc/cp/cxx-pretty-print.cc  |  5 -
 gcc/testsuite/g++.dg/pr101515.C | 25 +
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/pr101515.C

diff --git a/gcc/cp/cxx-pretty-print.cc b/gcc/cp/cxx-pretty-print.cc
index 4f9a090e520d..744ed0add5ba 100644
--- a/gcc/cp/cxx-pretty-print.cc
+++ b/gcc/cp/cxx-pretty-print.cc
@@ -171,7 +171,10 @@ pp_cxx_unqualified_id (cxx_pretty_printer *pp, tree t)
 case ENUMERAL_TYPE:
 case TYPENAME_TYPE:
 case UNBOUND_CLASS_TEMPLATE:
-  pp_cxx_unqualified_id (pp, TYPE_NAME (t));
+  if (TYPE_NAME (t))
+   pp_cxx_unqualified_id (pp, TYPE_NAME (t));
+  else
+   pp_string (pp, "");
   if (tree ti = TYPE_TEMPLATE_INFO_MAYBE_ALIAS (t))
if (PRIMARY_TEMPLATE_P (TI_TEMPLATE (ti)))
  {
diff --git a/gcc/testsuite/g++.dg/pr101515.C b/gcc/testsuite/g++.dg/pr101515.C
new file mode 100644
index ..898c7e003c22
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr101515.C
@@ -0,0 +1,25 @@
+/* PR101515 -  ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128 
+   { dg-do compile }
+   { dg-options "-Wuninitialized -O1" } */
+
+struct S
+{
+  int j;
+};
+struct T : public S
+{
+  virtual void h () {}
+};
+struct ptrmemfunc
+{
+  void (*ptr) ();
+};
+typedef void (S::*sp)();
+int main ()
+{
+  T t;
+  sp x;
+  ptrmemfunc *xp = (ptrmemfunc *) &x;
+  if (xp->ptr != ((void (*)())(sizeof(void *  /* { dg-warning "is used 
uninitialized" } */
+return 1;
+}
-- 
2.27.0

Re: [PATCH v3] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2022-02-08 Thread Bill Schmidt via Gcc-patches

Hi!

>From some discussion today, I think we want to limit the scope of
this patch to just the power8-fusion flag that's causing trouble for
now, given stage 4.  We've talked about making power8-fusion a do-
nothing flag, since it doesn't add much benefit now and probably
shouldn't be a separate flag anyway.  Having it as a meaningless
flag makes it more palatable to add an exception for it in the
inlining path.

Others, feel free to weigh in.

Thanks,
Bill

On 1/5/22 1:34 AM, Kewen.Lin wrote:
> Hi,
>
> This patch is to fix the inconsistent behaviors for non-LTO mode
> and LTO mode.  As Martin pointed out, currently the function
> rs6000_can_inline_p simply makes it inlinable if callee_tree is
> NULL, but it's unexpected, we should use the command line options
> from target_option_default_node as default.
>
> It replaces rs6000_isa_flags with target_option_default_node when
> caller_tree is NULL since it's more straightforward and doesn't
> suffer from some bug not to keep rs6000_isa_flags as default.
>
> It also extends the scope of the check for the case that callee
> has explicit set options, inlining in test case pr102059-5.c can
> happen unexpectedly before, it's fixed accordingly.
>
> As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION
> can be neglected for always inlining, this patch also takes some
> flags when the callee is attributed by always_inline.
>
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586112.html
>
> This patch is one re-post of this updated version[1] and also
> rebased and adjusted on top of the related commit r12-6219.
>
> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
> powerpc64le-linux-gnu P9 and P10.
>
> Is it ok for trunk?
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586296.html
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
>   PR target/102059
>   * config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
>   target_option_default_node and consider always_inline_safe flags.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/102059
>   * gcc.target/powerpc/pr102059-4.c: New test.
>   * gcc.target/powerpc/pr102059-5.c: New test.
>   * gcc.target/powerpc/pr102059-6.c: New test.
>   * gcc.target/powerpc/pr102059-7.c: New test.
>   * gcc.target/powerpc/pr102059-8.c: New test.
>   * gcc.dg/lto/pr102059-1_0.c: Remove unneeded option.
>
>

[PATCH V2] tree-optimization/104288 - Register non-null side effects properly.

2022-02-08 Thread Andrew MacLeod via Gcc-patches


On 2/8/22 03:32, Richard Biener wrote:

On Tue, Feb 8, 2022 at 2:33 AM Andrew MacLeod via Gcc-patches
 wrote:

On 2/7/22 09:29, Andrew MacLeod wrote:

I have a proposal for PR 104288.

ALL patches (in sequence) bootstrap on their own and each cause no
regressions.

I've been continuing to work with this towards the GCC13 solution for
statement side effects.  And There is another possibility we could
pursue which is less intrusive

I adapted the portions of patch 2/4 which process nonnull, but changes
it from being in the nonnull class to being in the cache.

THe main difference is it continues to use a single bit, just changing
all uses of it to *always* mean its non-null on exit to the block.

Range-on-entry is changed to only check dominator blocks, and
range_of_expr doesn't check the bit at all.. in both ranger and the cache.

When we are doing the block walk and process side effects, the on-entry
*cache* range for the block is adjusted to be non-null instead.   The
statements which precede this stmt have already been processed, and all
remaining statements in this block will now pick up this new non-value
from the on-entry cache.  This should be perfectly safe.

The final tweak is when range_of_expr is looking the def block, it
normally does not have an on entry cache value.  (the def is in the
block, so there is no entry cache value).  It now checks to see if we
have set one, which can only happen when we have been doing the
statement walk and set the on-entry cache with  a non-null value.  This
allows us to pick up the non-null range in the def block too... (once we
have passed a statement which set nonnull of course).

THis patch provides much less change to trunk, and is probably a better
approach as its closer to what is going to happen in GCC13.

Im still working with it, so the patch isnt fully refined yet... but it
bootstraps and passes all test with no regressions.  And passes the new
tests too.   I figured I would mention this before you look to hard at
the other patchset.the GCC11 patch doesn't change.

Let me know if you prefer this I think I do :-)  less churn, and
closer to end state.

Yes, I very much prefer this - some comments to the other patches
still apply to this one.  Like using get_nonnull_args and probably
adding a bail-out to computing ranges from stmts that can throw
internally or have abnormal control flow (to not get into range-on-exit
vs. normal vs. exceptional or abnormal edges).

Richard.


with some minor performance tweaks, such as moving adjust_range() to the 
header so it can be inlined .


range-on-edge now applies the non-null from the src block if 
appropriate, not in range-on-exit.   That should resolve  the internal 
throwing statements I think.  and I have switched over to 
get_nonnull_args().


Bootstraps on build-x86_64-pc-linux-gnu and passes all regressions.

OK for trunk, or did I miss something?

Andrew

PS. odd.. I haven't seen the git diff be wrong before, but it shows the 
ranger_cache::range_on_edge changes as being in 
gimple_cache::range_of_expr...  They are most definitely are not   
and it applies/de-applies fine.. so its just an oddity I guess.


From 2f38b3de7958d83dab383c73029aedbd108d8930 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 7 Feb 2022 15:52:16 -0500
Subject: [PATCH] Register non-null side effects properly.

This patch adjusts uses of nonnull to accurately reflect "somewhere in block".
It also adds the ability to register statement side effects within a block
for ranger which will apply for the rest of the block.

	PR tree-optimization/104288
	gcc/
	* gimple-range-cache.cc (non_null_ref::set_nonnull): New.
	(non_null_ref::adjust_range): Move to header.
	(ranger_cache::range_of_def): Don't check non-null.
	(ranger_cache::entry_range): Don't check non-null.
	(ranger_cache::range_on_edge): Check for nonnull on normal edges.
	(ranger_cache::update_to_nonnull): New.
(non_null_loadstore): New.
(ranger_cache::block_apply_nonnull): New.
	* ranger-cache.h (class non_null_ref): Update prototypes.
	(non_null_ref::adjust_range): Move to here and inline.
	(class ranger_cache): Update prototypes.
	* gimple-range-path.cc (path_range_query::range_defined_in_block): Do
	not search dominators.
	(path_range_query::adjust_for_non_null_uses): Ditto.
	* gimple-range.cc (gimple_ranger::range_of_expr): Check on-entry for
	def overrides.  Do not check nonnull.
	(gimple_ranger::range_on_entry): Check dominators for nonnull.
	(gimple_ranger::range_on_edge): Check for nonnull on normal edges..
	* gimple-range.cc (gimple_ranger::register_side_effects): New.
	* gimple-range.h (gimple_ranger::register_side_effects): New.
	* tree-vrp.cc (rvrp_folder::fold_stmt): Call register_side_effects.

	testsuite/gcc/
	* gcc.dg/pr104288.c: New.
---
 gcc/gimple-range-cache.cc   | 135 
 gcc/gimple-range-cache.h|  31 
 gcc/gimple-range-path.cc|   4 +-
 gcc/gimple-range.cc

Re: [PATCH] libstdc++: Decouple HAVE_FCNTL_H from HAVE_DIRENT_H check

2022-02-08 Thread Dimitar Dimitrov

On Mon, Feb 07, 2022 at 09:05:45PM +, Jonathan Wakely wrote:
> On Mon, 7 Feb 2022 at 21:01, Jonathan Wakely  wrote:
> 
> >
> >
> > On Mon, 7 Feb 2022 at 20:12, Dimitar Dimitrov  wrote:
> >
> >> On PRU target with newlib, we have the following combination in config.h:
> >>   /* #undef HAVE_DIRENT_H */
> >>   #define HAVE_FCNTL_H 1
> >>   #define HAVE_UNLINKAT 1
> >>
> >> In newlib, targets which do not define dirent.h, get a build error when
> >> including :
> >>
> >> https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/include/sys/dirent.h;hb=HEAD
> >>
> >> While fs_dir.cc correctly checks for HAVE_FCNTL_H, dir-common.h doesn't,
> >> and instead uses HAVE_DIRENT_H. This results in unlinkat() function call
> >> in fs_dir.cc without the needed  include in dir-common.h. Thus
> >> a build failure:
> >>   .../gcc/libstdc++-v3/src/c++17/fs_dir.cc:151:11: error: ‘::unlinkat’
> >> has not been declared; did you mean ‘unlink’?
> >>
> >> Fix by encapsulating  include with the correct check.
> >>
> >
> > But there's no point doing anything in that file if we don't have
> > , the whole thing is unusable. There's no point making the
> > members using unlinkat compile if you can't ever construct the type.
> >
> > So I think we want a different fix.
> >
> 
> 
> Maybe something like:
> 
> --- a/libstdc++-v3/src/filesystem/dir-common.h
> +++ b/libstdc++-v3/src/filesystem/dir-common.h
> @@ -70,6 +70,8 @@ struct DIR { };
> inline DIR* opendir(const char*) { return nullptr; }
> inline dirent* readdir(DIR*) { return nullptr; }
> inline int closedir(DIR*) { return -1; }
> +#undef _GLIBCXX_HAVE_DIRFD
> +#undef _GLIBCXX_HAVE_UNLINKAT
> #endif
> } // namespace __gnu_posix
Yes, this fixes the PRU target, and does not regress
x86_64-pc-linux-gnu.

Regards,
Dimitar

[committed] libstdc++: Simplify resource management in directory iterators

2022-02-08 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.


This replaces the _Dir constructor that takes ownership of an existing
DIR* resource with one that takes a _Dir_base rvalue instead. This means
a raw DIR* is never passed around, but is always owned by a _Dir_base
object.

libstdc++-v3/ChangeLog:

* src/c++17/fs_dir.cc (_Dir(DIR*, const path&)): Change first
parameter to _Dir_base&&.
* src/filesystem/dir-common.h (_Dir_base(DIR*)): Remove.
* src/filesystem/dir.cc (_Dir(DIR*, const path&)): Change first
parameter to _Dir_base&&.
---
 libstdc++-v3/src/c++17/fs_dir.cc | 4 ++--
 libstdc++-v3/src/filesystem/dir-common.h | 2 --
 libstdc++-v3/src/filesystem/dir.cc   | 4 ++--
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/src/c++17/fs_dir.cc b/libstdc++-v3/src/c++17/fs_dir.cc
index 54f135d2baf..c67fe76bc14 100644
--- a/libstdc++-v3/src/c++17/fs_dir.cc
+++ b/libstdc++-v3/src/c++17/fs_dir.cc
@@ -57,7 +57,7 @@ struct fs::_Dir : _Dir_base
   path = p;
   }
 
-  _Dir(posix::DIR* dirp, const path& p) : _Dir_base(dirp), path(p) { }
+  _Dir(_Dir_base&& d, const path& p) : _Dir_base(std::move(d)), path(p) { }
 
   _Dir(_Dir&&) = default;
 
@@ -140,7 +140,7 @@ struct fs::_Dir : _Dir_base
 _Dir_base d(dirfd, pathname, skip_permission_denied, nofollow, ec);
 // If this->path is empty, the new _Dir should have an empty path too.
 const fs::path& p = this->path.empty() ? this->path : this->entry.path();
-return _Dir(std::exchange(d.dirp, nullptr), p);
+return _Dir(std::move(d), p);
   }
 
   bool
diff --git a/libstdc++-v3/src/filesystem/dir-common.h 
b/libstdc++-v3/src/filesystem/dir-common.h
index 0b7665a3f70..511b988f1c7 100644
--- a/libstdc++-v3/src/filesystem/dir-common.h
+++ b/libstdc++-v3/src/filesystem/dir-common.h
@@ -89,8 +89,6 @@ is_permission_denied_error(int e)
 
 struct _Dir_base
 {
-  _Dir_base(posix::DIR* dirp = nullptr) : dirp(dirp) { }
-
   // If no error occurs then dirp is non-null,
   // otherwise null (even if a permission denied error is ignored).
   _Dir_base(int fd, const posix::char_type* pathname,
diff --git a/libstdc++-v3/src/filesystem/dir.cc 
b/libstdc++-v3/src/filesystem/dir.cc
index e838b4bc6bf..b451902c4a1 100644
--- a/libstdc++-v3/src/filesystem/dir.cc
+++ b/libstdc++-v3/src/filesystem/dir.cc
@@ -59,7 +59,7 @@ struct fs::_Dir : std::filesystem::_Dir_base
   path = p;
   }
 
-  _Dir(posix::DIR* dirp, const path& p) : _Dir_base(dirp), path(p) { }
+  _Dir(_Dir_base&& d, const path& p) : _Dir_base(std::move(d)), path(p) { }
 
   _Dir(_Dir&&) = default;
 
@@ -133,7 +133,7 @@ struct fs::_Dir : std::filesystem::_Dir_base
   {
 auto [dirfd, pathname] = dir_and_pathname();
 _Dir_base d(dirfd, pathname, skip_permission_denied, nofollow, ec);
-return _Dir(std::exchange(d.dirp, nullptr), entry.path());
+return _Dir(std::move(d), entry.path());
   }
 
   fs::path path;
-- 
2.34.1

Re: [PATCH] libstdc++: Decouple HAVE_FCNTL_H from HAVE_DIRENT_H check

2022-02-08 Thread Jonathan Wakely via Gcc-patches

On Tue, 8 Feb 2022 at 21:02, Dimitar Dimitrov  wrote:

> On Mon, Feb 07, 2022 at 09:05:45PM +, Jonathan Wakely wrote:
> > On Mon, 7 Feb 2022 at 21:01, Jonathan Wakely 
> wrote:
> >
> > >
> > >
> > > On Mon, 7 Feb 2022 at 20:12, Dimitar Dimitrov 
> wrote:
> > >
> > >> On PRU target with newlib, we have the following combination in
> config.h:
> > >>   /* #undef HAVE_DIRENT_H */
> > >>   #define HAVE_FCNTL_H 1
> > >>   #define HAVE_UNLINKAT 1
> > >>
> > >> In newlib, targets which do not define dirent.h, get a build error
> when
> > >> including :
> > >>
> > >>
> https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/include/sys/dirent.h;hb=HEAD
> > >>
> > >> While fs_dir.cc correctly checks for HAVE_FCNTL_H, dir-common.h
> doesn't,
> > >> and instead uses HAVE_DIRENT_H. This results in unlinkat() function
> call
> > >> in fs_dir.cc without the needed  include in dir-common.h.
> Thus
> > >> a build failure:
> > >>   .../gcc/libstdc++-v3/src/c++17/fs_dir.cc:151:11: error: ‘::unlinkat’
> > >> has not been declared; did you mean ‘unlink’?
> > >>
> > >> Fix by encapsulating  include with the correct check.
> > >>
> > >
> > > But there's no point doing anything in that file if we don't have
> > > , the whole thing is unusable. There's no point making the
> > > members using unlinkat compile if you can't ever construct the type.
> > >
> > > So I think we want a different fix.
> > >
> >
> >
> > Maybe something like:
> >
> > --- a/libstdc++-v3/src/filesystem/dir-common.h
> > +++ b/libstdc++-v3/src/filesystem/dir-common.h
> > @@ -70,6 +70,8 @@ struct DIR { };
> > inline DIR* opendir(const char*) { return nullptr; }
> > inline dirent* readdir(DIR*) { return nullptr; }
> > inline int closedir(DIR*) { return -1; }
> > +#undef _GLIBCXX_HAVE_DIRFD
> > +#undef _GLIBCXX_HAVE_UNLINKAT
> > #endif
> > } // namespace __gnu_posix
> Yes, this fixes the PRU target, and does not regress
> x86_64-pc-linux-gnu.
>

Thanks for checking it. I'm just testing it myself on powerpc64le-linux-gnu
and will push when it finishes.

[PATCH] rs6000: Correct function prototypes for vec_replace_unaligned

2022-02-08 Thread Bill Schmidt via Gcc-patches

Hi!

Due to a pasto error in the documentation, vec_replace_unaligned was
implemented with the same function prototypes as vec_replace_elt.  It was
intended that vec_replace_unaligned always specify output vectors as having
type vector unsigned char, to emphasize that elements are potentially
misaligned by this built-in function.  This patch corrects the
misimplementation.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for trunk?  Eventually I would also like to backport it
to GCC 11, after burn-in.

Thanks!
Bill


2022-02-04  Bill Schmidt  

gcc/
* config/rs6000/rs6000-builtins.def (VREPLACE_UN_UV2DI): Change
function prototype.
(VREPLACE_UN_UV4SI): Likewise.
(VREPLACE_UN_V2DF): Likewise.
(VREPLACE_UN_V2DI): Likewise.
(VREPLACE_UN_V4SF): Likewise.
(VREPLACE_UN_V4SI): Likewise.
* config/rs6000/rs6000-overload.def (VEC_REPLACE_UN): Change all
function prototypes.
* config/rs6000/vsx.md (vreplace_un_): Remove define_expand.
(vreplace_un_): New define_insn.

gcc/testsuite/
* gcc.target/powerpc/vec-replace-word-runnable.c: Handle expected
prototypes for each call to vec_replace_unaligned.
---
 gcc/config/rs6000/rs6000-builtins.def | 16 ++--
 gcc/config/rs6000/rs6000-overload.def | 12 -
 gcc/config/rs6000/vsx.md  | 25 ---
 .../powerpc/vec-replace-word-runnable.c   | 20 ++-
 4 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 5c988cc1152..846c0bafd45 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -3387,25 +3387,25 @@
   const vull __builtin_altivec_vpextd (vull, vull);
 VPEXTD vpextd {}
 
-  const vull __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \
-  const int<4>);
+  const vuc __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \
+ const int<4>);
 VREPLACE_UN_UV2DI vreplace_un_v2di {}
 
-  const vui __builtin_altivec_vreplace_un_uv4si (vui, unsigned int, \
+  const vuc __builtin_altivec_vreplace_un_uv4si (vui, unsigned int, \
  const int<4>);
 VREPLACE_UN_UV4SI vreplace_un_v4si {}
 
-  const vd __builtin_altivec_vreplace_un_v2df (vd, double, const int<4>);
+  const vuc __builtin_altivec_vreplace_un_v2df (vd, double, const int<4>);
 VREPLACE_UN_V2DF vreplace_un_v2df {}
 
-  const vsll __builtin_altivec_vreplace_un_v2di (vsll, signed long long, \
- const int<4>);
+  const vuc __builtin_altivec_vreplace_un_v2di (vsll, signed long long, \
+const int<4>);
 VREPLACE_UN_V2DI vreplace_un_v2di {}
 
-  const vf __builtin_altivec_vreplace_un_v4sf (vf, float, const int<4>);
+  const vuc __builtin_altivec_vreplace_un_v4sf (vf, float, const int<4>);
 VREPLACE_UN_V4SF vreplace_un_v4sf {}
 
-  const vsi __builtin_altivec_vreplace_un_v4si (vsi, signed int, const int<4>);
+  const vuc __builtin_altivec_vreplace_un_v4si (vsi, signed int, const int<4>);
 VREPLACE_UN_V4SI vreplace_un_v4si {}
 
   const vull __builtin_altivec_vreplace_uv2di (vull, unsigned long long, \
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 49a6104ddd2..44e2945aaa0 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3059,17 +3059,17 @@
 VREPLACE_ELT_V2DF
 
 [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
-  vui __builtin_vec_replace_un (vui, unsigned int, const int);
+  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
 VREPLACE_UN_UV4SI
-  vsi __builtin_vec_replace_un (vsi, signed int, const int);
+  vuc __builtin_vec_replace_un (vsi, signed int, const int);
 VREPLACE_UN_V4SI
-  vull __builtin_vec_replace_un (vull, unsigned long long, const int);
+  vuc __builtin_vec_replace_un (vull, unsigned long long, const int);
 VREPLACE_UN_UV2DI
-  vsll __builtin_vec_replace_un (vsll, signed long long, const int);
+  vuc __builtin_vec_replace_un (vsll, signed long long, const int);
 VREPLACE_UN_V2DI
-  vf __builtin_vec_replace_un (vf, float, const int);
+  vuc __builtin_vec_replace_un (vf, float, const int);
 VREPLACE_UN_V4SF
-  vd __builtin_vec_replace_un (vd, double, const int);
+  vuc __builtin_vec_replace_un (vd, double, const int);
 VREPLACE_UN_V2DF
 
 [VEC_REVB, vec_revb, __builtin_vec_revb]
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 2f5a2f7828d..b53de103872 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4197,21 +4197,6 @@ (define_expand "vreplace_elt_"
  }
 [(set_attr "type" "vecsimple")])
 
-(def

[PATCH] handle "invisible" reference in -Wdangling-pointer (PR104436)

2022-02-08 Thread Martin Sebor via Gcc-patches


Transforming a by-value arguments to by-reference as GCC does for some
class types can trigger -Wdangling-pointer when the argument is used
to store the address of a local variable.  Since the stored value is
not accessible in the caller the warning is a false positive.

The attached patch handles this case by excluding PARM_DECLs with
the DECL_BY_REFERENCE bit set from consideration.

While testing the patch I noticed some instances of the warning are
uninitentionally duplicated as the pass runs more than once.  To avoid
that, I also introduce warning suppression into the handler for this
instance of the warning.  (There might still be others.)

Tested on x86_64-linux.

MartinAvoid -Wdangling-pointer for by-transparent-reference arguments [PR104436].

Resolves:
PR middle-end/104436 - spurious -Wdangling-pointer assigning local address to a class passed by value

gcc/ChangeLog:

	PR middle-end/104436
	* gimple-ssa-warn-access.cc (pass_waccess::check_dangling_stores):
	Check for warning suppression.  Avoid by-value arguments transformed
	into by-transparent-reference.

gcc/testsuite/ChangeLog:

	PR middle-end/104436
	* c-c++-common/Wdangling-pointer-7.c: New test.
	* g++.dg/warn/Wdangling-pointer-4.C: New test.

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 80d41ea4383..0c319a32b70 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -4517,6 +4517,9 @@ pass_waccess::check_dangling_stores (basic_block bb,
   if (!stmt)
 	break;
 
+  if (warning_suppressed_p (stmt, OPT_Wdangling_pointer_))
+	continue;
+
   if (is_gimple_call (stmt)
 	  && !(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE)))
 	/* Avoid looking before nonconst, nonpure calls since those might
@@ -4542,10 +4545,16 @@ pass_waccess::check_dangling_stores (basic_block bb,
 	}
   else if (TREE_CODE (lhs_ref.ref) == SSA_NAME)
 	{
-	  /* Avoid looking at or before stores into unknown objects.  */
 	  gimple *def_stmt = SSA_NAME_DEF_STMT (lhs_ref.ref);
 	  if (!gimple_nop_p (def_stmt))
+	/* Avoid looking at or before stores into unknown objects.  */
 	return;
+
+	  tree var = SSA_NAME_VAR (lhs_ref.ref);
+	  if (DECL_BY_REFERENCE (var))
+	/* Avoid by-value arguments transformed into by-reference.  */
+	continue;
+
 	}
   else if (TREE_CODE (lhs_ref.ref) == MEM_REF)
 	{
@@ -4578,6 +4587,8 @@ pass_waccess::check_dangling_stores (basic_block bb,
 		  "storing the address of local variable %qD in %qE",
 		  rhs_ref.ref, lhs))
 	{
+	  suppress_warning (stmt, OPT_Wdangling_pointer_);
+
 	  location_t loc = DECL_SOURCE_LOCATION (rhs_ref.ref);
 	  inform (loc, "%qD declared here", rhs_ref.ref);
 
diff --git a/gcc/testsuite/c-c++-common/Wdangling-pointer-7.c b/gcc/testsuite/c-c++-common/Wdangling-pointer-7.c
new file mode 100644
index 000..433727dd845
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wdangling-pointer-7.c
@@ -0,0 +1,20 @@
+/* Verify -Wdangling-pointer is issued only once.
+   { dg-do compile }
+   { dg-options "-O -Wall" } */
+
+void *p;
+
+void escape_global_warn_once (void)
+{
+  int x[5];
+
+  p = &x[3];// { dg-regexp "\[^\n\r\]+: warning: \[^\n\r\]+ \\\[-Wdangling-pointer.?\\\]" "message" }
+}
+
+
+void escape_param_warn_once (void **p)
+{
+  int x[5];
+
+  *p = &x[3];   // { dg-regexp "\[^\n\r\]+: warning: \[^\n\r\]+ \\\[-Wdangling-pointer.?\\\]" "message" }
+}
diff --git a/gcc/testsuite/g++.dg/warn/Wdangling-pointer-4.C b/gcc/testsuite/g++.dg/warn/Wdangling-pointer-4.C
new file mode 100644
index 000..b3d144a9e6d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wdangling-pointer-4.C
@@ -0,0 +1,34 @@
+/* PR middle-end/104436 - spurious -Wdangling-pointer assigning local
+   address to a class passed by value
+   { dg-do compile }
+   { dg-options "-O1 -Wall" } */
+
+struct S
+{
+  S (void *p): p (p) { }
+  S (const S &s): p (s.p) { }
+
+  void *p;
+};
+
+
+void nowarn_assign_by_value (S s)
+{
+  int i;
+  S t (&i);
+  s = t;// { dg-bogus "-Wdangling-pointer" }
+}
+
+void nowarn_assign_by_value_arg (S s)
+{
+  S t (&s);
+  s = t;// { dg-bogus "-Wdangling-pointer" }
+}
+
+
+void warn_assign_local_by_reference (S &s)
+{
+  int i;
+  S t (&i);
+  s = t;// { dg-warning "-Wdangling-pointer" }
+}

Re: [PATCH] Fix PR 101515 (ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128)

2022-02-08 Thread Jason Merrill via Gcc-patches


On 2/8/22 15:11, Qing Zhao wrote:

Hi,

This is the patch to fix PR101515 (ICE in pp_cxx_unqualified_id, at  
cp/cxx-pretty-print.c:128)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101515

It's possible that the TYPE_NAME of a record_type is NULL, therefore when
printing the TYPE_NAME, we should check and handle this special case.

Please see the comment of pr101515 for more details.

The fix is very simple, just check and special handle cases when TYPE_NAME is 
NULL.

Bootstrapped and regression tested on both x86 and aarch64, no issues.

Okay for commit?

Thanks.

Qing

=

 From f37ee8d21b80cb77d8108cb97a487c84c530545b Mon Sep 17 00:00:00 2001
From: Qing Zhao 
Date: Tue, 8 Feb 2022 16:10:37 +
Subject: [PATCH] Fix PR 101515 ICE in pp_cxx_unqualified_id, at
  cp/cxx-pretty-print.c:128.

It's possible that the TYPE_NAME of a record_type is NULL, therefore when
printing the TYPE_NAME, we should check and handle this special case.

gcc/cp/ChangeLog:

* cxx-pretty-print.cc (pp_cxx_unqualified_id): Check and handle
the case when TYPE_NAME is NULL.

gcc/testsuite/ChangeLog:

* g++.dg/pr101515.C: New test.
---
  gcc/cp/cxx-pretty-print.cc  |  5 -
  gcc/testsuite/g++.dg/pr101515.C | 25 +
  2 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/pr101515.C

diff --git a/gcc/cp/cxx-pretty-print.cc b/gcc/cp/cxx-pretty-print.cc
index 4f9a090e520d..744ed0add5ba 100644
--- a/gcc/cp/cxx-pretty-print.cc
+++ b/gcc/cp/cxx-pretty-print.cc
@@ -171,7 +171,10 @@ pp_cxx_unqualified_id (cxx_pretty_printer *pp, tree t)
  case ENUMERAL_TYPE:
  case TYPENAME_TYPE:
  case UNBOUND_CLASS_TEMPLATE:
-  pp_cxx_unqualified_id (pp, TYPE_NAME (t));
+  if (TYPE_NAME (t))
+   pp_cxx_unqualified_id (pp, TYPE_NAME (t));
+  else
+   pp_string (pp, "");


Hmm, but it's not an unnamed class, it's a pointer to member function 
type, and it would be better to avoid dumping compiler internal 
representations like the __pfn field name.


I think the real problem comes sooner, when c_fold_indirect_ref_for_warn 
turns a MEM_REF with RECORD_TYPE into a COMPONENT_REF with POINTER_TYPE.



if (tree ti = TYPE_TEMPLATE_INFO_MAYBE_ALIAS (t))
if (PRIMARY_TEMPLATE_P (TI_TEMPLATE (ti)))
  {
diff --git a/gcc/testsuite/g++.dg/pr101515.C b/gcc/testsuite/g++.dg/pr101515.C
new file mode 100644
index ..898c7e003c22
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr101515.C
@@ -0,0 +1,25 @@
+/* PR101515 -  ICE in pp_cxx_unqualified_id, at cp/cxx-pretty-print.c:128
+   { dg-do compile }
+   { dg-options "-Wuninitialized -O1" } */
+
+struct S
+{
+  int j;
+};
+struct T : public S
+{
+  virtual void h () {}
+};
+struct ptrmemfunc
+{
+  void (*ptr) ();
+};
+typedef void (S::*sp)();
+int main ()
+{
+  T t;
+  sp x;
+  ptrmemfunc *xp = (ptrmemfunc *) &x;
+  if (xp->ptr != ((void (*)())(sizeof(void *  /* { dg-warning "is used 
uninitialized" } */
+return 1;
+}

Re: [PATCH] handle "invisible" reference in -Wdangling-pointer (PR104436)

2022-02-08 Thread Jason Merrill via Gcc-patches


On 2/8/22 16:59, Martin Sebor wrote:

Transforming a by-value arguments to by-reference as GCC does for some
class types can trigger -Wdangling-pointer when the argument is used
to store the address of a local variable.  Since the stored value is
not accessible in the caller the warning is a false positive.

The attached patch handles this case by excluding PARM_DECLs with
the DECL_BY_REFERENCE bit set from consideration.

While testing the patch I noticed some instances of the warning are
uninitentionally duplicated as the pass runs more than once.  To avoid
that, I also introduce warning suppression into the handler for this
instance of the warning.  (There might still be others.)


The second test should verify that we do warn about returning 't' from a 
function; we don't want to ignore the DECL_BY_REFERENCE RESULT_DECL.



+ tree var = SSA_NAME_VAR (lhs_ref.ref);
+ if (DECL_BY_REFERENCE (var))
+   /* Avoid by-value arguments transformed into by-reference.  */
+   continue;


I wonder if we can we express this property of invisiref parms somewhere 
more general?  I imagine optimizations would find it useful as well. 
Could pointer_query somehow treat the reference as pointing to a 
function-local object?


I previously tried to express this by marking the reference as 
'restrict', but that was wrong 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97474).


Jason

[PATCH] PR tree-optimization/104420: Fix checks for constant folding X*0.0

2022-02-08 Thread Roger Sayle


This patch resolves PR tree-optimization/104420, which is a P1 regression
where, as observed by Jakub Jelinek, the conditions for constant folding
x*0.0 are incorrect (following my patch for PR tree-optimization/96392).
The multiplication x*0.0 may yield a negative zero result, -0.0, if X is
negative (not just if x may be negative zero).  Hence (without -ffast-math)
(int)x*0.0 can't be optimized to 0.0, but (unsigned)x*0.0 can be constant
folded.  This adds a bunch of test cases to confirm the desired behaviour,
and removes an incorrect test from gcc.dg/pr96392.c which checked for the
wrong behaviour.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
make -k check no new failures.  Ok for mainline?

2022-02-08  Roger Sayle  

gcc/ChangeLog
PR tree-optimization/104420
* match.pd (mult @0 real_zerop): Tweak conditions for constant
folding X*0.0 (or X*-0.0) to HONOR_SIGNED_ZEROS when appropriate.

gcc/testsuite/ChangeLog
PR tree-optimization/104420
* gcc.dg/pr104420-1.c: New test case.
* gcc.dg/pr104420-2.c: New test case.
* gcc.dg/pr104420-3.c: New test case.
* gcc.dg/pr104420-4.c: New test case.
* gcc.dg/pr96392.c: Remove incorrect test.

Thanks in advance (and sorry for the breakage/thinko).
Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index 7bbb801..4fe5909 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -262,8 +262,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (mult @0 real_zerop@1)
  (if (!tree_expr_maybe_nan_p (@0)
   && (!HONOR_NANS (type) || !tree_expr_maybe_infinite_p (@0))
-  && !tree_expr_maybe_real_minus_zero_p (@0)
-  && !tree_expr_maybe_real_minus_zero_p (@1))
+  && (!HONOR_SIGNED_ZEROS (type) || tree_expr_nonnegative_p (@0)))
   @1))
 
 /* In IEEE floating point, x*1 is not equivalent to x for snans.
diff --git a/gcc/testsuite/gcc.dg/pr104420-1.c 
b/gcc/testsuite/gcc.dg/pr104420-1.c
new file mode 100644
index 000..48385fa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104420-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-add-options ieee } */
+
+double f(int a)
+{
+  return a * 0.0;
+}
+
+/* { dg-final { scan-tree-dump " \\\* 0.0" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr104420-2.c 
b/gcc/testsuite/gcc.dg/pr104420-2.c
new file mode 100644
index 000..49d0189
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104420-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-add-options ieee } */
+
+double f(int a)
+{
+  return a * -0.0;
+}
+
+/* { dg-final { scan-tree-dump " \\\* -0.0" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr104420-3.c 
b/gcc/testsuite/gcc.dg/pr104420-3.c
new file mode 100644
index 000..962dfff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104420-3.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-add-options ieee } */
+
+double f(unsigned int a)
+{
+  return a * 0.0;
+}
+
+/* { dg-final { scan-tree-dump "return 0.0" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr104420-4.c 
b/gcc/testsuite/gcc.dg/pr104420-4.c
new file mode 100644
index 000..95ed0cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104420-4.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-add-options ieee } */
+
+double f(unsigned int a)
+{
+  return a * -0.0;
+}
+
+/* { dg-final { scan-tree-dump "return -0.0" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr96392.c b/gcc/testsuite/gcc.dg/pr96392.c
index 662bacb..fb7de21 100644
--- a/gcc/testsuite/gcc.dg/pr96392.c
+++ b/gcc/testsuite/gcc.dg/pr96392.c
@@ -12,11 +12,6 @@ double sub0(int x)
   return x - 0.0;
 }
 
-double mult0(int x)
-{
-  return 0.0 * x;
-}
-
 double negate(int x)
 {
   return 0.0 - x;
@@ -29,5 +24,4 @@ double subtract(int x)
 
 /* { dg-final { scan-tree-dump-not " \\+ " "optimized" } } */
 /* { dg-final { scan-tree-dump-not " \\- " "optimized" } } */
-/* { dg-final { scan-tree-dump-not " \\* " "optimized" } } */

[PATCH] libstdc++: Fix deadlock in atomic wait [PR104442]

2022-02-08 Thread Thomas Rodgers via Gcc-patches

This issue was observed as a deadloack in
29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is
"laundered" (e.g. type T* does not suffice as a waitable address for the
platform's native waiting primitive), the address waited is that of the
_M_ver member of __waiter_pool_base, so several threads may wait on the
same address for unrelated atomic's. As noted in the PR, the
implementation correctly exits the wait for the thread who's data
changed, but not for any other threads waiting on the same address.

As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting
but the other waiters were not reloaded the value of _M_ver before
re-entering the wait.

Moving the spin call inside the loop accomplishes this, and is
consistent with the predicate accepting version of __waiter::_M_do_wait.
From ee66736beca3dce4bc09350c5407a2ac7219fbec Mon Sep 17 00:00:00 2001
From: Thomas Rodgers 
Date: Tue, 8 Feb 2022 16:33:36 -0800
Subject: [PATCH] libstdc++: Fix deadlock in atomic wait [PR104442]

This issue was observed as a deadloack in
29_atomics/atomic/wait_notify/100334.cc on vxworks. When a wait is
"laundered" (e.g. type T* does not suffice as a waitable address for the
platform's native waiting primitive), the address waited is that of the
_M_ver member of __waiter_pool_base, so several threads may wait on the
same address for unrelated atomic's. As noted in the PR, the
implementation correctly exits the wait for the thread who's data
changed, but not for any other threads waiting on the same address.

As noted in the PR the __waiter::_M_do_wait_v member was correctly exiting
but the other waiters were not reloaded the value of _M_ver before
re-entering the wait.

Moving the spin call inside the loop accomplishes this, and is
consistent with the predicate accepting version of __waiter::_M_do_wait.

libstdc++-v3/ChangeLog:

	PR libstdc++/104442
	* include/bits/atomic_wait.h (__waiter::_M_do_wait_v): Move spin
	 loop inside do loop so that threads failing the wait, reload
	 _M_ver.
---
 libstdc++-v3/include/bits/atomic_wait.h | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h
index d7de0d7eb9e..33ce26ade1b 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -332,7 +332,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  }
 	else
 	  {
-		__atomic_load(__addr, &__val, __ATOMIC_RELAXED);
+		__atomic_load(__addr, &__val, __ATOMIC_SEQ_CST);
 	  }
 	return __atomic_spin(__pred, __spin);
 	  }
@@ -388,12 +388,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  void
 	  _M_do_wait_v(_Tp __old, _ValFn __vfn)
 	  {
-	__platform_wait_t __val;
-	if (__base_type::_M_do_spin_v(__old, __vfn, __val))
-	  return;
-
 	do
 	  {
+		__platform_wait_t __val;
+		if (__base_type::_M_do_spin_v(__old, __vfn, __val))
+		  return;
 		__base_type::_M_w._M_do_wait(__base_type::_M_addr, __val);
 	  }
 	while (__detail::__atomic_compare(__old, __vfn()));
-- 
2.34.1

[pushed] c++: cleanup constant-init'd members [PR96876]

2022-02-08 Thread Jason Merrill via Gcc-patches

This is a case missed by my recent fixes to aggregate initialization and
exception cleanup for PR94041 et al: we also need to clean up members with
constant initialization if initialization of a later member throws.

It also occurs to me that we needn't bother building the cleanups if
-fno-exceptions; build_vec_init already doesn't.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/96876

gcc/cp/ChangeLog:

* typeck2.cc (split_nonconstant_init_1): Push cleanups for
preceding members with constant initialization.
(maybe_push_temp_cleanup): Do nothing if -fno-exceptions.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/aggr-base11.C: New test.
* g++.dg/eh/aggregate2.C: New test.
---
 gcc/cp/typeck2.cc| 26 +++
 gcc/testsuite/g++.dg/cpp1z/aggr-base11.C | 19 +
 gcc/testsuite/g++.dg/eh/aggregate2.C | 27 
 3 files changed, 72 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/aggr-base11.C
 create mode 100644 gcc/testsuite/g++.dg/eh/aggregate2.C

diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 4015bd53257..39d03e4c3c4 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -467,6 +467,8 @@ cxx_incomplete_type_error (location_t loc, const_tree 
value, const_tree type)
 static void
 maybe_push_temp_cleanup (tree sub, vec **flags)
 {
+  if (!flag_exceptions)
+return;
   if (tree cleanup
   = cxx_maybe_build_cleanup (sub, tf_warning_or_error))
 {
@@ -496,6 +498,7 @@ split_nonconstant_init_1 (tree dest, tree init, bool last,
   bool array_type_p = false;
   bool complete_p = true;
   HOST_WIDE_INT num_split_elts = 0;
+  tree last_split_elt = NULL_TREE;
 
   switch (TREE_CODE (type))
 {
@@ -572,6 +575,7 @@ split_nonconstant_init_1 (tree dest, tree init, bool last,
  else
{
  /* Mark element for removal.  */
+ last_split_elt = field_index;
  CONSTRUCTOR_ELT (init, idx)->index = NULL_TREE;
  if (idx < tidx)
tidx = idx;
@@ -584,6 +588,7 @@ split_nonconstant_init_1 (tree dest, tree init, bool last,
  flags));
 
  /* Mark element for removal.  */
+ last_split_elt = field_index;
  CONSTRUCTOR_ELT (init, idx)->index = NULL_TREE;
  if (idx < tidx)
tidx = idx;
@@ -593,6 +598,26 @@ split_nonconstant_init_1 (tree dest, tree init, bool last,
{
  tree code;
 
+ /* Push cleanups for any preceding members with constant
+initialization.  */
+ if (CLASS_TYPE_P (type))
+   for (tree prev = (last_split_elt ?
+ DECL_CHAIN (last_split_elt)
+ : TYPE_FIELDS (type));
+; prev = DECL_CHAIN (prev))
+ {
+   prev = next_initializable_field (prev);
+   if (prev == field_index)
+ break;
+   tree ptype = TREE_TYPE (prev);
+   if (type_build_dtor_call (ptype))
+ {
+   tree pcref = build3 (COMPONENT_REF, ptype, dest, prev,
+NULL_TREE);
+   maybe_push_temp_cleanup (pcref, flags);
+ }
+ }
+
  /* Mark element for removal.  */
  CONSTRUCTOR_ELT (init, idx)->index = NULL_TREE;
  if (idx < tidx)
@@ -645,6 +670,7 @@ split_nonconstant_init_1 (tree dest, tree init, bool last,
maybe_push_temp_cleanup (sub, flags);
}
 
+ last_split_elt = field_index;
  num_split_elts++;
}
}
diff --git a/gcc/testsuite/g++.dg/cpp1z/aggr-base11.C 
b/gcc/testsuite/g++.dg/cpp1z/aggr-base11.C
new file mode 100644
index 000..88625dc9533
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/aggr-base11.C
@@ -0,0 +1,19 @@
+// PR c++/96876
+// { dg-do compile { target c++17 } }
+
+struct B {
+protected:
+~B() {}// { dg-message "" }
+};
+
+struct A { };
+struct C1: B { int n; };
+struct C2: A, B { int n; };
+
+A af ();
+int f();
+
+void g() {
+  C1 c1{ {}, f()}; // { dg-error "protected" }
+  C2 c2{ af(), {}, f()};   // { dg-error "protected" }
+}
diff --git a/gcc/testsuite/g++.dg/eh/aggregate2.C 
b/gcc/testsuite/g++.dg/eh/aggregate2.C
new file mode 100644
index 000..8424d63de2d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/eh/aggregate2.C
@@ -0,0 +1,27 @@
+// PR c++/96876
+// { dg-do run { target c++11 } }
+
+int d;
+struct B {
+  ~B() { ++d; }
+};
+
+struct C1 { B b; int n; };
+struct C2 { int i; B b; int n; };
+
+int f() { throw 24; return 42; }
+int dummy;
+int g() { ++dummy; return 42; }
+
+int main() {
+  try {
+C1 c{{}, f()};
+  } catch (...) { }

Re: [PATCH v3] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2022-02-08 Thread Kewen.Lin via Gcc-patches

Hi Bill,

on 2022/2/9 上午4:29, Bill Schmidt wrote:
> Hi!
> 
> From some discussion today, I think we want to limit the scope of
> this patch to just the power8-fusion flag that's causing trouble for
> now, given stage 4.  We've talked about making power8-fusion a do-
> nothing flag, since it doesn't add much benefit now and probably
> shouldn't be a separate flag anyway.  Having it as a meaningless
> flag makes it more palatable to add an exception for it in the
> inlining path.
> 
> Others, feel free to weigh in.
> 

Many thanks for caring about this.  It's a good idea to handle this
power8-fusion specially for now.

BR,
Kewen

> Thanks,
> Bill
> 
> On 1/5/22 1:34 AM, Kewen.Lin wrote:
>> Hi,
>>
>> This patch is to fix the inconsistent behaviors for non-LTO mode
>> and LTO mode.  As Martin pointed out, currently the function
>> rs6000_can_inline_p simply makes it inlinable if callee_tree is
>> NULL, but it's unexpected, we should use the command line options
>> from target_option_default_node as default.
>>
>> It replaces rs6000_isa_flags with target_option_default_node when
>> caller_tree is NULL since it's more straightforward and doesn't
>> suffer from some bug not to keep rs6000_isa_flags as default.
>>
>> It also extends the scope of the check for the case that callee
>> has explicit set options, inlining in test case pr102059-5.c can
>> happen unexpectedly before, it's fixed accordingly.
>>
>> As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION
>> can be neglected for always inlining, this patch also takes some
>> flags when the callee is attributed by always_inline.
>>
>> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html
>> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586112.html
>>
>> This patch is one re-post of this updated version[1] and also
>> rebased and adjusted on top of the related commit r12-6219.
>>
>> Bootstrapped and regtested on powerpc64-linux-gnu P8 and
>> powerpc64le-linux-gnu P9 and P10.
>>
>> Is it ok for trunk?
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586296.html
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>>  PR target/102059
>>  * config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
>>  target_option_default_node and consider always_inline_safe flags.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR target/102059
>>  * gcc.target/powerpc/pr102059-4.c: New test.
>>  * gcc.target/powerpc/pr102059-5.c: New test.
>>  * gcc.target/powerpc/pr102059-6.c: New test.
>>  * gcc.target/powerpc/pr102059-7.c: New test.
>>  * gcc.target/powerpc/pr102059-8.c: New test.
>>  * gcc.dg/lto/pr102059-1_0.c: Remove unneeded option.
>>
>>

Re: [Patch, fortran] PR37336 (Finalization) - [F03] Finish derived-type finalization

2022-02-08 Thread Jerry D via Gcc-patches

Remember the days when reading very old cryptic Fortran code? Remember 
the fixed line lengths and cryptic variable names!


I fear the Standards committee has achieved history with the Standard 
itself it is so difficult to understand sometimes.


Cheers to Paul and Harald for digging on this.

Jerry

On 2/8/22 11:29 AM, Harald Anlauf via Fortran wrote:

Hi Paul,

Am 08.02.22 um 12:22 schrieb Paul Richard Thomas via Fortran:

Hi Harald,

Thanks for giving the patch a whirl.


the parent components as an array. I strongly suspect that, from 
reading

7.5.6.2 paragraphs 2 and 3 closely, that ifort has it right. However,

this

is another issue to come back to in the future.


Could you specify which version of Intel you tried?



ifort (IFORT) 2021.1 Beta 20201112


ok, that's good to know.



Testcase finalize_38.f90 fails for me with ifort 2021.5.0 with:

131

This test also fails with crayftn 11 & 12 and nagfor 7.0,
but in a different place.



I have run your modified version of finalize_38.f90, and now I see
that you can get a bloody head just from scratching too much...

crayftn 12.0.2:

 1,  3,  1
 2,  21,  0
 11,  3,  2
 12,  21,  1
 21,  4,  3
 23,  21,  22 | 42,  43
 31,  6,  4
 41,  7,  5
 51,  9,  7
 61,  10,  8
 71,  13,  10
 101,  2,  1
 102,  4,  3
 111,  3,  2
 121,  4,  2
 122,  0,  4
 123,  5,  6 | 2*0
 131,  5,  2
 132,  0,  4
 133,  7,  8 | 2*0
 141,  6,  3
 151,  10,  5
 161,  16,  9
 171,  18,  11
 175,  0.,  20. | 10.,  20.

nagfor 7.0:

 1 0 1
 11 1 2
 23 21 22 | 42 43
 71 9 10
 72 11 99
 131 3 2
 132 5 4
 141 4 3
 151 6 5
 161 10 9
 171 12 11

Intel 2021.5.0:

 131   3   2
 132   0   4
 133   5   6 |   0   0
 141   4   3
 151   7   5
 152   3   0
 153   0   0 |   1   3
forrtl: severe (174): SIGSEGV, segmentation fault occurred
[...]


That got me reading 7.5.6.3, where is says in paragraph 1:

"When an intrinsic assignment statement is executed (10.2.1.3), if the
variable is not an unallocated allocatable variable, it is finalized
after evaluation of expr and before the definition of the variable.
..."

Looking at the beginning of the testcase code (abridged):

  type(simple), allocatable :: MyType, MyType2
  type(simple) :: ThyType = simple(21), ThyType2 = simple(22)

! The original PR - one finalization of 'var' before (re)allocation.
  MyType = ThyType
  call test(1, 0, [0,0], 0)


This is an intrinsic assignment.

Naively I would expect MyType to be initially unallocated.

ThyType is not allocatable and non-pointer and cannot become
undefined here and would not play any role in finalization.

I am probably too blind-sighted to see why there should be
a finalization here.  What am I missing?

Could you use the attached version of finalize_38.f90 with crayftn 
and NAG?

All the stop statements are replaced with prints. Ifort gives:
  131   3   2
  132   0   4
  133   5   6 |   0   0
  141   4   3
  151   7   5
  152   3   0
  153   0   0 |   1   3
  161  13   9
  162  20   0
  163   0   0 |  10  20
  171  14  11


I think it is a good idea to have these prints in the testcase
whenever there is a departure from expectations.  So print&stop?

Furthermore, for the sake of health of people reading the testcases
later, I think it would not harm to add more explanations why we
expect a certain behavior... ;-)


Best regards

Paul


Best regards,
Harald

Re: [PATCH] analyzer: Fix tests for glibc 2.35 [PR101081]

2022-02-08 Thread David Malcolm via Gcc-patches

On Fri, 2022-02-04 at 11:35 -0500, Joel Teichroeb via Gcc-patches
wrote:
> In recent versions of glibc fopen has __attribute__((malloc)).
> Since we can not detect wether this attribute is present or not,
> we avoid including stdio.h and instead forward declare what we
> need in each test.
> 
> Signed-off-by: Joel Teichroeb 

Thanks!

I wrote a ChangeLog entry and pushed this to trunk (as 
commit r12-7119-ge52a683170877d140eebc9782731eaf11897db71).

Dave

Re: [PATCH v4] x86: Add -m[no-]direct-extern-access

2022-02-08 Thread Hongtao Liu via Gcc-patches

On Fri, Jan 28, 2022 at 5:53 AM H.J. Lu via Gcc-patches
 wrote:
>
> The v3 patch was posted at
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574847.html
>
> There is no progress with repeated pings since then.  Glibc 2.35 and
> binutils 2.38 will support GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS.
> I'd like to remove copy relocation to improve security and performance
> on x86 in GCC 12.  Here is the v4 patch
  This patch won't change default behavior for copy relocation(to
avoid conflict with existing .so files),
just give users an option under which copy relocation can be removed.
  The removal of copy relocation is an optimization of the linker(Also
improves security), whose
patches have been approved and committed to glibc2.35[1], binutils2.38[2].
  The compiler part is the final step in completing this optimization.

[1] https://sourceware.org/pipermail/libc-alpha/2021-October/131742.html
[2] https://sourceware.org/pipermail/binutils/2021-July/117308.html
>
> 1. Rename the common option to x86 specific -mdirect-extern-access option.
> 2. Remove GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS linker check.
> 3. Use the existing GNU property function in x86 backend.
>
> This new behavior is off by default.  Are there any objections?
>
> Changes in the v3 patch.
>
> 1. GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support has been added to
> GNU binutils 2.38.  But the -z indirect-extern-access linker option is
> only available for Linux/x86.  However, the --max-cache-size=SIZE linker
> option was also addded within a day.  --max-cache-size=SIZE is used to
> check for GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support.
>
> Changes in the v2 patch.
>
> 1. Rename the option to -fdirect-extern-access.
>
> ---
> On systems with copy relocation:
> * A copy in executable is created for the definition in a shared library
> at run-time by ld.so.
> * The copy is referenced by executable and shared libraries.
> * Executable can access the copy directly.
>
> Issues are:
> * Overhead of a copy, time and space, may be visible at run-time.
> * Read-only data in the shared library becomes read-write copy in
> executable at run-time.
> * Local access to data with the STV_PROTECTED visibility in the shared
> library must use GOT.
>
> On systems without function descriptor, function pointers vary depending
> on where and how the functions are defined.
> * If the function is defined in executable, it can be the address of
> function body.
> * If the function, including the function with STV_PROTECTED visibility,
> is defined in the shared library, it can be the address of the PLT entry
> in executable or shared library.
>
> Issues are:
> * The address of function body may not be used as its function pointer.
> * ld.so needs to search loaded shared libraries for the function pointer
> of the function with STV_PROTECTED visibility.
>
> Here is a proposal to remove copy relocation and use canonical function
> pointer:
>
> 1. Accesses, including in PIE and non-PIE, to undefined symbols must
> use GOT.
>   a. Linker may optimize out GOT access if the data is defined in PIE or
>   non-PIE.
> 2. Read-only data in the shared library remain read-only at run-time
> 3. Address of global data with the STV_PROTECTED visibility in the shared
> library is the address of data body.
>   a. Can use IP-relative access.
>   b. May need GOT without IP-relative access.
> 4. For systems without function descriptor,
>   a. All global function pointers of undefined functions in PIE and
>   non-PIE must use GOT.  Linker may optimize out GOT access if the
>   function is defined in PIE or non-PIE.
>   b. Function pointer of functions with the STV_PROTECTED visibility in
>   executable and shared library is the address of function body.
>i. Can use IP-relative access.
>ii. May need GOT without IP-relative access.
>iii. Branches to undefined functions may use PLT.
> 5. Single global definition marker:
>
> Add GNU_PROPERTY_1_NEEDED:
>
> #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO
>
> to indicate the needed properties by the object file.
>
> Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS:
>
> #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0)
>
> to indicate that the object file requires canonical function pointers and
> cannot be used with copy relocation.  This bit should be cleared in
> executable when there are non-GOT or non-PLT relocations in relocatable
> input files without this bit set.
>
>   a. Protected symbol access within the shared library can be treated as
>   local.
>   b. Copy relocation should be disallowed at link-time and run-time.
>   c. GOT function pointer reference is required at link-time and run-time.
>
> The indirect external access marker can be used in the following ways:
>
> 1. Linker can decide the best way to resolve a relocation against a
> protected symbol before seeing all relocations against the symbol.
> 2. Dynamic linker can decide if it is an error to have a copy rel

[committed] analyzer: fix hashing of bit_range_region::key_t [PR104452]

2022-02-08 Thread David Malcolm via Gcc-patches

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu;
manually verified using valgrind.

Pushed to trunk as r12-7118-g391512ade5f6cda95456133296c8dcc42d5fbefd.

gcc/analyzer/ChangeLog:
PR analyzer/104452
* region-model.cc (selftest::test_bit_range_regions): New.
(selftest::analyzer_region_model_cc_tests): Call it.
* region.h (bit_range_region::key_t::hash): Fix hashing of m_bits
to avoid using uninitialized data.

gcc/testsuite/ChangeLog:
PR analyzer/104452
* gcc.dg/analyzer/pr104452.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc | 18 ++
 gcc/analyzer/region.h|  3 ++-
 gcc/testsuite/gcc.dg/analyzer/pr104452.c | 10 ++
 3 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr104452.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6e7a21d0f9c..f8f19769258 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4710,6 +4710,23 @@ test_descendent_of_p ()
   ASSERT_TRUE (cast_reg->descendent_of_p (x_reg));
 }
 
+/* Verify that bit_range_region works as expected.  */
+
+static void
+test_bit_range_regions ()
+{
+  tree x = build_global_decl ("x", integer_type_node);
+  region_model_manager mgr;
+  const region *x_reg = mgr.get_region_for_global (x);
+  const region *byte0
+= mgr.get_bit_range (x_reg, char_type_node, bit_range (0, 8));
+  const region *byte1
+= mgr.get_bit_range (x_reg, char_type_node, bit_range (8, 8));
+  ASSERT_TRUE (byte0->descendent_of_p (x_reg));
+  ASSERT_TRUE (byte1->descendent_of_p (x_reg));
+  ASSERT_NE (byte0, byte1);
+}
+
 /* Verify that simple assignments work as expected.  */
 
 static void
@@ -6009,6 +6026,7 @@ analyzer_region_model_cc_tests ()
   test_binop_svalue_folding ();
   test_sub_svalue_folding ();
   test_descendent_of_p ();
+  test_bit_range_regions ();
   test_assignment ();
   test_compound_assignment ();
   test_stack_frames ();
diff --git a/gcc/analyzer/region.h b/gcc/analyzer/region.h
index 206b157e908..53112175266 100644
--- a/gcc/analyzer/region.h
+++ b/gcc/analyzer/region.h
@@ -1156,7 +1156,8 @@ public:
   inchash::hash hstate;
   hstate.add_ptr (m_parent);
   hstate.add_ptr (m_type);
-  hstate.add (&m_bits, sizeof (m_bits));
+  hstate.add_wide_int (m_bits.m_start_bit_offset);
+  hstate.add_wide_int (m_bits.m_size_in_bits);
   return hstate.end ();
 }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr104452.c 
b/gcc/testsuite/gcc.dg/analyzer/pr104452.c
new file mode 100644
index 000..85eb82defbf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr104452.c
@@ -0,0 +1,10 @@
+/* { dg-additional-options "-O" } */
+
+void
+test_1 (void)
+{
+  int __attribute__((__vector_size__ (16))) x;
+  for (unsigned i = 0; i < 4;)
+if (x[i]) /* { dg-warning "use of uninitialized value" } */
+  __builtin_abort ();
+}
-- 
2.26.3

[PATCH, rs6000] Remove TImode from mode iterator BOOL_128 [PR100694]

2022-02-08 Thread HAO CHEN GUI via Gcc-patches

Hi,
  This patch removes TImode from mode iterator BOOL_128. Thus, bool operations 
(AND, IOR, XOR, NOT)
on TImode will be split to the relevant operations on word mode during expand 
(in optabs.c). Potential
optimizations can be implemented after the split. The former practice splits it 
after the reload
pass which is too later for some optimizations. The new test case illustrates 
it.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is 
this okay for trunk?
Any recommendations? Thanks a lot.

ChangeLog
2022-02-08 Haochen Gui 

gcc/
PR target/100694
* config/rs6000/rs6000.md (BOOL_128): Remove TI.

gcc/testsuite/
PR target/100694
* gcc.target/powerpc/pr100694.c: New.
* gcc.target/powerpc/pr92398.c: New.
* gcc.target/powerpc/pr92398.h: Remove.
* gcc.target/powerpc/pr92398.p9-.c: Remove.
* gcc.target/powerpc/pr92398.p9+.c: Remove.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6f74075f58d..2bc1b8f497a 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -750,8 +750,7 @@ (define_mode_attr SI_CONVERT_FP [(SF "TARGET_FCFIDS")
 (DF "TARGET_FCFID")])

 ;; Mode iterator for logical operations on 128-bit types
-(define_mode_iterator BOOL_128 [TI
-PTI
+(define_mode_iterator BOOL_128 [PTI
 (V16QI "TARGET_ALTIVEC")
 (V8HI  "TARGET_ALTIVEC")
 (V4SI  "TARGET_ALTIVEC")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr100694.c 
b/gcc/testsuite/gcc.target/powerpc/pr100694.c
new file mode 100644
index 000..7b41d920140
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr100694.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */
+/* { dg-final { scan-assembler-not {\mli\M} } } */
+/* { dg-final { scan-assembler-not {\mor\M} } } */
+
+/* It just needs two std.  */
+void foo (unsigned __int128* res, unsigned long long hi, unsigned long long lo)
+{
+   unsigned __int128 i = hi;
+   i <<= 64;
+   i |= lo;
+   *res = i;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.c 
b/gcc/testsuite/gcc.target/powerpc/pr92398.c
new file mode 100644
index 000..7d6201cc5bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr92398.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times {\mnot\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */
+
+/* All platforms should generate the same instructions: not;not;std;std.  */
+void bar (__int128_t *dst, __int128_t src)
+{
+  *dst =  ~src;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.h 
b/gcc/testsuite/gcc.target/powerpc/pr92398.h
deleted file mode 100644
index 5a4a8bcab80..000
--- a/gcc/testsuite/gcc.target/powerpc/pr92398.h
+++ /dev/null
@@ -1,17 +0,0 @@
-/* This test code is included into pr92398.p9-.c and pr92398.p9+.c.
-   The two files have the tests for the number of instructions generated for
-   P9- versus P9+.
-
-   store generates difference instructions as below:
-   P9+: mtvsrdd;xxlnot;stxv.
-   P8/P7/P6 LE: not;not;std;std.
-   P8 BE: mtvsrd;mtvsrd;xxpermdi;xxlnor;stxvd2x.
-   P7/P6 BE: std;std;addi;lxvd2x;xxlnor;stxvd2x.
-   P9+ and P9- LE are expected, P6/P7/P8 BE are unexpected.  */
-
-void
-bar (__int128_t *dst, __int128_t src)
-{
-  *dst =  ~src;
-}
-
diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c 
b/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c
deleted file mode 100644
index 72dd1d9a274..000
--- a/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c
+++ /dev/null
@@ -1,12 +0,0 @@
-/* { dg-do compile { target { lp64 && has_arch_pwr9 } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-options "-O2 -mvsx" } */
-
-/* { dg-final { scan-assembler-times {\mmtvsrdd\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mxxlnor\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mstxv\M} 1 } } */
-/* { dg-final { scan-assembler-not {\mld\M} } } */
-/* { dg-final { scan-assembler-not {\mnot\M} } } */
-
-/* Source code for the test in pr92398.h */
-#include "pr92398.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c 
b/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c
deleted file mode 100644
index bd7fa98af51..000
--- a/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c
+++ /dev/null
@@ -1,10 +0,0 @@
-/* { dg-do compile { target { lp64 && {! has_arch_pwr9} } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-options "-O2 -mvsx" } */
-
-/* { dg-final { scan-assembler-times {\mnot\M} 2 { xfail be } } } */
-/* { dg-final { scan-assembler-times {\mstd\M} 2 { xfail { { {! has_arch_pwr9}

[PATCH] x86: Check each component of source operand for AVX_U128_DIRTY

2022-02-08 Thread H.J. Lu via Gcc-patches

commit 9775e465c1fbfc32656de77c618c61acf5bd905d
Author: H.J. Lu 
Date:   Tue Jul 27 07:46:04 2021 -0700

x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register

called ix86_check_avx_upper_register to check mode on source operand.
But ix86_check_avx_upper_register doesn't work on source operand like

(vec_select:V2DI (reg/v:V4DI 23 xmm3 [orig:91 ymm ] [91])
(parallel [
(const_int 2 [0x2])
(const_int 3 [0x3])
]))

Add ix86_avx_u128_mode_source to check mode for each component of source
operand.

gcc/

PR target/104441
* config/i386/i386.cc (ix86_avx_u128_mode_source): New function.
(ix86_avx_u128_mode_needed): Return AVX_U128_ANY for debug INSN.
Call ix86_avx_u128_mode_source to check mode for each component
of source operand.

gcc/testsuite/

PR target/104441
* gcc.target/i386/pr104441-1a.c: New test.
* gcc.target/i386/pr104441-1b.c: Likewise.
---
 gcc/config/i386/i386.cc | 145 +++-
 gcc/testsuite/gcc.target/i386/pr104441-1a.c |  57 
 gcc/testsuite/gcc.target/i386/pr104441-1b.c |  32 +
 3 files changed, 168 insertions(+), 66 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104441-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104441-1b.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index dd5584fb8ed..2d87acca7ff 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -14365,11 +14365,82 @@ ix86_check_avx_upper_stores (rtx dest, const_rtx, 
void *data)
 }
  }
 
+/* For YMM/ZMM store or YMM/ZMM extract.  Return mode for the source
+   operand of SRC DEFs in the same basic block before INSN.  */
+
+static int
+ix86_avx_u128_mode_source (rtx_insn *insn, const_rtx src)
+{
+  basic_block bb = BLOCK_FOR_INSN (insn);
+  rtx_insn *end = BB_END (bb);
+
+  /* Return AVX_U128_DIRTY if there is no DEF in the same basic
+ block.  */
+  int status = AVX_U128_DIRTY;
+
+  for (df_ref def = DF_REG_DEF_CHAIN (REGNO (src));
+   def; def = DF_REF_NEXT_REG (def))
+if (DF_REF_BB (def) == bb)
+  {
+   /* Ignore DEF from different basic blocks.  */
+   rtx_insn *def_insn = DF_REF_INSN (def);
+
+   /* Check if DEF_INSN is before INSN.  */
+   rtx_insn *next;
+   for (next = NEXT_INSN (def_insn);
+next != nullptr && next != end && next != insn;
+next = NEXT_INSN (next))
+ ;
+
+   /* Skip if DEF_INSN isn't before INSN.  */
+   if (next != insn)
+ continue;
+
+   /* Return AVX_U128_DIRTY if the source operand of DEF_INSN
+  isn't constant zero.  */
+
+   if (CALL_P (def_insn))
+ {
+   bool avx_upper_reg_found = false;
+   note_stores (def_insn,
+ix86_check_avx_upper_stores,
+&avx_upper_reg_found);
+
+   /* Return AVX_U128_DIRTY if call returns AVX.  */
+   if (avx_upper_reg_found)
+ return AVX_U128_DIRTY;
+
+   continue;
+ }
+
+   rtx set = single_set (def_insn);
+   if (!set)
+ return AVX_U128_DIRTY;
+
+   rtx dest = SET_DEST (set);
+
+   /* Skip if DEF_INSN is not an AVX load.  Return AVX_U128_DIRTY
+  if the source operand isn't constant zero.  */
+   if (ix86_check_avx_upper_register (dest)
+   && standard_sse_constant_p (SET_SRC (set),
+   GET_MODE (dest)) != 1)
+ return AVX_U128_DIRTY;
+
+   /* We get here only if all AVX loads are from constant zero.  */
+   status = AVX_U128_ANY;
+  }
+
+  return status;
+}
+
 /* Return needed mode for entity in optimize_mode_switching pass.  */
 
 static int
 ix86_avx_u128_mode_needed (rtx_insn *insn)
 {
+  if (DEBUG_INSN_P (insn))
+return AVX_U128_ANY;
+
   if (CALL_P (insn))
 {
   rtx link;
@@ -14409,6 +14480,8 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
   return AVX_U128_CLEAN;
 }
 
+  subrtx_iterator::array_type array;
+
   rtx set = single_set (insn);
   if (set)
 {
@@ -14423,74 +14496,15 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
  else
return AVX_U128_ANY;
}
-  else if (ix86_check_avx_upper_register (src))
+  else
{
- /* This is an YMM/ZMM store.  Check for the source operand
-of SRC DEFs in the same basic block before INSN.  */
- basic_block bb = BLOCK_FOR_INSN (insn);
- rtx_insn *end = BB_END (bb);
-
- /* Return AVX_U128_DIRTY if there is no DEF in the same basic
-block.  */
- int status = AVX_U128_DIRTY;
-
- for (df_ref def = DF_REG_DEF_CHAIN (REGNO (src));
-  def; def = DF_REF_NEXT_REG (def))
-   if (DF_REF_BB (def) == bb)
+ FOR_EACH_SUBRTX (iter, array, src, NONCONST)
+   if (ix86_check_avx_upper_register (*iter))
  {
-   /* Ignore DEF from different ba

Re: [PATCH v4] x86: Add -m[no-]direct-extern-access

2022-02-08 Thread H.J. Lu via Gcc-patches

On Tue, Feb 8, 2022 at 6:38 PM Hongtao Liu  wrote:
>
> On Fri, Jan 28, 2022 at 5:53 AM H.J. Lu via Gcc-patches
>  wrote:
> >
> > The v3 patch was posted at
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574847.html
> >
> > There is no progress with repeated pings since then.  Glibc 2.35 and
> > binutils 2.38 will support GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS.
> > I'd like to remove copy relocation to improve security and performance
> > on x86 in GCC 12.  Here is the v4 patch
>   This patch won't change default behavior for copy relocation(to
> avoid conflict with existing .so files),
> just give users an option under which copy relocation can be removed.
>   The removal of copy relocation is an optimization of the linker(Also
> improves security), whose
> patches have been approved and committed to glibc2.35[1], binutils2.38[2].
>   The compiler part is the final step in completing this optimization.

Thanks for looking into it.  Since the new behavior is off by default and
none of the maintainers feel comfortable to review my patch,  which is
related to linker and glibc, for the last 6 months,  now is the good time
to check it in as any other time.  I will check it in tomorrow.

> [1] https://sourceware.org/pipermail/libc-alpha/2021-October/131742.html
> [2] https://sourceware.org/pipermail/binutils/2021-July/117308.html
> >
> > 1. Rename the common option to x86 specific -mdirect-extern-access option.
> > 2. Remove GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS linker check.
> > 3. Use the existing GNU property function in x86 backend.
> >
> > This new behavior is off by default.  Are there any objections?
> >
> > Changes in the v3 patch.
> >
> > 1. GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support has been added to
> > GNU binutils 2.38.  But the -z indirect-extern-access linker option is
> > only available for Linux/x86.  However, the --max-cache-size=SIZE linker
> > option was also addded within a day.  --max-cache-size=SIZE is used to
> > check for GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support.
> >
> > Changes in the v2 patch.
> >
> > 1. Rename the option to -fdirect-extern-access.
> >
> > ---
> > On systems with copy relocation:
> > * A copy in executable is created for the definition in a shared library
> > at run-time by ld.so.
> > * The copy is referenced by executable and shared libraries.
> > * Executable can access the copy directly.
> >
> > Issues are:
> > * Overhead of a copy, time and space, may be visible at run-time.
> > * Read-only data in the shared library becomes read-write copy in
> > executable at run-time.
> > * Local access to data with the STV_PROTECTED visibility in the shared
> > library must use GOT.
> >
> > On systems without function descriptor, function pointers vary depending
> > on where and how the functions are defined.
> > * If the function is defined in executable, it can be the address of
> > function body.
> > * If the function, including the function with STV_PROTECTED visibility,
> > is defined in the shared library, it can be the address of the PLT entry
> > in executable or shared library.
> >
> > Issues are:
> > * The address of function body may not be used as its function pointer.
> > * ld.so needs to search loaded shared libraries for the function pointer
> > of the function with STV_PROTECTED visibility.
> >
> > Here is a proposal to remove copy relocation and use canonical function
> > pointer:
> >
> > 1. Accesses, including in PIE and non-PIE, to undefined symbols must
> > use GOT.
> >   a. Linker may optimize out GOT access if the data is defined in PIE or
> >   non-PIE.
> > 2. Read-only data in the shared library remain read-only at run-time
> > 3. Address of global data with the STV_PROTECTED visibility in the shared
> > library is the address of data body.
> >   a. Can use IP-relative access.
> >   b. May need GOT without IP-relative access.
> > 4. For systems without function descriptor,
> >   a. All global function pointers of undefined functions in PIE and
> >   non-PIE must use GOT.  Linker may optimize out GOT access if the
> >   function is defined in PIE or non-PIE.
> >   b. Function pointer of functions with the STV_PROTECTED visibility in
> >   executable and shared library is the address of function body.
> >i. Can use IP-relative access.
> >ii. May need GOT without IP-relative access.
> >iii. Branches to undefined functions may use PLT.
> > 5. Single global definition marker:
> >
> > Add GNU_PROPERTY_1_NEEDED:
> >
> > #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO
> >
> > to indicate the needed properties by the object file.
> >
> > Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS:
> >
> > #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0)
> >
> > to indicate that the object file requires canonical function pointers and
> > cannot be used with copy relocation.  This bit should be cleared in
> > executable when there are non-GOT or non-PLT relocations in relocatable
> > input file

[PATCH v2] MIPS: IPL is 8bit in Cause register if TARGET_MCU

2022-02-08 Thread YunQiang Su

If MIPS MCU extension is enable, the IPL section in Cause register
has been expand to 8bit instead of 6bit.

gcc/ChangeLog:

* config/mips/mips.cc (mips_expand_prologue):
  IPL is 8bit for MCU ASE.
---
 gcc/config/mips/mips.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 4f9683e8bf4..d823c459b75 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -12255,7 +12255,7 @@ mips_expand_prologue (void)
  if (!cfun->machine->keep_interrupts_masked_p
  && cfun->machine->int_mask == INT_MASK_EIC)
emit_insn (gen_insvsi (gen_rtx_REG (SImode, K1_REG_NUM),
-  GEN_INT (6),
+  TARGET_MCU ? GEN_INT (8) : GEN_INT (6),
   GEN_INT (SR_IPL),
   gen_rtx_REG (SImode, K0_REG_NUM)));
 
-- 
2.30.2

[PATCH] PR target/102059 Fix inline of target specific functions

2022-02-08 Thread Michael Meissner via Gcc-patches

Reset -mpower8-fusion for power9 inlining power8 functions, PR 102059.

This patch is an attempt to make a much simpler patch to fix PR target/102059
than the previous patch.

It just fixes the issue that if a function is specifically declared as a power8
function, you can't inline in functions that are specified with power9 or
power10 options.

The issue is -mpower8-fusion is cleared when you use -mcpu=power9 or
-mcpu=power10.  When I wrote the code for controlling which function can inline
other functions, -mpower8-fusion was set for -mcpu=power9 (-mcpu=power10 was
not an option at that time).  This patch fixes this particular problem.

Perhaps -mpower8-fusion should go away in the GCC 13 time frame.  This patch
just goes in and resets the fusion bit for testing inlines.

I have built a bootstrapped little endian compiler on power9 and the tests had
no regressions.

I have built a bootstrapped big endian compiler on power8 and I tested both
32-bit and 64-bit builds, and there were no regressions.

Can I install this into the trunk and back port it into GCC 11 after a burn-in
period?

2022-02-08   Michael Meissner  

gcc/

PR target/102059
* config/rs6000/rs6000.cc (rs6000_can_inline_p): Don't test for
power8-fusion if the caller was power9 or power10.

gcc/testsuite/
PR target/102059
* gcc.target/powerpc/pr102059-4.c: New test.
---
 gcc/config/rs6000/rs6000.cc   |  8 
 gcc/testsuite/gcc.target/powerpc/pr102059-4.c | 20 +++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102059-4.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index eaba9a2d698..e2d94421ae9 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -25278,6 +25278,14 @@ rs6000_can_inline_p (tree caller, tree callee)
  callee_isa &= ~OPTION_MASK_HTM;
  explicit_isa &= ~OPTION_MASK_HTM;
}
+
+ /* Power9 and power10 do not set power8-fusion.  If the callee was
+explicitly compiled for power8, and the caller was power9 or
+power10, ignore the power8-fusion bits if it was set by
+default.  */
+ if ((caller_isa & OPTION_MASK_P8_FUSION) == 0
+ && (explicit_isa & OPTION_MASK_P8_FUSION) == 0)
+   callee_isa &= ~OPTION_MASK_P8_FUSION;
}
 
   /* The callee's options must be a subset of the caller's options, i.e.
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102059-4.c 
b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
new file mode 100644
index 000..627c6f820c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102059-4.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -Wno-attributes" } */
+
+/* Verify that power10 can explicity include functions compiled for power8.
+   The issue is -mcpu=power8 enables -mpower8-fusion, but -mcpu=power9 or
+   -mcpu=power10 do not set power8-fusion by default.  */
+
+static inline int __attribute__ ((always_inline,target("cpu=power8")))
+foo (int *b)
+{
+  *b += 10;
+  return *b;
+}
+
+int
+bar (int *a)
+{
+  *a = foo (a);
+  return 0;
+}
-- 
2.34.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Go patch committed: Implement FuncPCABI0, FuncPCABIInternal

2022-02-08 Thread Ian Lance Taylor via Gcc-patches

This patch to the Go frontend and libgo implements the abi.FuncPCABI0
and abi.FuncPCABIInternal functions.  The Go 1.18 standard library
uses an internal/abi package with two functions that are implemented
in the compiler. This patch implements them in the gofrontend, to
support the upcoming update to 1.18.  For the gofrontend, which uses
the platform ABI, the functions behave identically.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian

patc
56fa431758bcc37c4dade7b2d8d38feb0c59097e
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index f78561c3483..5f4adf9d0e7 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-61f7cf4b9db0587ff099aa36832a355b90ee1bf9
+262cb89fd5ed82ab135a3933b2ddf4eb67683149
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 79702821336..1e6890a3cc5 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -12177,6 +12177,50 @@ Call_expression::intrinsify(Gogo* gogo,
   return Runtime::make_call(code, loc, 3, a1, a2, a3);
 }
 }
+  else if (package == "internal/abi")
+{
+  if ((name == "FuncPCABI0" || name == "FuncPCABIInternal")
+ && this->args_ != NULL
+ && this->args_->size() == 1)
+   {
+ // We expect to see a conversion from the expression to "any".
+ Expression* expr = this->args_->front();
+ Type_conversion_expression* tce = expr->conversion_expression();
+ if (tce != NULL)
+   expr = tce->expr();
+ Func_expression* fe = expr->func_expression();
+ Interface_field_reference_expression* interface_method =
+   expr->interface_field_reference_expression();
+ if (fe != NULL)
+   {
+ Named_object* no = fe->named_object();
+ Expression* ref = Expression::make_func_code_reference(no, loc);
+ Type* uintptr_type = Type::lookup_integer_type("uintptr");
+ return Expression::make_cast(uintptr_type, ref, loc);
+   }
+ else if (interface_method != NULL)
+   return interface_method->get_function();
+ else
+   {
+ expr = this->args_->front();
+ go_assert(expr->type()->interface_type() != NULL
+   && expr->type()->interface_type()->is_empty());
+ expr = Expression::make_interface_info(expr,
+INTERFACE_INFO_OBJECT,
+loc);
+ // Trust that this is a function type, which means that
+ // it is a direct iface type and we can use EXPR
+ // directly.  The backend representation of this
+ // function is a pointer to a struct whose first field
+ // is the actual function to call.
+ Type* pvoid = Type::make_pointer_type(Type::make_void_type());
+ Type* pfntype = Type::make_pointer_type(pvoid);
+ Expression* ref = make_unsafe_cast(pfntype, expr, loc);
+ return Expression::make_dereference(ref, NIL_CHECK_NOT_NEEDED,
+ loc);
+   }
+   }
+}
 
   return NULL;
 }
diff --git a/libgo/go/internal/abi/abi.go b/libgo/go/internal/abi/abi.go
new file mode 100644
index 000..c4a108847ca
--- /dev/null
+++ b/libgo/go/internal/abi/abi.go
@@ -0,0 +1,35 @@
+// Copyright 2020 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package abi
+
+// FuncPC* intrinsics.
+//
+// CAREFUL: In programs with plugins, FuncPC* can return different values
+// for the same function (because there are actually multiple copies of
+// the same function in the address space). To be safe, don't use the
+// results of this function in any == expression. It is only safe to
+// use the result as an address at which to start executing code.
+
+// FuncPCABI0 returns the entry PC of the function f, which must be a
+// direct reference of a function defined as ABI0. Otherwise it is a
+// compile-time error.
+//
+// Implemented as a compile intrinsic.
+func FuncPCABI0(f any) uintptr {
+   // The compiler should remove all calls.
+   panic("FuncPCABI0")
+}
+
+// FuncPCABIInternal returns the entry PC of the function f. If f is a
+// direct reference of a function, it must be defined as ABIInternal.
+// Otherwise it is a compile-time error. If f is not a direct reference
+// of a defined function, it assumes that f is a func value. Otherwise
+// the behavior is undefined.
+//
+// Implemented as a compile intrinsic.
+func FuncPCABIInternal(f any) uintptr {
+   // The compiler should remove all calls.
+   panic("FuncPCABIInternal")
+}
diff --git

Go patch committed: Recognize Go 1.18 runtime/internal/atomic methods

2022-02-08 Thread Ian Lance Taylor via Gcc-patches

The Go 1.18 library introduces specific types in
runtime/internal/atomic.  This patch to the Go frontend recognizes and
optimizes the methods on those types, as we do with the functions in
runtime/internal/atomic.

While we're here avoid getting confused by methods in any other
package that we recognize specially.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

* go-gcc.cc (Gcc_backend::Gcc_backend): Define builtins
__atomic_load_1 and __atomic_store_1.

pa
869fb813039e8933a85f5f2a3a53cde156030b0a
diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
index 631996073f4..f3de7a8c183 100644
--- a/gcc/go/go-gcc.cc
+++ b/gcc/go/go-gcc.cc
@@ -897,6 +897,20 @@ Gcc_backend::Gcc_backend()
   this->define_builtin(BUILT_IN_ATOMIC_ADD_FETCH_8, "__atomic_add_fetch_8", 
NULL,
t, 0);
 
+  t = build_function_type_list(unsigned_char_type_node,
+  ptr_type_node,
+  integer_type_node,
+  NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_LOAD_1, "__atomic_load_1", NULL, t, 0);
+
+  t = build_function_type_list(void_type_node,
+  ptr_type_node,
+  unsigned_char_type_node,
+  integer_type_node,
+  NULL_TREE);
+  this->define_builtin(BUILT_IN_ATOMIC_STORE_1, "__atomic_store_1", NULL,
+  t, 0);
+
   t = build_function_type_list(unsigned_char_type_node,
ptr_type_node,
unsigned_char_type_node,
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 5f4adf9d0e7..9cd22ef011e 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-262cb89fd5ed82ab135a3933b2ddf4eb67683149
+3b1e46937d11b043d0986a3dfefaee27454c3da0
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 1e6890a3cc5..d7b64767a00 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -11613,12 +11613,16 @@ Call_expression::intrinsify(Gogo* gogo,
   std::string package = (no->package() != NULL
  ? no->package()->pkgpath()
  : gogo->pkgpath());
+  bool is_method = ((no->is_function() && no->func_value()->is_method())
+   || (no->is_function_declaration()
+   && no->func_declaration_value()->is_method()));
   Location loc = this->location();
 
   Type* int_type = Type::lookup_integer_type("int");
   Type* int32_type = Type::lookup_integer_type("int32");
   Type* int64_type = Type::lookup_integer_type("int64");
   Type* uint_type = Type::lookup_integer_type("uint");
+  Type* uint8_type = Type::lookup_integer_type("uint8");
   Type* uint32_type = Type::lookup_integer_type("uint32");
   Type* uint64_type = Type::lookup_integer_type("uint64");
   Type* uintptr_type = Type::lookup_integer_type("uintptr");
@@ -11629,6 +11633,9 @@ Call_expression::intrinsify(Gogo* gogo,
 
   if (package == "sync/atomic")
 {
+  if (is_method)
+   return NULL;
+
   // sync/atomic functions and runtime/internal/atomic functions
   // are very similar. In order not to duplicate code, we just
   // redirect to the latter and let the code below to handle them.
@@ -11694,6 +11701,9 @@ Call_expression::intrinsify(Gogo* gogo,
 
   if (package == "runtime/internal/sys")
 {
+  if (is_method)
+   return NULL;
+
   // runtime/internal/sys functions and math/bits functions
   // are very similar. In order not to duplicate code, we just
   // redirect to the latter and let the code below to handle them.
@@ -11713,6 +11723,9 @@ Call_expression::intrinsify(Gogo* gogo,
 
   if (package == "runtime")
 {
+  if (is_method)
+   return NULL;
+
   // Handle a couple of special runtime functions.  In the runtime
   // package, getcallerpc returns the PC of the caller, and
   // getcallersp returns the frame pointer of the caller.  Implement
@@ -11743,6 +11756,9 @@ Call_expression::intrinsify(Gogo* gogo,
 }
   else if (package == "math/bits")
 {
+  if (is_method)
+   return NULL;
+
   if ((name == "ReverseBytes16" || name == "ReverseBytes32"
|| name == "ReverseBytes64" || name == "ReverseBytes")
   && this->args_ != NULL && this->args_->size() == 1)
@@ -11913,9 +11929,137 @@ Call_expression::intrinsify(Gogo* gogo,
 {
   int memorder = __ATOMIC_SEQ_CST;
 
+  if (is_method)
+   {
+ Function_type* ftype = (no->is_function()
+ ? no->func_value()->type()
+ : no->func_declaration_value()->type());
+ Type* rtype = ftype->receiver()->type()->deref();
+ go_assert(rtype->named_type() !=

Re: [PATCH] x86: Check each component of source operand for AVX_U128_DIRTY

2022-02-08 Thread Hongtao Liu via Gcc-patches

On Wed, Feb 9, 2022 at 10:53 AM H.J. Lu via Gcc-patches
 wrote:
>
> commit 9775e465c1fbfc32656de77c618c61acf5bd905d
> Author: H.J. Lu 
> Date:   Tue Jul 27 07:46:04 2021 -0700
>
> x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register
>
> called ix86_check_avx_upper_register to check mode on source operand.
> But ix86_check_avx_upper_register doesn't work on source operand like
>
The new function ix86_avx_u128_mode_source just takes the code from the
 *else if (ix86_check_avx_upper_register (src))* branch to check each
 component of src that meets the ix86_check_avx_upper_register condition,
 which seems reasonable.

The patch LGTM.
> (vec_select:V2DI (reg/v:V4DI 23 xmm3 [orig:91 ymm ] [91])
> (parallel [
> (const_int 2 [0x2])
> (const_int 3 [0x3])
> ]))
>
> Add ix86_avx_u128_mode_source to check mode for each component of source
> operand.
>
> gcc/
>
> PR target/104441
> * config/i386/i386.cc (ix86_avx_u128_mode_source): New function.
> (ix86_avx_u128_mode_needed): Return AVX_U128_ANY for debug INSN.
> Call ix86_avx_u128_mode_source to check mode for each component
> of source operand.
>
> gcc/testsuite/
>
> PR target/104441
> * gcc.target/i386/pr104441-1a.c: New test.
> * gcc.target/i386/pr104441-1b.c: Likewise.
> ---
>  gcc/config/i386/i386.cc | 145 +++-
>  gcc/testsuite/gcc.target/i386/pr104441-1a.c |  57 
>  gcc/testsuite/gcc.target/i386/pr104441-1b.c |  32 +
>  3 files changed, 168 insertions(+), 66 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104441-1a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104441-1b.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index dd5584fb8ed..2d87acca7ff 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -14365,11 +14365,82 @@ ix86_check_avx_upper_stores (rtx dest, const_rtx, 
> void *data)
>  }
>   }
>
> +/* For YMM/ZMM store or YMM/ZMM extract.  Return mode for the source
> +   operand of SRC DEFs in the same basic block before INSN.  */
> +
> +static int
> +ix86_avx_u128_mode_source (rtx_insn *insn, const_rtx src)
> +{
> +  basic_block bb = BLOCK_FOR_INSN (insn);
> +  rtx_insn *end = BB_END (bb);
> +
> +  /* Return AVX_U128_DIRTY if there is no DEF in the same basic
> + block.  */
> +  int status = AVX_U128_DIRTY;
> +
> +  for (df_ref def = DF_REG_DEF_CHAIN (REGNO (src));
> +   def; def = DF_REF_NEXT_REG (def))
> +if (DF_REF_BB (def) == bb)
> +  {
> +   /* Ignore DEF from different basic blocks.  */
> +   rtx_insn *def_insn = DF_REF_INSN (def);
> +
> +   /* Check if DEF_INSN is before INSN.  */
> +   rtx_insn *next;
> +   for (next = NEXT_INSN (def_insn);
> +next != nullptr && next != end && next != insn;
> +next = NEXT_INSN (next))
> + ;
> +
> +   /* Skip if DEF_INSN isn't before INSN.  */
> +   if (next != insn)
> + continue;
> +
> +   /* Return AVX_U128_DIRTY if the source operand of DEF_INSN
> +  isn't constant zero.  */
> +
> +   if (CALL_P (def_insn))
> + {
> +   bool avx_upper_reg_found = false;
> +   note_stores (def_insn,
> +ix86_check_avx_upper_stores,
> +&avx_upper_reg_found);
> +
> +   /* Return AVX_U128_DIRTY if call returns AVX.  */
> +   if (avx_upper_reg_found)
> + return AVX_U128_DIRTY;
> +
> +   continue;
> + }
> +
> +   rtx set = single_set (def_insn);
> +   if (!set)
> + return AVX_U128_DIRTY;
> +
> +   rtx dest = SET_DEST (set);
> +
> +   /* Skip if DEF_INSN is not an AVX load.  Return AVX_U128_DIRTY
> +  if the source operand isn't constant zero.  */
> +   if (ix86_check_avx_upper_register (dest)
> +   && standard_sse_constant_p (SET_SRC (set),
> +   GET_MODE (dest)) != 1)
> + return AVX_U128_DIRTY;
> +
> +   /* We get here only if all AVX loads are from constant zero.  */
> +   status = AVX_U128_ANY;
> +  }
> +
> +  return status;
> +}
> +
>  /* Return needed mode for entity in optimize_mode_switching pass.  */
>
>  static int
>  ix86_avx_u128_mode_needed (rtx_insn *insn)
>  {
> +  if (DEBUG_INSN_P (insn))
> +return AVX_U128_ANY;
> +
>if (CALL_P (insn))
>  {
>rtx link;
> @@ -14409,6 +14480,8 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
>return AVX_U128_CLEAN;
>  }
>
> +  subrtx_iterator::array_type array;
> +
>rtx set = single_set (insn);
>if (set)
>  {
> @@ -14423,74 +14496,15 @@ ix86_avx_u128_mode_needed (rtx_insn *insn)
>   else
> return AVX_U128_ANY;
> }
> -  else if (ix86_check_avx_upper_register (src))
> +  else
> {
> - /* This is an YMM/ZMM store.  Check for the source operand
> -of

Re: PING^5 [PATCH v4 0/2] Implement indirect external access

2022-02-08 Thread Fāng-ruì Sòng via Gcc-patches

On Mon, Jan 3, 2022 at 7:33 PM H.J. Lu via Gcc-patches
 wrote:
>
> On Sat, Dec 11, 2021 at 10:44 AM H.J. Lu  wrote:
> >
> > On Thu, Nov 25, 2021 at 9:54 AM H.J. Lu  wrote:
> > >
> > > On Mon, Nov 1, 2021 at 7:02 AM H.J. Lu  wrote:
> > > >
> > > > On Thu, Oct 21, 2021 at 12:56 PM H.J. Lu  wrote:
> > > > >
> > > > > On Wed, Sep 22, 2021 at 7:02 PM H.J. Lu  wrote:
> > > > > >
> > > > > > Changes in the v4 patch.
> > > > > >
> > > > > > 1. Add nodirect_extern_access attribute.
> > > > > >
> > > > > > Changes in the v3 patch.
> > > > > >
> > > > > > 1. GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support has been 
> > > > > > added to
> > > > > > GNU binutils 2.38.  But the -z indirect-extern-access linker option 
> > > > > > is
> > > > > > only available for Linux/x86.  However, the --max-cache-size=SIZE 
> > > > > > linker
> > > > > > option was also addded within a day.  --max-cache-size=SIZE is used 
> > > > > > to
> > > > > > check for GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS support.
> > > > > >
> > > > > > Changes in the v2 patch.
> > > > > >
> > > > > > 1. Rename the option to -fdirect-extern-access.
> > > > > >
> > > > > > ---
> > > > > > On systems with copy relocation:
> > > > > > * A copy in executable is created for the definition in a shared 
> > > > > > library
> > > > > > at run-time by ld.so.
> > > > > > * The copy is referenced by executable and shared libraries.
> > > > > > * Executable can access the copy directly.
> > > > > >
> > > > > > Issues are:
> > > > > > * Overhead of a copy, time and space, may be visible at run-time.
> > > > > > * Read-only data in the shared library becomes read-write copy in
> > > > > > executable at run-time.
> > > > > > * Local access to data with the STV_PROTECTED visibility in the 
> > > > > > shared
> > > > > > library must use GOT.
> > > > > >
> > > > > > On systems without function descriptor, function pointers vary 
> > > > > > depending
> > > > > > on where and how the functions are defined.
> > > > > > * If the function is defined in executable, it can be the address of
> > > > > > function body.
> > > > > > * If the function, including the function with STV_PROTECTED 
> > > > > > visibility,
> > > > > > is defined in the shared library, it can be the address of the PLT 
> > > > > > entry
> > > > > > in executable or shared library.
> > > > > >
> > > > > > Issues are:
> > > > > > * The address of function body may not be used as its function 
> > > > > > pointer.
> > > > > > * ld.so needs to search loaded shared libraries for the function 
> > > > > > pointer
> > > > > > of the function with STV_PROTECTED visibility.
> > > > > >
> > > > > > Here is a proposal to remove copy relocation and use canonical 
> > > > > > function
> > > > > > pointer:
> > > > > >
> > > > > > 1. Accesses, including in PIE and non-PIE, to undefined symbols must
> > > > > > use GOT.
> > > > > >   a. Linker may optimize out GOT access if the data is defined in 
> > > > > > PIE or
> > > > > >   non-PIE.
> > > > > > 2. Read-only data in the shared library remain read-only at run-time
> > > > > > 3. Address of global data with the STV_PROTECTED visibility in the 
> > > > > > shared
> > > > > > library is the address of data body.
> > > > > >   a. Can use IP-relative access.
> > > > > >   b. May need GOT without IP-relative access.
> > > > > > 4. For systems without function descriptor,
> > > > > >   a. All global function pointers of undefined functions in PIE and
> > > > > >   non-PIE must use GOT.  Linker may optimize out GOT access if the
> > > > > >   function is defined in PIE or non-PIE.
> > > > > >   b. Function pointer of functions with the STV_PROTECTED 
> > > > > > visibility in
> > > > > >   executable and shared library is the address of function body.
> > > > > >i. Can use IP-relative access.
> > > > > >ii. May need GOT without IP-relative access.
> > > > > >iii. Branches to undefined functions may use PLT.
> > > > > > 5. Single global definition marker:
> > > > > >
> > > > > > Add GNU_PROPERTY_1_NEEDED:
> > > > > >
> > > > > > #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO
> > > > > >
> > > > > > to indicate the needed properties by the object file.
> > > > > >
> > > > > > Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS:
> > > > > >
> > > > > > #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0)
> > > > > >
> > > > > > to indicate that the object file requires canonical function 
> > > > > > pointers and
> > > > > > cannot be used with copy relocation.  This bit should be cleared in
> > > > > > executable when there are non-GOT or non-PLT relocations in 
> > > > > > relocatable
> > > > > > input files without this bit set.
> > > > > >
> > > > > >   a. Protected symbol access within the shared library can be 
> > > > > > treated as
> > > > > >   local.
> > > > > >   b. Copy relocation should be disallowed at link-time and run-time.
> > > > > >   c. GOT function pointer reference is required at link-time and 
> > > > > > run-time.
> >

Re: [PATCH] tree-optimization/104373 - early uninit diagnostic on unreachable code

2022-02-08 Thread Richard Biener via Gcc-patches

On Tue, 8 Feb 2022, Jeff Law wrote:

> 
> 
> On 2/8/2022 12:03 AM, Richard Biener via Gcc-patches wrote:
> > The following improves early uninit diagnostics by computing edge
> > reachability using our value-numbering framework in its cheapest
> > mode and ignoring unreachable blocks when looking
> > for uninitialized uses.  To not ICE with -fdump-tree-all the
> > early uninit pass needs a dumpfile since VN tries to dump statistics.
> >
> > For gimple-match.c at -O0 -g this causes a 2% increase in compile-time.
> >
> > In theory all early diagnostic passes could benefit from a VN run but
> > that would require more refactoring that's not appropriate at this stage.
> > This patch addresses a GCC 12 diagnostic regression and also happens to
> > fix one XFAIL in gcc.dg/uninit-pr20644-O0.c
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk?
> >
> > Thanks,
> > Richard.
> >
> > 2022-02-04  Richard Biener  
> >
> >  PR tree-optimization/104373
> >  * tree-ssa-sccvn.h (do_rpo_vn): New export exposing the
> >  walk kind.
> >  * tree-ssa-sccvn.cc (do_rpo_vn): Export, get the default
> >  walk kind as argument.
> >  (run_rpo_vn): Adjust.
> >  (pass_fre::execute): Likewise.
> >  * tree-ssa-uninit.cc (warn_uninitialized_vars): Skip
> >  blocks not reachable.
> >  (execute_late_warn_uninitialized): Mark all edges as
> >  executable.
> >  (execute_early_warn_uninitialized): Use VN to compute
> >  executable edges.
> >  (pass_data_early_warn_uninitialized): Enable a dump file,
> >  change dump name to warn_uninit.
> >
> >  * g++.dg/warn/Wuninitialized-32.C: New testcase.
> >  * gcc.dg/uninit-pr20644-O0.c: Remove XFAIL.
> I'm conflicted on this ;-)
> 
> I generally lean on the side of eliminating false positives in these
> diagnostics.  So identifying unreachable blocks and using that to prune the
> set of warnings we generate, even at -O0 is good from that point of view.
> 
> But doing something like this has many of the problems that relying on
> optimizations does, even if we don't optimize away the unreachable code. 
> Right now the warning should be fairly stable at -O0 -- the set of diagnostics
> you get isn't going to change a lot release to release which is important to
> some users.   Second, at -O0 whether or not you get a warning isn't a function
> of how good our unreachable code analysis might be.
> 
> This was quite a contentious topic many years ago.  So much that I dropped
> some work on Wuninit on the floor as I just couldn't take the arguing.  So be
> aware that you might be opening a can of worms.
> 
> So the question comes down to a design decision.   What's more important to
> the end users?  Fewer false positives or better stability in the warning?  I
> think the former, but there's certainly been a vocal group that prefers the
> latter.

I see - I didn't think of this aspect at all but that means I have no
idea on whether it is important or not ...

In our own setup we're running into "instabilities" with optimization
when building packages that enable -Werror, so I can see shops doing
dev builds at -O0 with warnings and -Werror but drop -Werror for
optimized builds.

> On the implementation side I have zero concerns.    Looking further out, ISTM
> we could mark the blocks as unreachable (rather than deducing it from edge
> flags).  That would make it pretty easy to mark those blocks relatively early
> and allow us to suppress any middle end diagnostic occurring in an unreachable
> block.

So what I had in mind is that for the set of early diagnostic passes

  PUSH_INSERT_PASSES_WITHIN (pass_build_ssa_passes)
  NEXT_PASS (pass_fixup_cfg);
  NEXT_PASS (pass_build_ssa);
  NEXT_PASS (pass_warn_printf);
  NEXT_PASS (pass_warn_nonnull_compare);
  NEXT_PASS (pass_early_warn_uninitialized);
  NEXT_PASS (pass_warn_access, /*early=*/true);

we'd run VN and keep it's lattice around (and not just the
EDGE_EXECUTABLE flags).  That would for example allow
pass_warn_nonnull_compare to see that in

void foo (void *p __attribute__((nonnull)))
{
  void *q = p;
  if (q)
bar (q);
}

we are comparing a never NULL pointer.  Currently the q = p copy
makes it not realize this.  Likewise some constants can be
propagated this way.

Of course using the VN lattice means quite some changes in those
passes.  Even without the VN lattice having unreachable edges
marked could improve diagnostics for, say PHI nodes, if only
a single executable edge remains.

Martin, do you have any thoughts here?  Any opinion on the patch
that for now just marks (not) executable edges to prune diagnostics at 
-O0?

Thanks,
Richard.

[PATCH] AutoFDO: Don't try to promote indirect calls that result in recursive direct calls

2022-02-08 Thread Eugene Rozenfeld via Gcc-patches

AutoFDO tries to promote and inline all indirect calls that were promoted
and inlined in the original binary and that are still hot. In the included
test case, the promotion results in a direct call that is a recursive call.
inline_call and optimize_inline_calls can't handle recursive calls at this 
stage.
Currently, inline_call fails with a segmentation fault.

This change leaves the indirect call alone if promotion will result in a 
recursive call.

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:
* auto-profile.cc (afdo_indirect_call): Don't attempt to promote 
indirect calls
that will result in direct recursive calls.

gcc/testsuite/g++.dg/tree-prof/ChangeLog:
* indir-call-recursive-inlining.C : New test.
---
 gcc/auto-profile.cc   | 40 --
 .../tree-prof/indir-call-recursive-inlining.C | 54 +++
 2 files changed, 78 insertions(+), 16 deletions(-)
 create mode 100644 
gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index c7cee639c85..2b34b80b82d 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -975,7 +975,7 @@ read_profile (void)
  * after annotation, we just need to mark, and let follow-up logic to
decide if it needs to promote and inline.  */

-static void
+static bool
 afdo_indirect_call (gimple_stmt_iterator *gsi, const icall_target_map &map,
 bool transform)
 {
@@ -983,12 +983,12 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const 
icall_target_map &map,
   tree callee;

   if (map.size () == 0)
-return;
+return false;
   gcall *stmt = dyn_cast  (gs);
   if (!stmt
   || gimple_call_internal_p (stmt)
   || gimple_call_fndecl (stmt) != NULL_TREE)
-return;
+return false;

   gcov_type total = 0;
   icall_target_map::const_iterator max_iter = map.end ();
@@ -1003,7 +1003,7 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const 
icall_target_map &map,
   struct cgraph_node *direct_call = cgraph_node::get_for_asmname (
   get_identifier (afdo_string_table->get_name (max_iter->first)));
   if (direct_call == NULL || !direct_call->profile_id)
-return;
+return false;

   callee = gimple_call_fn (stmt);

@@ -1013,20 +1013,27 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const 
icall_target_map &map,
   hist->hvalue.counters = XNEWVEC (gcov_type, hist->n_counters);
   gimple_add_histogram_value (cfun, stmt, hist);

-  // Total counter
+  /* Total counter */
   hist->hvalue.counters[0] = total;
-  // Number of value/counter pairs
+  /* Number of value/counter pairs */
   hist->hvalue.counters[1] = 1;
-  // Value
+  /* Value */
   hist->hvalue.counters[2] = direct_call->profile_id;
-  // Counter
+  /* Counter */
   hist->hvalue.counters[3] = max_iter->second;

   if (!transform)
-return;
+return false;
+
+  cgraph_node* current_function_node = cgraph_node::get 
(current_function_decl);
+
+  /* If the direct call is a recursive call, don't promote it since
+ we are not set up to inline recursive calls at this stage. */
+  if (direct_call == current_function_node)
+return false;

   struct cgraph_edge *indirect_edge
-  = cgraph_node::get (current_function_decl)->get_edge (stmt);
+  = current_function_node->get_edge (stmt);

   if (dump_file)
 {
@@ -1040,13 +1047,13 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const 
icall_target_map &map,
 {
   if (dump_file)
 fprintf (dump_file, " not transforming\n");
-  return;
+  return false;
 }
   if (DECL_STRUCT_FUNCTION (direct_call->decl) == NULL)
 {
   if (dump_file)
 fprintf (dump_file, " no declaration\n");
-  return;
+  return false;
 }

   if (dump_file)
@@ -1063,16 +1070,17 @@ afdo_indirect_call (gimple_stmt_iterator *gsi, const 
icall_target_map &map,
   cgraph_edge::redirect_call_stmt_to_callee (new_edge);
   gimple_remove_histogram_value (cfun, stmt, hist);
   inline_call (new_edge, true, NULL, NULL, false);
+  return true;
 }

 /* From AutoFDO profiles, find values inside STMT for that we want to measure
histograms and adds them to list VALUES.  */

-static void
+static bool
 afdo_vpt (gimple_stmt_iterator *gsi, const icall_target_map &map,
   bool transform)
 {
-  afdo_indirect_call (gsi, map, transform);
+  return afdo_indirect_call (gsi, map, transform);
 }

 typedef std::set bb_set;
@@ -1498,8 +1506,8 @@ afdo_vpt_for_early_inline (stmt_set *promoted_stmts)
   {
 /* Promote the indirect call and update the promoted_stmts.  */
 promoted_stmts->insert (stmt);
-afdo_vpt (&gsi, info.targets, true);
-has_vpt = true;
+if (afdo_vpt (&gsi, info.targets, true))
+  has_vpt = true;
   }
   }
   }
diff --git a/gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C 
b/gcc/testsuite/g++.dg/tree-prof/indir-call-recursive-inlining.C
new file mode

83 matches

Mail list logo