[committed] openmp: Optimize for OpenMP atomics 2x__builtin_clear_padding+__builtin_memcmp if possible [PR102571]

2021-10-06 Thread Jakub Jelinek via Gcc-patches
Hi!

For the few long double types that do have padding bits, e.g. on x86
the clear_type_padding_in_mask computed mask is
ff ff ff ff ff ff ff ff ff ff 00 00 for 32-bit and
ff ff ff ff ff ff ff ff ff ff 00 00 00 00 00 00 for 64-bit.
Instead of doing __builtin_clear_padding on both operands that will clear the
last 2 or 6 bytes and then memcmp on the whole 12/16 bytes, we can just
memcmp 10 bytes.  The code also handles if the padding would be at the start
or both at the start and end, but everything on byte boundaries only and
non-padding bits being contiguous.
This works around a tree-ssa-dse.c bug (but we need to fix it anyway,
as libstdc++ won't do this and as it can deal with arbitrary types, it even
can't do that generally).

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-06  Jakub Jelinek  

PR tree-optimization/102571
* c-omp.c (c_finish_omp_atomic): Optimize the case where type has
padding, but the non-padding bits are contiguous set of bytes
by adjusting the memcmp call arguments instead of emitting
__builtin_clear_padding and then comparing all the type's bytes.

--- gcc/c-family/c-omp.c.jj 2021-10-01 10:45:37.909412708 +0200
+++ gcc/c-family/c-omp.c2021-10-05 15:19:36.853387522 +0200
@@ -379,6 +379,8 @@ c_finish_omp_atomic (location_t loc, enu
   if (SCALAR_FLOAT_TYPE_P (cmptype) && !test)
{
  bool clear_padding = false;
+ HOST_WIDE_INT non_padding_start = 0;
+ HOST_WIDE_INT non_padding_end = 0;
  if (BITS_PER_UNIT == 8 && CHAR_BIT == 8)
{
  HOST_WIDE_INT sz = int_size_in_bytes (cmptype), i;
@@ -392,6 +394,40 @@ c_finish_omp_atomic (location_t loc, enu
clear_padding = true;
break;
  }
+ if (clear_padding && buf[i] == 0)
+   {
+ /* Try to optimize.  In the common case where
+non-padding bits are all continuous and start
+and end at a byte boundary, we can just adjust
+the memcmp call arguments and don't need to
+emit __builtin_clear_padding calls.  */
+ if (i == 0)
+   {
+ for (i = 0; i < sz; i++)
+   if (buf[i] != 0)
+ break;
+ if (i < sz && buf[i] == (unsigned char) ~0)
+   {
+ non_padding_start = i;
+ for (; i < sz; i++)
+   if (buf[i] != (unsigned char) ~0)
+ break;
+   }
+ else
+   i = 0;
+   }
+ if (i != 0)
+   {
+ non_padding_end = i;
+ for (; i < sz; i++)
+   if (buf[i] != 0)
+ {
+   non_padding_start = 0;
+   non_padding_end = 0;
+   break;
+ }
+   }
+   }
}
  tree inttype = NULL_TREE;
  if (!clear_padding && tree_fits_uhwi_p (TYPE_SIZE (cmptype)))
@@ -428,12 +464,22 @@ c_finish_omp_atomic (location_t loc, enu
  tmp2 = build4 (TARGET_EXPR, cmptype, tmp2,
 TREE_OPERAND (rhs1, 1), NULL, NULL);
  tmp2 = build1 (ADDR_EXPR, pcmptype, tmp2);
+ if (non_padding_start)
+   {
+ tmp1 = build2 (POINTER_PLUS_EXPR, pcmptype, tmp1,
+size_int (non_padding_start));
+ tmp2 = build2 (POINTER_PLUS_EXPR, pcmptype, tmp2,
+size_int (non_padding_start));
+   }
  tree fndecl = builtin_decl_explicit (BUILT_IN_MEMCMP);
  rhs1 = build_call_expr_loc (loc, fndecl, 3, tmp1, tmp2,
- TYPE_SIZE_UNIT (cmptype));
+ non_padding_end
+ ? size_int (non_padding_end
+ - non_padding_start)
+ : TYPE_SIZE_UNIT (cmptype));
  rhs1 = build2 (EQ_EXPR, boolean_type_node, rhs1,
 integer_zero_node);
- if (clear_padding)
+ if (clear_padding && non_padding_end == 0)
{
  fndecl = builtin_decl_explicit (BUILT_IN_CLEAR_PADDING);
  tree cp1 = build_call_expr_loc (loc, fndecl, 1, tmp1);

Jakub



Re: [PATCH, v5] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-10-06 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 05, 2021 at 10:40:45PM -0400, Jason Merrill wrote:
> I've switched to handling bases via binfo as discussed on IRC and added
> spaceship-synth14.C to test proper base handling with virtual <=>. Here's
> what I'm committing:

Thanks a lot.

I see spaceship-synth8.C is accepted without errors (| LOOKUP_NONVIRTUAL |
LOOKUP_DEFAULTED didn't help it for me back when playing with it), but if I add

E e1, e2;
auto x = e1 <=> e2;
at the end of it, it is rejected:
spaceship-synth8.C:26:17: error: use of deleted function ‘virtual constexpr 
std::strong_ordering E::operator<=>(const E&) const’
   26 | auto x = e1 <=> e2;
  | ^~
spaceship-synth8.C:22:24: note: ‘virtual constexpr std::strong_ordering 
E::operator<=>(const E&) const’ is implicitly deleted because the default 
definition would be ill-formed:
   22 |   std::strong_ordering operator<=>(const E&) const override = default;
  |^~~~
spaceship-synth8.C:21:8: error: no match for ‘operator<=>’ (operand types are 
‘const D’ and ‘const D’)
   21 | struct E : D {
  |^
spaceship-synth8.C:19:32: note: candidate: ‘virtual std::strong_ordering 
D::operator<=>(const E&) const’ (reversed)
   19 |   virtual std::strong_ordering operator<=>(const struct E&) const = 0;
  |^~~~
spaceship-synth8.C:19:44: note:   no known conversion for argument 1 from 
‘const D’ to ‘const E&’
   19 |   virtual std::strong_ordering operator<=>(const struct E&) const = 0;
  |^~~

Is that ok (i.e. whether it is accepted or rejected when the operator<=>
is actually not called falls into "no diagnostics required" category)?

Note, before this fix we were accepting it even with those
E e1, e2;
auto x = e1 <=> e2;
lines in there.  Perhaps we want to copy spaceship-synth8.C to another
test that will add those two lines and check for the errors...

Jakub



[PATCH] Properly parse invariant &MEM addresses in the GIMPLE FE

2021-10-06 Thread Richard Biener via Gcc-patches
Currently the frontend rejects those addresses as not lvalues
because the C frontend doens't expect MEM_REF or TARGET_MEM_REF
to appear (but they would be valid lvalues there).  The following
fixes that by amending lvalue_p.

The change also makes the dumping of the source of the testcase
valid for the GIMPLE FE by not eliding the '&' when dumping
string literals.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Joseph, is the C frontend change OK?

Thanks,
Richard.

2021-10-06  Richard Biener  

gcc/c/
* c-typeck.c (lvalue_p): Also allow MEM_REF and TARGET_MEM_REF.

gcc/
* tree-pretty-print.c (dump_generic_node): Do not elide
printing '&' when dumping with -gimple.

gcc/testsuite/
* gcc.dg/gimplefe-47.c: New testcase.
---
 gcc/c/c-typeck.c   |  4 
 gcc/testsuite/gcc.dg/gimplefe-47.c | 27 +++
 gcc/tree-pretty-print.c|  7 +--
 3 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gimplefe-47.c

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index f9eb0e5176f..0aac978c02e 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -4968,6 +4968,10 @@ lvalue_p (const_tree ref)
 case STRING_CST:
   return true;
 
+case MEM_REF:
+case TARGET_MEM_REF:
+  /* MEM_REFs can appear from -fgimple parsing or folding, so allow them
+here as well.  */
 case INDIRECT_REF:
 case ARRAY_REF:
 case VAR_DECL:
diff --git a/gcc/testsuite/gcc.dg/gimplefe-47.c 
b/gcc/testsuite/gcc.dg/gimplefe-47.c
new file mode 100644
index 000..3bbd34d669f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gimplefe-47.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple" } */
+
+char * begfield (int tab, char * ptr, char * lim, int sword, int schar);
+
+int __GIMPLE (ssa)
+main ()
+{
+  char * lim;
+  char * s;
+  char * _1;
+
+  __BB(2):
+  _1 = begfield (58, ":ab", &__MEM  ((void *)&":ab" + _Literal
+(void *) 3), 1, 1);
+  if (_1 != _Literal (char *) &__MEM  ((void *)&":ab" + _Literal 
(void *) 2))
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  __builtin_abort ();
+
+  __BB(4):
+  __builtin_exit (0);
+}
+
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 30a3945c37c..275dc7d8af7 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -2888,10 +2888,13 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
 case PREDECREMENT_EXPR:
 case PREINCREMENT_EXPR:
 case INDIRECT_REF:
-  if (TREE_CODE (node) == ADDR_EXPR
+  if (!(flags & TDF_GIMPLE)
+ && TREE_CODE (node) == ADDR_EXPR
  && (TREE_CODE (TREE_OPERAND (node, 0)) == STRING_CST
  || TREE_CODE (TREE_OPERAND (node, 0)) == FUNCTION_DECL))
-   ;   /* Do not output '&' for strings and function pointers.  */
+   /* Do not output '&' for strings and function pointers when not
+  dumping GIMPLE FE syntax.  */
+   ;
   else
pp_string (pp, op_symbol (node));
 
-- 
2.31.1


[PATCH][v2] More consistently dump GIMPLE FE consumable stmts

2021-10-06 Thread Richard Biener via Gcc-patches
The following makes more stmts consumable with the GIMPLE FE
when dumping with -gimple.  In particular addresses in GIMPLE
operand position require wrapping with _Literal.

The TDF_ flag space is now exhausted and I've removed overlaps
and re-ordered things as to how it is supposed to work and
made it uint32_t and prepared the operator overloads for an
easy migration to uint64_t once required.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-05  Richard Biener  

PR c/102605
* dumpfile.h (TDF_GIMPLE_VAL): New.
(dump_flag): Re-order and adjust TDF_* flags.  Make
the enum uint32_t.  Use std::underlying_type in the
operator overloads.
(optgroup_flag): Likewise for the operator overloads.
* tree-pretty-print.c (dump_generic_node): Wrap ADDR_EXPR
in _Literal if TDF_GIMPLE_VAL.
* gimple-pretty-print.c (dump_gimple_assign): Add
TDF_GIMPLE_VAL to flags when dumping operands where only
is_gimple_val are allowed.
(dump_gimple_cond): Likewise.
---
 gcc/dumpfile.h| 55 +++
 gcc/gimple-pretty-print.c | 21 +++
 gcc/tree-pretty-print.c   | 10 ++-
 3 files changed, 57 insertions(+), 29 deletions(-)

diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index 6c7758dd2fb..76226753845 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -74,7 +74,7 @@ enum dump_kind
the DUMP_OPTIONS array in dumpfile.c. The TDF_* flags coexist with
MSG_* flags (for -fopt-info) and the bit values must be chosen to
allow that.  */
-enum dump_flag
+enum dump_flag : uint32_t
 {
   /* Value of TDF_NONE is used just for bits filtered by TDF_KIND_MASK.  */
   TDF_NONE  = 0,
@@ -140,23 +140,26 @@ enum dump_flag
   /* Dump SCEV details.  */
   TDF_SCEV = (1 << 19),
 
-  /* Dump in GIMPLE FE syntax  */
+  /* Dump in GIMPLE FE syntax.  */
   TDF_GIMPLE = (1 << 20),
 
   /* Dump folding details.  */
   TDF_FOLDING = (1 << 21),
 
+  /* Dumping for range path solver.  */
+  TDF_THREADING = (1 << 22),
+
   /* MSG_* flags for expressing the kinds of message to
  be emitted by -fopt-info.  */
 
   /* -fopt-info optimized sources.  */
-  MSG_OPTIMIZED_LOCATIONS = (1 << 22),
+  MSG_OPTIMIZED_LOCATIONS = (1 << 23),
 
   /* Missed opportunities.  */
-  MSG_MISSED_OPTIMIZATION = (1 << 23),
+  MSG_MISSED_OPTIMIZATION = (1 << 24),
 
   /* General optimization info.  */
-  MSG_NOTE = (1 << 24),
+  MSG_NOTE = (1 << 25),
 
   /* Mask for selecting MSG_-kind flags.  */
   MSG_ALL_KINDS = (MSG_OPTIMIZED_LOCATIONS
@@ -175,33 +178,33 @@ enum dump_flag
  sub-option of -fopt-info to show the internal messages.  */
 
   /* Implicitly supplied for messages at the top-level dump scope.  */
-  MSG_PRIORITY_USER_FACING = (1 << 25),
+  MSG_PRIORITY_USER_FACING = (1 << 26),
 
   /* Implicitly supplied for messages within nested dump scopes.  */
-  MSG_PRIORITY_INTERNALS = (1 << 26),
+  MSG_PRIORITY_INTERNALS = (1 << 27),
 
   /* Supplied when an opt_problem generated in a nested scope is re-emitted
  at the top-level.   We want to default to showing these in -fopt-info
  output, but to *not* show them in dump files, as the message would be
  shown twice, messing up "scan-tree-dump-times" in DejaGnu tests.  */
-  MSG_PRIORITY_REEMITTED = (1 << 27),
+  MSG_PRIORITY_REEMITTED = (1 << 28),
 
   /* Mask for selecting MSG_PRIORITY_* flags.  */
   MSG_ALL_PRIORITIES = (MSG_PRIORITY_USER_FACING
| MSG_PRIORITY_INTERNALS
| MSG_PRIORITY_REEMITTED),
 
-  /* Dumping for -fcompare-debug.  */
-  TDF_COMPARE_DEBUG = (1 << 28),
+  /* All -fdump- flags.  */
+  TDF_ALL_VALUES = (1 << 29) - 1,
 
-  /* For error.  */
-  TDF_ERROR = (1 << 26),
+  /* Dumping for -fcompare-debug.  */
+  TDF_COMPARE_DEBUG = (1 << 29),
 
-  /* Dumping for range path solver.  */
-  TDF_THREADING = (1 << 27),
+  /* Dump a GIMPLE value which means wrapping certain things with _Literal.  */
+  TDF_GIMPLE_VAL = (1 << 30),
 
-  /* All values.  */
-  TDF_ALL_VALUES = (1 << 29) - 1
+  /* For error.  */
+  TDF_ERROR = ((uint32_t)1 << 31),
 };
 
 /* Dump flags type.  */
@@ -211,32 +214,36 @@ typedef enum dump_flag dump_flags_t;
 static inline dump_flags_t
 operator| (dump_flags_t lhs, dump_flags_t rhs)
 {
-  return (dump_flags_t)((int)lhs | (int)rhs);
+  return (dump_flags_t)((std::underlying_type::type)lhs
+   | (std::underlying_type::type)rhs);
 }
 
 static inline dump_flags_t
 operator& (dump_flags_t lhs, dump_flags_t rhs)
 {
-  return (dump_flags_t)((int)lhs & (int)rhs);
+  return (dump_flags_t)((std::underlying_type::type)lhs
+   & (std::underlying_type::type)rhs);
 }
 
 static inline dump_flags_t
 operator~ (dump_flags_t flags)
 {
-  return (dump_flags_t)~((int)flags);
+  return (dump_flags_t)~((std::underlying_type::type)flags);
 }
 
 static inline dump_flags_t &
 operator|= (dump_flags_t &lhs, dump_flags_t rhs)
 {
-  lhs = (dump_flags_t)(

Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-06 Thread Aldy Hernandez via Gcc-patches
[Here go the bits by Richi, tested on x86-64 Linux, and ongoing tests
on aarch64 and ppc64le.]

There is a lot of fall-out from this patch, as there were many threading
tests that assumed the restrictions introduced by this patch were valid.
Some tests have merely shifted the threading to after loop
optimizations, but others ended up with no threading opportunities at
all.  Surprisingly some tests ended up with more total threads.  It was
a crapshoot all around.

On a postive note, there are 6 tests that no longer XFAIL, and one
guality test which now passes.

I would appreciate someone reviewing the test changes.  I am unsure
whether some of the tests should be removed, XFAILed, or some other
thing.

I felt a bit queasy about such a fundamental change wrt threading, so I
ran it through my callgrind test harness (.ii files from a bootstrap).
There was no change in overall compilation, DOM, or the VRP threaders.

However, there was a slight increase of 1.63% in the backward threader.
I'm pretty sure we could reduce this if we incorporated the restrictions
into their profitability code.  This way we could stop the search when
we ran into one of these restrictions.  Not sure it's worth it at this
point.

Note, that this ad-hoc testing is not meant to replace a more thorough
SPEC, etc test.

Tested on x86-64 Linux.

OK pending tests on aarch64 and ppc64le?

Co-authored-by: Richard Biener 
From 5c0fb55b45733ec38918f5d7a576719da32e4a6c Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Mon, 4 Oct 2021 09:47:02 +0200
Subject: [PATCH] Disallow loop rotation and loop header crossing in jump
 threaders.

There is a lot of fall-out from this patch, as there are many threading
tests that assumed the restrictions introduced by this patch were valid.
Some tests have merely shifted the threading to after loop
optimizations, but others ended up with no threading opportunities at
all.  Surprisingly some tests ended up with more total threads.  It was
a crapshoot all around.

On a postive note, there are 6 tests that no longer XFAIL, and one
guality test which now passes.

I would appreciate someone reviewing the test changes.  I am unsure
whether some of the tests should be removed, XFAILed, or some other
thing.

I felt a bit queasy about such a fundamental change wrt threading, so I
ran it through my callgrind test harness (.ii files from a bootstrap).
There was no change in overall compilation, DOM, or the VRP threaders.

However, there was a slight increase of 1.63% in the backward threader.
I'm pretty sure we could reduce this if we incorporated the restrictions
into their profitability code.  This way we could stop the search when
we ran into one of these restrictions.  Not sure it's worth it at this
point.

Note, that this ad-hoc testing is not meant to replace a more thorough
SPEC, etc test.

Tested on x86-64 Linux.

OK pending tests on aarch64 and ppc64le?

Co-authored-by: Richard Biener 

gcc/ChangeLog:

	* tree-ssa-threadupdate.c (cancel_thread): Dump threading reason
	on the same line as the threading cancellation.
	(jt_path_registry::cancel_invalid_paths): Avoid rotating loops.
	Avoid threading through loop headers where the path remains in the
	loop.

libgomp/ChangeLog:

	* testsuite/libgomp.graphite/force-parallel-5.c: Remove xfail.

gcc/testsuite/ChangeLog:

	* gcc.dg/Warray-bounds-87.c: Remove xfail.
	* gcc.dg/analyzer/pr94851-2.c: Remove xfail.
	* gcc.dg/graphite/pr69728.c: Remove xfail.
	* gcc.dg/graphite/scop-dsyr2k.c: Remove xfail.
	* gcc.dg/graphite/scop-dsyrk.c: Remove xfail.
	* gcc.dg/shrink-wrap-loop.c: Remove xfail.
	* gcc.dg/loop-8.c: Adjust for new threading restrictions.
	* gcc.dg/tree-ssa/ifc-20040816-1.c: Same.
	* gcc.dg/tree-ssa/pr21559.c: Same.
	* gcc.dg/tree-ssa/pr59597.c: Same.
	* gcc.dg/tree-ssa/pr71437.c: Same.
	* gcc.dg/tree-ssa/pr77445-2.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
	* gcc.dg/vect/bb-slp-16.c: Same.
	* gcc.dg/tree-ssa/ssa-thread-invalid.c: New test.
---
 gcc/testsuite/gcc.dg/Warray-bounds-87.c   |   2 +-
 gcc/testsuite/gcc.dg/analyzer/pr94851-2.c |   2 +-
 gcc/testsuite/gcc.dg/graphite/pr69728.c   |   4 +-
 gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c   |   2 +-
 gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c|   2 +-
 gcc/testsuite/gcc.dg/loop-8.c |   6 +-
 gcc/testsuite/gcc.dg/shrink-wrap-loop.c   |  54 +-
 .../gcc.dg/tree-ssa/ifc-20040816-1.c  |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21559.c   |   7 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr59597.c   |  10 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr71437.c   |   8 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c |   3 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-18.c   |   8 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-2a.c   |  11 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-4.c|  1

PING - (Re: [Patch] Fortran: Various CLASS + assumed-rank fixed [PR102541])

2021-10-06 Thread Tobias Burnus

Early ping for this patch.

(I still plan to review Harald's pending patch soon, unless someone
beats me:
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/580810.html )

On 01.10.21 02:43, Tobias Burnus wrote:

Hi all,

this patch fixes a bunch of issues with CLASS.

 * * *

Side remark: I disliked the way CLASS is represented when it was
introduced;
when writing the testcase for this PR and kept fixing the testcase
fallout,
I started to hate it!
I am sure that there are more issues – but I tried hard not too look
closer
at surrounding code to avoid hitting more issues.
(If you look for a project, I think if you put attributes on separate
lines,
like a separate "POINTER :: var" line, you have a high chance to hit the
error.)

 * * *

What I found rather puzzling is that the 'optional' argument could be
either
on sym->attr.optional or on CLASS_DATA (sym)->attr.optional. I think one
occurs for 'foo2' and the other for 'foo4' - not that I understand why it
differs.

I think it is otherwise straight forward. Regarding the original issue:

In order to detect an assumed-size argument to an assumed-rank array,
the last dimension has 'ubound = -1' to indicate an assume-size array;
for those size(x, dim=rank(x)-1) == -1 and size(x) < 0

However, when the dummy argument (and hence: actual argument) is either
a pointer or an allocatable, the bound is passed as is (in particular,
"-1" is a valid ubound and size(x) >= 0). – However, if the actual
argument is unallocated/not associated, rank(var) still is supposed to
work - hence, it has to be set.

The last two items did work before - but not for CLASS -> CLASS.
Additionally,
the ubound = -1 had an issue for CLASS -> TYPE as the code assumed that
expr->ref is the whole array ("var(full-array-ref)") but for CLASS the
expr->ref is a component and only expr->ref->next is the array ref.
("var%_data(full-array-ref)").

OK for mainline?

Tobias


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[SVE] [gimple-isel] PR93183 - SVE does not use neg as conditional

2021-10-06 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
As mentioned in PR, for the following test-case:

typedef unsigned char uint8_t;

static inline uint8_t
x264_clip_uint8(uint8_t x)
{
  uint8_t t = -x;
  uint8_t t1 = x & ~63;
  return (t1 != 0) ? t : x;
}

void
mc_weight(uint8_t *restrict dst, uint8_t *restrict src, int n)
{
  for (int x = 0; x < n*16; x++)
dst[x] = x264_clip_uint8(src[x]);
}

-O3 -mcpu=generic+sve generates following code for the inner loop:

.L3:
ld1bz0.b, p0/z, [x1, x2]
movprfx z2, z0
and z2.b, z2.b, #0xc0
movprfx z1, z0
neg z1.b, p1/m, z0.b
cmpeq   p2.b, p1/z, z2.b, #0
sel z0.b, p2, z0.b, z1.b
st1bz0.b, p0, [x0, x2]
add x2, x2, x4
whilelo p0.b, w2, w3
b.any   .L3

The sel is redundant since we could conditionally negate z0 based on
the predicate
comparing z2 with 0.

As suggested in the PR, the attached patch, introduces a new
conditional internal function .COND_NEG, and in gimple-isel replaces
the following sequence:
   op2 = -op1
   op0 = A cmp B
   lhs = op0 ? op1 : op2

with:
   op0 = A inverted_cmp B
   lhs = .COND_NEG (op0, op1, op1).

lhs = .COD_NEG (op0, op1, op1)
implies
lhs = neg (op1) if cond is true OR fall back to op1 if cond is false.

With patch, it generates the following code-gen:
.L3:
ld1bz0.b, p0/z, [x1, x2]
movprfx z1, z0
and z1.b, z1.b, #0xc0
cmpne   p1.b, p2/z, z1.b, #0
neg z0.b, p1/m, z0.b
st1bz0.b, p0, [x0, x2]
add x2, x2, x4
whilelo p0.b, w2, w3
b.any   .L3

While it seems to work for this test-case, I am not entirely sure if
the patch is correct. Does it look in the right direction ?

Thanks,
Prathamesh


pr93183-1.diff
Description: Binary data


Re: [PATCH] openmp, fortran: Add support for declare variant in Fortran

2021-10-06 Thread Jakub Jelinek via Gcc-patches
On Wed, Oct 06, 2021 at 12:39:01PM +0100, Kwok Cheung Yeung wrote:
> In secion 2.3.1 of the OpenMP 5.0 spec, it says:
> 
> 3. For functions within a declare target block, the target trait is added to
> the beginning of the set...
> 
> But OpenMP in Fortran doesn't have the notion of a declare target _block_
> (i.e. the #pragma omp declare target/#pragma omp end declare target form),
> only the !$omp declare target (extended-list)/[clause] form (which C/C++
> also has). The C FE differentiates between the two (it applies an 'omp
> declare target block' attribute for the first, an 'omp declare target' for
> the second) but only the first matches against the 'target' construct in a
> context selector. I opted to match against 'omp declare target' for Fortran
> only, otherwise this functionality won't get exercised in Fortran at all.
> This difference is tested in test3 of declare-variant-8.f90, which I have
> XFAILed for now.

Let me answer this separately.  The 5.0 wording I believe means it doesn't
apply to Fortran at all.  This has been noticed in 5.1 and changed to:

For device routines, the target trait is added to the beginning of the set...

which was actually far worse for the LLVM/ICC way of doing things (see
below), whether something is a device routine is determined either through 
explicit
#pragma omp {,begin} declare target
...
#pragma omp end declare target
block which has the advantage that it is known at compile time during
parsing (that is the reason why the 5.0 wording was written that way),
or through explicit or implicit to clauses on explicit declare target
(which can appear before the call site to the function or after it),
or during the implicit declare target to propagation that is done even
later.

Now, in 5.2 some language committee members wanted to make the presence of
absence of target in the construct selector a dynamic property, but that
would make all of the score computations dynamic as well, the presence or
absence of target in the construct selector affects what bit positions other
selectors get during the score computation.

For procedures that are determined to be target function variants by
a declare target directive, the target trait is added to the beginning of the 
set...

So, in the 5.2 wording and the current GCC implementation of offloading,
the presence or absence of target in the construct set or everything it
depends on needs to be deferred until omp_discover_implicit_declare_target
(i.e. before gimplification) for functions that aren't going to be marked
as "declare target to", and till after IPA for functions that are marked
that way (in that case, the host copy will not get target in the constructor
set and the offloading lto1 copies will get it).

As has been said multiple times, the way we do it in GCC is different from
the way LLVM/ICC etc. do it; they preprocess and parse, analyze etc. the
source code multiple times, once for each offloading target, trust user
isn't doing anything nasty and that e.g. preprocessor macros will not make
the host and offloading targets structures used during mapping different,
different target regions etc.  And the way they are implementing is then
shown in the amount of features that assume their way as the only way,
e.g. about begin declare variant ... end declare variant allowing
effectively not to parse what is in between those, which assumes that pretty
much all the static conditions can be resolved already during the parsing,
which is rarely the case for GCC.
So, perhaps we'll need one day to reconsider what we do and we could say
preprocess just once, but parse multiple times if we determine we need to
offload, and at that point questions like "is this the host or offloading
target variant of declare target?" can be answered already during that parsing.
E.g. C++ FE isn't that far from it, it creates the array of lexical tokens
and then parses those tokens (but I think from time to time modifies the
token array, e.g. the purged_p or error_reported bits).
The C FE doesn't do that.
And users can supply - as the source filename and have the source read from
stdin, at which time there is no file to parse again.

So, for this question, my preference would be for now to implement the 5.0
semantics and never add target to construct set for Fortran unless in
explicit !$omp target body.
When we implement the 5.1 semantics, that would basically mean we have to
defer everything related to target construct in the set outside of explicit
target until gimplification time (after the implicit declare target
discovery), though note at least the C/C++ FEs decide everything declare
variant related at gimplification time only (perhaps to be changed in the
future).

Jakub



Re: [PATCH v3] libgcc: Add a backchain fallback to _Unwind_Backtrace() on PowerPC

2021-10-06 Thread Raphael M Zinsly via Gcc-patches

On 05/10/2021 19:03, Segher Boessenkool wrote:

On Tue, Oct 05, 2021 at 03:32:52PM -0300, Raphael Moreira Zinsly wrote:

Without dwarf2 unwind tables available _Unwind_Backtrace() is not
able to return the full backtrace.
This patch adds a fallback function on powerpc to get the backtrace
by doing a backchain, this code was originally at glibc.

libgcc/ChangeLog:

* config/rs6000/linux-unwind.h (struct rt_sigframe): Move it to
outside of get_regs() in order to use it in another function, this
is done twice: for __powerpc64__ and for !__powerpc64__.
(struct trace_arg): New struct.
(struct layout): New struct.
(ppc_backchain_fallback): New function.
* unwind.inc (_Unwind_Backtrace): Look for _URC_NORMAL_STOP code
state and call MD_BACKCHAIN_FALLBACK.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/unwind-backchain.c: New test.
---
  .../gcc.target/powerpc/unwind-backchain.c |  24 +
  libgcc/config/rs6000/linux-unwind.h   | 102 +++---
  libgcc/unwind.inc |  14 ++-
  3 files changed, 124 insertions(+), 16 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/unwind-backchain.c


So what is new or different in v3 compared to v2?



gcc/testsuite/gcc.target/powerpc/unwind-backchain.c:
- Added a comment explaining why test only on *-linux targets.
- Changed the dejagnu target from powerpc*-*-linux* to *-*-linux*.

--
Raphael Moreira Zinsly


Re: [pushed] Darwin, D: Fix bootstrap when target does not support -Bstatic/dynamic.

2021-10-06 Thread ibuclaw--- via Gcc-patches
> On 05/10/2021 21:57 Iain Sandoe  wrote:
> 
>  
> This fixes a bootstrap fail because saw_static_libcxx was unused for
> targets without support for -Bstatic/dynamic.
> 
> The fix applied pushes the -static-libstdc++ back onto the command
> line, which allows a target to substitute a static version of the
> c++ standard library using specs.
> 
> tested on x86_64-darwin, pushed as bootstrap fix (it, or an alternate
> will also be needed on open branches), thanks

Thanks, it will apply cleanly on the other branches, so feel free to backport.

Iain.


Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-06 Thread Andreas Schwab
On Okt 05 2021, Aldy Hernandez via Gcc-patches wrote:

> From 5abe65668f602d53b8f3dc515508dc4616beb048 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez 
> Date: Tue, 5 Oct 2021 15:03:34 +0200
> Subject: [PATCH] Loosen loop crossing restriction in threader.
>
> Crossing loops is generally discouraged from the threader, but we can
> make an exception when we don't cross the latch or enter another loop,
> since this is just an early exit out of the loop.

This breaks bootstrap on aarch64 (in stage2):

In function 'void mark_stack_region_used(poly_uint64, poly_uint64)',
inlined from 'rtx_def* emit_library_call_value_1(int, rtx, rtx, 
libcall_type, machine_mode, int, rtx_mode_t*)' at ../../gcc/calls.c:4536:29:
../../gcc/calls.c:206:26: error: 'const_upper' may be used uninitialized in 
this function [-Werror=maybe-uninitialized]
  206 |   stack_usage_map[i] = 1;
  |   ~~~^~~
../../gcc/calls.c: In function 'rtx_def* emit_library_call_value_1(int, rtx, 
rtx, libcall_type, machine_mode, int, rtx_mode_t*)':
../../gcc/calls.c:202:39: note: 'const_upper' was declared here
  202 |   unsigned HOST_WIDE_INT const_lower, const_upper;
  |   ^~~

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-06 Thread Aldy Hernandez via Gcc-patches
The pending patch I have from Richi fixes this.  Even so, it's the
uninit code that's confused.

Sigh...every single change to the threading code shines the light on
some warning bug.

If you take the calls.ii file from the aarch64 bootstrap and break on
the warning, you can see that the uninitalized use is for
const_upper_3934 here:

  [local count: 315357954]:
  # const_upper_3934 = PHI 
  if (_881 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 157678977]:
  if (const_upper_3934 > _6699)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 17344687]:

   [local count: 157678977]:
  goto ; [100.00%]

   [local count: 140334290]:
  stack_usage_map.481_3930 = stack_usage_map;
  _6441 = const_upper_3934 - _6699;


PROBLEMATIC READ HERE

  _4819 = stack_usage_map.481_3930 + _6699;
  __builtin_memset (_4819, 1, _6441);
  goto ; [11.00%]

const_upper_3934 could be undefined if it comes from BB101
(const_upper_3937(D)), but it only gets read for _881 != 0, so it
shouldn't warn.

I suggest -Wmaybe-uninitialized be turned off for that file until the
warning is fixed.

And yes, the proposed patch will also cure this, but the underlying
problem in the warning is still there.

Aldy

On Wed, Oct 6, 2021 at 3:24 PM Andreas Schwab  wrote:
>
> On Okt 05 2021, Aldy Hernandez via Gcc-patches wrote:
>
> > From 5abe65668f602d53b8f3dc515508dc4616beb048 Mon Sep 17 00:00:00 2001
> > From: Aldy Hernandez 
> > Date: Tue, 5 Oct 2021 15:03:34 +0200
> > Subject: [PATCH] Loosen loop crossing restriction in threader.
> >
> > Crossing loops is generally discouraged from the threader, but we can
> > make an exception when we don't cross the latch or enter another loop,
> > since this is just an early exit out of the loop.
>
> This breaks bootstrap on aarch64 (in stage2):
>
> In function 'void mark_stack_region_used(poly_uint64, poly_uint64)',
> inlined from 'rtx_def* emit_library_call_value_1(int, rtx, rtx, 
> libcall_type, machine_mode, int, rtx_mode_t*)' at ../../gcc/calls.c:4536:29:
> ../../gcc/calls.c:206:26: error: 'const_upper' may be used uninitialized in 
> this function [-Werror=maybe-uninitialized]
>   206 |   stack_usage_map[i] = 1;
>   |   ~~~^~~
> ../../gcc/calls.c: In function 'rtx_def* emit_library_call_value_1(int, rtx, 
> rtx, libcall_type, machine_mode, int, rtx_mode_t*)':
> ../../gcc/calls.c:202:39: note: 'const_upper' was declared here
>   202 |   unsigned HOST_WIDE_INT const_lower, const_upper;
>   |   ^~~
>
> Andreas.
>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."
>



Re: [PATCH #2] Introduce smul_highpart and umul_highpart RTX for high-part multiplications

2021-10-06 Thread Richard Sandiford via Gcc-patches
"Roger Sayle"  writes:
> Hi Richard,
>
> All excellent suggestions.  The revised patch below implements all of
> your (and Andreas') recommendations.  I'm happy to restrict GCC's support
> for saturating arithmetic to integer types, even though I do know of one
> target (nvptx) that supports saturating floating point math, where results
> are clamped to [0.0, 1.0], but I've not investigated how NaNs or signed
> zeros are handled.
>
> Good catch on my min/max typo.  It convinced me to work harder to come
> up with some test cases for these simplifications, which I've managed to
> trigger on x86_64-pc-linux-gnu in the four new attached test cases.
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.  Ok for mainline?
>
> 2021-09-29  Roger Sayle  
>   Richard Sandiford  

Thanks for the cite, but I don't think I should be listed here.
I didn't really write anything :-)

The new patch is OK, thanks.  I don't know whether the x86 tests need
special markup.

Richard

>
> gcc/ChangeLog
>   * gcc/rtl.def (SMUL_HIGHPART, UMUL_HIGHPART): New RTX codes for
>   representing signed and unsigned high-part multiplication resp.
>   * gcc/simplify-rtx.c (simplify_binary_operation_1) [SMUL_HIGHPART,
>   UMUL_HIGHPART]: Simplify high-part multiplications by zero.
>   [SS_PLUS, US_PLUS, SS_MINUS, US_MINUS, SS_MULT, US_MULT,
>   SS_DIV, US_DIV]: Similar simplifications for saturating
>   arithmetic.
>   (simplify_const_binary_operation) [SS_PLUS, US_PLUS, SS_MINUS,
>   US_MINUS, SS_MULT, US_MULT, SMUL_HIGHPART, UMUL_HIGHPART]:
>   Implement compile-time evaluation for constant operands.
>
>   * gcc/dwarf2out.c (mem_loc_descriptor): Skip SMUL_HIGHPART and
>   UMUL_HIGHPART.
>   * doc/rtl.texi (smul_highpart, umul_highpart): Document RTX codes.
>   * doc/md.texi (smul@var{m}3_highpart, umul@var{m3}_highpart):
>   Mention the new smul_highpart and umul_highpart RTX codes.
>   * doc/invoke.texi: Silence @xref "compilation" warnings.
>
> gcc/testsuite/ChangeLog
>   * gcc.target/i386/sse2-mmx-paddsb-2.c: New test case.
>   * gcc.target/i386/sse2-mmx-paddusb-2.c: New test case.
>   * gcc.target/i386/sse2-mmx-subsb-2.c: New test case.
>   * gcc.target/i386/sse2-mmx-subusb-2.c: New test case.
>
> Roger
> --
>
> -Original Message-
> From: Richard Sandiford  
> Sent: 27 September 2021 16:44
> To: Roger Sayle 
> Cc: 'GCC Patches' 
> Subject: Re: [PATCH] Introduce sh_mul and uh_mul RTX codes for high-part 
> multiplications
>
> "Roger Sayle"  writes:
>> This patch introduces new RTX codes to allow the RTL passes and 
>> backends to consistently represent high-part multiplications.
>> Currently, the RTL used by different backends for expanding 
>> smul3_highpart and umul3_highpart varies greatly, with 
>> many but not all choosing to express this something like:
>>
>> (define_insn "smuldi3_highpart"
>>   [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
>>(truncate:DI
>> (lshiftrt:TI
>>  (mult:TI (sign_extend:TI
>>(match_operand:DI 1 "nvptx_register_operand" "R"))
>>   (sign_extend:TI
>>(match_operand:DI 2 "nvptx_register_operand" "R")))
>>  (const_int 64]
>>   ""
>>   "%.\\tmul.hi.s64\\t%0, %1, %2;")
>>
>> One complication with using this "widening multiplication" 
>> representation is that it requires an intermediate in a wider mode, 
>> making it difficult or impossible to encode a high-part multiplication 
>> of the widest supported integer mode.
>
> Yeah.  It's also a problem when representing vector ops.
>
>> A second is that it can interfere with optimization; for example 
>> simplify-rtx.c contains the comment:
>>
>>case TRUNCATE:
>>   /* Don't optimize (lshiftrt (mult ...)) as it would interfere
>>  with the umulXi3_highpart patterns.  */
>>
>> Hopefully these problems are solved (or reduced) by introducing a new 
>> canonical form for high-part multiplications in RTL passes.
>> This also simplifies insn patterns when one operand is constant.
>>
>> Whilst implementing some constant folding simplifications and 
>> compile-time evaluation of these new RTX codes, I noticed that this 
>> functionality could also be added for the existing saturating 
>> arithmetic RTX codes.  Then likewise when documenting these new RTX 
>> codes, I also took the opportunity to silence the @xref warnings in 
>> invoke.texi.
>>
>> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
>> and "make -k check" with no new failures.  Ok for mainline?
>>
>>
>> 2021-09-25  Roger Sayle  
>>
>> gcc/ChangeLog
>>  * gcc/rtl.def (SH_MULT, UH_MULT): New RTX codes for representing
>>  signed and unsigned high-part multiplication respectively.
>>  * gcc/simplify-rtx.c (simplify_binary_operation_1) [SH_MULT,
>>  UH_MULT]: Simplify high-part multiplications by zero

Re: [PATCH 1/7]AArch64 Add combine patterns for right shift and narrow

2021-10-06 Thread Richard Sandiford via Gcc-patches
(Nice optimisations!)

Kyrylo Tkachov  writes:
> Hi Tamar,
>
>> -Original Message-
>> From: Tamar Christina 
>> Sent: Wednesday, September 29, 2021 5:19 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: nd ; Richard Earnshaw ;
>> Marcus Shawcroft ; Kyrylo Tkachov
>> ; Richard Sandiford
>> 
>> Subject: [PATCH 1/7]AArch64 Add combine patterns for right shift and
>> narrow
>>
>> Hi All,
>>
>> This adds a simple pattern for combining right shifts and narrows into
>> shifted narrows.
>>
>> i.e.
>>
>> typedef short int16_t;
>> typedef unsigned short uint16_t;
>>
>> void foo (uint16_t * restrict a, int16_t * restrict d, int n)
>> {
>> for( int i = 0; i < n; i++ )
>>   d[i] = (a[i] * a[i]) >> 10;
>> }
>>
>> now generates:
>>
>> .L4:
>> ldr q0, [x0, x3]
>> umull   v1.4s, v0.4h, v0.4h
>> umull2  v0.4s, v0.8h, v0.8h
>> shrnv1.4h, v1.4s, 10
>> shrn2   v1.8h, v0.4s, 10
>> str q1, [x1, x3]
>> add x3, x3, 16
>> cmp x4, x3
>> bne .L4
>>
>> instead of:
>>
>> .L4:
>> ldr q0, [x0, x3]
>> umull   v1.4s, v0.4h, v0.4h
>> umull2  v0.4s, v0.8h, v0.8h
>> sshrv1.4s, v1.4s, 10
>> sshrv0.4s, v0.4s, 10
>> xtn v1.4h, v1.4s
>> xtn2v1.8h, v0.4s
>> str q1, [x1, x3]
>> add x3, x3, 16
>> cmp x4, x3
>> bne .L4
>>
>> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>>
>> Ok for master?
>>
>> Thanks,
>> Tamar
>>
>> gcc/ChangeLog:
>>
>>   * config/aarch64/aarch64-simd.md
>> (*aarch64_shrn_vect,
>>   *aarch64_shrn2_vect): New.
>>   * config/aarch64/iterators.md (srn_op): New.
>>
>> gcc/testsuite/ChangeLog:
>>
>>   * gcc.target/aarch64/shrn-combine.c: New test.
>>
>> --- inline copy of patch --
>> diff --git a/gcc/config/aarch64/aarch64-simd.md
>> b/gcc/config/aarch64/aarch64-simd.md
>> index
>> 48eddf64e05afe3788abfa05141f6544a9323ea1..d7b6cae424622d259f97a3d5
>> fa9093c0fb0bd5ce 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -1818,6 +1818,28 @@ (define_insn "aarch64_shrn_insn_be"
>>[(set_attr "type" "neon_shift_imm_narrow_q")]
>>  )
>>
>> +(define_insn "*aarch64_shrn_vect"
>> +  [(set (match_operand: 0 "register_operand" "=w")
>> +(truncate:
>> +  (SHIFTRT:VQN (match_operand:VQN 1 "register_operand" "w")
>> +(match_operand:VQN 2
>> "aarch64_simd_shift_imm_vec_"]
>> +  "TARGET_SIMD"
>> +  "shrn\\t%0., %1., %2"
>> +  [(set_attr "type" "neon_shift_imm_narrow_q")]
>> +)
>> +
>> +(define_insn "*aarch64_shrn2_vect"
>> +  [(set (match_operand: 0 "register_operand" "=w")
>> + (vec_concat:
>> +   (match_operand: 1 "register_operand" "0")
>> +   (truncate:
>> + (SHIFTRT:VQN (match_operand:VQN 2 "register_operand" "w")
>> +   (match_operand:VQN 3
>> "aarch64_simd_shift_imm_vec_")]
>> +  "TARGET_SIMD"
>> +  "shrn2\\t%0., %2., %3"
>> +  [(set_attr "type" "neon_shift_imm_narrow_q")]
>> +)
>
> I think this needs to be guarded on !BYTES_BIG_ENDIAN and a similar pattern 
> added for BYTES_BIG_ENDIAN with the vec_concat operands swapped around.
> This is similar to the aarch64_xtn2_insn_be pattern, for example.

Yeah.  I think that applies to 2/7 and 4/7 too.

Thanks,
Richard


[COMMITTED] Ranger: More efficient zero/nonzero check.

2021-10-06 Thread Andrew MacLeod via Gcc-patches


This is the first in a set of performance improvements.  Over the summer 
a few things have snuck in that are slowing down EVRP.


A recent change introduced a frequent check for zero and non-zero which 
has caused a lot of extra temporary trees to be created.


This patch makes the check more efficient as the non-null processing is 
always a pointer and thus unsigned.


Over a set of 380 GCC source files, it produces a 13% speedup in EVRP.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew


>From 4b8ca6c6177b2bd948c1cb2a116955b942751559 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 27 Sep 2021 18:53:54 -0400
Subject: [PATCH 1/4] Ranger: More efficient zero/nonzero check.

A recent change introduced a frequent check for zero and non-zero which has
caused a lot of extra temporary trees to be created.  Make the check more
efficent as it is always a pointer and thus unsigned.

	* gimple-range-cache.cc (non_null_ref::adjust_range): Check for
	zero and non-zero more efficently.
---
 gcc/gimple-range-cache.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 61043d3f375..91dd5a5c087 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -98,9 +98,10 @@ non_null_ref::adjust_range (irange &r, tree name, basic_block bb,
 return false;
 
   // We only care about the null / non-null property of pointers.
-  if (!POINTER_TYPE_P (TREE_TYPE (name)) || r.zero_p () || r.nonzero_p ())
+  if (!POINTER_TYPE_P (TREE_TYPE (name)))
+return false;
+  if (r.undefined_p () || r.lower_bound () != 0 || r.upper_bound () == 0)
 return false;
-
   // Check if pointers have any non-null dereferences.
   if (non_null_deref_p (name, bb, search_dom))
 {
-- 
2.17.2



[COMMITTED] Use TYPE_MIN/MAX_VALUE in set_varying when possible.

2021-10-06 Thread Andrew MacLeod via Gcc-patches


We set the upper and lower bounds in an irange whenever we call 
set_varying (). Unfortunately, we create a new tree for the bounds each 
time, and this is quite expensive.. creating a lot of unnecessary work.


This patch will use the values of TYPE_MIN_VALUE and TYPE_MAX_VALUE if 
possible, instead of creating new trees.  We cant simply use it always 
because a) pointers dont have such a thing, and b) enum types require 
varying to have the range of the underlying type rather than the range 
of the enum as reflected by the macros.


Regardless, even with the requisite checks this produces decent 
results.  Over a set of 380 GCC source files, it produces a further 4.5% 
speedup in EVRP.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From e828f4b5898896240b2ae5d5030c539aff28ea24 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 28 Sep 2021 13:11:22 -0400
Subject: [PATCH 2/4] Use TYPE_MIN/MAX_VALUE in set_varying when possible.

We currently create new trees every time... which is very wasteful and time
consuming. Instead, just use the TYPE_MIN/MAX_VALUE.

	* value-range.h (irange::set_varying): Use TYPE_MIN_VALUE and
	TYPE_MAX_VALUE instead of creating new trees when possible.
---
 gcc/value-range.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index a8adc50b98e..39e8f3bcdee 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -476,10 +476,21 @@ irange::set_varying (tree type)
 
   if (INTEGRAL_TYPE_P (type))
 {
+  // Strict enum's require varying to be not TYPE_MIN/MAX, but rather
+  // min_value and max_value.
   wide_int min = wi::min_value (TYPE_PRECISION (type), TYPE_SIGN (type));
   wide_int max = wi::max_value (TYPE_PRECISION (type), TYPE_SIGN (type));
-  m_base[0] = wide_int_to_tree (type, min);
-  m_base[1] = wide_int_to_tree (type, max);
+  if (wi::eq_p (max, wi::to_wide (TYPE_MAX_VALUE (type)))
+	  && wi::eq_p (min, wi::to_wide (TYPE_MIN_VALUE (type
+	{
+	  m_base[0] = TYPE_MIN_VALUE (type);
+	  m_base[1] = TYPE_MAX_VALUE (type);
+	}
+  else
+	{
+	  m_base[0] = wide_int_to_tree (type, min);
+	  m_base[1] = wide_int_to_tree (type, max);
+	}
 }
   else if (POINTER_TYPE_P (type))
 {
-- 
2.17.2



[COMMITTED] Introduce a param-switch-limit for EVRP.

2021-10-06 Thread Andrew MacLeod via Gcc-patches
One of the cases which causes ranger to bog down is processing large 
switch statements.


We create ranges for each individual case, and at meet points need to 
intersect those ranges and combine them.  As Switches get larger, the 
number of subranges at meets points increase and this becomes a much 
more expensive operation, and it can becomes quadratic or worse in 
pathological cases.


This patch introduces a --param=evrp-switch-limit variables which is 
used to decide a switch is too large to process, and we will pretend we 
don't see it.  I've run various values thru a testsuite, and the current 
default of 50 appears to give a good trade-off between speedup in 
compilation times vs results.


Over 380 GCC source files, this default setting only misses 20 out of 
5000 simplification opportunities, which producing a whopping 40% 
speedup in EVRP compilation.


I expect to be able to remove this in the next release with some plans I 
have for reducing propagation, but this is a decent solution for now.


We could consider increasing that default value when running at -O3.

I also ran experiments to limit the number of sub-ranges instead of, and 
along with, this change.  Limiting the number of subranges had no 
measurable effect on things.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew



>From 3ca950c3525527846f13e8c547368ef432547a23 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 29 Sep 2021 17:25:50 -0400
Subject: [PATCH 3/4] Introduce a param-switch-limit for EVRP.

Very large switches cause a lot of range calculations with multiple subranges
to happen.  This can cause quadratic or even exponetial time increases in
large testcases.  This patch introduces a param variable to limit
the size of switches EVRP will process.

	* gimple-range-edge.cc (gimple_outgoing_range::gimple_outgoing_range):
	Add parameter to limit size when recognizing switches.
	(gimple_outgoing_range::edge_range_p): Check size limit.
	* gimple-range-edge.h (gimple_outgoing_range): Add size field.
	* gimple-range-gori.cc (gori_map::calculate_gori): Ignore switches
	that exceed the size limit.
	(gori_compute::gori_compute): Add initializer.
	* params.opt (evrp-switch-limit): New.
	* doc/invoke.texi: Update docs.
---
 gcc/doc/invoke.texi  | 3 +++
 gcc/gimple-range-edge.cc | 7 ++-
 gcc/gimple-range-edge.h  | 3 ++-
 gcc/gimple-range-gori.cc | 6 +-
 gcc/params.opt   | 4 
 5 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b4eaa7793b7..80bc7fe41e0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14505,6 +14505,9 @@ Maximum number of basic blocks before EVRP uses a sparse cache.
 @item evrp-mode
 Specifies the mode Early VRP should operate in.
 
+@item evrp-switch-limit
+Specifies the maximum number of switch cases before EVRP ignores a switch.
+
 @item unroll-jam-min-percent
 The minimum percentage of memory references that must be optimized
 away for the unroll-and-jam transformation to be considered profitable.
diff --git a/gcc/gimple-range-edge.cc b/gcc/gimple-range-edge.cc
index d11153e677e..afffc8dbcae 100644
--- a/gcc/gimple-range-edge.cc
+++ b/gcc/gimple-range-edge.cc
@@ -65,9 +65,10 @@ gcond_edge_range (irange &r, edge e)
 }
 
 
-gimple_outgoing_range::gimple_outgoing_range ()
+gimple_outgoing_range::gimple_outgoing_range (int max_sw_edges)
 {
   m_edge_table = NULL;
+  m_max_edges = max_sw_edges;
 }
 
 
@@ -192,6 +193,10 @@ gimple_outgoing_range::edge_range_p (irange &r, edge e)
   return s;
 }
 
+  // Only process switches if it within the size limit.
+  if (EDGE_COUNT (e->src->succs) > (unsigned)m_max_edges)
+return NULL;
+
   gcc_checking_assert (is_a (s));
   gswitch *sw = as_a (s);
   tree type = TREE_TYPE (gimple_switch_index (sw));
diff --git a/gcc/gimple-range-edge.h b/gcc/gimple-range-edge.h
index 87b4124d01d..03e8e82bd03 100644
--- a/gcc/gimple-range-edge.h
+++ b/gcc/gimple-range-edge.h
@@ -38,13 +38,14 @@ along with GCC; see the file COPYING3.  If not see
 class gimple_outgoing_range
 {
 public:
-  gimple_outgoing_range ();
+  gimple_outgoing_range (int max_sw_edges = INT_MAX);
   ~gimple_outgoing_range ();
   gimple *edge_range_p (irange &r, edge e);
 private:
   void calc_switch_ranges (gswitch *sw);
   bool get_edge_range (irange &r, gimple *s, edge e);
 
+  int m_max_edges;
   hash_map *m_edge_table;
   irange_allocator m_range_allocator;
 };
diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 4a1ade7f921..6946fa65dda 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -565,6 +565,9 @@ gori_map::calculate_gori (basic_block bb)
 }
   else
 {
+  // Do not process switches if they are too large.
+  if (EDGE_COUNT (bb->succs) > (unsigned)param_evrp_switch_limit)
+	return;
   gswitch *gs = as_a(stmt);
   name = gimple_range_ssa_p (gimple_switch_index (gs));
   maybe_add_gori (name, gimple_bb (stmt));
@@ -

[COMMITTED] Add range intersect with 2 wide-ints.

2021-10-06 Thread Andrew MacLeod via Gcc-patches
This patch provides an intersect to irange which accepts a wide_int 
lower and upper bound instead of an irange.


This both prevents us from having to creates trees when we needs a 
temporary irange of single bounds.


This is also a bit more efficient as we can do the intersect directly in 
the sub-range vector of the destination irange (which we can't do as 
easily in the general case).


As a result, I've changed the general intersect routine to call this if 
the other irange has only a single pair.  I've also changed the 3 caller 
locations I could find which are amenable to this change.


Over a set of 380 GCC source files, it produces a nominal 0.8% speedup 
in EVRP.


All 4 patch sets combined produce a 50% speedup in EVRP.

I'm also looking at something similar for intersect which is both a more 
expensive operation, and of course, more complicated.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew


>From ad451b020a24fe7111e668f8c41a3ba648104569 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 4 Oct 2021 15:30:44 -0400
Subject: [PATCH 4/4] Add range intersect with 2 wide-ints.

Add a more efficent intersect using a lower/upper bound single pair of
wide_ints.

	* gimple-range-cache.cc (non_null_ref::adjust_range): Call new
	intersect routine.
	* gimple-range-fold.cc (adjust_pointer_diff_expr): Ditto.
	(adjust_imagpart_expr): Ditto.
	* value-range.cc (irange::irange_intersect): Call new routine if
	RHS is a single pair.
	(irange::intersect): New wide_int version.
	* value-range.h (class irange): New prototype.
---
 gcc/gimple-range-cache.cc |  6 ++--
 gcc/gimple-range-fold.cc  | 14 +++-
 gcc/value-range.cc| 69 +++
 gcc/value-range.h |  1 +
 4 files changed, 78 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 91dd5a5c087..7d994798e52 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -105,9 +105,9 @@ non_null_ref::adjust_range (irange &r, tree name, basic_block bb,
   // Check if pointers have any non-null dereferences.
   if (non_null_deref_p (name, bb, search_dom))
 {
-  int_range<2> nz;
-  nz.set_nonzero (TREE_TYPE (name));
-  r.intersect (nz);
+  // Remove zero from the range.
+  unsigned prec = TYPE_PRECISION (TREE_TYPE (name));
+  r.intersect (wi::one (prec), wi::max_value (prec, UNSIGNED));
   return true;
 }
   return false;
diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index bb09b751a4e..ed2fbe121cf 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -360,12 +360,9 @@ adjust_pointer_diff_expr (irange &res, const gimple *diff_stmt)
   && integer_zerop (gimple_call_arg (call, 1)))
 {
   tree max = vrp_val_max (ptrdiff_type_node);
-  wide_int wmax = wi::to_wide (max, TYPE_PRECISION (TREE_TYPE (max)));
-  tree expr_type = gimple_range_type (diff_stmt);
-  tree range_min = build_zero_cst (expr_type);
-  tree range_max = wide_int_to_tree (expr_type, wmax - 1);
-  int_range<2> r (range_min, range_max);
-  res.intersect (r);
+  unsigned prec = TYPE_PRECISION (TREE_TYPE (max));
+  wide_int wmaxm1 = wi::to_wide (max, prec) - 1;
+  res.intersect (wi::zero (prec), wmaxm1);
 }
 }
 
@@ -405,9 +402,8 @@ adjust_imagpart_expr (irange &res, const gimple *stmt)
   tree cst = gimple_assign_rhs1 (def_stmt);
   if (TREE_CODE (cst) == COMPLEX_CST)
 	{
-	  tree imag = TREE_IMAGPART (cst);
-	  int_range<2> tmp (imag, imag);
-	  res.intersect (tmp);
+	  wide_int imag = wi::to_wide (TREE_IMAGPART (cst));
+	  res.intersect (imag, imag);
 	}
 }
 }
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index f113fd7c905..147c4b04c1d 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1648,6 +1648,8 @@ void
 irange::irange_intersect (const irange &r)
 {
   gcc_checking_assert (!legacy_mode_p () && !r.legacy_mode_p ());
+  gcc_checking_assert (undefined_p () || r.undefined_p ()
+		   || range_compatible_p (type (), r.type ()));
 
   if (undefined_p () || r.varying_p ())
 return;
@@ -1662,6 +1664,13 @@ irange::irange_intersect (const irange &r)
   return;
 }
 
+  if (r.num_pairs () == 1)
+   {
+ // R cannot be undefined, use more efficent pair routine.
+ intersect (r.lower_bound(), r.upper_bound ());
+ return;
+   }
+
   signop sign = TYPE_SIGN (TREE_TYPE(m_base[0]));
   unsigned bld_pair = 0;
   unsigned bld_lim = m_max_ranges;
@@ -1737,6 +1746,66 @@ irange::irange_intersect (const irange &r)
 verify_range ();
 }
 
+// Multirange intersect for a specified wide_int [lb, ub] range.
+
+void
+irange::intersect (const wide_int& lb, const wide_int& ub)
+{
+  // Undefined remains undefined.
+  if (undefined_p ())
+return;
+
+  if (legacy_mode_p ())
+{
+  intersect (int_range<1> (type (), lb, ub));
+  return;
+}
+
+  tree range_type = type();
+  signop sign = TYPE_SIGN

Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-06 Thread Martin Sebor via Gcc-patches

On 10/6/21 7:47 AM, Aldy Hernandez via Gcc-patches wrote:

The pending patch I have from Richi fixes this.  Even so, it's the
uninit code that's confused.

Sigh...every single change to the threading code shines the light on
some warning bug.

If you take the calls.ii file from the aarch64 bootstrap and break on
the warning, you can see that the uninitalized use is for
const_upper_3934 here:

   [local count: 315357954]:
   # const_upper_3934 = PHI 
   if (_881 != 0)
 goto ; [50.00%]
   else
 goto ; [50.00%]

[local count: 157678977]:
   if (const_upper_3934 > _6699)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 17344687]:

[local count: 157678977]:
   goto ; [100.00%]

[local count: 140334290]:
   stack_usage_map.481_3930 = stack_usage_map;
   _6441 = const_upper_3934 - _6699;


PROBLEMATIC READ HERE

   _4819 = stack_usage_map.481_3930 + _6699;
   __builtin_memset (_4819, 1, _6441);
   goto ; [11.00%]

const_upper_3934 could be undefined if it comes from BB101
(const_upper_3937(D)), but it only gets read for _881 != 0, so it
shouldn't warn.

I suggest -Wmaybe-uninitialized be turned off for that file until the
warning is fixed.


-Wmaybe-uninitialized certainly suffers from a high rate of false
positives (I count 40 open PRs).  Even after all this time debugging
it I still struggle with the code for it but in all the bug reports
I've reviewed, only a small subset seems to be due to bugs or even
limitations in it.  Most are inherent in its design goal to warn
for reads from variables that cannot be proven to have been
initialized.

If this one is a bug with some chance of getting fixed it would
be good to distill it to a small test case (even though it's not
unlikely that there already is an existing report with the same
root cause).

That said, from from your description and the IL above it sounds
to me like the uninitialized read here is possible (it takes place
when _881 != 0), and so -Wmaybe-uininitialized triggers as it's
supposed to.

Either way, rather than suppressing the warning for the whole file
I would suggest to consider initializing the local variable instead.
Looking at the code in calls.c, I can't convince myself that
the variable cannot, in fact, be used uninitialized.

Martin



And yes, the proposed patch will also cure this, but the underlying
problem in the warning is still there.

Aldy

On Wed, Oct 6, 2021 at 3:24 PM Andreas Schwab  wrote:


On Okt 05 2021, Aldy Hernandez via Gcc-patches wrote:


 From 5abe65668f602d53b8f3dc515508dc4616beb048 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Tue, 5 Oct 2021 15:03:34 +0200
Subject: [PATCH] Loosen loop crossing restriction in threader.

Crossing loops is generally discouraged from the threader, but we can
make an exception when we don't cross the latch or enter another loop,
since this is just an early exit out of the loop.


This breaks bootstrap on aarch64 (in stage2):

In function 'void mark_stack_region_used(poly_uint64, poly_uint64)',
 inlined from 'rtx_def* emit_library_call_value_1(int, rtx, rtx, 
libcall_type, machine_mode, int, rtx_mode_t*)' at ../../gcc/calls.c:4536:29:
../../gcc/calls.c:206:26: error: 'const_upper' may be used uninitialized in 
this function [-Werror=maybe-uninitialized]
   206 |   stack_usage_map[i] = 1;
   |   ~~~^~~
../../gcc/calls.c: In function 'rtx_def* emit_library_call_value_1(int, rtx, 
rtx, libcall_type, machine_mode, int, rtx_mode_t*)':
../../gcc/calls.c:202:39: note: 'const_upper' was declared here
   202 |   unsigned HOST_WIDE_INT const_lower, const_upper;
   |   ^~~

Andreas.

--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."







Re: [PATCH v3] libgcc: Add a backchain fallback to _Unwind_Backtrace() on PowerPC

2021-10-06 Thread Segher Boessenkool
On Wed, Oct 06, 2021 at 10:09:31AM -0300, Raphael M Zinsly wrote:
> On 05/10/2021 19:03, Segher Boessenkool wrote:
> >So what is new or different in v3 compared to v2?
> 
> gcc/testsuite/gcc.target/powerpc/unwind-backchain.c:
>   - Added a comment explaining why test only on *-linux targets.
>   - Changed the dejagnu target from powerpc*-*-linux* to *-*-linux*.

I said in v2 that it is fine with that changed, so: okay for trunk :-)
Thanks!


Segher


Re: [PATCH 4/4] ipa-cp: Select saner profile count to base heuristics on

2021-10-06 Thread Jan Hubicka
> 2021-08-23  Martin Jambor  
> 
>   * params.opt (param_ipa_cp_profile_count_base): New parameter.
>   * ipa-cp.c (max_count): Replace with base_count, replace all
>   occurrences too, unless otherwise stated.
>   (ipcp_cloning_candidate_p): identify mostly-directly called
>   functions based on their counts, not max_count.
>   (compare_edge_profile_counts): New function.
>   (ipcp_propagate_stage): Instead of setting max_count, find the
>   appropriate edge count in a sorted vector of counts of eligible
>   edges and make it the base_count.
> ---
>  gcc/ipa-cp.c   | 82 +-
>  gcc/params.opt |  4 +++
>  2 files changed, 78 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 53cca7aa804..6ab74f61e83 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -400,9 +400,9 @@ object_allocator > 
> ipcp_sources_pool
>  object_allocator ipcp_agg_lattice_pool
>("IPA_CP aggregate lattices");
>  
> -/* Maximal count found in program.  */
> +/* Base count to use in heuristics when using profile feedback.  */
>  
> -static profile_count max_count;
> +static profile_count base_count;
>  
>  /* Original overall size of the program.  */
>  
> @@ -809,7 +809,8 @@ ipcp_cloning_candidate_p (struct cgraph_node *node)
>/* When profile is available and function is hot, propagate into it even if
>   calls seems cold; constant propagation can improve function's speed
>   significantly.  */
> -  if (max_count > profile_count::zero ())
> +  if (stats.count_sum > profile_count::zero ()
> +  && node->count.ipa ().initialized_p ())
>  {
>if (stats.count_sum > node->count.ipa ().apply_scale (90, 100))
>   {
> @@ -3310,10 +3311,10 @@ good_cloning_opportunity_p (struct cgraph_node *node, 
> sreal time_benefit,
>  
>ipa_node_params *info = ipa_node_params_sum->get (node);
>int eval_threshold = opt_for_fn (node->decl, param_ipa_cp_eval_threshold);
> -  if (max_count > profile_count::zero ())
> +  if (base_count > profile_count::zero ())
>  {
>  
> -  sreal factor = count_sum.probability_in (max_count).to_sreal ();
> +  sreal factor = count_sum.probability_in (base_count).to_sreal ();

If you have part of program built with -fprofile-use and part built without 
this will
disable cloning for functions called only from places w/o profile.
I think we want to count frequencies when ipa profile is uninitialized
and then pass the cloning oportunity if either count or freqs says it is
good idea.

>sreal evaluation = (time_benefit * factor) / size_cost;
>evaluation = incorporate_penalties (node, info, evaluation);
>evaluation *= 1000;
> @@ -3950,6 +3951,21 @@ value_topo_info::propagate_effects ()
>  }
>  }
>  
> +/* Callback for qsort to sort counts of all edges.  */
> +
> +static int
> +compare_edge_profile_counts (const void *a, const void *b)
> +{
> +  const profile_count *cnt1 = (const profile_count *) a;
> +  const profile_count *cnt2 = (const profile_count *) b;
> +
> +  if (*cnt1 < *cnt2)
> +return 1;
> +  if (*cnt1 > *cnt2)
> +return -1;
> +  return 0;
> +}
> +
>  
>  /* Propagate constants, polymorphic contexts and their effects from the
> summaries interprocedurally.  */
> @@ -3962,8 +3978,10 @@ ipcp_propagate_stage (class ipa_topo_info *topo)
>if (dump_file)
>  fprintf (dump_file, "\n Propagating constants:\n\n");
>  
> -  max_count = profile_count::uninitialized ();
> +  base_count = profile_count::uninitialized ();
>  
> +  bool compute_count_base = false;
> +  unsigned base_count_pos_percent = 0;
>FOR_EACH_DEFINED_FUNCTION (node)
>{
>  if (node->has_gimple_body_p ()
> @@ -3981,9 +3999,57 @@ ipcp_propagate_stage (class ipa_topo_info *topo)
>  ipa_size_summary *s = ipa_size_summaries->get (node);
>  if (node->definition && !node->alias && s != NULL)
>overall_size += s->self_size;
> -max_count = max_count.max (node->count.ipa ());
> +if (node->count.ipa ().initialized_p ())
> +  {
> + compute_count_base = true;
> + unsigned pos_percent = opt_for_fn (node->decl,
> +param_ipa_cp_profile_count_base);
> + base_count_pos_percent = MAX (base_count_pos_percent, pos_percent);
> +  }
>}
>  
> +  if (compute_count_base)
> +{
> +  auto_vec all_edge_counts;
> +  all_edge_counts.reserve_exact (symtab->edges_count);
> +  FOR_EACH_DEFINED_FUNCTION (node)
> + for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee)
> +   {
> + profile_count count = cs->count.ipa ();
> + if (!(count > profile_count::zero ()))
> +   continue;
> +
> + enum availability avail;
> + cgraph_node *tgt
> +   = cs->callee->function_or_virtual_thunk_symbol (&avail);
> + ipa_node_params *info = ipa_node_params_sum->get (tgt);
> + if (info && info->versionable)
> +   al

Re: [PATCH] openmp, fortran: Add support for declare variant in Fortran

2021-10-06 Thread Jakub Jelinek via Gcc-patches
On Wed, Oct 06, 2021 at 12:39:01PM +0100, Kwok Cheung Yeung wrote:
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -11599,8 +11599,11 @@ omp_construct_selector_matches (enum tree_code 
> *constructs, int nconstructs,
>   }
>  }
>if (!target_seen
> -  && lookup_attribute ("omp declare target block",
> -DECL_ATTRIBUTES (current_function_decl)))
> +  && (lookup_attribute ("omp declare target block",
> + DECL_ATTRIBUTES (current_function_decl))
> +   || (lang_GNU_Fortran ()
> +   && lookup_attribute ("omp declare target",
> +DECL_ATTRIBUTES (current_function_decl)
>  {
>if (scores)
>   codes.safe_push (OMP_TARGET);

So, as I wrote in the other mail, my preference would be to drop this hunk
and adjust testcases appropriately (perhaps with a comment).
The 5.1 way will then be different for all 3 languages and 5.2 way as well.

> +   const char *str = TREE_STRING_POINTER (TREE_VALUE (t2));
> +   if (!strcmp (str, props[i].props[j])
> +   && ((size_t) TREE_STRING_LENGTH (TREE_VALUE (t2))
> +   == strlen (str) + (lang_GNU_Fortran () ? 0 : 1)))

This is a little bit strange but if all identifiers from Fortran FE behave
that way and differently from C/C++ FEs, I guess we can live with that
(multiple occurrences thereof).

> +   DECL_ATTRIBUTES (base_fn_decl)
> + = tree_cons (
> + get_identifier ("omp declare variant base"),
> + build_tree_list (gfc_get_symbol_decl (variant_proc_sym),
> +  set_selectors),
> + DECL_ATTRIBUTES (base_fn_decl));

Perhaps that is just my private coding convention preference, but I really
don't like these calls with function (
on one line and less indented arguments on another one.  I'd find it more
readable to do use temporaries where possible,
  tree id = get_identifier ("omp declare variant base");
  tree var = gfc_get_symbol_decl (variant_proc_sym);
  DECL_ATTRIBUTES (base_fn_decl)
= tree_cons (id, build_tree_list (var, set_selectors),
 DECL_ATTRIBUTES (base_fn_decl));
is IMHO more radable and fits on fewer lines even.

Other than that the patch looks mostly good, what I miss in the testcases
though is Fortran specific stuff, e.g. I couldn't find a single testcase
that uses the
!$omp declare variant (procname:variantprocname) match (construct={parallel})
syntax and couldn't find testcase coverage or resolving of the Fortran
specific declare variant restrictions.
See OpenMP 5.0 [60:1-12], which includes
base-proc-name must not be a generic name, procedure pointer, or entry name.
and
If base-proc-name is omitted then the declare variant directive must appear in 
the
specification part of a subroutine subprogram or a function subprogram.
etc.  So unless I've completely missed that in the patch somewhere, please
try to add new testcases (i.e. with no c-c++-common counterparts) that test
all those restrictions in there, have one !$omp declare variant with
base-proc-name that is a generic name and dg-error for it, another one
for procedure pointer, another one for entry name, another one for
!$omp declare variant with base-proc-name omitted that appears where it
isn't allowed, some !$omp declare variant (both with and without
proc-base-name) that appears e.g. in execution part, etc.
My Fortran knowledge is rusty, but I hope Tobias could help there if needed,
and if some of the restrictions make no sense, look at what has changed in
5.1 or 5.2 current state if it hasn't been clarified.

Jakub



Re: [PATCH 2/4] ipa-cp: Propagation boost for recursion generated values

2021-10-06 Thread Jan Hubicka
> Recursive call graph edges, even when they are hot and important for
> the compiled program, can never have frequency bigger than one, even
> when the actual time savings in the next recursion call are not
> realized just once but depend on the depth of recursion.  The current
> IPA-CP effect propagation code did not take that into account and just
> used the frequency, thus severely underestimating the effect.
> 
> This patch artificially boosts values taking part in such calls.  If a
> value feeds into itself through a recursive call, the frequency of the
> edge is multiplied by a parameter with default value of 6, basically
> assuming that the recursion will take place 6 times.  This value can
> of course be subject to change.
> 
> Moreover, values which do not feed into themselves but which were
> generated for a self-recursive call with an arithmetic
> pass-function (aka the 548.exchange "hack" which however is generally
> applicable for recursive functions which count the recursion depth in
> a parameter) have the edge frequency multiplied as many times as there
> are generated values in the chain.  In essence, we will assume they
> are all useful.
> 
> This patch partially fixes the current situation when we fail to
> optimize 548.exchange with PGO.  In the benchmark one recursive edge
> count overwhelmingly dominates all other counts in the program and so
> we fail to perform the first cloning (for the nonrecursive entry call)
> because it looks totally insignificant.
> 
> gcc/ChangeLog:
> 
> 2021-07-16  Martin Jambor  
> 
>   * params.opt (ipa-cp-recursive-freq-factor): New.
>   * ipa-cp.c (ipcp_value): Switch to inline initialization.  New members
>   scc_no, self_recursion_generated_level, same_scc and
>   self_recursion_generated_p.
>   (ipcp_lattice::add_value): Replaced parameter unlimited with
>   same_lat_gen_level, usit it determine limit of values and store it to
>   the value.
>   (ipcp_lattice::print): Dump the new fileds.
>   (allocate_and_init_ipcp_value): Take same_lat_gen_level as a new
>   parameter and store it to the new value.
>   (self_recursively_generated_p): Removed.
>   (propagate_vals_across_arith_jfunc): Use self_recursion_generated_p
>   instead of self_recursively_generated_p, store self generation level
>   to such values.
>   (value_topo_info::add_val): Set scc_no.
>   (value_topo_info::propagate_effects): Multiply frequencies of
>   recursively feeding values and self generated values by appropriate
>   new factors.

If you boost every self fed value by factor of 6, I wonder how quickly
we run into exponential explosion of the cost (since the frequency
should be close to 1 and 6^9=10077696

I think it would be more robust to simply assume that the job will
distribute evenly across the clones.  How hard is to implement that?

Honza
> ---
>  gcc/ipa-cp.c   | 161 -
>  gcc/params.opt |   4 ++
>  2 files changed, 84 insertions(+), 81 deletions(-)
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 55b9216337f..b987d975793 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -184,30 +184,52 @@ public:
>/* The actual value for the given parameter.  */
>valtype value;
>/* The list of sources from which this value originates.  */
> -  ipcp_value_source  *sources;
> +  ipcp_value_source  *sources = nullptr;
>/* Next pointers in a linked list of all values in a lattice.  */
> -  ipcp_value *next;
> +  ipcp_value *next = nullptr;
>/* Next pointers in a linked list of values in a strongly connected 
> component
>   of values. */
> -  ipcp_value *scc_next;
> +  ipcp_value *scc_next = nullptr;
>/* Next pointers in a linked list of SCCs of values sorted topologically
>   according their sources.  */
> -  ipcp_value  *topo_next;
> +  ipcp_value  *topo_next = nullptr;
>/* A specialized node created for this value, NULL if none has been (so 
> far)
>   created.  */
> -  cgraph_node *spec_node;
> +  cgraph_node *spec_node = nullptr;
>/* Depth first search number and low link for topological sorting of
>   values.  */
> -  int dfs, low_link;
> +  int dfs = 0;
> +  int low_link = 0;
> +  /* SCC number to identify values which recursively feed into each other.
> + Values in the same SCC have the same SCC number.  */
> +  int scc_no = 0;
> +  /* Non zero if the value is generated from another value in the same 
> lattice
> + for a self-recursive call, the actual number is how many times the
> + operation has been performed.  In the unlikely event of the value being
> + present in two chains fo self-recursive value generation chains, it is 
> the
> + maximum.  */
> +  unsigned self_recursion_generated_level = 0;
>/* True if this value is currently on the topo-sort stack.  */
> -  bool on_stack;
> -
> -  ipcp_value()
> -: sources (0), next (0), scc_next (0), topo_next (0),
> -  spec_node

Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-06 Thread Aldy Hernandez via Gcc-patches
On Wed, Oct 6, 2021 at 5:03 PM Martin Sebor  wrote:
>
> On 10/6/21 7:47 AM, Aldy Hernandez via Gcc-patches wrote:

> -Wmaybe-uninitialized certainly suffers from a high rate of false
> positives (I count 40 open PRs).  Even after all this time debugging
> it I still struggle with the code for it but in all the bug reports
> I've reviewed, only a small subset seems to be due to bugs or even
> limitations in it.  Most are inherent in its design goal to warn
> for reads from variables that cannot be proven to have been
> initialized.
>
> If this one is a bug with some chance of getting fixed it would
> be good to distill it to a small test case (even though it's not
> unlikely that there already is an existing report with the same
> root cause).
>
> That said, from from your description and the IL above it sounds
> to me like the uninitialized read here is possible (it takes place
> when _881 != 0), and so -Wmaybe-uininitialized triggers as it's
> supposed to.
>
> Either way, rather than suppressing the warning for the whole file
> I would suggest to consider initializing the local variable instead.
> Looking at the code in calls.c, I can't convince myself that
> the variable cannot, in fact, be used uninitialized.
>
> Martin

FWIW, libgomp/team.c suffers from the same problem if you remove the
stop-gap in gomp_team_start():

  struct gomp_thread_start_data *start_data = NULL;

The use of start_data in the block predicated by "if (nested)" is the
only path that can use start_data without passing through its
initialization.  But the only way to elide the initialization is from:

if (!nested)
{
  ...
 
  if (i == nthreads)
goto do_release;
}

So the use of start_data either crosses the gomp_alloca
initialization, or is predicated by if(nested).

And the gimple shows a very similar pattern to what I described earlier:

   [local count: 59055800]:
  # start_data_876 = PHI 
do_release:
  if (_1 != 0)
goto ; [55.90%]
  else
goto ; [44.10%]

   [local count: 26046541]:
  goto ; [100.00%]

   [local count: 33009259]:
  _223 = &team_417(D)->barrier;
  gomp_barrier_wait (_223);
  goto ; [100.00%]

   [local count: 6962719]:

   [local count: 33009259]:
  # start_data_781 = PHI 
 # old_threads_used_887 = PHI 
  # affinity_count_825 = PHI 
  # affinity_thr_904 = PHI 
  # force_display_840 = PHI 
  _589 = &MEM[(struct gomp_simple_barrier_t *)pool_410 + 64B].bar;
  gomp_barrier_wait (_589);

   [local count: 66018519]:
  # start_data_870 = PHI 

The use of start_data_781 coming in from  BB265 (start_data_518(D))
would be uninitialized if the read wasn't guarded by "if (_1 != 0)".

It seems uninit has issues seeing this pattern.

Unfortunately reducing this has been, ahem...challenging, but the
problem is still there, both within calls.c on aarch64 and on x86-64
for libgomp/team.c.

Aldy



Re: [PATCH] Properly parse invariant &MEM addresses in the GIMPLE FE

2021-10-06 Thread Joseph Myers
On Wed, 6 Oct 2021, Richard Biener via Gcc-patches wrote:

> Currently the frontend rejects those addresses as not lvalues
> because the C frontend doens't expect MEM_REF or TARGET_MEM_REF
> to appear (but they would be valid lvalues there).  The following
> fixes that by amending lvalue_p.
> 
> The change also makes the dumping of the source of the testcase
> valid for the GIMPLE FE by not eliding the '&' when dumping
> string literals.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Joseph, is the C frontend change OK?

Yes.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] c++: odr-use argument to a function NTTP [PR53164]

2021-10-06 Thread Patrick Palka via Gcc-patches
On Tue, 5 Oct 2021, Jason Merrill wrote:

> On 10/5/21 15:17, Patrick Palka wrote:
> > On Mon, 4 Oct 2021, Patrick Palka wrote:
> > 
> > > When passing a function template as the argument to a function NTTP
> > > inside a template, we resolve it to the right specialization ahead of
> > > time via resolve_address_of_overloaded_function, though the call to
> > > mark_used within defers odr-using it until instantiation time (as usual).
> > > But at instantiation time we end up never calling mark_used on the
> > > specialization.
> > > 
> > > This patch fixes this by adding a call to mark_used in
> > > convert_nontype_argument_function.
> > > 
> > >   PR c++/53164
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * pt.c (convert_nontype_argument_function): Call mark_used.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/template/non-dependent16.C: New test.
> > > ---
> > >   gcc/cp/pt.c |  3 +++
> > >   gcc/testsuite/g++.dg/template/non-dependent16.C | 16 
> > >   2 files changed, 19 insertions(+)
> > >   create mode 100644 gcc/testsuite/g++.dg/template/non-dependent16.C
> > > 
> > > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > > index f950f4a21b7..5e819c9598c 100644
> > > --- a/gcc/cp/pt.c
> > > +++ b/gcc/cp/pt.c
> > > @@ -6668,6 +6668,9 @@ convert_nontype_argument_function (tree type, tree
> > > expr,
> > > return NULL_TREE;
> > >   }
> > >   +  if (!mark_used (fn_no_ptr, complain) && !(complain & tf_error))
> > > +return NULL_TREE;
> > > +
> > > linkage = decl_linkage (fn_no_ptr);
> > > if (cxx_dialect >= cxx11 ? linkage == lk_none : linkage !=
> > > lk_external)
> > >   {
> > > diff --git a/gcc/testsuite/g++.dg/template/non-dependent16.C
> > > b/gcc/testsuite/g++.dg/template/non-dependent16.C
> > > new file mode 100644
> > > index 000..b7dca8f6752
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/template/non-dependent16.C
> > > @@ -0,0 +1,16 @@
> > > +// PR c++/53164
> > > +
> > > +template
> > > +void f(T) {
> > > +  T::fail; // { dg-error "not a member" }
> > > +}
> > > +
> > > +template
> > > +struct A { };
> > > +
> > > +template
> > > +void g() {
> > > +  A a;
> > > +}
> > 
> > I should mention that the original testcase in the PR was slightly
> > different than this one in that it also performed a call to the NTTP,
> > e.g.
> > 
> >template
> >struct A {
> >  static void h() {
> >p(0);
> >  }
> >};
> > 
> >template
> >void g() {
> >  A::h();
> >}
> > 
> >templated void g<0>();
> > 
> > and not even the call was enough to odr-use f, apparently because the
> > CALL_EXPR case of tsubst_expr calls mark_used on the callee only when
> > it's a FUNCTION_DECL, but in this case after substitution it's an
> > ADDR_EXPR of a FUNCTION_DECL.  Fixing this by looking through the ADDR_EXPR
> > worked, but IIUC the call isn't necessary for f to be odr-used, simply
> > using f as a template argument should be sufficient, so it seems the
> > above is better fix.
> 
> I agree that pedantically the use happens when substituting into the use of
> A, but convert_nontype_argument_function seems like a weird place to
> implement that; it's only called again during instantiation of A, when we
> instantiate the injected-class-name.  If A isn't instantiated, e.g. if 'a'
> is a pointer to A, we again don't instantiate f.

I see, makes sense..  I'm not sure where else we can mark the use, then.
Since we resolve the OVERLOAD f to the FUNCTION_DECL f ahead of
time (during which mark_used doesn't actually instantiate f because
we're inside a template), at instantiation time the type A is already
non-dependent so tsubst_aggr_type avoids doing the work that would end
up calling convert_nontype_argument_function.

> 
> I see that clang doesn't reject your testcase, either, but MSVC and icc do
> (even with 'a' a pointer): https://godbolt.org/z/MGE6TcMch

FWIW although Clang doesn't reject 'A a;', it does reject
'using type = A;' weirdly enough: https://godbolt.org/z/T9qEn6bWW


Shall we just go with the other more specific approach, that makes sure
the CALL_EXPR case of tsubst_expr calls mark_used when the callee is an
ADDR_EXPR?  Something like (bootstrapped and regtested):

-- >8 --

PR c++/53164

gcc/cp/ChangeLog:

* pt.c (tsubst_copy_and_build) : Look through
ADDR_EXPR after substituting into the callee.

gcc/testsuite/ChangeLog:

* g++.dg/template/fn-ptr3.C: New test.
---
 gcc/cp/pt.c |  4 
 gcc/testsuite/g++.dg/template/fn-ptr3.C | 20 
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/fn-ptr3.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 19e03369ffa..5af3a6472f8 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -20310,6 +20310,10 @@ tsubst_copy_and_build (tree t,
 
if (BASELINK_P (function))
  qualified_p = true;
+
+   if (TREE_CODE (functi

Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-06 Thread Aldy Hernandez via Gcc-patches
FWIW, I've opened a PR with both testcases:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102631



Re: [committed] libstdc++: Specialize std::pointer_traits<__normal_iterator>

2021-10-06 Thread François Dumont via Gcc-patches

Here is another proposal with the __to_address overload.

I preferred to let it open to any kind of __normal_iterator 
instantiation cause afaics std::vector supports fancy pointer types. It 
is better if __to_address works fine also in this case, no ?


    libstdc++: Overload std::__to_address for __gnu_cxx::__normal_iterator.

    Prefer to overload __to_address to partially specialize 
std::pointer_traits because
    std::pointer_traits would be mostly useless. In the case of 
__gnu_debug::_Safe_iterator
    the to_pointer method is even impossible to implement correctly 
because we are missing

    the parent container to associate the iterator to.

    libstdc++-v3/ChangeLog:

    * include/bits/stl_iterator.h
(std::pointer_traits<__gnu_cxx::__normal_iterator<>>): Remove.
    (std::__to_address(const __gnu_cxx::__normal_iterator<>&)): 
New.

    * include/debug/safe_iterator.h
    (std::__to_address(const 
__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<>, _Sequence>&)):

    New.
    * testsuite/24_iterators/normal_iterator/to_address.cc: Add 
check on std::vector::iterator
    to validate both __gnu_cxx::__normal_iterator<> 
__to_address overload in normal mode and the


Tested under Linux x86_64.

Ok to commit ?

François


On 04/10/21 10:30 pm, Jonathan Wakely wrote:

On Mon, 4 Oct 2021 at 21:28, François Dumont via Libstdc++
  wrote:

On 04/10/21 10:05 pm, François Dumont wrote:

On 02/10/21 10:24 pm, Jonathan Wakely wrote:

On Sat, 2 Oct 2021 at 18:27, François Dumont wrote:

I would like to propose this alternative approach.

In this patch I make __normal_iterator and random iterator
_Safe_iterator compatible for pointer_traits primary template.

Regarding pointer_traits I wonder if it shouldn't check for the
to_pointer method availability and use per default: return {
std::addressof(__e) }; otherwise. This way we wouldn't have to
provide a
pointer_to method on __normal_iterator.

But I would rather not have these members present in vector::iterator
and string::iterator, in case users accidentally start to rely on them
being present.

Making pointer_traits friends would help but I do not like it neither.



Another option would be to overload std::__to_address so it knows how
to get the address from __normal_iterator and _Safe_iterator.

.

I start thinking that rather than proposing not-useful and even
incorrect code in the case of the _Safe_iterator<> it might be a
better approach.

Even the rebind for __normal_iterator is a little strange because when
doing rebind on std::vector::iterator for long it produces
__normal_iterator>, quite inconsistent even if
useless.


diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h
index 8afd6756613..6b915ec011c 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1312,32 +1312,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if __cplusplus >= 201103L
 
-  // Need to specialize pointer_traits because the primary template will
-  // deduce element_type of __normal_iterator as T* rather than T.
+  // Need to overload __to_address because the pointer_traits primary template
+  // will deduce element_type of __normal_iterator as T* rather than T.
   template
-struct pointer_traits<__gnu_cxx::__normal_iterator<_Iterator, _Container>>
-{
-private:
-  using _Base = pointer_traits<_Iterator>;
-
-public:
-  using element_type = typename _Base::element_type;
-  using pointer = __gnu_cxx::__normal_iterator<_Iterator, _Container>;
-  using difference_type = typename _Base::difference_type;
-
-  template
-	using rebind = __gnu_cxx::__normal_iterator<_Tp, _Container>;
-
-  static pointer
-  pointer_to(element_type& __e) noexcept
-  { return pointer(_Base::pointer_to(__e)); }
-
-#if __cplusplus >= 202002L
-  static element_type*
-  to_address(pointer __p) noexcept
-  { return __p.base(); }
-#endif
-};
+constexpr auto
+__to_address(const __gnu_cxx::__normal_iterator<_Iterator,
+		_Container>& __it) noexcept
+-> decltype(std::__to_address(__it.base()))
+{ return std::__to_address(__it.base()); }
 
   /**
* @addtogroup iterators
diff --git a/libstdc++-v3/include/debug/safe_iterator.h b/libstdc++-v3/include/debug/safe_iterator.h
index 5584d06de5a..09e35f79067 100644
--- a/libstdc++-v3/include/debug/safe_iterator.h
+++ b/libstdc++-v3/include/debug/safe_iterator.h
@@ -1018,6 +1018,23 @@ namespace __gnu_debug
 #undef _GLIBCXX_DEBUG_VERIFY_EQ_OPERANDS
 #undef _GLIBCXX_DEBUG_VERIFY_OPERANDS
 
+#if __cplusplus >= 201103L
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  template
+constexpr auto
+__to_address(const __gnu_debug::_Safe_iterator<
+		 __gnu_cxx::__normal_iterator<_Iterator, _Container>,
+		 _Sequence>& __it) noexcept
+-> decltype(std::__to_address(__it.base()))
+{ return s

Re: [committed] libstdc++: Specialize std::pointer_traits<__normal_iterator>

2021-10-06 Thread François Dumont via Gcc-patches

I forgot to ask if with this patch this overload:

  template
    constexpr auto
    __to_address(const _Ptr& __ptr, _None...) noexcept
    {
  if constexpr (is_base_of_v<__gnu_debug::_Safe_iterator_base, _Ptr>)
    return std::__to_address(__ptr.base().operator->());
  else
    return std::__to_address(__ptr.operator->());
    }

should be removed ?

Or perhaps just the _Safe_iterator_base branch in it ?

On 06/10/21 7:18 pm, François Dumont wrote:

Here is another proposal with the __to_address overload.

I preferred to let it open to any kind of __normal_iterator 
instantiation cause afaics std::vector supports fancy pointer types. 
It is better if __to_address works fine also in this case, no ?


    libstdc++: Overload std::__to_address for 
__gnu_cxx::__normal_iterator.


    Prefer to overload __to_address to partially specialize 
std::pointer_traits because
    std::pointer_traits would be mostly useless. In the case of 
__gnu_debug::_Safe_iterator
    the to_pointer method is even impossible to implement correctly 
because we are missing

    the parent container to associate the iterator to.

    libstdc++-v3/ChangeLog:

    * include/bits/stl_iterator.h
(std::pointer_traits<__gnu_cxx::__normal_iterator<>>): Remove.
    (std::__to_address(const 
__gnu_cxx::__normal_iterator<>&)): New.

    * include/debug/safe_iterator.h
    (std::__to_address(const 
__gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<>, 
_Sequence>&)):

    New.
    * testsuite/24_iterators/normal_iterator/to_address.cc: 
Add check on std::vector::iterator
    to validate both __gnu_cxx::__normal_iterator<> 
__to_address overload in normal mode and the


Tested under Linux x86_64.

Ok to commit ?

François


On 04/10/21 10:30 pm, Jonathan Wakely wrote:

On Mon, 4 Oct 2021 at 21:28, François Dumont via Libstdc++
  wrote:

On 04/10/21 10:05 pm, François Dumont wrote:

On 02/10/21 10:24 pm, Jonathan Wakely wrote:

On Sat, 2 Oct 2021 at 18:27, François Dumont wrote:

I would like to propose this alternative approach.

In this patch I make __normal_iterator and random iterator
_Safe_iterator compatible for pointer_traits primary template.

Regarding pointer_traits I wonder if it shouldn't check for the
to_pointer method availability and use per default: return {
std::addressof(__e) }; otherwise. This way we wouldn't have to
provide a
pointer_to method on __normal_iterator.

But I would rather not have these members present in vector::iterator
and string::iterator, in case users accidentally start to rely on 
them

being present.

Making pointer_traits friends would help but I do not like it neither.



Another option would be to overload std::__to_address so it knows how
to get the address from __normal_iterator and _Safe_iterator.

.

I start thinking that rather than proposing not-useful and even
incorrect code in the case of the _Safe_iterator<> it might be a
better approach.

Even the rebind for __normal_iterator is a little strange because when
doing rebind on std::vector::iterator for long it produces
__normal_iterator>, quite inconsistent even if
useless.






*PING 3* [PATCH] doc: improve -fsanitize=undefined description

2021-10-06 Thread Diane Meirowitz via Gcc-patches
Please review my small doc patch. Thank you.

Diane

On 9/15/21, 5:02 PM, "Diane Meirowitz"  
wrote:


doc: improve -fsanitize=undefined description

gcc/ChangeLog:
* doc/invoke.texi: add link to 
UndefinedBehaviorSanitizer documentation,
mention UBSAN_OPTIONS, similar to what is done for 
AddressSanitizer.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 78cfc100ac2..f022885edf8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15200,7 +15200,8 @@ The option cannot be combined with 
@option{-fsanitize=thread}.
@opindex fsanitize=undefined
Enable UndefinedBehaviorSanitizer, a fast undefined behavior 
detector.
Various computations are instrumented to detect undefined behavior
-at runtime.  Current suboptions are:
+at runtime.  See 
@uref{https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html} for more 
details.   The run-time behavior can be influenced using the
+@env{UBSAN_OPTIONS} environment variable.  Current suboptions are:

@table @gcctabopt







[PATCH] libsanitizer: Add AM_CCASFLAGS to Makefile.am

2021-10-06 Thread H.J. Lu via Gcc-patches
commit 9069eb28d45baaa8baf5e3790b03b0e2cc5b49b3
Author: Igor Tsimbalist 
Date:   Fri Nov 17 22:34:50 2017 +0100

Enable building libsanitizer with Intel CET

libsanitizer/
* acinclude.m4: Add enable.m4 and cet.m4.
* Makefile.in: Regenerate.
* asan/Makefile.am: Update AM_CXXFLAGS.
* asan/Makefile.in: Regenerate.
* configure: Likewise.
* configure.ac: Set CET_FLAGS. Update EXTRA_CFLAGS,
EXTRA_CXXFLAGS, EXTRA_ASFLAGS.
* interception/Makefile.am: Update AM_CXXFLAGS.
* interception/Makefile.in: Regenerate.
* libbacktrace/Makefile.am: Update AM_CFLAGS, AM_CXXFLAGS.
* libbacktrace/Makefile.in: Regenerate.
* lsan/Makefile.am: Update AM_CXXFLAGS.
* lsan/Makefile.in: Regenerate.
* sanitizer_common/Makefile.am: Update AM_CXXFLAGS,
AM_CCASFLAGS.
* sanitizer_common/sanitizer_linux_x86_64.S: Include cet.h.
Add _CET_ENDBR macro.
* sanitizer_common/Makefile.in: Regenerate.
* tsan/Makefile.am: Update AM_CXXFLAGS.
* tsan/Makefile.in: Regenerate.
* tsan/tsan_rtl_amd64.S Include cet.h. Add _CET_ENDBR macro.
* ubsan/Makefile.am: Update AM_CXXFLAGS.
* ubsan/Makefile.in: Regenerate.

failed to add EXTRA_ASFLAGS to AM_CCASFLAGS in all Makefile.am.  As
the result, CET aren't enabled in all assembly codes.

Add AM_CCASFLAGS to Makefile.am to compile assembly codes with $CET_FLAGS.

PR sanitizer/102632
* asan/Makefile.am (AM_CCASFLAGS): New.  Set to $(EXTRA_ASFLAGS).
* hwasan/Makefile.am (AM_CCASFLAGS): Likewise.
* interception/Makefile.am (AM_CCASFLAGS): Likewise.
* lsan/Makefile.am (AM_CCASFLAGS): Likewise.
* tsan/Makefile.am (AM_CCASFLAGS): Likewise.
* usan/Makefile.am (AM_CCASFLAGS): Likewise.
* asan/Makefile.in: Regenerate.
* hwasan/Makefile.in: Likewise.
* interception/Makefile.in: Likewise.
* lsan/Makefile.in: Likewise.
* tsan/Makefile.in: Likewise.
* usan/Makefile.in: Likewise.
---
 libsanitizer/asan/Makefile.am | 1 +
 libsanitizer/asan/Makefile.in | 1 +
 libsanitizer/hwasan/Makefile.am   | 1 +
 libsanitizer/hwasan/Makefile.in   | 1 +
 libsanitizer/interception/Makefile.am | 1 +
 libsanitizer/interception/Makefile.in | 1 +
 libsanitizer/lsan/Makefile.am | 1 +
 libsanitizer/lsan/Makefile.in | 1 +
 libsanitizer/tsan/Makefile.am | 1 +
 libsanitizer/tsan/Makefile.in | 1 +
 libsanitizer/ubsan/Makefile.am| 1 +
 libsanitizer/ubsan/Makefile.in| 1 +
 12 files changed, 12 insertions(+)

diff --git a/libsanitizer/asan/Makefile.am b/libsanitizer/asan/Makefile.am
index 74658ca7b9c..4f802f723d6 100644
--- a/libsanitizer/asan/Makefile.am
+++ b/libsanitizer/asan/Makefile.am
@@ -11,6 +11,7 @@ AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings 
-pedantic -Wno-long
 AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
 AM_CXXFLAGS += -std=gnu++14
 AM_CXXFLAGS += $(EXTRA_CXXFLAGS)
+AM_CCASFLAGS = $(EXTRA_ASFLAGS)
 ACLOCAL_AMFLAGS = -I $(top_srcdir) -I $(top_srcdir)/config
 
 toolexeclib_LTLIBRARIES = libasan.la
diff --git a/libsanitizer/asan/Makefile.in b/libsanitizer/asan/Makefile.in
index 53efe526f9c..528ab61312c 100644
--- a/libsanitizer/asan/Makefile.in
+++ b/libsanitizer/asan/Makefile.in
@@ -421,6 +421,7 @@ AM_CXXFLAGS = -Wall -W -Wno-unused-parameter 
-Wwrite-strings -pedantic \
-fomit-frame-pointer -funwind-tables -fvisibility=hidden \
-Wno-variadic-macros -fno-ipa-icf \
$(LIBSTDCXX_RAW_CXX_CXXFLAGS) -std=gnu++14 $(EXTRA_CXXFLAGS)
+AM_CCASFLAGS = $(EXTRA_ASFLAGS)
 ACLOCAL_AMFLAGS = -I $(top_srcdir) -I $(top_srcdir)/config
 toolexeclib_LTLIBRARIES = libasan.la
 nodist_toolexeclib_HEADERS = libasan_preinit.o
diff --git a/libsanitizer/hwasan/Makefile.am b/libsanitizer/hwasan/Makefile.am
index cfc1bfe8f01..e12c0a0ce71 100644
--- a/libsanitizer/hwasan/Makefile.am
+++ b/libsanitizer/hwasan/Makefile.am
@@ -8,6 +8,7 @@ AM_CXXFLAGS = -Wall -W -Wno-unused-parameter -Wwrite-strings 
-pedantic -Wno-long
 AM_CXXFLAGS += $(LIBSTDCXX_RAW_CXX_CXXFLAGS)
 AM_CXXFLAGS += -std=gnu++14
 AM_CXXFLAGS += $(EXTRA_CXXFLAGS)
+AM_CCASFLAGS = $(EXTRA_ASFLAGS)
 ACLOCAL_AMFLAGS = -I $(top_srcdir) -I $(top_srcdir)/config
 
 toolexeclib_LTLIBRARIES = libhwasan.la
diff --git a/libsanitizer/hwasan/Makefile.in b/libsanitizer/hwasan/Makefile.in
index f63670b50d1..1729349e682 100644
--- a/libsanitizer/hwasan/Makefile.in
+++ b/libsanitizer/hwasan/Makefile.in
@@ -409,6 +409,7 @@ AM_CXXFLAGS = -Wall -W -Wno-unused-parameter 
-Wwrite-strings -pedantic \
-funwind-tables -fvisibility=hidden -Wno-variadic-macros \
-fno-ipa-icf $(LIBSTDCXX_RAW_CXX_CXXFLAGS) -std=gnu++14 \
$(EXTRA_CXXFLAGS)
+AM_CCASFLAGS = $(EXTRA_ASFLAGS)
 ACLOCAL_AMFLAGS = -I $(top_srcdir) -I $(top_srcdir)/config
 

Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-06 Thread Martin Sebor via Gcc-patches

On 10/6/21 11:03 AM, Aldy Hernandez wrote:

FWIW, I've opened a PR with both testcases:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102631



Okay, thanks.  I see what you mean that reducing it to a test case
is challenging.  I'll see if I can find some time to distill it to
something useful.  Until then, to get past the warning, I recommend
to simply initialize the variable.  For simple scalars there's little
reason not to.

Martin


[committed] libstdc++: Implement std::move_only_function for C++23 (P0288R9)

2021-10-06 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/std/functional: Include .
* include/std/version (__cpp_lib_move_only_function): Define.
* include/bits/mofunc_impl.h: New file.
* include/bits/move_only_function.h: New file.
* testsuite/20_util/move_only_function/call.cc: New test.
* testsuite/20_util/move_only_function/cons.cc: New test.
* testsuite/20_util/move_only_function/move.cc: New test.
* testsuite/20_util/move_only_function/version.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit 881d1689a42cc7a1fd63bde53c883e52a56eded3
Author: Jonathan Wakely 
Date:   Wed Oct 6 13:28:57 2021

libstdc++: Implement std::move_only_function for C++23 (P0288R9)

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/std/functional: Include .
* include/std/version (__cpp_lib_move_only_function): Define.
* include/bits/mofunc_impl.h: New file.
* include/bits/move_only_function.h: New file.
* testsuite/20_util/move_only_function/call.cc: New test.
* testsuite/20_util/move_only_function/cons.cc: New test.
* testsuite/20_util/move_only_function/move.cc: New test.
* testsuite/20_util/move_only_function/version.cc: New test.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 5a5b5fd1d1b..27b548607b9 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -154,7 +154,9 @@ bits_headers = \
${bits_srcdir}/mask_array.h \
${bits_srcdir}/max_size_type.h \
${bits_srcdir}/memoryfwd.h \
+   ${bits_srcdir}/mofunc_impl.h \
${bits_srcdir}/move.h \
+   ${bits_srcdir}/move_only_function.h \
${bits_srcdir}/node_handle.h \
${bits_srcdir}/ostream.tcc \
${bits_srcdir}/ostream_insert.h \
diff --git a/libstdc++-v3/include/bits/mofunc_impl.h 
b/libstdc++-v3/include/bits/mofunc_impl.h
new file mode 100644
index 000..543c6f547b7
--- /dev/null
+++ b/libstdc++-v3/include/bits/mofunc_impl.h
@@ -0,0 +1,200 @@
+// Implementation of std::move_only_function -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file include/bits/mofunc_impl.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{functional}
+ */
+
+#ifndef _GLIBCXX_MOF_CV
+# define _GLIBCXX_MOF_CV
+#endif
+
+#ifdef _GLIBCXX_MOF_REF
+# define _GLIBCXX_MOF_INV_QUALS _GLIBCXX_MOF_CV _GLIBCXX_MOF_REF
+#else
+# define _GLIBCXX_MOF_REF
+# define _GLIBCXX_MOF_INV_QUALS _GLIBCXX_MOF_CV &
+#endif
+
+#define _GLIBCXX_MOF_CV_REF _GLIBCXX_MOF_CV _GLIBCXX_MOF_REF
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  template
+class move_only_function<_Res(_ArgTypes...) _GLIBCXX_MOF_CV
+  _GLIBCXX_MOF_REF noexcept(_Noex)>
+: _Mofunc_base
+{
+  template
+   using __callable
+ = __conditional_t<_Noex,
+   is_nothrow_invocable_r<_Res, _Tp, _ArgTypes...>,
+   is_invocable_r<_Res, _Tp, _ArgTypes...>>;
+
+  // [func.wrap.mov.con]/1 is-callable-from
+  template
+   static constexpr bool __is_callable_from
+ = __and_v<__callable<_Vt _GLIBCXX_MOF_CV_REF>,
+   __callable<_Vt _GLIBCXX_MOF_INV_QUALS>>;
+
+public:
+  using result_type = _Res;
+
+  move_only_function() noexcept { }
+
+  move_only_function(nullptr_t) noexcept { }
+
+  move_only_function(move_only_function&& __x) noexcept
+  : _Mofunc_base(static_cast<_Mofunc_base&&>(__x)),
+   _M_invoke(std::__exchange(__x._M_invoke, nullptr))
+  { }
+
+  template>
+   requires (!is_same_v<_Vt, move_only_function>

Re: [PATCH] c++: odr-use argument to a function NTTP [PR53164]

2021-10-06 Thread Patrick Palka via Gcc-patches
On Wed, 6 Oct 2021, Patrick Palka wrote:

> On Tue, 5 Oct 2021, Jason Merrill wrote:
> 
> > On 10/5/21 15:17, Patrick Palka wrote:
> > > On Mon, 4 Oct 2021, Patrick Palka wrote:
> > > 
> > > > When passing a function template as the argument to a function NTTP
> > > > inside a template, we resolve it to the right specialization ahead of
> > > > time via resolve_address_of_overloaded_function, though the call to
> > > > mark_used within defers odr-using it until instantiation time (as 
> > > > usual).
> > > > But at instantiation time we end up never calling mark_used on the
> > > > specialization.
> > > > 
> > > > This patch fixes this by adding a call to mark_used in
> > > > convert_nontype_argument_function.
> > > > 
> > > > PR c++/53164
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * pt.c (convert_nontype_argument_function): Call mark_used.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/template/non-dependent16.C: New test.
> > > > ---
> > > >   gcc/cp/pt.c |  3 +++
> > > >   gcc/testsuite/g++.dg/template/non-dependent16.C | 16 
> > > >   2 files changed, 19 insertions(+)
> > > >   create mode 100644 gcc/testsuite/g++.dg/template/non-dependent16.C
> > > > 
> > > > diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> > > > index f950f4a21b7..5e819c9598c 100644
> > > > --- a/gcc/cp/pt.c
> > > > +++ b/gcc/cp/pt.c
> > > > @@ -6668,6 +6668,9 @@ convert_nontype_argument_function (tree type, tree
> > > > expr,
> > > > return NULL_TREE;
> > > >   }
> > > >   +  if (!mark_used (fn_no_ptr, complain) && !(complain & tf_error))
> > > > +return NULL_TREE;
> > > > +
> > > > linkage = decl_linkage (fn_no_ptr);
> > > > if (cxx_dialect >= cxx11 ? linkage == lk_none : linkage !=
> > > > lk_external)
> > > >   {
> > > > diff --git a/gcc/testsuite/g++.dg/template/non-dependent16.C
> > > > b/gcc/testsuite/g++.dg/template/non-dependent16.C
> > > > new file mode 100644
> > > > index 000..b7dca8f6752
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/g++.dg/template/non-dependent16.C
> > > > @@ -0,0 +1,16 @@
> > > > +// PR c++/53164
> > > > +
> > > > +template
> > > > +void f(T) {
> > > > +  T::fail; // { dg-error "not a member" }
> > > > +}
> > > > +
> > > > +template
> > > > +struct A { };
> > > > +
> > > > +template
> > > > +void g() {
> > > > +  A a;
> > > > +}
> > > 
> > > I should mention that the original testcase in the PR was slightly
> > > different than this one in that it also performed a call to the NTTP,
> > > e.g.
> > > 
> > >template
> > >struct A {
> > >  static void h() {
> > >p(0);
> > >  }
> > >};
> > > 
> > >template
> > >void g() {
> > >  A::h();
> > >}
> > > 
> > >templated void g<0>();
> > > 
> > > and not even the call was enough to odr-use f, apparently because the
> > > CALL_EXPR case of tsubst_expr calls mark_used on the callee only when
> > > it's a FUNCTION_DECL, but in this case after substitution it's an
> > > ADDR_EXPR of a FUNCTION_DECL.  Fixing this by looking through the 
> > > ADDR_EXPR
> > > worked, but IIUC the call isn't necessary for f to be odr-used, simply
> > > using f as a template argument should be sufficient, so it seems the
> > > above is better fix.
> > 
> > I agree that pedantically the use happens when substituting into the use of
> > A, but convert_nontype_argument_function seems like a weird place to
> > implement that; it's only called again during instantiation of A, when we
> > instantiate the injected-class-name.  If A isn't instantiated, e.g. if 
> > 'a'
> > is a pointer to A, we again don't instantiate f.
> 
> I see, makes sense..  I'm not sure where else we can mark the use, then.
> Since we resolve the OVERLOAD f to the FUNCTION_DECL f ahead of
> time (during which mark_used doesn't actually instantiate f because
> we're inside a template), at instantiation time the type A is already
> non-dependent so tsubst_aggr_type avoids doing the work that would end
> up calling convert_nontype_argument_function.
> 
> > 
> > I see that clang doesn't reject your testcase, either, but MSVC and icc do
> > (even with 'a' a pointer): https://godbolt.org/z/MGE6TcMch
> 
> FWIW although Clang doesn't reject 'A a;', it does reject
> 'using type = A;' weirdly enough: https://godbolt.org/z/T9qEn6bWW
> 
> 
> Shall we just go with the other more specific approach, that makes sure
> the CALL_EXPR case of tsubst_expr calls mark_used when the callee is an
> ADDR_EXPR?  Something like (bootstrapped and regtested):

Err, this approach is wrong because by stripping the ADDR_EXPR here we
end up checking access of the unwrapped FUNCTION_DECL again after
substituting into the call.  So we incorrectly reject e.g.

  template
  void g() {
P(); // error: ‘static void A::h()’ is private within this context
  }

  struct A {
void f() {
  g();
}
  private:
static void h();
  };

since A::h isn't accessible from g.

[PATCH 0/2] libsanitizer: Merge with upstream commit fdf4c035225d

2021-10-06 Thread H.J. Lu via Gcc-patches
I am checking in these patches to merge with upstream commit:

commit fdf4c035225de52f596899931b1f6100e5e3e928
Author: H.J. Lu 
Date:   Fri Sep 10 06:24:36 2021 -0700

[sanitizer] Support Intel CET

1. Include  in sanitizer_common/sanitizer_asm.h to mark Intel CET
support when Intel CET is enabled.
2. Add _CET_ENDBR to function entries in assembly codes so that ENDBR
instruction will be generated when Intel CET is enabled.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D85

H.J. Lu (2):
  libsanitizer: Merge with upstream
  libsanitizer: Apply local patches

 libsanitizer/MERGE|  2 +-
 libsanitizer/asan/asan_allocator.cpp  | 15 -
 libsanitizer/asan/asan_allocator.h|  2 -
 libsanitizer/asan/asan_debugging.cpp  |  5 +-
 libsanitizer/asan/asan_stats.cpp  |  4 +-
 libsanitizer/hwasan/hwasan.cpp|  4 +-
 libsanitizer/hwasan/hwasan_setjmp_x86_64.S|  2 +
 libsanitizer/lsan/lsan_common.cpp | 56 -
 libsanitizer/lsan/lsan_common.h   |  3 +
 libsanitizer/sanitizer_common/sanitizer_asm.h |  4 ++
 .../sanitizer_chained_origin_depot.cpp| 11 ++--
 .../sanitizer_chained_origin_depot.h  |  9 +--
 ...tizer_common_interceptors_vfork_i386.inc.S |  1 +
 ...zer_common_interceptors_vfork_x86_64.inc.S |  1 +
 .../sanitizer_common_libcdep.cpp  | 20 +++---
 .../sanitizer_common/sanitizer_hash.h | 24 +++
 .../sanitizer_platform_interceptors.h |  4 +-
 .../sanitizer_platform_limits_freebsd.cpp | 21 +++
 .../sanitizer_platform_limits_freebsd.h   | 16 +
 .../sanitizer_common/sanitizer_stackdepot.cpp | 49 ---
 .../sanitizer_common/sanitizer_stackdepot.h   | 10 +--
 .../sanitizer_stackdepotbase.h|  9 +--
 libsanitizer/tsan/tsan_interceptors.h |  6 +-
 libsanitizer/tsan/tsan_interceptors_posix.cpp | 13 ++--
 libsanitizer/tsan/tsan_interface.cpp  |  5 +-
 libsanitizer/tsan/tsan_platform_linux.cpp | 18 +++---
 libsanitizer/tsan/tsan_platform_mac.cpp   | 62 ++-
 libsanitizer/tsan/tsan_rtl.cpp|  8 +--
 libsanitizer/tsan/tsan_rtl.h  |  9 +--
 libsanitizer/tsan/tsan_rtl_amd64.S|  6 ++
 30 files changed, 227 insertions(+), 172 deletions(-)

-- 
2.31.1



[PATCH 2/2] libsanitizer: Apply local patches

2021-10-06 Thread H.J. Lu via Gcc-patches
---
 libsanitizer/asan/asan_globals.cpp| 19 --
 libsanitizer/asan/asan_interceptors.h |  7 ++-
 libsanitizer/asan/asan_mapping.h  |  2 +-
 .../sanitizer_linux_libcdep.cpp   |  4 
 .../sanitizer_common/sanitizer_mac.cpp| 12 +--
 libsanitizer/sanitizer_common/sanitizer_mac.h | 20 +++
 .../sanitizer_platform_limits_linux.cpp   |  5 -
 .../sanitizer_platform_limits_posix.h |  2 +-
 .../sanitizer_common/sanitizer_stacktrace.cpp | 17 +++-
 libsanitizer/tsan/tsan_rtl_ppc64.S|  1 +
 libsanitizer/ubsan/ubsan_flags.cpp|  1 +
 libsanitizer/ubsan/ubsan_handlers.cpp | 15 ++
 libsanitizer/ubsan/ubsan_handlers.h   |  8 
 libsanitizer/ubsan/ubsan_platform.h   |  2 ++
 14 files changed, 85 insertions(+), 30 deletions(-)

diff --git a/libsanitizer/asan/asan_globals.cpp 
b/libsanitizer/asan/asan_globals.cpp
index 9bf378f6207..763d3c6d2c0 100644
--- a/libsanitizer/asan/asan_globals.cpp
+++ b/libsanitizer/asan/asan_globals.cpp
@@ -154,23 +154,6 @@ static void CheckODRViolationViaIndicator(const Global *g) 
{
   }
 }
 
-// Check ODR violation for given global G by checking if it's already poisoned.
-// We use this method in case compiler doesn't use private aliases for global
-// variables.
-static void CheckODRViolationViaPoisoning(const Global *g) {
-  if (__asan_region_is_poisoned(g->beg, g->size_with_redzone)) {
-// This check may not be enough: if the first global is much larger
-// the entire redzone of the second global may be within the first global.
-for (ListOfGlobals *l = list_of_all_globals; l; l = l->next) {
-  if (g->beg == l->g->beg &&
-  (flags()->detect_odr_violation >= 2 || g->size != l->g->size) &&
-  !IsODRViolationSuppressed(g->name))
-ReportODRViolation(g, FindRegistrationSite(g),
-   l->g, FindRegistrationSite(l->g));
-}
-  }
-}
-
 // Clang provides two different ways for global variables protection:
 // it can poison the global itself or its private alias. In former
 // case we may poison same symbol multiple times, that can help us to
@@ -216,8 +199,6 @@ static void RegisterGlobal(const Global *g) {
 // where two globals with the same name are defined in different modules.
 if (UseODRIndicator(g))
   CheckODRViolationViaIndicator(g);
-else
-  CheckODRViolationViaPoisoning(g);
   }
   if (CanPoisonMemory())
 PoisonRedZones(*g);
diff --git a/libsanitizer/asan/asan_interceptors.h 
b/libsanitizer/asan/asan_interceptors.h
index 047b044c8bf..105c672cc24 100644
--- a/libsanitizer/asan/asan_interceptors.h
+++ b/libsanitizer/asan/asan_interceptors.h
@@ -81,7 +81,12 @@ void InitializePlatformInterceptors();
 #if ASAN_HAS_EXCEPTIONS && !SANITIZER_WINDOWS && !SANITIZER_SOLARIS && \
 !SANITIZER_NETBSD
 # define ASAN_INTERCEPT___CXA_THROW 1
-# define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 1
+# if ! defined(ASAN_HAS_CXA_RETHROW_PRIMARY_EXCEPTION) \
+ || ASAN_HAS_CXA_RETHROW_PRIMARY_EXCEPTION
+#   define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 1
+# else
+#   define ASAN_INTERCEPT___CXA_RETHROW_PRIMARY_EXCEPTION 0
+# endif
 # if defined(_GLIBCXX_SJLJ_EXCEPTIONS) || (SANITIZER_IOS && defined(__arm__))
 #  define ASAN_INTERCEPT__UNWIND_SJLJ_RAISEEXCEPTION 1
 # else
diff --git a/libsanitizer/asan/asan_mapping.h b/libsanitizer/asan/asan_mapping.h
index e5a7f2007ae..4b0037fced3 100644
--- a/libsanitizer/asan/asan_mapping.h
+++ b/libsanitizer/asan/asan_mapping.h
@@ -165,7 +165,7 @@ static const u64 kAArch64_ShadowOffset64 = 1ULL << 36;
 static const u64 kRiscv64_ShadowOffset64 = 0xd;
 static const u64 kMIPS32_ShadowOffset32 = 0x0aaa;
 static const u64 kMIPS64_ShadowOffset64 = 1ULL << 37;
-static const u64 kPPC64_ShadowOffset64 = 1ULL << 44;
+static const u64 kPPC64_ShadowOffset64 = 1ULL << 41;
 static const u64 kSystemZ_ShadowOffset64 = 1ULL << 52;
 static const u64 kSPARC64_ShadowOffset64 = 1ULL << 43;  // 0x800
 static const u64 kFreeBSD_ShadowOffset32 = 1ULL << 30;  // 0x4000
diff --git a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp 
b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
index 7ce9e25da34..fc5619e4b37 100644
--- a/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
+++ b/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp
@@ -759,9 +759,13 @@ u32 GetNumberOfCPUs() {
 #elif SANITIZER_SOLARIS
   return sysconf(_SC_NPROCESSORS_ONLN);
 #else
+#if defined(CPU_COUNT)
   cpu_set_t CPUs;
   CHECK_EQ(sched_getaffinity(0, sizeof(cpu_set_t), &CPUs), 0);
   return CPU_COUNT(&CPUs);
+#else
+  return 1;
+#endif
 #endif
 }
 
diff --git a/libsanitizer/sanitizer_common/sanitizer_mac.cpp 
b/libsanitizer/sanitizer_common/sanitizer_mac.cpp
index b8839f197d2..fa077a129c2 100644
--- a/libsanitizer/sanitizer_common/sanitizer_mac.cpp
+++ b/libsanitizer/sanitizer_common/sa

[PATCH 1/2] libsanitizer: Merge with upstream

2021-10-06 Thread H.J. Lu via Gcc-patches
Merged revision: fdf4c035225de52f596899931b1f6100e5e3e928
---
 libsanitizer/MERGE|  2 +-
 libsanitizer/asan/asan_allocator.cpp  | 15 -
 libsanitizer/asan/asan_allocator.h|  2 -
 libsanitizer/asan/asan_debugging.cpp  |  5 +-
 libsanitizer/asan/asan_globals.cpp| 19 ++
 libsanitizer/asan/asan_interceptors.h |  7 +--
 libsanitizer/asan/asan_mapping.h  |  2 +-
 libsanitizer/asan/asan_stats.cpp  |  4 +-
 libsanitizer/hwasan/hwasan.cpp|  4 +-
 libsanitizer/hwasan/hwasan_setjmp_x86_64.S|  2 +
 libsanitizer/lsan/lsan_common.cpp | 56 -
 libsanitizer/lsan/lsan_common.h   |  3 +
 libsanitizer/sanitizer_common/sanitizer_asm.h |  4 ++
 .../sanitizer_chained_origin_depot.cpp| 11 ++--
 .../sanitizer_chained_origin_depot.h  |  9 +--
 ...tizer_common_interceptors_vfork_i386.inc.S |  1 +
 ...zer_common_interceptors_vfork_x86_64.inc.S |  1 +
 .../sanitizer_common_libcdep.cpp  | 20 +++---
 .../sanitizer_common/sanitizer_hash.h | 24 +++
 .../sanitizer_linux_libcdep.cpp   |  4 --
 .../sanitizer_common/sanitizer_mac.cpp| 12 +---
 libsanitizer/sanitizer_common/sanitizer_mac.h | 20 --
 .../sanitizer_platform_interceptors.h |  4 +-
 .../sanitizer_platform_limits_freebsd.cpp | 21 +++
 .../sanitizer_platform_limits_freebsd.h   | 16 +
 .../sanitizer_platform_limits_linux.cpp   |  5 +-
 .../sanitizer_platform_limits_posix.h |  2 +-
 .../sanitizer_common/sanitizer_stackdepot.cpp | 49 ---
 .../sanitizer_common/sanitizer_stackdepot.h   | 10 +--
 .../sanitizer_stackdepotbase.h|  9 +--
 .../sanitizer_common/sanitizer_stacktrace.cpp | 17 ++---
 libsanitizer/tsan/tsan_interceptors.h |  6 +-
 libsanitizer/tsan/tsan_interceptors_posix.cpp | 13 ++--
 libsanitizer/tsan/tsan_interface.cpp  |  5 +-
 libsanitizer/tsan/tsan_platform_linux.cpp | 18 +++---
 libsanitizer/tsan/tsan_platform_mac.cpp   | 62 ++-
 libsanitizer/tsan/tsan_rtl.cpp|  8 +--
 libsanitizer/tsan/tsan_rtl.h  |  9 +--
 libsanitizer/tsan/tsan_rtl_amd64.S|  6 ++
 libsanitizer/tsan/tsan_rtl_ppc64.S|  1 -
 libsanitizer/ubsan/ubsan_flags.cpp|  1 -
 libsanitizer/ubsan/ubsan_handlers.cpp | 15 -
 libsanitizer/ubsan/ubsan_handlers.h   |  8 ---
 libsanitizer/ubsan/ubsan_platform.h   |  2 -
 44 files changed, 257 insertions(+), 257 deletions(-)

diff --git a/libsanitizer/MERGE b/libsanitizer/MERGE
index 2094a8beb3e..5ea083a693a 100644
--- a/libsanitizer/MERGE
+++ b/libsanitizer/MERGE
@@ -1,4 +1,4 @@
-1c2e5fd66ea27d0c51360ba4e22099124a915562
+fdf4c035225de52f596899931b1f6100e5e3e928
 
 The first line of this file holds the git revision number of the
 last merge done from the master library sources.
diff --git a/libsanitizer/asan/asan_allocator.cpp 
b/libsanitizer/asan/asan_allocator.cpp
index 414fba3b427..268feac59dd 100644
--- a/libsanitizer/asan/asan_allocator.cpp
+++ b/libsanitizer/asan/asan_allocator.cpp
@@ -908,13 +908,6 @@ AllocType AsanChunkView::GetAllocType() const {
   return (AllocType)chunk_->alloc_type;
 }
 
-static StackTrace GetStackTraceFromId(u32 id) {
-  CHECK(id);
-  StackTrace res = StackDepotGet(id);
-  CHECK(res.trace);
-  return res;
-}
-
 u32 AsanChunkView::GetAllocStackId() const {
   u32 tid = 0;
   u32 stack = 0;
@@ -931,14 +924,6 @@ u32 AsanChunkView::GetFreeStackId() const {
   return stack;
 }
 
-StackTrace AsanChunkView::GetAllocStack() const {
-  return GetStackTraceFromId(GetAllocStackId());
-}
-
-StackTrace AsanChunkView::GetFreeStack() const {
-  return GetStackTraceFromId(GetFreeStackId());
-}
-
 void InitializeAllocator(const AllocatorOptions &options) {
   instance.InitLinkerInitialized(options);
 }
diff --git a/libsanitizer/asan/asan_allocator.h 
b/libsanitizer/asan/asan_allocator.h
index 2963e979b55..27d826fb613 100644
--- a/libsanitizer/asan/asan_allocator.h
+++ b/libsanitizer/asan/asan_allocator.h
@@ -64,8 +64,6 @@ class AsanChunkView {
   bool Eq(const AsanChunkView &c) const { return chunk_ == c.chunk_; }
   u32 GetAllocStackId() const;
   u32 GetFreeStackId() const;
-  StackTrace GetAllocStack() const;
-  StackTrace GetFreeStack() const;
   AllocType GetAllocType() const;
   bool AddrIsInside(uptr addr, uptr access_size, sptr *offset) const {
 if (addr >= Beg() && (addr + access_size) <= End()) {
diff --git a/libsanitizer/asan/asan_debugging.cpp 
b/libsanitizer/asan/asan_debugging.cpp
index c01360b52fc..0b4bf52f249 100644
--- a/libsanitizer/asan/asan_debugging.cpp
+++ b/libsanitizer/asan/asan_debugging.cpp
@@ -19,6 +19,7 @@
 #include "asan_mapping.h"
 #include "asan_report.h"
 #include "asan_thread.h"
+#include "sanitizer_common/sanitizer_stackdepot.h"
 
 namespace {
 using namespace __asan;
@@ -54,11 +55,11 @@ uptr AsanGetSt

Re: [PATCH, v5] c++: Fix up synthetization of defaulted comparison operators on classes with bitfields [PR102490]

2021-10-06 Thread Jason Merrill via Gcc-patches

On 10/6/21 05:06, Jakub Jelinek wrote:

On Tue, Oct 05, 2021 at 10:40:45PM -0400, Jason Merrill wrote:

I've switched to handling bases via binfo as discussed on IRC and added
spaceship-synth14.C to test proper base handling with virtual <=>. Here's
what I'm committing:


Thanks a lot.

I see spaceship-synth8.C is accepted without errors (| LOOKUP_NONVIRTUAL |
LOOKUP_DEFAULTED didn't help it for me back when playing with it), but if I add

E e1, e2;
auto x = e1 <=> e2;
at the end of it, it is rejected:
spaceship-synth8.C:26:17: error: use of deleted function ‘virtual constexpr 
std::strong_ordering E::operator<=>(const E&) const’
26 | auto x = e1 <=> e2;
   | ^~
spaceship-synth8.C:22:24: note: ‘virtual constexpr std::strong_ordering 
E::operator<=>(const E&) const’ is implicitly deleted because the default 
definition would be ill-formed:
22 |   std::strong_ordering operator<=>(const E&) const override = default;
   |^~~~
spaceship-synth8.C:21:8: error: no match for ‘operator<=>’ (operand types are 
‘const D’ and ‘const D’)
21 | struct E : D {
   |^
spaceship-synth8.C:19:32: note: candidate: ‘virtual std::strong_ordering 
D::operator<=>(const E&) const’ (reversed)
19 |   virtual std::strong_ordering operator<=>(const struct E&) const = 0;
   |^~~~
spaceship-synth8.C:19:44: note:   no known conversion for argument 1 from ‘const D’ 
to ‘const E&’
19 |   virtual std::strong_ordering operator<=>(const struct E&) const = 0;
   |^~~

Is that ok (i.e. whether it is accepted or rejected when the operator<=>
is actually not called falls into "no diagnostics required" category)?


It's not even NDR, it's implicitly deleted, so it's well-formed until 
called.



Note, before this fix we were accepting it even with those
E e1, e2;
auto x = e1 <=> e2;
lines in there.  Perhaps we want to copy spaceship-synth8.C to another
test that will add those two lines and check for the errors...


Done:

From 15b57c0ffe8ac9d568e76e70244cf723b72b1a82 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Wed, 6 Oct 2021 17:12:02 -0400
Subject: [PATCH] c++: One more spaceship test.
To: gcc-patches@gcc.gnu.org

Jakub suggested adding a variant where we actually try to call the
implicitly deleted operator.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/spaceship-synth8a.C: New test.
---
 .../g++.dg/cpp2a/spaceship-synth8a.C  | 25 +++
 1 file changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/spaceship-synth8a.C

diff --git a/gcc/testsuite/g++.dg/cpp2a/spaceship-synth8a.C b/gcc/testsuite/g++.dg/cpp2a/spaceship-synth8a.C
new file mode 100644
index 000..42a32da77f9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/spaceship-synth8a.C
@@ -0,0 +1,25 @@
+// PR c++/94907
+// { dg-do compile { target c++20 } }
+
+namespace std { struct strong_ordering {
+  int _v;
+  constexpr strong_ordering (int v) :_v(v) {}
+  constexpr operator int (void) const { return _v; }
+  static const strong_ordering less;
+  static const strong_ordering equal;
+  static const strong_ordering greater;
+};
+constexpr strong_ordering strong_ordering::less = -1;
+constexpr strong_ordering strong_ordering::equal = 0;
+constexpr strong_ordering strong_ordering::greater = 1;
+}
+
+struct E;
+struct D {
+  virtual std::strong_ordering operator<=>(const struct E&) const = 0;
+};
+struct E : D {			   // { dg-error "no match" }
+  std::strong_ordering operator<=>(const E&) const override = default; // { dg-message "default" }
+};
+
+auto x = E() <=> E();		// { dg-error "deleted" }
-- 
2.27.0



[PATCH, Fortran] Add diagnostic for F2018:C839 (TS29113:C535c)

2021-10-06 Thread Sandra Loosemore
This patch is for PR fortran/54753, to add a diagnostic for violations 
of this constraint in the 2018 standard:


  C839 If an assumed-size or nonallocatable nonpointer assumed-rank
  array is an actual argument that corresponds to a dummy argument that
  is an INTENT (OUT) assumed-rank array, it shall not be polymorphic,
  finalizable, of a type with an allocatable ultimate component, or of a
  type for which default initialization is specified.

The last 3 clauses were fairly straightforward, but the "polymorphic" 
case gave me fits because I didn't initially understand that the front 
end stores flags for class types in different places than for non-class 
types.  I must give Tobias credit for straightening me out on that and 
some other obscure points that were confusing me, but he deserves none 
of the blame for this patch.  :-P


This patch fixes all the missing diagnostics and ICEs I previously 
reported in the PR, but I ended up completely rewriting the c535c-1 test 
case that formerly produced a bogus diagnostic.  (It now uses an 
interface instead of an actual subroutine definition, since Tobias 
recently committed a patch to fix interfaces in order to unblock my work 
on this one.)  That bug is independent of enforcing this constraint so 
I'm planning to open a new issue for it with its own test case, if there 
isn't already one in Bugzilla.


OK to commit?

-Sandra
commit d11d942503c884c06155f2743f8ed6c981a65533
Author: Sandra Loosemore 
Date:   Mon Sep 27 07:05:32 2021 -0700

Fortran: Add diagnostic for F2018:C839 (TS29113:C535c)

2021-10-06 Sandra Loosemore  

PR fortran/54753

gcc/fortran/
* interface.c (gfc_compare_actual_formal): Add diagnostic
for F2018:C839.  Refactor shared code and fix bugs with class
array info lookup, and add comments to diagnostic from PR94110
that is structured similarly to the new diagnostic.

gcc/testsuite/
* gfortran.dg/c-interop/c535c-1.f90: Rewrite and expand.
* gfortran.dg/c-interop/c535c-2.f90: Remove xfails.
* gfortran.dg/c-interop/c535c-3.f90: Likewise.
* gfortran.dg/c-interop/c535c-4.f90: Likewise.

diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index a2fea0e97b8..9d13575cbf0 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -3061,6 +3061,8 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
   unsigned long actual_size, formal_size;
   bool full_array = false;
   gfc_array_ref *actual_arr_ref;
+  gfc_array_spec *fas, *aas;
+  bool pointer_arg, allocatable_arg;;
 
   actual = *ap;
 
@@ -3329,13 +3331,48 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
 	  return false;
 	}
 
-  if (f->sym->as
-	  && (f->sym->as->type == AS_ASSUMED_SHAPE
-	  || f->sym->as->type == AS_DEFERRED
-	  || (f->sym->as->type == AS_ASSUMED_RANK && f->sym->attr.pointer))
-	  && a->expr->expr_type == EXPR_VARIABLE
-	  && a->expr->symtree->n.sym->as
-	  && a->expr->symtree->n.sym->as->type == AS_ASSUMED_SIZE
+  /* Class array variables and expressions store array info in a
+	 different place from non-class objects; consolidate the logic
+	 to access it here instead of repeating it below.  */
+  fas = (f->sym->ts.type == BT_CLASS
+	 ? CLASS_DATA (f->sym)->as
+	 : f->sym->as);
+  if (a->expr->expr_type != EXPR_VARIABLE)
+	{
+	  aas = NULL;
+	  pointer_arg = false;
+	  allocatable_arg = false;
+	}
+  else if (a->expr->ts.type == BT_CLASS
+	   && a->expr->symtree->n.sym
+	   && CLASS_DATA (a->expr->symtree->n.sym))
+	{
+	  gfc_component *classdata = CLASS_DATA (a->expr->symtree->n.sym);
+	  aas = classdata->as;
+	  pointer_arg = classdata->attr.class_pointer;
+	  allocatable_arg = classdata->attr.allocatable;
+	}
+  else
+	{
+	  aas = a->expr->symtree->n.sym->as;
+	  pointer_arg = a->expr->symtree->n.sym->attr.pointer;
+	  allocatable_arg = a->expr->symtree->n.sym->attr.allocatable;
+	}
+
+  /* F2018:9.5.2(2) permits assumed-size whole array expressions as
+	 actual arguments only if the shape is not required; thus it
+	 cannot be passed to an assumed-shape array dummy.
+	 F2018:15.5.2.(2) permits passing a nonpointer actual to an
+	 intent(in) pointer dummy argument and this is accepted by
+	 the compare_pointer check below, but this also requires shape
+	 information.
+	 There's more discussion of this in PR94110.  */
+  if (fas
+	  && (fas->type == AS_ASSUMED_SHAPE
+	  || fas->type == AS_DEFERRED
+	  || (fas->type == AS_ASSUMED_RANK && f->sym->attr.pointer))
+	  && aas
+	  && aas->type == AS_ASSUMED_SIZE
 	  && (a->expr->ref == NULL
 	  || (a->expr->ref->type == REF_ARRAY
 		  && a->expr->ref->u.ar.type == AR_FULL)))
@@ -3346,6 +3383,35 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
 	  return false;
 	}
 
+  /* Diagnose F2018 C839 (TS29113 C

Re: [PATCH] c++: odr-use argument to a function NTTP [PR53164]

2021-10-06 Thread Jason Merrill via Gcc-patches

On 10/6/21 15:52, Patrick Palka wrote:

On Wed, 6 Oct 2021, Patrick Palka wrote:


On Tue, 5 Oct 2021, Jason Merrill wrote:


On 10/5/21 15:17, Patrick Palka wrote:

On Mon, 4 Oct 2021, Patrick Palka wrote:


When passing a function template as the argument to a function NTTP
inside a template, we resolve it to the right specialization ahead of
time via resolve_address_of_overloaded_function, though the call to
mark_used within defers odr-using it until instantiation time (as usual).
But at instantiation time we end up never calling mark_used on the
specialization.

This patch fixes this by adding a call to mark_used in
convert_nontype_argument_function.

PR c++/53164

gcc/cp/ChangeLog:

* pt.c (convert_nontype_argument_function): Call mark_used.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent16.C: New test.
---
   gcc/cp/pt.c |  3 +++
   gcc/testsuite/g++.dg/template/non-dependent16.C | 16 
   2 files changed, 19 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/template/non-dependent16.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f950f4a21b7..5e819c9598c 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6668,6 +6668,9 @@ convert_nontype_argument_function (tree type, tree
expr,
 return NULL_TREE;
   }
   +  if (!mark_used (fn_no_ptr, complain) && !(complain & tf_error))
+return NULL_TREE;
+
 linkage = decl_linkage (fn_no_ptr);
 if (cxx_dialect >= cxx11 ? linkage == lk_none : linkage !=
lk_external)
   {
diff --git a/gcc/testsuite/g++.dg/template/non-dependent16.C
b/gcc/testsuite/g++.dg/template/non-dependent16.C
new file mode 100644
index 000..b7dca8f6752
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent16.C
@@ -0,0 +1,16 @@
+// PR c++/53164
+
+template
+void f(T) {
+  T::fail; // { dg-error "not a member" }
+}
+
+template
+struct A { };
+
+template
+void g() {
+  A a;
+}


I should mention that the original testcase in the PR was slightly
different than this one in that it also performed a call to the NTTP,
e.g.

template
struct A {
  static void h() {
p(0);
  }
};

template
void g() {
  A::h();
}

templated void g<0>();

and not even the call was enough to odr-use f, apparently because the
CALL_EXPR case of tsubst_expr calls mark_used on the callee only when
it's a FUNCTION_DECL, but in this case after substitution it's an
ADDR_EXPR of a FUNCTION_DECL.  Fixing this by looking through the ADDR_EXPR
worked, but IIUC the call isn't necessary for f to be odr-used, simply
using f as a template argument should be sufficient, so it seems the
above is better fix.


I agree that pedantically the use happens when substituting into the use of
A, but convert_nontype_argument_function seems like a weird place to
implement that; it's only called again during instantiation of A, when we
instantiate the injected-class-name.  If A isn't instantiated, e.g. if 'a'
is a pointer to A, we again don't instantiate f.


I see, makes sense..  I'm not sure where else we can mark the use, then.
Since we resolve the OVERLOAD f to the FUNCTION_DECL f ahead of
time (during which mark_used doesn't actually instantiate f because
we're inside a template), at instantiation time the type A is already
non-dependent so tsubst_aggr_type avoids doing the work that would end
up calling convert_nontype_argument_function.



I see that clang doesn't reject your testcase, either, but MSVC and icc do
(even with 'a' a pointer): https://godbolt.org/z/MGE6TcMch


FWIW although Clang doesn't reject 'A a;', it does reject
'using type = A;' weirdly enough: https://godbolt.org/z/T9qEn6bWW


Shall we just go with the other more specific approach, that makes sure
the CALL_EXPR case of tsubst_expr calls mark_used when the callee is an
ADDR_EXPR?  Something like (bootstrapped and regtested):


Err, this approach is wrong because by stripping the ADDR_EXPR here we
end up checking access of the unwrapped FUNCTION_DECL again after
substituting into the call.  So we incorrectly reject e.g.

   template
   void g() {
 P(); // error: ‘static void A::h()’ is private within this context
   }

   struct A {
 void f() {
   g();
 }
   private:
 static void h();
   };

since A::h isn't accessible from g.


I guess you could call mark_used directly instead of stripping the 
ADDR_EXPR.


Or for the general problem, perhaps we could mark the TEMPLATE_INFO or 
TI_ARGS to indicate that we still need to mark_used the arguments when 
we encounter A again during instantiation?




-- >8 --

PR c++/53164

gcc/cp/ChangeLog:

* pt.c (tsubst_copy_and_build) : Look through
ADDR_EXPR after substituting into the callee.

gcc/testsuite/ChangeLog:

* g++.dg/template/fn-ptr3.C: New test.
---
  gcc/cp/pt.c |  4 
  gcc/testsuite/g++.dg/template/fn-ptr3.C | 20 
  2 files changed, 24 insertions(+)
  

Re: [Ada] introduce stack scrub (strub) feature

2021-10-06 Thread Alexandre Oliva via Gcc-patches
On Oct  5, 2021, Pierre-Marie de Rodat  wrote:

>   * gcc-interface/utils.c (handle_strub_attribute): New.
>   (gnat_internal_attribute_table): Add strub.

FTR, this is just a dummy.  The actual implementation was submitted
along with the language- and machine-independent infrastructure at
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579091.html and
it's yet to be reviewed.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-06 Thread Jeff Law via Gcc-patches




On 10/6/2021 10:22 AM, Aldy Hernandez via Gcc-patches wrote:

On Wed, Oct 6, 2021 at 5:03 PM Martin Sebor  wrote:

On 10/6/21 7:47 AM, Aldy Hernandez via Gcc-patches wrote:
-Wmaybe-uninitialized certainly suffers from a high rate of false
positives (I count 40 open PRs).  Even after all this time debugging
it I still struggle with the code for it but in all the bug reports
I've reviewed, only a small subset seems to be due to bugs or even
limitations in it.  Most are inherent in its design goal to warn
for reads from variables that cannot be proven to have been
initialized.

If this one is a bug with some chance of getting fixed it would
be good to distill it to a small test case (even though it's not
unlikely that there already is an existing report with the same
root cause).

That said, from from your description and the IL above it sounds
to me like the uninitialized read here is possible (it takes place
when _881 != 0), and so -Wmaybe-uininitialized triggers as it's
supposed to.

Either way, rather than suppressing the warning for the whole file
I would suggest to consider initializing the local variable instead.
Looking at the code in calls.c, I can't convince myself that
the variable cannot, in fact, be used uninitialized.

Martin

FWIW, libgomp/team.c suffers from the same problem if you remove the
stop-gap in gomp_team_start():

   struct gomp_thread_start_data *start_data = NULL;

The use of start_data in the block predicated by "if (nested)" is the
only path that can use start_data without passing through its
initialization.  But the only way to elide the initialization is from:

if (!nested)
{
   ...
  
   if (i == nthreads)
 goto do_release;
}

So the use of start_data either crosses the gomp_alloca
initialization, or is predicated by if(nested).
It may simply be the case that the uninit analysis gave up or it may 
have failed to simplify its predicates enough to realize the use is 
properly guarded.  These things certainly do happen.


Of course this is all part of why we try so hard to thread jumps to 
aggressively.  Missed jump threads closely correlate with bogus 
uninitialized warnings due to infeasible paths in the CFG.  In my mind 
the predicate analysis we do for uninit is primarily there to catch 
cases that jump threading discovers, but does not optimize due to cost.


Jeff


Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-06 Thread Jeff Law via Gcc-patches




On 10/6/2021 7:47 AM, Aldy Hernandez via Gcc-patches wrote:

The pending patch I have from Richi fixes this.  Even so, it's the
uninit code that's confused.

Sigh...every single change to the threading code shines the light on
some warning bug.

If you take the calls.ii file from the aarch64 bootstrap and break on
the warning, you can see that the uninitalized use is for
const_upper_3934 here:

   [local count: 315357954]:
   # const_upper_3934 = PHI 
   if (_881 != 0)
 goto ; [50.00%]
   else
 goto ; [50.00%]

[local count: 157678977]:
   if (const_upper_3934 > _6699)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 17344687]:

[local count: 157678977]:
   goto ; [100.00%]

[local count: 140334290]:
   stack_usage_map.481_3930 = stack_usage_map;
   _6441 = const_upper_3934 - _6699;


PROBLEMATIC READ HERE

   _4819 = stack_usage_map.481_3930 + _6699;
   __builtin_memset (_4819, 1, _6441);
   goto ; [11.00%]

const_upper_3934 could be undefined if it comes from BB101
(const_upper_3937(D)), but it only gets read for _881 != 0, so it
shouldn't warn.

I suggest -Wmaybe-uninitialized be turned off for that file until the
warning is fixed.

And yes, the proposed patch will also cure this, but the underlying
problem in the warning is still there.
You haven't shown enough context for me to agree or disagree.  Are there 
other preds to bb105?   I hate these slimmed down dumps.  I work better 
with the full pred/succ lists. -fdump-tree--blocks-details :-)


It appears to me that for _881 != 0 we certainly flow into the read of 
_const_upper_3934 in bb103 and bb105.  Why do you think that's safe?


Jeff


Re: [PATCH] c++: Do not warn about lifetime of std::initializer_list& [PR102482]

2021-10-06 Thread Jason Merrill via Gcc-patches

On 10/1/21 16:58, Jonathan Wakely wrote:

An initializer-list constructor taking a non-const lvalue cannot be
called with a temporary, so the array's lifetime probably doesn't end
with the full expression. -Winit-list-lifetime should not warn for that
case.

PR c++/102482

gcc/cp/ChangeLog:

* init.c (maybe_warn_list_ctor): Do not warn for a reference to
a non-const std::initializer_list.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Winit-list5.C: New test.



Tested x86_64-linux. OK for trunk?


OK, thanks.

Jason



Re: *PING* [PATCH] libiberty: allow comments in option file

2021-10-06 Thread Jeff Law via Gcc-patches




On 10/5/2021 10:15 AM, Hans-Peter Nilsson wrote:

On Sat, 25 Sep 2021, Hu Jialun wrote:

Hello,

Sorry for bumping it again but I guess it was getting overlooked.

I am very junior with mailing list open source contributions so please feel
free to point out if I have inadvertantly done something in an incorrect way.

The archive of the original email can be found at
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579232.html
and is also appended below.

Best regards,
Hu Jialun

This looks useful, I hope a reviewer will eventually notice it.
Until then keep pinging (say, weekly)!
I read the upstream thread from a while back and I got the impression 
the utility was in question, so I pushed this near the bottom of my 
review list.


jeff



Re: [PATCH] c++: ttp variadic constraint subsumption [PR99904]

2021-10-06 Thread Jason Merrill via Gcc-patches

On 10/1/21 16:10, Patrick Palka wrote:

Here we're crashing when level-lowering the variadic constraints on the
template template parameter TT because tsubst_pack_expansion expects
processing_template_decl to be set during a partial substitution.

bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?  Also tested on cmcstl2 and range-v3.


OK.


PR c++/99904

gcc/cp/ChangeLog:

* pt.c (is_compatible_template_arg): Set processing_template_decl
around tsubst_constraint_info.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-ttp4.C: New test.
---
  gcc/cp/pt.c| 2 ++
  gcc/testsuite/g++.dg/cpp2a/concepts-ttp4.C | 9 +
  2 files changed, 11 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-ttp4.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 1dcdffe322a..f950f4a21b7 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8223,8 +8223,10 @@ is_compatible_template_arg (tree parm, tree arg)
  {
tree aparms = DECL_INNERMOST_TEMPLATE_PARMS (arg);
new_args = template_parms_level_to_args (aparms);
+  ++processing_template_decl;
parm_cons = tsubst_constraint_info (parm_cons, new_args,
  tf_none, NULL_TREE);
+  --processing_template_decl;
if (parm_cons == error_mark_node)
  return false;
  }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-ttp4.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp4.C
new file mode 100644
index 000..cf3e71ea974
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-ttp4.C
@@ -0,0 +1,9 @@
+// PR c++/99904
+// { dg-do compile { target c++20 } }
+
+template concept C = (Ts::value && ...);
+template requires C struct A;
+template requires true struct B;
+template requires C class TT> struct S;
+using ty1 = S;
+using ty2 = S; // { dg-error "constraint" } TT's constraints don't subsume 
B's