[PATCH] tree-optimization/104825 - guard modref query

2022-03-08 Thread Richard Biener via Gcc-patches
The following makes sure to guard the modref query in VN on a
pointer typed argument.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-03-08  Richard Biener  

PR tree-optimization/104825
* tree-ssa-sccvn.c (visit_reference_op_call): Properly
guard modref get_ao_ref on a pointer typed argument.

* gcc.dg/torture/pr104825.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr104825.c | 14 ++
 gcc/tree-ssa-sccvn.cc   |  5 +++--
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr104825.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr104825.c 
b/gcc/testsuite/gcc.dg/torture/pr104825.c
new file mode 100644
index 000..7affacc2094
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr104825.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wno-stringop-overread" } */
+
+int foo (fmt)
+char* fmt;
+{
+  return (__builtin_strchr (fmt, '*') != 0
+  || __builtin_strchr (fmt, 'n') != 0);
+}
+void bar ()
+{
+  if (foo (1))
+__builtin_abort ();
+}
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index d4d0aba880c..66b4fc21f5b 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -5140,11 +5140,12 @@ visit_reference_op_call (tree lhs, gcall *stmt)
{
  accesses.quick_grow (accesses.length () + 1);
  ao_ref *r = &accesses.last ();
- if (!access_node.get_ao_ref (stmt, r))
+ tree arg = access_node.get_call_arg (stmt);
+ if (!POINTER_TYPE_P (TREE_TYPE (arg))
+ || !access_node.get_ao_ref (stmt, r))
{
  /* Initialize a ref based on the argument and
 unknown offset if possible.  */
- tree arg = access_node.get_call_arg (stmt);
  if (arg && TREE_CODE (arg) == SSA_NAME)
arg = SSA_VAL (arg);
  if (arg
-- 
2.34.1


Re: [PATCH] c++: Don't suggest cdtor or conversion op identifiers in spelling hints [PR104806]

2022-03-08 Thread Richard Biener via Gcc-patches
On Tue, Mar 8, 2022 at 8:27 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> On the following testcase, we emit "did you mean '__dt '?" in the error
> message.  "__dt " shows there because it is dtor_identifier, but we
> shouldn't suggest those to the user, they are purely internal and can't
> be really typed by the user because of the final space in it.

Are those maybe also DECL_ARTIFICIAL?

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2022-03-08  Jakub Jelinek  
>
> PR c++/104806
> * search.cc (lookup_field_fuzzy_info::fuzzy_lookup_field): Ignore
> identifiers with space at the end.
>
> * g++.dg/spellcheck-pr104806.C: New test.
>
> --- gcc/cp/search.cc.jj 2022-01-18 11:58:59.407984557 +0100
> +++ gcc/cp/search.cc2022-03-07 10:44:33.455673155 +0100
> @@ -1275,6 +1275,13 @@ lookup_field_fuzzy_info::fuzzy_lookup_fi
>if (is_lambda_ignored_entity (field))
> continue;
>
> +  /* Ignore special identifiers with space at the end like cdtor or
> +conversion op identifiers.  */
> +  if (TREE_CODE (DECL_NAME (field)) == IDENTIFIER_NODE)
> +   if (unsigned int len = IDENTIFIER_LENGTH (DECL_NAME (field)))
> + if (IDENTIFIER_POINTER (DECL_NAME (field))[len - 1] == ' ')
> +   continue;
> +
>m_candidates.safe_push (DECL_NAME (field));
>  }
>  }
> --- gcc/testsuite/g++.dg/spellcheck-pr104806.C.jj   2022-03-07 
> 10:34:07.224499657 +0100
> +++ gcc/testsuite/g++.dg/spellcheck-pr104806.C  2022-03-07 10:43:41.900399808 
> +0100
> @@ -0,0 +1,5 @@
> +// PR c++/104806
> +
> +struct S {};
> +int main() { S s; s.__d; } // { dg-bogus "'struct S' has no member named 
> '__d'; did you mean '__\[a-z]* '" }
> +   // { dg-error "'struct S' has no member named 
> '__d'" "" { target *-*-* } .-1 }
>
> Jakub
>


Re: [PATCH] c++: Don't suggest cdtor or conversion op identifiers in spelling hints [PR104806]

2022-03-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 08, 2022 at 10:23:28AM +0100, Richard Biener wrote:
> On Tue, Mar 8, 2022 at 8:27 AM Jakub Jelinek via Gcc-patches
> > On the following testcase, we emit "did you mean '__dt '?" in the error
> > message.  "__dt " shows there because it is dtor_identifier, but we
> > shouldn't suggest those to the user, they are purely internal and can't
> > be really typed by the user because of the final space in it.
> 
> Are those maybe also DECL_ARTIFICIAL?

You mean the FUNCTION_DECLs in the TYPE_FIELDS chain?  No, they aren't.

> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2022-03-08  Jakub Jelinek  
> >
> > PR c++/104806
> > * search.cc (lookup_field_fuzzy_info::fuzzy_lookup_field): Ignore
> > identifiers with space at the end.
> >
> > * g++.dg/spellcheck-pr104806.C: New test.
> >
> > --- gcc/cp/search.cc.jj 2022-01-18 11:58:59.407984557 +0100
> > +++ gcc/cp/search.cc2022-03-07 10:44:33.455673155 +0100
> > @@ -1275,6 +1275,13 @@ lookup_field_fuzzy_info::fuzzy_lookup_fi
> >if (is_lambda_ignored_entity (field))
> > continue;
> >
> > +  /* Ignore special identifiers with space at the end like cdtor or
> > +conversion op identifiers.  */
> > +  if (TREE_CODE (DECL_NAME (field)) == IDENTIFIER_NODE)
> > +   if (unsigned int len = IDENTIFIER_LENGTH (DECL_NAME (field)))
> > + if (IDENTIFIER_POINTER (DECL_NAME (field))[len - 1] == ' ')
> > +   continue;
> > +
> >m_candidates.safe_push (DECL_NAME (field));
> >  }
> >  }
> > --- gcc/testsuite/g++.dg/spellcheck-pr104806.C.jj   2022-03-07 
> > 10:34:07.224499657 +0100
> > +++ gcc/testsuite/g++.dg/spellcheck-pr104806.C  2022-03-07 
> > 10:43:41.900399808 +0100
> > @@ -0,0 +1,5 @@
> > +// PR c++/104806
> > +
> > +struct S {};
> > +int main() { S s; s.__d; } // { dg-bogus "'struct S' has no member 
> > named '__d'; did you mean '__\[a-z]* '" }
> > +   // { dg-error "'struct S' has no member 
> > named '__d'" "" { target *-*-* } .-1 }

Jakub



Re: [PATCH] PR tree-optimization/98335: Improvements to DSE's compute_trims.

2022-03-08 Thread Richard Biener via Gcc-patches
On Mon, Mar 7, 2022 at 11:04 AM Roger Sayle  wrote:
>
>
> This patch is the main middle-end piece of a fix for PR tree-opt/98335,
> which is a code-quality regression affecting mainline.  The issue occurs
> in DSE's (dead store elimination's) compute_trims function that determines
> where a store to memory can be trimmed.  In the testcase given in the
> PR, this function notices that the first byte of a DImode store is dead,
> and replaces the 8-byte store at (aligned) offset zero, with a 7-byte store
> at (unaligned) offset one.  Most architectures can store a power-of-two
> bytes (up to a maximum) in single instruction, so writing 7 bytes requires
> more instructions than writing 8 bytes.  This patch follows Jakub Jelinek's
> suggestion in comment 5, that compute_trims needs improved heuristics.
>
> In this patch, decision of whether and how to align trim_head is based
> on the number of bytes being written, the alignment of the start of the
> object and where within the object the first byte is written.  The first
> tests check whether we're already writing to the start of the object,
> and that we're writing three or more bytes.  If we're only writing one
> or two bytes, there's no benefit from providing additional alignment.
> Then we determine the alignment of the object, which is either 1, 2,
> 4, 8 or 16 byte aligned (capping at 16 guarantees that we never write
> more than 7 bytes beyond the minimum required).  If the buffer is only
> 1 or 2 byte aligned there's no benefit from additional alignment.  For
> the remaining cases, alignment of trim_head is based upon where within
> each aligned block (word) the first byte is written.  For example,
> storing the last byte (or last half-word) of a word can be performed
> with a single insn.
>
> On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from:
>
> movl$0, -24(%rsp)
> movabsq $72057594037927935, %rdx
> movl$0, -21(%rsp)
> andq-24(%rsp), %rdx
> movq%rdx, %rax
> salq$8, %rax
> movbc(%rip), %al
> ret
>
> to
>
> xorl%eax, %eax
> movbc(%rip), %al
> ret
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures.  I've also added new testcases
> for the original motivating PR tree-optimization/86010, to ensure that
> those remain optimized (in future).  Ok for mainline?

diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 2b22a61..080e406 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -405,10 +405,36 @@ compute_trims (ao_ref *ref, sbitmap live, int
*trim_head, int *trim_tail,
   int first_live = bitmap_first_set_bit (live);
   *trim_head = first_live - first_orig;

-  /* If more than a word remains, then make sure to keep the
- starting point at least word aligned.  */
-  if (last_live - first_live > UNITS_PER_WORD)
-*trim_head &= ~(UNITS_PER_WORD - 1);
+  /* If REF is aligned, try to maintain this alignment if it reduces
+ the number of (power-of-two sized aligned) writes to memory.
+ First check that we're writing >= 3 bytes at a non-zero offset.  */
+  if (first_live
+  && last_live - first_live >= 2)
+{
+  unsigned int align = TYPE_ALIGN_UNIT (TREE_TYPE (ref->base));

you can't simply use TYPE_ALIGN_* on ref->base.  You can use
get_object_alignment on ref->ref, but ref->ref can be NULL in case the
ref was initialized from a builtin call like memcpy.

Also ref->base is offsetted by ref->offset which you don't seem to
account for.  In theory one could export get_object_alignment_2 and
if ref->ref is NULL, use that on ref->base, passing addr_p = true,
and then adjust the resulting bitpos by ref->offset and fix align accordingly
(trimming might also align an access if the original access was offsetted
from known alignment).

That said, a helper like ao_ref_alignment () might be useful here.

I wonder if we can apply good heuristics to compute_trims without taking
into account context, like maybe_trimp_complex_store is already
limiting itself to useful subsets and the constructor and memstar cases
will only benefit if they end up being expanded inline via *_by_pieces,
not if expanded as a call.

You don't seem to adjust *trim_tail at all, if an aligned 16 byte region
is trimmed there by 3 that will result in two extra stores as well, no?

+  if (DECL_P (ref->base) && DECL_ALIGN_UNIT (ref->base) > align)
+   align = DECL_ALIGN_UNIT (ref->base);
+  if (align > UNITS_PER_WORD)
+   align = UNITS_PER_WORD;
+  if (align > 16)
+   align = 16;
+  if (align > 2)
+   {
+ /* ALIGN is 4, 8 or 16.  */
+ unsigned int low = first_live & (align - 1);
+ if (low * 2 < align)
+   {
+ if (align == 16 && low >= 4 && last_live < 15)
+   *trim_head &= ~3;
+ else
+   *trim_head &= ~(align - 1);
+   }
+ else if (low 

Re: [PATCH v3] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-08 Thread Jakub Jelinek via Gcc-patches
On Mon, Mar 07, 2022 at 07:06:28AM -0800, H.J. Lu wrote:
> Since eh_return doesn't work with stack realignment, disable SSE on
> unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte
> incoming stack to avoid SSE usage which is caused by
> 
> commit 609e8c492d62d92465460eae3d43dfc4b2c68288
> Author: H.J. Lu 
> Date:   Sat Feb 26 14:17:23 2022 -0800
> 
> x86: Always return pseudo register in ix86_gen_scratch_sse_rtx
> 
> when pseudo vector registers are used to expand memset.

>   PR target/104781
>   * config.host (tmake_file): Add i386/32/t-eh-return-no-sse for
>   32-bit x86 Cygwin, MinGW and Solaris.
>   * config/i386/32/t-eh-return-no-sse: New file.

For this, isn't the right fix instead something like:

--- gcc/config/i386/i386.h.jj   2022-02-25 12:06:45.535493490 +0100
+++ gcc/config/i386/i386.h  2022-03-08 11:20:43.207043370 +0100
@@ -2848,6 +2848,10 @@ extern enum attr_cpu ix86_schedule;
 #define NUM_X86_64_MS_CLOBBERED_REGS 12
 #endif
 
+/* __builtin_eh_return can't handle stack realignment, so disable SSE in
+   libgcc functions that call it.  */
+#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
+
 /*
 Local variables:
 version-control: t


As mentioned in PR104838, this likely isn't specific to just Solaris and
cygwin/mingw.  Fedora uses -msse2 -mfpmath=sse -mstackrealign in its C{,XX}FLAGS
among other things for i686.

Jakub



Re: [PATCH] OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.

2022-03-08 Thread Marcel Vollweiler

Hi Jakub,


diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2ac5809..00a4858 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -224,6 +224,7 @@ OMP_5.1 {
 omp_set_teams_thread_limit_8_;
 omp_get_teams_thread_limit;
 omp_get_teams_thread_limit_;
+omp_get_mapped_ptr;
  } OMP_5.0.2;


I think it is too late for this to be targetted for GCC 12, and
for GCC 13 it will need to go into OMP_5.1.1 symver.


Agreed and changed accordingly.


+void *
+omp_get_mapped_ptr (const void *ptr, int device_num)
+{
+  if (device_num < 0 || device_num > omp_get_num_devices ())
+return NULL;
+
+  if (device_num == omp_get_initial_device ())
+return (void*)ptr;


Space before * and space after )


Changed.


+  struct gomp_device_descr *devicep = resolve_device (device_num);
+  if (devicep == NULL)
+return NULL;
+
+  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+  || devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+return (void*)ptr;


Likewise.


Changed.


+
+  gomp_mutex_lock (&devicep->lock);
+
+  struct splay_tree_s *mem_map = &devicep->mem_map;
+  struct splay_tree_key_s cur_node;
+  void *ret = NULL;
+  uintptr_t offset = 0;


offset should be moved to the only place that defines it.


Changed.


+
+  cur_node.host_start = (uintptr_t) ptr;
+  cur_node.host_end = cur_node.host_start;
+  splay_tree_key n = gomp_map_0len_lookup (mem_map, &cur_node);
+
+  if (n && n->host_start == cur_node.host_start)
+{
+  ret = (void*) n->tgt->tgt_start + n->tgt_offset;
+}


Single statement body, so without {}s and reindented, space before *.

+  else if (n)
+{
+  offset = cur_node.host_start - n->host_start;

   uintptr_t offset = cur_node.host_start - n->host_start;


+  ret = (void*) n->tgt->tgt_start + n->tgt_offset + offset;


Space before *.

Though, looking at this more, what is the point of the first if?
The second if would compute offset = 0...


Absolutely true :)
Changed.



Also, void * arithmetics is a GNU extension, maybe better use char *.


I changed it to (enclosing parentheses):

   ret = (void *) (n->tgt->tgt_start + n->tgt_offset + offset);

i.e. pointer arithmetic is done on uintptr_t, but I am not completely sure if
that's sufficient in terms of compatibility. On the other hand,

   ret = (void *) ((char *) n->tgt->tgt_start + (char *) n->tgt_offset + 
(char *)
offset);

is perhaps overcomplicated if not really necessary. What do you think?


+  if (omp_get_mapped_ptr (q, -1) != NULL)
+__builtin_abort ();


When you do include stdlib.h, what is the point of using __builtin_abort ?
Just use abort then.


Good point. Changed.

Thanks,

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.

libgomp/ChangeLog:

* libgomp.map: Added omp_get_mapped_ptr.
* libgomp.texi: Tagged omp_get_mapped_ptr as supported.
* omp.h.in: Added omp_get_mapped_ptr.
* omp_lib.f90.in: Added interface for omp_get_mapped_ptr.
* omp_lib.h.in: Likewise.
* target.c (omp_get_mapped_ptr): Added implementation of
omp_get_mapped_ptr.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c: New test.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-2.c: New test.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-3.c: New test.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-4.c: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-1.f90: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-2.f90: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-3.f90: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-4.f90: New test.

diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2ac5809..608a54c 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -226,6 +226,11 @@ OMP_5.1 {
omp_get_teams_thread_limit_;
 } OMP_5.0.2;
 
+OMP_5.1.1 {
+  global:
+   omp_get_mapped_ptr;
+} OMP_5.1;
+
 GOMP_1.0 {
   global:
GOMP_atomic_end;
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 161a423..c163b56 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -314,7 +314,7 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{omp_target_is_accessible} runtime routine @tab N @tab
 @item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
   runtime routines @tab N @tab
-@item @code{omp_get_mapped_ptr} runtime routine @tab N @tab
+@item @code{omp_get_mapped_ptr} runtime routine @tab Y @tab
 @item @code{omp_calloc}, @code{omp_realloc}, @code{omp_aligned_alloc} and
   @code{omp_aligned_calloc} runtime routines @tab Y @tab
 @item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialize

Re: [PATCH] PR tree-optimization/98335: Improvements to DSE's compute_trims.

2022-03-08 Thread Richard Biener via Gcc-patches
On Tue, Mar 8, 2022 at 11:10 AM Richard Biener
 wrote:
>
> On Mon, Mar 7, 2022 at 11:04 AM Roger Sayle  
> wrote:
> >
> >
> > This patch is the main middle-end piece of a fix for PR tree-opt/98335,
> > which is a code-quality regression affecting mainline.  The issue occurs
> > in DSE's (dead store elimination's) compute_trims function that determines
> > where a store to memory can be trimmed.  In the testcase given in the
> > PR, this function notices that the first byte of a DImode store is dead,
> > and replaces the 8-byte store at (aligned) offset zero, with a 7-byte store
> > at (unaligned) offset one.  Most architectures can store a power-of-two
> > bytes (up to a maximum) in single instruction, so writing 7 bytes requires
> > more instructions than writing 8 bytes.  This patch follows Jakub Jelinek's
> > suggestion in comment 5, that compute_trims needs improved heuristics.
> >
> > In this patch, decision of whether and how to align trim_head is based
> > on the number of bytes being written, the alignment of the start of the
> > object and where within the object the first byte is written.  The first
> > tests check whether we're already writing to the start of the object,
> > and that we're writing three or more bytes.  If we're only writing one
> > or two bytes, there's no benefit from providing additional alignment.
> > Then we determine the alignment of the object, which is either 1, 2,
> > 4, 8 or 16 byte aligned (capping at 16 guarantees that we never write
> > more than 7 bytes beyond the minimum required).  If the buffer is only
> > 1 or 2 byte aligned there's no benefit from additional alignment.  For
> > the remaining cases, alignment of trim_head is based upon where within
> > each aligned block (word) the first byte is written.  For example,
> > storing the last byte (or last half-word) of a word can be performed
> > with a single insn.
> >
> > On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from:
> >
> > movl$0, -24(%rsp)
> > movabsq $72057594037927935, %rdx
> > movl$0, -21(%rsp)
> > andq-24(%rsp), %rdx
> > movq%rdx, %rax
> > salq$8, %rax
> > movbc(%rip), %al
> > ret
> >
> > to
> >
> > xorl%eax, %eax
> > movbc(%rip), %al
> > ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check with no new failures.  I've also added new testcases
> > for the original motivating PR tree-optimization/86010, to ensure that
> > those remain optimized (in future).  Ok for mainline?
>
> diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> index 2b22a61..080e406 100644
> --- a/gcc/tree-ssa-dse.cc
> +++ b/gcc/tree-ssa-dse.cc
> @@ -405,10 +405,36 @@ compute_trims (ao_ref *ref, sbitmap live, int
> *trim_head, int *trim_tail,
>int first_live = bitmap_first_set_bit (live);
>*trim_head = first_live - first_orig;
>
> -  /* If more than a word remains, then make sure to keep the
> - starting point at least word aligned.  */
> -  if (last_live - first_live > UNITS_PER_WORD)
> -*trim_head &= ~(UNITS_PER_WORD - 1);
> +  /* If REF is aligned, try to maintain this alignment if it reduces
> + the number of (power-of-two sized aligned) writes to memory.
> + First check that we're writing >= 3 bytes at a non-zero offset.  */
> +  if (first_live
> +  && last_live - first_live >= 2)
> +{
> +  unsigned int align = TYPE_ALIGN_UNIT (TREE_TYPE (ref->base));
>
> you can't simply use TYPE_ALIGN_* on ref->base.  You can use
> get_object_alignment on ref->ref, but ref->ref can be NULL in case the
> ref was initialized from a builtin call like memcpy.
>
> Also ref->base is offsetted by ref->offset which you don't seem to
> account for.  In theory one could export get_object_alignment_2 and
> if ref->ref is NULL, use that on ref->base, passing addr_p = true,
> and then adjust the resulting bitpos by ref->offset and fix align accordingly
> (trimming might also align an access if the original access was offsetted
> from known alignment).
>
> That said, a helper like ao_ref_alignment () might be useful here.

Like the attached - free feel to use that.

Richard.

>
> I wonder if we can apply good heuristics to compute_trims without taking
> into account context, like maybe_trimp_complex_store is already
> limiting itself to useful subsets and the constructor and memstar cases
> will only benefit if they end up being expanded inline via *_by_pieces,
> not if expanded as a call.
>
> You don't seem to adjust *trim_tail at all, if an aligned 16 byte region
> is trimmed there by 3 that will result in two extra stores as well, no?
>
> +  if (DECL_P (ref->base) && DECL_ALIGN_UNIT (ref->base) > align)
> +   align = DECL_ALIGN_UNIT (ref->base);
> +  if (align > UNITS_PER_WORD)
> +   align = UNITS_PER_WORD;
> +  if (align > 16)
> +   align = 16;
> +  if (align > 2)
> +   {
> + /* ALIGN i

[PATCH] params: Remove repeated word "that" in parameter description

2022-03-08 Thread Martin Jambor
Hi,

One of the mistakes reported in PR 104552 is repeated "that" in
description of ipa-cp-recursive-freq-factor which I introduced.  This
patch removes one of them.

Added to a bootstrap and testing on x86_64-linux, which passed,
committed as obvious.

Thanks,

Martin


gcc/ChangeLog:

2022-03-07  Martin Jambor  

PR translation/104552
* params.opt (ipa-cp-recursive-freq-factor): Remove repeated word
"that" in the description.
---
 gcc/params.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/params.opt b/gcc/params.opt
index b07663daa05..f76f7839916 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -251,7 +251,7 @@ Recursive cloning only when the probability of call being 
executed exceeds the p
 
 -param=ipa-cp-recursive-freq-factor=
 Common Joined UInteger Var(param_ipa_cp_recursive_freq_factor) Init(6) Param 
Optimization
-When propagating IPA-CP effect estimates, multiply frequencies of recursive 
edges that that bring back an unchanged value by this factor.
+When propagating IPA-CP effect estimates, multiply frequencies of recursive 
edges that bring back an unchanged value by this factor.
 
 -param=ipa-cp-recursion-penalty=
 Common Joined UInteger Var(param_ipa_cp_recursion_penalty) Init(40) 
IntegerRange(0, 100) Param Optimization
-- 
2.35.1



Re: [PATCH v3] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 08, 2022 at 11:23:51AM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Mar 07, 2022 at 07:06:28AM -0800, H.J. Lu wrote:
> > Since eh_return doesn't work with stack realignment, disable SSE on
> > unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte
> > incoming stack to avoid SSE usage which is caused by
> > 
> > commit 609e8c492d62d92465460eae3d43dfc4b2c68288
> > Author: H.J. Lu 
> > Date:   Sat Feb 26 14:17:23 2022 -0800
> > 
> > x86: Always return pseudo register in ix86_gen_scratch_sse_rtx
> > 
> > when pseudo vector registers are used to expand memset.
> 
> > PR target/104781
> > * config.host (tmake_file): Add i386/32/t-eh-return-no-sse for
> > 32-bit x86 Cygwin, MinGW and Solaris.
> > * config/i386/32/t-eh-return-no-sse: New file.
> 
> For this, isn't the right fix instead something like:
> 
> --- gcc/config/i386/i386.h.jj 2022-02-25 12:06:45.535493490 +0100
> +++ gcc/config/i386/i386.h2022-03-08 11:20:43.207043370 +0100
> @@ -2848,6 +2848,10 @@ extern enum attr_cpu ix86_schedule;
>  #define NUM_X86_64_MS_CLOBBERED_REGS 12
>  #endif
>  
> +/* __builtin_eh_return can't handle stack realignment, so disable SSE in
> +   libgcc functions that call it.  */
> +#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> +
>  /*
>  Local variables:
>  version-control: t
> 
> 
> As mentioned in PR104838, this likely isn't specific to just Solaris and
> cygwin/mingw.  Fedora uses -msse2 -mfpmath=sse -mstackrealign in its 
> C{,XX}FLAGS
> among other things for i686.

Now verified that with your full patch applied gcc on Fedora/i686 doesn't
build (gets those sorry messages when compiling unwind-dw2), while if I
replace those 2 libgcc hunks from your patch with the above one it seems to
get past the previous hanging point of gnat1 processes.

Jakub



Re: [PATCH] Check if loading const from mem is faster

2022-03-08 Thread Jiufu Guo via Gcc-patches
Jiufu Guo  writes:

Hi!

> Hi Sehger,
>
> Segher Boessenkool  writes:
>
>> On Tue, Mar 01, 2022 at 10:28:57PM +0800, Jiufu Guo wrote:
>>> Segher Boessenkool  writes:
>>> > No.  insn_cost is only for correct, existing instructions, not for
>>> > made-up nonsense.  I created insn_cost precisely to get away from that
>>> > aspect of rtx_cost (and some other issues, like, it is incredibly hard
>>> > and cumbersome to write a correct rtx_cost).
>>> 
>>> Thanks! The implementations of hook insn_cost are align with this
>>> design, they are  checking insn's attributes and COSTS_N_INSNS.
>>> 
>>> One question on the speciall case: 
>>> For instruction: "r119:DI=0x100803004101001"
>>> Would we treat it as valid instruction?
>>
>> Currently we do, alternative 6 in *movdi_internal64: we allow any r<-n.
>> This is costed as 5 insns (cost=20).
>>
>> It generally is better to split things into patterns close to the
>> eventual machine isntructions as early as possible: all the more generic
>> optimisations can take advantage of that then.
> Get it!
>>
>>> A patch, which is attached the end of this mail, accepts
>>> "r119:DI=0x100803004101001" as input of insn_cost.
>>> In this patch, 
>>> - A tmp instruction is generated via make_insn_raw.
>>> - A few calls to rtx_cost (in cse_insn) is replaced by insn_cost.
>>> - In hook of insn_cost, checking the special 'constant' instruction.
>>> Are these make sense?
>>
>> I'll review that patch inline.

I drafted a new patch that replace rtx_cost with insn_cost for cse.cc.
Different from the previous partial patch, this patch replaces all usage
of rtx_cost. It may be better/aggressive than previous one.

With this patch, bootstrap pass.
>From regtest, only output of fusion-p10-ldcmpi.c is changed, and the
change seems as expected.


BR,
Jiufu

diff --git a/gcc/cse.cc b/gcc/cse.cc
index a18b599d324..e623ad298db 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -262,6 +262,9 @@ static struct qty_table_elem *qty_table;
 static rtx_insn *this_insn;
 static bool optimize_this_for_speed_p;
 
+/* Used for insn_cost. */
+static rtx_insn *estimate_insn;
+
 /* Index by register number, gives the number of the next (or
previous) register in the chain of registers sharing the same
value.
@@ -445,7 +448,7 @@ struct table_elt
 /* Compute cost of X, as stored in the `cost' field of a table_elt.  Fixed
hard registers and pointers into the frame are the cheapest with a cost
of 0.  Next come pseudos with a cost of one and other hard registers with
-   a cost of 2.  Aside from these special cases, call `rtx_cost'.  */
+   a cost of 2.  Aside from these special cases, call `insn_cost'.  */
 
 #define CHEAP_REGNO(N) \
   (REGNO_PTR_FRAME_P (N)   \
@@ -698,18 +701,33 @@ preferable (int cost_a, int regcost_a, int cost_b, int 
regcost_b)
from COST macro to keep it simple.  */
 
 static int
-notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno)
+notreg_cost (rtx x, machine_mode mode, enum rtx_code /*outer*/, int /*opno*/)
 {
   scalar_int_mode int_mode, inner_mode;
-  return ((GET_CODE (x) == SUBREG
-  && REG_P (SUBREG_REG (x))
-  && is_int_mode (mode, &int_mode)
-  && is_int_mode (GET_MODE (SUBREG_REG (x)), &inner_mode)
-  && GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode)
-  && subreg_lowpart_p (x)
-  && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode))
- ? 0
- : rtx_cost (x, mode, outer, opno, optimize_this_for_speed_p) * 2);
+  if (GET_CODE (x) == SUBREG && REG_P (SUBREG_REG (x))
+  && is_int_mode (mode, &int_mode)
+  && is_int_mode (GET_MODE (SUBREG_REG (x)), &inner_mode)
+  && GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode)
+  && subreg_lowpart_p (x)
+  && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode))
+return 0;
+
+  if (estimate_insn == NULL)
+{
+  estimate_insn = make_insn_raw (
+   gen_rtx_SET (gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1), x));
+  SET_PREV_INSN (estimate_insn) = NULL_RTX;
+  SET_NEXT_INSN (estimate_insn) = NULL_RTX;
+  INSN_LOCATION (estimate_insn) = 0;
+}
+  else
+{
+  /* Update for new context.  */
+  INSN_CODE (estimate_insn) = -1;
+  PUT_MODE (SET_DEST (PATTERN (estimate_insn)), mode);
+  SET_SRC (PATTERN (estimate_insn)) = x;
+}
+  return insn_cost (estimate_insn, optimize_this_for_speed_p);
 }
 
 
@@ -6667,6 +6685,7 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs)
 
   init_recog ();
   init_alias_analysis ();
+  estimate_insn = NULL;
 
   reg_eqv_table = XNEWVEC (struct reg_eqv_elem, nregs);
 


[PATCH][committed][testsuite] vect: disable bitmask tests on sparc

2022-03-08 Thread Tamar Christina via Gcc-patches
Hi All,

These testcases declare requiring vect_int which sparc declares as well however
sparc doesn't have an optab to vectorize comparisons so these testcases fail to
vectorize and so the tests fail.

As such best coure of action is to just skip them on sparc as comparisons are
somewhat expected from a target that can do SIMD.

Regtested on aarch64-none-linux-gnu and no issues.

Committed under the obvious rule.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR tree-optimization/104755
* gcc.dg/vect/vect-bic-bitmask-10.c: Disable sparc.
* gcc.dg/vect/vect-bic-bitmask-11.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-12.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-2.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-23.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-3.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-4.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-5.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-6.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-8.c: Likewise.
* gcc.dg/vect/vect-bic-bitmask-9.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c 
b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
index 
fe4f677b64dc96862683faf503eb4900a01e7407..e9ec9603af62b67afcf82bc79f66005a5d146be1
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-10.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
 /* { dg-do run } */
 /* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c 
b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
index 
b77f4d42450fe6496d277a4429f0e051f5178781..06c103d38858dd15c21afc13cfc459312c0b4f5b
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-11.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
 /* { dg-do run } */
 /* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c 
b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c
index 
30d36f452014cdb90eeccf6eb7f0a4cd6d8f8234..36ec5a8b19bb88ea69dcae0807b7c230510d3c25
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-12.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
 /* { dg-do assemble } */
 /* { dg-additional-options "-O3 -fdump-tree-dce -w" } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
index 
58c0b9254badc2aeae01bd181a60830ed3eba44a..059bfb3ae62379c74b2075e3467c36fbb7d4077a
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-2.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
 /* { dg-do run } */
 /* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c 
b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c
index 
67119d32f751fa107b5d4927809e122c1bcbf3ef..5b4c3b6e19bcf17c03e93cec315f5940601e060e
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-23.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
 /* { dg-do assemble } */
 /* { dg-additional-options "-O1 -fdump-tree-dce -w" } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
index 
58c0b9254badc2aeae01bd181a60830ed3eba44a..059bfb3ae62379c74b2075e3467c36fbb7d4077a
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-3.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
 /* { dg-do run } */
 /* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
index 
6e2da41bac127d82a6a83f3e99c6f68b77ac2b42..91b82fb598871090a8dc1d78b12bc051902f621e
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-4.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
 /* { dg-do run } */
 /* { dg-additional-options "-O3 -save-temps -fdump-tree-dce -w" } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
index 
5ef0f46c0b1709db633d3aa801cd7211baef31ef..59f339fb8c58590f74e03c002a12d6b6f7bd7f7b
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bic-bitmask-5.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "missing optab for vectorization" { sparc*-*-* } } */
 /* { dg-

[PATCH 0/5] openmp: Handle pinned and unified shared memory.

2022-03-08 Thread Hafiz Abid Qadeer
This patch series add support for unified shared memory (USM) and pinned
memory. The support in libgomp is for nvptx offloading only.  A new
command line option -foffload-memory allows user to choose either USM
or pinned memory. The USM can also be enabled using requires construct.

When USM us in use, calls to memory allocation function like malloc are
changed to omp_alloc with appropriate allocator.  No transformations are
required for the pinned memory which is implemented using mlockall so is
only available on Linux.

Andrew Stubbs (4):
  openmp: Add -foffload-memory
  openmp: allow requires unified_shared_memory
  openmp, nvptx: ompx_unified_shared_mem_alloc
  openmp: -foffload-memory=pinned

Hafiz Abid Qadeer (1):
  openmp: Use libgomp memory allocation functions with unified shared
memory.

 gcc/c/c-parser.cc |  13 +-
 gcc/common.opt|  16 ++
 gcc/coretypes.h   |   7 +
 gcc/cp/parser.cc  |  13 +-
 gcc/doc/invoke.texi   |  16 +-
 gcc/fortran/openmp.cc |  10 +-
 gcc/omp-low.cc| 220 ++
 gcc/passes.def|   1 +
 .../c-c++-common/gomp/alloc-pinned-1.c|  28 +++
 gcc/testsuite/c-c++-common/gomp/usm-1.c   |   4 +
 gcc/testsuite/c-c++-common/gomp/usm-2.c   |  34 +++
 gcc/testsuite/c-c++-common/gomp/usm-3.c   |  32 +++
 gcc/testsuite/g++.dg/gomp/usm-1.C |  32 +++
 gcc/testsuite/g++.dg/gomp/usm-2.C |  30 +++
 gcc/testsuite/g++.dg/gomp/usm-3.C |  38 +++
 gcc/testsuite/gfortran.dg/gomp/usm-1.f90  |   6 +
 gcc/testsuite/gfortran.dg/gomp/usm-2.f90  |  16 ++
 gcc/testsuite/gfortran.dg/gomp/usm-3.f90  |  13 ++
 gcc/tree-pass.h   |   1 +
 libgomp/allocator.c   |  13 +-
 libgomp/config/linux/allocator.c  |  70 --
 libgomp/config/nvptx/allocator.c  |   6 +
 libgomp/libgomp-plugin.h  |   3 +
 libgomp/libgomp.h |   6 +
 libgomp/libgomp.map   |   5 +
 libgomp/omp.h.in  |   4 +
 libgomp/omp_lib.f90.in|   8 +
 libgomp/plugin/plugin-nvptx.c |  45 +++-
 libgomp/target.c  |  70 ++
 libgomp/testsuite/libgomp.c++/usm-1.C |  54 +
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  |  66 ++
 libgomp/testsuite/libgomp.c/usm-1.c   |  24 ++
 libgomp/testsuite/libgomp.c/usm-2.c   |  32 +++
 libgomp/testsuite/libgomp.c/usm-3.c   |  35 +++
 libgomp/testsuite/libgomp.c/usm-4.c   |  36 +++
 libgomp/testsuite/libgomp.c/usm-5.c   |  28 +++
 libgomp/testsuite/libgomp.c/usm-6.c   |  70 ++
 37 files changed, 1075 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90
 create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c

-- 
2.25.1



[PATCH 1/5] openmp: Add -foffload-memory

2022-03-08 Thread Hafiz Abid Qadeer
From: Andrew Stubbs 

Add a new option.  It will be used in follow-up patches.

gcc/ChangeLog:

* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
 gcc/common.opt  | 16 
 gcc/coretypes.h |  7 +++
 gcc/doc/invoke.texi | 16 +++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 8b6513de47c..17426523e23 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2182,6 +2182,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-memory=
+Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) 
Init(OFFLOAD_MEMORY_NONE)
+-foffload-memory=[none|unified|pinned] Use an offload memory optimization.
+
+Enum
+Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload 
memory option %qs)
+
+EnumValue
+Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE)
+
+EnumValue
+Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED)
+
+EnumValue
+Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED)
+
 fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 08b9ac9094c..dd52d5bb113 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -206,6 +206,13 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of memory optimization for an offload device.  */
+enum offload_memory {
+  OFFLOAD_MEMORY_NONE,
+  OFFLOAD_MEMORY_UNIFIED,
+  OFFLOAD_MEMORY_PINNED
+};
+
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 248ed534aee..d16019fc8c3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch @gol
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted @gol
 -flax-vector-conversions  -fms-extensions @gol
--foffload=@var{arg}  -foffload-options=@var{arg} @gol
+-foffload=@var{arg}  -foffload-options=@var{arg} -foffload-memory=@var{arg} 
@gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
 -fopenmp  -fopenmp-simd @gol
 -fpermitted-flt-eval-methods=@var{standard} @gol
@@ -2694,6 +2694,20 @@ Typical command lines are
 -foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm
 @end smallexample
 
+@item -foffload-memory=none
+@itemx -foffload-memory=unified
+@itemx -foffload-memory=pinned
+@opindex foffload-memory
+@cindex OpenMP offloading memory modes
+Enable a memory optimization mode to use with OpenMP.  The default behavior,
+@option{-foffload-memory=none}, is to do nothing special (unless enabled via
+a requires directive in the code).  @option{-foffload-memory=unified} is
+equivalent to @code{#pragma omp requires unified_shared_memory}.
+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).
+All translation units must select the same setting to avoid undefined
+behavior.
+
 @item -fopenacc
 @opindex fopenacc
 @cindex OpenACC accelerator programming
-- 
2.25.1



[PATCH 2/5] openmp: allow requires unified_shared_memory

2022-03-08 Thread Hafiz Abid Qadeer
From: Andrew Stubbs 

This is the front-end portion of the Unified Shared Memory implementation.
It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets
flag_offload_memory, but is otherwise inactive, for now.

It also checks that -foffload-memory isn't set to an incompatible mode.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_requires): Allow "requires
  unified_share_memory".

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_requires): Allow "requires
unified_share_memory".

gcc/fortran/ChangeLog:

* openmp.cc (gfc_match_omp_requires): Allow "requires
unified_share_memory".

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/usm-1.c: New test.
* gfortran.dg/gomp/usm-1.f90: New test.
---
 gcc/c/c-parser.cc| 13 -
 gcc/cp/parser.cc | 13 -
 gcc/fortran/openmp.cc| 10 +-
 gcc/testsuite/c-c++-common/gomp/usm-1.c  |  4 
 gcc/testsuite/gfortran.dg/gomp/usm-1.f90 |  6 ++
 5 files changed, 43 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 84deac04c44..dc834158d1c 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -22542,7 +22542,16 @@ c_parser_omp_requires (c_parser *parser)
  if (!strcmp (p, "unified_address"))
this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
  else if (!strcmp (p, "unified_shared_memory"))
+ {
this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+   if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+   && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+ error_at (cloc,
+   "unified_shared_memory is incompatible with the "
+   "selected -foffload-memory option");
+   flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+ }
  else if (!strcmp (p, "dynamic_allocators"))
this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
  else if (!strcmp (p, "reverse_offload"))
@@ -22609,7 +22618,9 @@ c_parser_omp_requires (c_parser *parser)
  c_parser_skip_to_pragma_eol (parser, false);
  return;
}
- if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS)
+ if (p
+ && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS
+ && this_req != OMP_REQUIRES_UNIFIED_SHARED_MEMORY)
sorry_at (cloc, "%qs clause on % directive not "
"supported yet", p);
  if (p)
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 03d99aba13e..ba263152aaf 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -46464,7 +46464,16 @@ cp_parser_omp_requires (cp_parser *parser, cp_token 
*pragma_tok)
  if (!strcmp (p, "unified_address"))
this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
  else if (!strcmp (p, "unified_shared_memory"))
+ {
this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+   if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+   && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+ error_at (cloc,
+   "unified_shared_memory is incompatible with the "
+   "selected -foffload-memory option");
+   flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+ }
  else if (!strcmp (p, "dynamic_allocators"))
this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
  else if (!strcmp (p, "reverse_offload"))
@@ -46537,7 +46546,9 @@ cp_parser_omp_requires (cp_parser *parser, cp_token 
*pragma_tok)
  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
  return false;
}
- if (p && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS)
+ if (p
+ && this_req != OMP_REQUIRES_DYNAMIC_ALLOCATORS
+ && this_req != OMP_REQUIRES_UNIFIED_SHARED_MEMORY)
sorry_at (cloc, "%qs clause on % directive not "
"supported yet", p);
  if (p)
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 16cd03a3d67..1f434857719 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "gomp-constants.h"
 #include "target-memory.h"  /* For gfc_encode_character.  */
+#include "options.h"
 
 /* Match an end of OpenMP directive.  End of OpenMP directive is optional
whitespace, followed by '\n' or comment '!'.  */
@@ -5373,6 +5374,12 @@ gfc_match_omp_requires (void)
  requires_clause = OMP_REQ_UNIFIED_SHARED_MEMORY;
  if (requires_clauses & OMP_REQ_UNIFIED_SHARED_MEMORY)
goto duplicate_clause;
+
+ if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+ && flag_offload_memory != OFFLOAD_M

[PATCH 3/5] openmp, nvptx: ompx_unified_shared_mem_alloc

2022-03-08 Thread Hafiz Abid Qadeer
From: Andrew Stubbs 

This adds support for using Cuda Managed Memory with omp_alloc.  It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.

There are two new predefined allocators, ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to
allocate memory in the "managed" space and explicitly on the host (it is
intended that "malloc" will be intercepted by the compiler).

The nvptx plugin is modified to make the necessary Cuda calls, and libgomp
is modified to switch to shared-memory mode for USM allocated mappings.

libgomp/ChangeLog:

* allocator.c (omp_max_predefined_alloc): Update.
(omp_aligned_alloc): Don't fallback ompx_host_mem_alloc.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/linux/allocator.c (linux_memspace_alloc): Handle USM.
(linux_memspace_calloc): Handle USM.
(linux_memspace_free): Handle USM.
(linux_memspace_realloc): Handle USM.
* config/nvptx/allocator.c (nvptx_memspace_alloc): Reject
ompx_host_mem_alloc.
(nvptx_memspace_calloc): Likewise.
(nvptx_memspace_realloc): Likewise.
* libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype.
(GOMP_OFFLOAD_usm_free): New prototype.
(GOMP_OFFLOAD_is_usm_ptr): New prototype.
* libgomp.h (gomp_usm_alloc): New prototype.
(gomp_usm_free): New prototype.
(gomp_is_usm_ptr): New prototype.
(struct gomp_device_descr): Add USM functions.
* omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space
and ompx_host_mem_space.
(omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc.
* omp_lib.f90.in: Likewise.
* plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter.
Call cuMemAllocManaged as appropriate.
(GOMP_OFFLOAD_alloc): Move internals to ...
(GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter.
(GOMP_OFFLOAD_usm_alloc): New function.
(GOMP_OFFLOAD_usm_free): New function.
(GOMP_OFFLOAD_is_usm_ptr): New function.
* target.c (gomp_map_vars_internal): Add USM support.
(gomp_usm_alloc): New function.
(gomp_usm_free): New function.
(gomp_load_plugin_for_device): New function.
* testsuite/libgomp.c/usm-1.c: New test.
* testsuite/libgomp.c/usm-2.c: New test.
* testsuite/libgomp.c/usm-3.c: New test.
* testsuite/libgomp.c/usm-4.c: New test.
* testsuite/libgomp.c/usm-5.c: New test.
---
 libgomp/allocator.c | 13 --
 libgomp/config/linux/allocator.c| 48 
 libgomp/config/nvptx/allocator.c|  6 +++
 libgomp/libgomp-plugin.h|  3 ++
 libgomp/libgomp.h   |  6 +++
 libgomp/omp.h.in|  4 ++
 libgomp/omp_lib.f90.in  |  8 
 libgomp/plugin/plugin-nvptx.c   | 45 ---
 libgomp/target.c| 70 +
 libgomp/testsuite/libgomp.c/usm-1.c | 24 ++
 libgomp/testsuite/libgomp.c/usm-2.c | 32 +
 libgomp/testsuite/libgomp.c/usm-3.c | 35 +++
 libgomp/testsuite/libgomp.c/usm-4.c | 36 +++
 libgomp/testsuite/libgomp.c/usm-5.c | 28 
 14 files changed, 330 insertions(+), 28 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 000ccc2dd9c..18045dbe0c4 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -32,7 +32,7 @@
 #include 
 #include 
 
-#define omp_max_predefined_alloc ompx_pinned_mem_alloc
+#define omp_max_predefined_alloc ompx_host_mem_alloc
 
 /* These macros may be overridden in config//allocator.c.  */
 #ifndef MEMSPACE_ALLOC
@@ -68,6 +68,8 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] 
= {
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
   omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
+  ompx_unified_shared_mem_space,  /* ompx_unified_shared_mem_alloc. */
+  ompx_host_mem_space, /* ompx_host_mem_alloc.  */
 };
 
 struct omp_allocator_data
@@ -367,7 +369,8 @@ fail:
   int fallback = (allocator_data
  ? allocator_data->fallback
  : (allocator == omp_default_mem_alloc
-|| allocator == ompx_pinned_mem_alloc)
+|| allocator == ompx_pinned_mem_alloc
+|| allocator == ompx_host_mem_alloc)
  ? omp_atv_null_fb
  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -597,7 +600,8 @@ fail:
   

[PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-03-08 Thread Hafiz Abid Qadeer
This patches changes calls to malloc/free/calloc/realloc and operator new to
memory allocation functions in libgomp with
allocator=ompx_unified_shared_mem_alloc.  This helps existing code to benefit
from the unified shared memory.  The libgomp does the correct thing with all
the mapping constructs and there is no memory copies if the pointer is pointing
to unified shared memory.

We only replace replacable new operator and not the class member or placement 
new.

gcc/ChangeLog:

* omp-low.cc (usm_transform): New function.
(make_pass_usm_transform): Likewise.
(class pass_usm_transform): New.
* passes.def: Add pass_usm_transform.
* tree-pass.h (make_pass_usm_transform): New declaration.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/usm-2.c: New test.
* c-c++-common/gomp/usm-3.c: New test.
* g++.dg/gomp/usm-1.C: New test.
* g++.dg/gomp/usm-2.C: New test.
* g++.dg/gomp/usm-3.C: New test.
* gfortran.dg/gomp/usm-2.f90: New test.
* gfortran.dg/gomp/usm-3.f90: New test.

libgomp/ChangeLog:

* testsuite/libgomp.c/usm-6.c: New test.
* testsuite/libgomp.c++/usm-1.C: Likewise.
---
 gcc/omp-low.cc   | 152 +++
 gcc/passes.def   |   1 +
 gcc/testsuite/c-c++-common/gomp/usm-2.c  |  34 +
 gcc/testsuite/c-c++-common/gomp/usm-3.c  |  32 +
 gcc/testsuite/g++.dg/gomp/usm-1.C|  32 +
 gcc/testsuite/g++.dg/gomp/usm-2.C|  30 +
 gcc/testsuite/g++.dg/gomp/usm-3.C|  38 ++
 gcc/testsuite/gfortran.dg/gomp/usm-2.f90 |  16 +++
 gcc/testsuite/gfortran.dg/gomp/usm-3.f90 |  13 ++
 gcc/tree-pass.h  |   1 +
 libgomp/testsuite/libgomp.c++/usm-1.C|  54 
 libgomp/testsuite/libgomp.c/usm-6.c  |  70 +++
 12 files changed, 473 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90
 create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C
 create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 5ce3a50709a..ec08d59f676 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14849,6 +14849,158 @@ make_pass_diagnose_omp_blocks (gcc::context *ctxt)
 {
   return new pass_diagnose_omp_blocks (ctxt);
 }
+
+/* Provide transformation required for using unified shared memory
+   by replacing calls to standard memory allocation functions with
+   function provided by the libgomp.  */
+
+static tree
+usm_transform (gimple_stmt_iterator *gsi_p, bool *,
+  struct walk_stmt_info *wi)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  /* ompx_unified_shared_mem_alloc is 10.  */
+  const unsigned int unified_shared_mem_alloc = 10;
+
+  switch (gimple_code (stmt))
+{
+case GIMPLE_CALL:
+  {
+   gcall *gs = as_a  (stmt);
+   tree fndecl = gimple_call_fndecl (gs);
+   if (fndecl)
+ {
+   tree allocator = build_int_cst (pointer_sized_int_node,
+   unified_shared_mem_alloc);
+   const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+   if ((strcmp (name, "malloc") == 0)
+|| (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+&& DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MALLOC)
+|| DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl))
+ {
+ tree omp_alloc_type
+   = build_function_type_list (ptr_type_node, size_type_node,
+   pointer_sized_int_node,
+   NULL_TREE);
+   tree repl = build_fn_decl ("omp_alloc", omp_alloc_type);
+   tree size = gimple_call_arg (gs, 0);
+   gimple *g = gimple_build_call (repl, 2, size, allocator);
+   gimple_call_set_lhs (g, gimple_call_lhs (gs));
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi_p, g, true);
+ }
+   else if ((strcmp (name, "calloc") == 0)
+ || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+ && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_CALLOC))
+ {
+   tree omp_calloc_type
+ = build_function_type_list (ptr_type_node, size_type_node,
+ size_type_node,
+ pointer_sized_int_node,
+ NULL_TREE);
+   tree repl = build_fn_decl ("omp_calloc", omp_calloc_type);
+

[PATCH 5/5] openmp: -foffload-memory=pinned

2022-03-08 Thread Hafiz Abid Qadeer
From: Andrew Stubbs 

Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up.  The option is
intended to provide a performance boost to certain offload programs without
modifying the code.

This feature only works on Linux, at present, and simply calls mlockall to
enable always-on memory pinning.  It requires that the ulimit feature is
set high enough to accommodate all the program's memory usage.

In this mode the ompx_pinned_memory_alloc feature is disabled as it is not
needed and may conflict.

gcc/ChangeLog:

* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.

libgomp/ChangeLog:

* config/linux/allocator.c (always_pinned_mode): New variable.
(GOMP_enable_pinned_mode): New function.
(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
(linux_memspace_calloc): Likewise.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.map (GOMP_5.1.1): New version space with
GOMP_enable_pinned_mode.
* testsuite/libgomp.c/alloc-pinned-7.c: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/alloc-pinned-1.c: New test.
---
 gcc/omp-low.cc| 68 +++
 .../c-c++-common/gomp/alloc-pinned-1.c| 28 
 libgomp/config/linux/allocator.c  | 26 +++
 libgomp/libgomp.map   |  5 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  | 66 ++
 5 files changed, 193 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index ec08d59f676..ce21b3bd6f8 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14441,6 +14441,70 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
+/* Emit a constructor function to enable -foffload-memory=pinned
+   at runtime.  Libgomp handles the OS mode setting, but we need to trigger
+   it by calling GOMP_enable_pinned mode before the program proper runs.  */
+
+static void
+omp_enable_pinned_mode ()
+{
+  static bool visited = false;
+  if (visited)
+return;
+  visited = true;
+
+  /* Create a new function like this:
+
+   static void __attribute__((constructor))
+   __set_pinned_mode ()
+   {
+GOMP_enable_pinned_mode ();
+   }
+  */
+
+  tree name = get_identifier ("__set_pinned_mode");
+  tree voidfntype = build_function_type_list (void_type_node, NULL_TREE);
+  tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype);
+
+  TREE_STATIC (decl) = 1;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  TREE_PUBLIC (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
+  DECL_STATIC_CONSTRUCTOR (decl) = 1;
+  DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"),
+ NULL_TREE, NULL_TREE);
+
+  tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
+  void_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_CONTEXT (t) = decl;
+  DECL_RESULT (decl) = t;
+
+  push_struct_function (decl);
+  init_tree_ssa (cfun);
+
+  tree callname = get_identifier ("GOMP_enable_pinned_mode");
+  tree calldecl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, callname,
+ voidfntype);
+  gcall *call = gimple_build_call (calldecl, 0);
+
+  gimple_seq seq = NULL;
+  gimple_seq_add_stmt (&seq, call);
+  gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL));
+
+  cfun->function_end_locus = UNKNOWN_LOCATION;
+  cfun->curr_properties |= PROP_gimple_any;
+  pop_cfun ();
+  cgraph_node::add_new_function (decl, true);
+}
+
 /* Main entry point.  */
 
 static unsigned int
@@ -14497,6 +14561,10 @@ execute_lower_omp (void)
   for (auto task_stmt : task_cpyfns)
 finalize_task_copyfn (task_stmt);
   task_cpyfns.release ();
+
+  if (flag_offload_memory == OFFLOAD_MEMORY_PINNED)
+omp_enable_pinned_mode ();
+
   return 0;
 }
 
diff --git a/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c 
b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
new file mode 100644
index 000..e0e08019bff
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-additional-options "-foffload-memory=pinned" } */
+/* { dg-xfail-run-if "Pinning not implemented on this host" { ! *-*-linux-gnu 
} } */
+
+#if __cplusplus
+#define EXTERNC extern "C"
+#else
+#define EXTERNC
+#endif
+
+/* Intercept the libgomp initialization call to check it happens.  */
+
+int good = 0;
+
+EXTERNC void
+GOMP_enable_pinned_m

Re: [PATCH] Check if loading const from mem is faster

2022-03-08 Thread Richard Biener via Gcc-patches
On Tue, 8 Mar 2022, Jiufu Guo wrote:

> Jiufu Guo  writes:
> 
> Hi!
> 
> > Hi Sehger,
> >
> > Segher Boessenkool  writes:
> >
> >> On Tue, Mar 01, 2022 at 10:28:57PM +0800, Jiufu Guo wrote:
> >>> Segher Boessenkool  writes:
> >>> > No.  insn_cost is only for correct, existing instructions, not for
> >>> > made-up nonsense.  I created insn_cost precisely to get away from that
> >>> > aspect of rtx_cost (and some other issues, like, it is incredibly hard
> >>> > and cumbersome to write a correct rtx_cost).
> >>> 
> >>> Thanks! The implementations of hook insn_cost are align with this
> >>> design, they are  checking insn's attributes and COSTS_N_INSNS.
> >>> 
> >>> One question on the speciall case: 
> >>> For instruction: "r119:DI=0x100803004101001"
> >>> Would we treat it as valid instruction?
> >>
> >> Currently we do, alternative 6 in *movdi_internal64: we allow any r<-n.
> >> This is costed as 5 insns (cost=20).
> >>
> >> It generally is better to split things into patterns close to the
> >> eventual machine isntructions as early as possible: all the more generic
> >> optimisations can take advantage of that then.
> > Get it!
> >>
> >>> A patch, which is attached the end of this mail, accepts
> >>> "r119:DI=0x100803004101001" as input of insn_cost.
> >>> In this patch, 
> >>> - A tmp instruction is generated via make_insn_raw.
> >>> - A few calls to rtx_cost (in cse_insn) is replaced by insn_cost.
> >>> - In hook of insn_cost, checking the special 'constant' instruction.
> >>> Are these make sense?
> >>
> >> I'll review that patch inline.
> 
> I drafted a new patch that replace rtx_cost with insn_cost for cse.cc.
> Different from the previous partial patch, this patch replaces all usage
> of rtx_cost. It may be better/aggressive than previous one.

I think there's no advantage for using insn_cost over rtx_cost for
the simple SET case.

Richard.

> With this patch, bootstrap pass.
> From regtest, only output of fusion-p10-ldcmpi.c is changed, and the
> change seems as expected.
> 
> 
> BR,
> Jiufu
> 
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index a18b599d324..e623ad298db 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -262,6 +262,9 @@ static struct qty_table_elem *qty_table;
>  static rtx_insn *this_insn;
>  static bool optimize_this_for_speed_p;
>  
> +/* Used for insn_cost. */
> +static rtx_insn *estimate_insn;
> +
>  /* Index by register number, gives the number of the next (or
> previous) register in the chain of registers sharing the same
> value.
> @@ -445,7 +448,7 @@ struct table_elt
>  /* Compute cost of X, as stored in the `cost' field of a table_elt.  Fixed
> hard registers and pointers into the frame are the cheapest with a cost
> of 0.  Next come pseudos with a cost of one and other hard registers with
> -   a cost of 2.  Aside from these special cases, call `rtx_cost'.  */
> +   a cost of 2.  Aside from these special cases, call `insn_cost'.  */
>  
>  #define CHEAP_REGNO(N)   
> \
>(REGNO_PTR_FRAME_P (N) \
> @@ -698,18 +701,33 @@ preferable (int cost_a, int regcost_a, int cost_b, int 
> regcost_b)
> from COST macro to keep it simple.  */
>  
>  static int
> -notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno)
> +notreg_cost (rtx x, machine_mode mode, enum rtx_code /*outer*/, int /*opno*/)
>  {
>scalar_int_mode int_mode, inner_mode;
> -  return ((GET_CODE (x) == SUBREG
> -&& REG_P (SUBREG_REG (x))
> -&& is_int_mode (mode, &int_mode)
> -&& is_int_mode (GET_MODE (SUBREG_REG (x)), &inner_mode)
> -&& GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode)
> -&& subreg_lowpart_p (x)
> -&& TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode))
> -   ? 0
> -   : rtx_cost (x, mode, outer, opno, optimize_this_for_speed_p) * 2);
> +  if (GET_CODE (x) == SUBREG && REG_P (SUBREG_REG (x))
> +  && is_int_mode (mode, &int_mode)
> +  && is_int_mode (GET_MODE (SUBREG_REG (x)), &inner_mode)
> +  && GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode)
> +  && subreg_lowpart_p (x)
> +  && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, inner_mode))
> +return 0;
> +
> +  if (estimate_insn == NULL)
> +{
> +  estimate_insn = make_insn_raw (
> + gen_rtx_SET (gen_rtx_REG (mode, LAST_VIRTUAL_REGISTER + 1), x));
> +  SET_PREV_INSN (estimate_insn) = NULL_RTX;
> +  SET_NEXT_INSN (estimate_insn) = NULL_RTX;
> +  INSN_LOCATION (estimate_insn) = 0;
> +}
> +  else
> +{
> +  /* Update for new context.  */
> +  INSN_CODE (estimate_insn) = -1;
> +  PUT_MODE (SET_DEST (PATTERN (estimate_insn)), mode);
> +  SET_SRC (PATTERN (estimate_insn)) = x;
> +}
> +  return insn_cost (estimate_insn, optimize_this_for_speed_p);
>  }
>  
>  
> @@ -6667,6 +6685,7 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs)
>  
>init_recog ();
>init_alias_analysis ();
> +  estimat

Re: [PATCH v3] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 08, 2022 at 12:15:15PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > --- gcc/config/i386/i386.h.jj   2022-02-25 12:06:45.535493490 +0100
> > +++ gcc/config/i386/i386.h  2022-03-08 11:20:43.207043370 +0100
> > @@ -2848,6 +2848,10 @@ extern enum attr_cpu ix86_schedule;
> >  #define NUM_X86_64_MS_CLOBBERED_REGS 12
> >  #endif
> >  
> > +/* __builtin_eh_return can't handle stack realignment, so disable SSE in
> > +   libgcc functions that call it.  */
> > +#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> > +
> >  /*
> >  Local variables:
> >  version-control: t
> > 
> > 
> > As mentioned in PR104838, this likely isn't specific to just Solaris and
> > cygwin/mingw.  Fedora uses -msse2 -mfpmath=sse -mstackrealign in its 
> > C{,XX}FLAGS
> > among other things for i686.
> 
> Now verified that with your full patch applied gcc on Fedora/i686 doesn't
> build (gets those sorry messages when compiling unwind-dw2), while if I
> replace those 2 libgcc hunks from your patch with the above one it seems to
> get past the previous hanging point of gnat1 processes.

Though, perhaps it should be
#ifndef __x86_64__
#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
#endif
or something similar, on x86-64 one at least normally doesn't use lower
stack realignment unless avx or later.  Maybe we want to use
no-avx for the x86-64 case though.
Disabling sse/sse2 might be a problem especially on mingw where we need to
restore SSE registers in the EH return, no?

Even better would be to make __builtin_eh_return work even with DRAP,
but I admit I haven't understood what exactly is the problem that prevents
it from working.

Jakub



Re: [PATCH v3] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-08 Thread Eric Botcazou via Gcc-patches
> Disabling sse/sse2 might be a problem especially on mingw where we need to
> restore SSE registers in the EH return, no?

Not in 32-bit mode I think, all XMM registers are call used.

-- 
Eric Botcazou




[committed] libstdc++: Remove incorrect copyright notice from header

2022-03-08 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk. Backport to gcc-11 to follow.

-- >8 --

This file has the SGI copyright notice, but contains no code from
the SGI STL. It was entirely written by me in 2019, originally as part
of the  header. When I extracted it into a new header I
accidentally copied across the SGI copyright, but that only applies to
some much older parts of .

libstdc++-v3/ChangeLog:

* include/bits/uses_allocator_args.h: Remove incorrect copyright
notice.
---
 libstdc++-v3/include/bits/uses_allocator_args.h | 14 --
 1 file changed, 14 deletions(-)

diff --git a/libstdc++-v3/include/bits/uses_allocator_args.h 
b/libstdc++-v3/include/bits/uses_allocator_args.h
index d92dd1f0901..09cdbf1aaa8 100644
--- a/libstdc++-v3/include/bits/uses_allocator_args.h
+++ b/libstdc++-v3/include/bits/uses_allocator_args.h
@@ -22,20 +22,6 @@
 // see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 // .
 
-/*
- * Copyright (c) 1997-1999
- * Silicon Graphics Computer Systems, Inc.
- *
- * Permission to use, copy, modify, distribute and sell this software
- * and its documentation for any purpose is hereby granted without fee,
- * provided that the above copyright notice appear in all copies and
- * that both that copyright notice and this permission notice appear
- * in supporting documentation.  Silicon Graphics makes no
- * representations about the suitability of this software for any
- * purpose.  It is provided "as is" without express or implied warranty.
- *
- */
-
 /** @file include/bits/uses_allocator_args.h
  *  This is an internal header file, included by other library headers.
  *  Do not attempt to use it directly. @headername{memory}
-- 
2.34.1



[PATCH][RFC] tree-optimization/84201 - add --param vect-induction-float

2022-03-08 Thread Richard Biener via Gcc-patches
This adds a --param to allow disabling of vectorization of
floating point inductions.  Ontop of -Ofast this should allow
549.fotonik3d_r to not miscompare.

While I thought of a more elaborate way of disabling certain
vectorization kinds (reductions also came to my mind) this
for now simply uses a --param than some sophisticated -fvectorize-*
scheme.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  I've
verified that 549.fotonik3d_r miscompares with -Ofast -march=znver2
and passes when adding --param vect-induction-float=0 which
should be valid at least for peak (but I guess also base for
FOPTIMIZE for example).  I did not benchmark against other
workarounds (it has been said -fno-unsafe-math-optimizations
or other similar things work as well).

OK for trunk?

Thanks,
Richard.

2022-03-08  Richard Biener  

PR tree-optimization/84201
* params.opt (-param=vect-induction-float): Add.
* doc/invoke.texi (vect-induction-float): Document.
* tree-vect-loop.cc (vectorizable_induction): Honor
param_vect_induction_float.

* gcc.dg/vect/pr84201.c: New testcase.
---
 gcc/doc/invoke.texi |  3 +++
 gcc/params.opt  |  4 
 gcc/testsuite/gcc.dg/vect/pr84201.c | 22 ++
 gcc/tree-vect-loop.cc   |  8 
 4 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr84201.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b01ffab566a..a0fa5e1cf43 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14989,6 +14989,9 @@ in an inner loop relative to the loop being vectorized. 
 The factor applied
 is the maximum of the estimated number of iterations of the inner loop and
 this parameter.  The default value of this parameter is 50.
 
+@item vect-induction-float
+Enable loop vectorization of floating point inductions.
+
 @item avoid-fma-max-bits
 Maximum number of bits for which we avoid creating FMAs.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index f76f7839916..9561aa61a50 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1176,6 +1176,10 @@ Controls how loop vectorizer uses partial vectors.  0 
means never, 1 means only
 Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) 
IntegerRange(1, 1) Param Optimization
 The maximum factor which the loop vectorizer applies to the cost of statements 
in an inner loop relative to the loop being vectorized.
 
+-param=vect-induction-float=
+Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRage(0, 
1) Param Optimization
+Enable loop vectorization of floating point inductions.
+
 -param=vrp1-mode=
 Common Joined Var(param_vrp1_mode) Enum(vrp_mode) Init(VRP_MODE_VRP) Param 
Optimization
 --param=vrp1-mode=[vrp|ranger] Specifies the mode VRP1 should operate in.
diff --git a/gcc/testsuite/gcc.dg/vect/pr84201.c 
b/gcc/testsuite/gcc.dg/vect/pr84201.c
new file mode 100644
index 000..1cc6d1ff13c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr84201.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast --param vect-induction-float=0" } */
+
+void foo (float *a, float f, float s, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  a[i] = f;
+  f += s;
+}
+}
+
+void bar (double *a, double f, double s, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  a[i] = f;
+  f += s;
+}
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 2 "vect" } } */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 1f30fc82ca1..7fcec12a3e9 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8175,6 +8175,14 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   return false;
 }
 
+  if (FLOAT_TYPE_P (vectype) && !param_vect_induction_float)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"floating point induction vectorization disabled\n");
+  return false;
+}
+
   step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
   gcc_assert (step_expr != NULL_TREE);
   tree step_vectype = get_same_sized_vectype (TREE_TYPE (step_expr), vectype);
-- 
2.34.1


Re: [PATCH v3] libgo: Don't use pt_regs member in mcontext_t

2022-03-08 Thread Rich Felker
On Mon, Mar 07, 2022 at 02:59:02PM -0800, Ian Lance Taylor wrote:
> On Sun, Mar 6, 2022 at 11:11 PM  wrote:
> >
> > +#ifdef __PPC64__
> > +   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gp_regs[32];
> > +#else
> > +   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32];
> > +#endif
> 
> Have you tested this in 32-bit mode?  It does not look correct based
> on the glibc definitions.  Looking at glibc it seems that it ought to
> be
> 
> reg.sigpc = ((ucontext_t*)(context))->uc_mcontext.uc_regs->gregs[32];

Indeed, I think it has to use that conditional on __GLIBC__. I was
thinking the union glibc has was an anon union, but no, it's named
uc_mcontext despite not having type mcontext_t.

Ideally glibc could fix this by doing:

union {
union __ctx(uc_regs_ptr) {
struct __ctx(pt_regs) *__ctx(regs);
mcontext_t *__ctx(uc_regs);
} uc_mcontext;
mcontext_t *__ctx(uc_regs);
};

so that there would be a common API for accessing it that doesn't
conflict with the properties the standard mandates.

Rich


Re: [PATCH] arm: Remove unused variable arm_binop_none_none_unone_qualifiers

2022-03-08 Thread Christophe Lyon via Gcc-patches

Hi,


On 3/4/22 15:39, Christophe Lyon via Gcc-patches wrote:

From: Christophe Lyon 
I went ahead and committed this patch after fixing my committer address 
(above); I hope this is obvious enough.


Christophe



Commits r12-7342 and r12-7344 made some cleanup, leaving
arm_binop_none_none_unone_qualifiers unused.
This is causing build failures with -Werror (eg bootstrap).

This patch fixes the problem by removing the definition of
arm_binop_none_none_unone_qualifiers and
BINOP_NONE_NONE_UNONE_QUALIFIERS which are now unused.

Tested by bootstraping on arm-linux-gnueaibhf.

2022-03-04  Christophe Lyon  

gcc/
* config/arm/arm-builtins.cc
(arm_binop_none_none_unone_qualifiers): Delete.
(BINOP_NONE_NONE_UNONE_QUALIFIERS): Delete.
---
  gcc/config/arm/arm-builtins.cc | 6 --
  1 file changed, 6 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index a7acc1d71e7..6afca7a82cb 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -432,12 +432,6 @@ arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
  #define BINOP_UNONE_NONE_IMM_QUALIFIERS \
(arm_binop_unone_none_imm_qualifiers)
  
-static enum arm_type_qualifiers

-arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_unsigned };
-#define BINOP_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_binop_none_none_unone_qualifiers)
-
  static enum arm_type_qualifiers
  arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_predicate, qualifier_none, qualifier_none };


[Patch] Fortran: OpenMP/OpenACC avoid uninit access in size calc for mapping

2022-03-08 Thread Tobias Burnus

Hi Thomas & Jakub,

found when working on the deep-mapping patch* with OpenMP code
(and part of that patch) but it already shows up in an existing
OpenACC testcase. I think it makes sense to fix it already for GCC 12.

Problem: Also for unallocated allocatables, their size was
calculated - the 'if(desc.data == NULL)' check was only added
for pointers.

Result after the patch: When compiling with -O (which is the default
for goacc.exp), the warning now disappears. Thus, I now use '-O0'
and the previous "is uninitialized" is now "may be uninitialized".

Unrelated to the patch and the testcase, I added some
'allocate'**/'if(allocated())' to the testcase - as otherwise
uninit vars would be accessed. (Not relevant for the warning
or the patch - but I prefer no invalid code in testcases,
if it can be avoided.)

OK for mainline?

Tobias
* https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591144.html

** I am actually not sure whether 'acc update(b)' will/should map a
previous allocated variable - or whether it should. But that's
unrelated to this bug fix. See also: https://gcc.gnu.org/PR96668
for the re-mapping in OpenMP (works for arrays but not scalars).
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: OpenMP/OpenACC avoid uninit access in size calc for mapping

gcc/fortran/ChangeLog:

	* trans-openmp.cc (gfc_trans_omp_clauses, gfc_omp_finish_clause):
	Obtain size for mapping only if allocatable array is allocated.

gcc/testsuite/ChangeLog:

	* gfortran.dg/goacc/array-with-dt-1.f90: Run with -O0 and
	update dg-warning.
	* gfortran.dg/goacc/pr93464.f90: Likewise.

 gcc/fortran/trans-openmp.cc |  6 --
 gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90 | 12 +---
 gcc/testsuite/gfortran.dg/goacc/pr93464.f90 |  8 
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 4d56a771349..fad76a4791f 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -1597,7 +1597,8 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc)
   tree size = create_tmp_var (gfc_array_index_type);
   tree elemsz = TYPE_SIZE_UNIT (gfc_get_element_type (type));
   elemsz = fold_convert (gfc_array_index_type, elemsz);
-  if (GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER
+  if (GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_ALLOCATABLE
+	  || GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER
 	  || GFC_TYPE_ARRAY_AKIND (type) == GFC_ARRAY_POINTER_CONT)
 	{
 	  stmtblock_t cond_block;
@@ -3208,7 +3209,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 
 		  /* We have to check for n->sym->attr.dimension because
 			 of scalar coarrays.  */
-		  if (n->sym->attr.pointer && n->sym->attr.dimension)
+		  if ((n->sym->attr.pointer || n->sym->attr.allocatable)
+			  && n->sym->attr.dimension)
 			{
 			  stmtblock_t cond_block;
 			  tree size
diff --git a/gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90 b/gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90
index 136e42acd59..f6880238c89 100644
--- a/gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90
@@ -1,4 +1,4 @@
-! { dg-additional-options -Wuninitialized }
+! { dg-additional-options "-Wuninitialized -O0" }
 
 type t
integer, allocatable :: A(:,:)
@@ -8,9 +8,15 @@ type(t), allocatable :: b(:)
 ! { dg-note {'b' declared here} {} { target *-*-* } .-1 }
 
 !$acc update host(b)
-! { dg-warning {'b\.dim\[0\]\.ubound' is used uninitialized} {} { target *-*-* } .-1 }
-! { dg-warning {'b\.dim\[0\]\.lbound' is used uninitialized} {} { target *-*-* } .-2 }
+! { dg-warning {'b\.dim\[0\]\.ubound' may be used uninitialized} {} { target *-*-* } .-1 }
+! { dg-warning {'b\.dim\[0\]\.lbound' may be used uninitialized} {} { target *-*-* } .-2 }
+
+allocate(b(1))
+!$acc update host(b)
 !$acc update host(b(:))
+
+!$acc update host(b(1)%A)
+allocate(b(1)%A(1,1))
 !$acc update host(b(1)%A)
 !$acc update host(b(1)%A(:,:))
 end
diff --git a/gcc/testsuite/gfortran.dg/goacc/pr93464.f90 b/gcc/testsuite/gfortran.dg/goacc/pr93464.f90
index c92f1d3d8b2..18531abdf77 100644
--- a/gcc/testsuite/gfortran.dg/goacc/pr93464.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/pr93464.f90
@@ -2,17 +2,17 @@
 !
 ! Contributed by G. Steinmetz
 
-! { dg-additional-options -Wuninitialized }
+! { dg-additional-options "-Wuninitialized -O0" }
 
 program p
character :: c(2) = 'a'
character, allocatable :: z(:)
! { dg-note {'z' declared here} {} { target *-*-* } .-1 }
!$acc parallel
-   ! { dg-warning {'z\.dim\[0\]\.ubound' is used uninitialized} {} { target *-*-* } .-1 }
-   ! { dg-warning {'z\.dim\[0\]\.lbound' is used uninitialized} {} { targ

Re: [PATCH][RFC] tree-optimization/84201 - add --param vect-induction-float

2022-03-08 Thread Jeff Law via Gcc-patches




On 3/8/2022 5:56 AM, Richard Biener via Gcc-patches wrote:

This adds a --param to allow disabling of vectorization of
floating point inductions.  Ontop of -Ofast this should allow
549.fotonik3d_r to not miscompare.

While I thought of a more elaborate way of disabling certain
vectorization kinds (reductions also came to my mind) this
for now simply uses a --param than some sophisticated -fvectorize-*
scheme.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  I've
verified that 549.fotonik3d_r miscompares with -Ofast -march=znver2
and passes when adding --param vect-induction-float=0 which
should be valid at least for peak (but I guess also base for
FOPTIMIZE for example).  I did not benchmark against other
workarounds (it has been said -fno-unsafe-math-optimizations
or other similar things work as well).
Those other options do work well to throttle things back for fotonik.   
Presumably the idea of having the param is to be more surgical about 
what needs to be disabled to keep fotonik happy. Given the benchmark 
specific nature of the option, I'm not opposed to it being a param 
(primarily for developer use) vs a -f which are more geared to the 
user community.






OK for trunk?

Thanks,
Richard.

2022-03-08  Richard Biener  

PR tree-optimization/84201
* params.opt (-param=vect-induction-float): Add.
* doc/invoke.texi (vect-induction-float): Document.
* tree-vect-loop.cc (vectorizable_induction): Honor
param_vect_induction_float.

* gcc.dg/vect/pr84201.c: New testcase.

LGTM.
jeff



Re: [PATCH] PR tree-optimization/98335: Improvements to DSE's compute_trims.

2022-03-08 Thread Jeff Law via Gcc-patches




On 3/8/2022 3:10 AM, Richard Biener via Gcc-patches wrote:

On Mon, Mar 7, 2022 at 11:04 AM Roger Sayle  wrote:


This patch is the main middle-end piece of a fix for PR tree-opt/98335,
which is a code-quality regression affecting mainline.  The issue occurs
in DSE's (dead store elimination's) compute_trims function that determines
where a store to memory can be trimmed.  In the testcase given in the
PR, this function notices that the first byte of a DImode store is dead,
and replaces the 8-byte store at (aligned) offset zero, with a 7-byte store
at (unaligned) offset one.  Most architectures can store a power-of-two
bytes (up to a maximum) in single instruction, so writing 7 bytes requires
more instructions than writing 8 bytes.  This patch follows Jakub Jelinek's
suggestion in comment 5, that compute_trims needs improved heuristics.

In this patch, decision of whether and how to align trim_head is based
on the number of bytes being written, the alignment of the start of the
object and where within the object the first byte is written.  The first
tests check whether we're already writing to the start of the object,
and that we're writing three or more bytes.  If we're only writing one
or two bytes, there's no benefit from providing additional alignment.
Then we determine the alignment of the object, which is either 1, 2,
4, 8 or 16 byte aligned (capping at 16 guarantees that we never write
more than 7 bytes beyond the minimum required).  If the buffer is only
1 or 2 byte aligned there's no benefit from additional alignment.  For
the remaining cases, alignment of trim_head is based upon where within
each aligned block (word) the first byte is written.  For example,
storing the last byte (or last half-word) of a word can be performed
with a single insn.

On x86_64-pc-linux-gnu with -O2 the new test case in the PR goes from:

 movl$0, -24(%rsp)
 movabsq $72057594037927935, %rdx
 movl$0, -21(%rsp)
 andq-24(%rsp), %rdx
 movq%rdx, %rax
 salq$8, %rax
 movbc(%rip), %al
 ret

to

 xorl%eax, %eax
 movbc(%rip), %al
 ret

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  I've also added new testcases
for the original motivating PR tree-optimization/86010, to ensure that
those remain optimized (in future).  Ok for mainline?

diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 2b22a61..080e406 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -405,10 +405,36 @@ compute_trims (ao_ref *ref, sbitmap live, int
*trim_head, int *trim_tail,
int first_live = bitmap_first_set_bit (live);
*trim_head = first_live - first_orig;

-  /* If more than a word remains, then make sure to keep the
- starting point at least word aligned.  */
-  if (last_live - first_live > UNITS_PER_WORD)
-*trim_head &= ~(UNITS_PER_WORD - 1);
+  /* If REF is aligned, try to maintain this alignment if it reduces
+ the number of (power-of-two sized aligned) writes to memory.
+ First check that we're writing >= 3 bytes at a non-zero offset.  */
+  if (first_live
+  && last_live - first_live >= 2)
+{
+  unsigned int align = TYPE_ALIGN_UNIT (TREE_TYPE (ref->base));

you can't simply use TYPE_ALIGN_* on ref->base.  You can use
get_object_alignment on ref->ref, but ref->ref can be NULL in case the
ref was initialized from a builtin call like memcpy.

Also ref->base is offsetted by ref->offset which you don't seem to
account for.  In theory one could export get_object_alignment_2 and
if ref->ref is NULL, use that on ref->base, passing addr_p = true,
and then adjust the resulting bitpos by ref->offset and fix align accordingly
(trimming might also align an access if the original access was offsetted
from known alignment).

That said, a helper like ao_ref_alignment () might be useful here.

I wonder if we can apply good heuristics to compute_trims without taking
into account context, like maybe_trimp_complex_store is already
limiting itself to useful subsets and the constructor and memstar cases
will only benefit if they end up being expanded inline via *_by_pieces,
not if expanded as a call.
Trimming heuristics are minimal and localized to what we know about a 
given statement.  Alignment is probably the most important thing we need 
to keep in mind.


For complex trimming really means removing one of two loads/stores and 
was the motivating case behind introducing trimming to begin with  -- 
the memcpy stuff and friends came along for the ride later.


For trimming on mem*, str*, it can still be beneficial if they're not 
expanded by pieces, though it's more beneficial to trim in cases where 
they get expanded by pieces.




You don't seem to adjust *trim_tail at all, if an aligned 16 byte region
is trimmed there by 3 that will result in two extra stores as well, no?

It would and that's probably bad.

Jeff


Re: [PATCH][RFC] tree-optimization/84201 - add --param vect-induction-float

2022-03-08 Thread Richard Biener via Gcc-patches
On Tue, 8 Mar 2022, Jeff Law wrote:

> 
> 
> On 3/8/2022 5:56 AM, Richard Biener via Gcc-patches wrote:
> > This adds a --param to allow disabling of vectorization of
> > floating point inductions.  Ontop of -Ofast this should allow
> > 549.fotonik3d_r to not miscompare.
> >
> > While I thought of a more elaborate way of disabling certain
> > vectorization kinds (reductions also came to my mind) this
> > for now simply uses a --param than some sophisticated -fvectorize-*
> > scheme.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.  I've
> > verified that 549.fotonik3d_r miscompares with -Ofast -march=znver2
> > and passes when adding --param vect-induction-float=0 which
> > should be valid at least for peak (but I guess also base for
> > FOPTIMIZE for example).  I did not benchmark against other
> > workarounds (it has been said -fno-unsafe-math-optimizations
> > or other similar things work as well).
> Those other options do work well to throttle things back for fotonik.  
> Presumably the idea of having the param is to be more surgical about what
> needs to be disabled to keep fotonik happy. Given the benchmark specific
> nature of the option, I'm not opposed to it being a param (primarily for
> developer use) vs a -f which are more geared to the user community.

Yes, my original idea was to provide more fine-grained control in
a more user-friendly way but that requires quite a bit of bike-shedding
that's not appropriate for this stage.

> >
> > OK for trunk?
> >
> > Thanks,
> > Richard.
> >
> > 2022-03-08  Richard Biener  
> >
> >  PR tree-optimization/84201
> >  * params.opt (-param=vect-induction-float): Add.
> >  * doc/invoke.texi (vect-induction-float): Document.
> >  * tree-vect-loop.cc (vectorizable_induction): Honor
> >  param_vect_induction_float.
> >
> >  * gcc.dg/vect/pr84201.c: New testcase.
> LGTM.

Thanks - pushed.

Richard.


Re: [PATCH] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-08 Thread Marek Polacek via Gcc-patches
On Mon, Mar 07, 2022 at 07:19:09PM -0600, Segher Boessenkool wrote:
> On Mon, Mar 07, 2022 at 07:03:17PM -0500, Marek Polacek via Gcc-patches wrote:
> > In r270550, Jakub fixed classify_insn to handle asm goto: if the asm can
> > jump to a label, the insn should be a JUMP_INSN.
> > 
> > However, as the following testcase shows, non-null ASM_OPERANDS_LABEL_VEC
> > doesn't guarantee that the rtx has any actual labels it can branch to.
> 
> But it should.

Ok, that would make sense.  However...
 
> > Here, the rtvec has 0 elements because of the __thread variable: we perform
> > ix86_rewrite_tls_address which calls copy_isns and that allocates the rtvec:
> > 
> > XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i));
> 
> So fix *that* instead?  Everywhere else does not use length zero RTL
> vectors.  copy_rtx makes sure to do the right thing here, for example.

...I don't see that.  In fact copy_rtx does the same thing as
copy_insn:

   case 'V':
 if (XVEC (orig, i) != NULL)
   {
 XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i));

which will copy a zero-length vector too, right?  But even if it didn't
we'd still ICE on the original rtx as I found out.  The zero-length
rtvec is originally created in expand_asm_stmt:

  rtvec labelvec = rtvec_alloc (nlabels);

where nlabels is 0 but using NULL_RTVEC instead just means crashes everywhere.
  
So I'm afraid I don't have a better fix (except that I should have used
ASM_OPERANDS_LABEL_LENGTH).

> We do not have notation to create zero-length vectors in RTL source
> code either, btw.:
> case 'V':
>   /* 'V' is an optional vector: if a closeparen follows,
>  just store NULL for this element.  */
> (optional vectors are at the end of an RTX), and if you write [] you
> will hit
>   fatal_with_file_and_line ("vector must have at least one element");

Marek



Re: [PATCH] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-08 Thread Segher Boessenkool
Hi!

On Tue, Mar 08, 2022 at 10:08:25AM -0500, Marek Polacek wrote:
> On Mon, Mar 07, 2022 at 07:19:09PM -0600, Segher Boessenkool wrote:
> > On Mon, Mar 07, 2022 at 07:03:17PM -0500, Marek Polacek via Gcc-patches 
> > wrote:
> > > In r270550, Jakub fixed classify_insn to handle asm goto: if the asm can
> > > jump to a label, the insn should be a JUMP_INSN.
> > > 
> > > However, as the following testcase shows, non-null ASM_OPERANDS_LABEL_VEC
> > > doesn't guarantee that the rtx has any actual labels it can branch to.
> > 
> > But it should.
> 
> Ok, that would make sense.  However...
>  
> > > Here, the rtvec has 0 elements because of the __thread variable: we 
> > > perform
> > > ix86_rewrite_tls_address which calls copy_isns and that allocates the 
> > > rtvec:
> > > 
> > > XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i));
> > 
> > So fix *that* instead?  Everywhere else does not use length zero RTL
> > vectors.  copy_rtx makes sure to do the right thing here, for example.
> 
> ...I don't see that.  In fact copy_rtx does the same thing as
> copy_insn:
> 
>case 'V':
>  if (XVEC (orig, i) != NULL)
>{
>  XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i));
> 
> which will copy a zero-length vector too, right?

It doesn't.  It copies NULL as NULL.  That is what that "if" is for.
You can do similar in copy_insn_1?


Segher


[aarch64] update reg-costs to include predicate move costs

2022-03-08 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch adds predicate move costs to several SVE enabled cores.


2022-02-25  Tamar Christina  
   Andre Vieira 

gcc/ChangeLog:

    * config/aarch64/aarch64-protos.h (struct cpu_regmove_cost): 
Add PR2PR member.
    * config/aarch64/aarch64.cc (aarch64_register_move_cost): Use 
PR2PR costs when moving a predicate.
    (generic_regmove_cost, cortexa57_regmove_cost, 
exynosm1_regmove_cost thunderx_regmove_cost, xgene1_regmove_cost, 
qdf24xx_regmove_cost, thunderx2t99_regmove_cost, 
thunderx3t110_regmove_cost, tsv110_regmove_cost, a64fx_regmove_cost): 
Add PR2PR entry.

    (cortexa76_regmove_cost): New.
    (neoversen1_tunings): Use cortexa76_regmove_cost.
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
d0e78d6a559a7c310b7f8c7877081a0e2baf6a05..f2fde35c6eb4989af8736db8fad004171c160282
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -192,6 +192,7 @@ struct cpu_regmove_cost
   const int GP2FP;
   const int FP2GP;
   const int FP2FP;
+  const int PR2PR;
 };
 
 struct simd_vec_cost
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
dbeaaf484dbc070ae3fcc08530ec9bd20b8ab651..9a94f3a30b0f1acc3c9b8a0e3d703e60780d0cbc
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -526,7 +526,8 @@ static const struct cpu_regmove_cost generic_regmove_cost =
  their cost higher than memmov_cost.  */
   5, /* GP2FP  */
   5, /* FP2GP  */
-  2 /* FP2FP  */
+  2, /* FP2FP  */
+  2 /* PR2PR.  */
 };
 
 static const struct cpu_regmove_cost cortexa57_regmove_cost =
@@ -536,7 +537,8 @@ static const struct cpu_regmove_cost cortexa57_regmove_cost 
=
  their cost higher than memmov_cost.  */
   5, /* GP2FP  */
   5, /* FP2GP  */
-  2 /* FP2FP  */
+  2, /* FP2FP  */
+  2 /* PR2PR.  */
 };
 
 static const struct cpu_regmove_cost cortexa53_regmove_cost =
@@ -546,7 +548,8 @@ static const struct cpu_regmove_cost cortexa53_regmove_cost 
=
  their cost higher than memmov_cost.  */
   5, /* GP2FP  */
   5, /* FP2GP  */
-  2 /* FP2FP  */
+  2, /* FP2FP  */
+  2 /* PR2PR.  */
 };
 
 static const struct cpu_regmove_cost exynosm1_regmove_cost =
@@ -556,7 +559,8 @@ static const struct cpu_regmove_cost exynosm1_regmove_cost =
  their cost higher than memmov_cost (actual, 4 and 9).  */
   9, /* GP2FP  */
   9, /* FP2GP  */
-  1 /* FP2FP  */
+  1, /* FP2FP  */
+  1 /* PR2PR.  */
 };
 
 static const struct cpu_regmove_cost thunderx_regmove_cost =
@@ -564,7 +568,8 @@ static const struct cpu_regmove_cost thunderx_regmove_cost =
   2, /* GP2GP  */
   2, /* GP2FP  */
   6, /* FP2GP  */
-  4 /* FP2FP  */
+  4, /* FP2FP  */
+  4 /* PR2PR.  */
 };
 
 static const struct cpu_regmove_cost xgene1_regmove_cost =
@@ -574,7 +579,8 @@ static const struct cpu_regmove_cost xgene1_regmove_cost =
  their cost higher than memmov_cost.  */
   8, /* GP2FP  */
   8, /* FP2GP  */
-  2 /* FP2FP  */
+  2, /* FP2FP  */
+  2 /* PR2PR.  */
 };
 
 static const struct cpu_regmove_cost qdf24xx_regmove_cost =
@@ -583,7 +589,8 @@ static const struct cpu_regmove_cost qdf24xx_regmove_cost =
   /* Avoid the use of int<->fp moves for spilling.  */
   6, /* GP2FP  */
   6, /* FP2GP  */
-  4 /* FP2FP  */
+  4, /* FP2FP  */
+  4 /* PR2PR.  */
 };
 
 static const struct cpu_regmove_cost thunderx2t99_regmove_cost =
@@ -593,6 +600,7 @@ static const struct cpu_regmove_cost 
thunderx2t99_regmove_cost =
   5, /* GP2FP  */
   6, /* FP2GP  */
   3, /* FP2FP  */
+  3 /* PR2PR.  */
 };
 
 static const struct cpu_regmove_cost thunderx3t110_regmove_cost =
@@ -601,7 +609,8 @@ static const struct cpu_regmove_cost 
thunderx3t110_regmove_cost =
   /* Avoid the use of int<->fp moves for spilling.  */
   4, /* GP2FP  */
   5, /* FP2GP  */
-  4  /* FP2FP  */
+  4,  /* FP2FP  */
+  4 /* PR2PR.  */
 };
 
 static const struct cpu_regmove_cost tsv110_regmove_cost =
@@ -611,7 +620,8 @@ static const struct cpu_regmove_cost tsv110_regmove_cost =
  their cost higher than memmov_cost.  */
   2, /* GP2FP  */
   3, /* FP2GP  */
-  2  /* FP2FP  */
+  2,  /* FP2FP  */
+  2 /* PR2PR.  */
 };
 
 static const struct cpu_regmove_cost a64fx_regmove_cost =
@@ -621,7 +631,19 @@ static const struct cpu_regmove_cost a64fx_regmove_cost =
  their cost higher than memmov_cost.  */
   5, /* GP2FP  */
   7, /* FP2GP  */
-  2 /* FP2FP  */
+  2, /* FP2FP  */
+  2 /* PR2PR.  */
+};
+
+static const struct cpu_regmove_cost neoversen1_regmove_cost =
+{
+  1, /* GP2GP  */
+  /* Spilling to int<->fp instead of memory is recommended so set
+ realistic costs compared to memmv_cost.  */
+  3, /* GP2FP  */
+  2, /* FP2GP  */
+  2, /* FP2FP  */
+  1 /* PR2PR.  */
 };
 
 /* Generic costs for Advanced SIMD vector operations.   */
@@ -1698,7 +1720,7 @@ static const struct tune_params neoversen1_tunings =
 {
   &cortexa76_extra_costs,
   &generic_addrcost_table,
-  &generic_regmove_cost,
+  &neoversen1_regmove_cost,
   &cor

[aarch64] Enable FP16 feature by default for Armv9

2022-03-08 Thread Andre Vieira (lists) via Gcc-patches

Hi all,

This patch adds the feature bit for FP16 to the feature set for Armv9 
since Armv9 requires SVE to be implemented and SVE requires FP16 to be 
implemented.


2022-03-04  Andre Vieira  

    * config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH9): Add FP16 
feature bit.
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
27ba4f4ca3fa78585733cfe68e2dee32c55282a7..efa46ac0b8799b5849b609d591186e26e5cb37ff
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -278,7 +278,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_FOR_ARCH8_R \
   (AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_V8_R)
 #define AARCH64_FL_FOR_ARCH9   \
-  (AARCH64_FL_FOR_ARCH8_5 | AARCH64_FL_SVE | AARCH64_FL_SVE2 | AARCH64_FL_V9)
+  (AARCH64_FL_FOR_ARCH8_5 | AARCH64_FL_SVE | AARCH64_FL_SVE2 | AARCH64_FL_V9 \
+   | AARCH64_FL_F16)
 
 /* Macros to test ISA flags.  */
 


Re: [PATCH] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-08 Thread Marek Polacek via Gcc-patches
On Tue, Mar 08, 2022 at 09:14:56AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Mar 08, 2022 at 10:08:25AM -0500, Marek Polacek wrote:
> > On Mon, Mar 07, 2022 at 07:19:09PM -0600, Segher Boessenkool wrote:
> > > On Mon, Mar 07, 2022 at 07:03:17PM -0500, Marek Polacek via Gcc-patches 
> > > wrote:
> > > > In r270550, Jakub fixed classify_insn to handle asm goto: if the asm can
> > > > jump to a label, the insn should be a JUMP_INSN.
> > > > 
> > > > However, as the following testcase shows, non-null 
> > > > ASM_OPERANDS_LABEL_VEC
> > > > doesn't guarantee that the rtx has any actual labels it can branch to.
> > > 
> > > But it should.
> > 
> > Ok, that would make sense.  However...
> >  
> > > > Here, the rtvec has 0 elements because of the __thread variable: we 
> > > > perform
> > > > ix86_rewrite_tls_address which calls copy_isns and that allocates the 
> > > > rtvec:
> > > > 
> > > > XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i));
> > > 
> > > So fix *that* instead?  Everywhere else does not use length zero RTL
> > > vectors.  copy_rtx makes sure to do the right thing here, for example.
> > 
> > ...I don't see that.  In fact copy_rtx does the same thing as
> > copy_insn:
> > 
> >case 'V':
> >  if (XVEC (orig, i) != NULL)
> >{
> >  XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i));
> > 
> > which will copy a zero-length vector too, right?
> 
> It doesn't.  It copies NULL as NULL.  That is what that "if" is for.

But XVEC (orig, i) is not null, it just has XVECLEN 0.

> You can do similar in copy_insn_1?

You mean copy_rtx?  It already has the same XVEC (orig, i) != NULL check.

But like I said above, even if we didn't copy these XVECLEN 0 rtvecs,
the crash would not go away.

Marek



Re: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

2022-03-08 Thread Patrick Palka via Gcc-patches
On Mon, 7 Mar 2022, Jason Merrill wrote:

> On 3/7/22 10:47, Patrick Palka wrote:
> > On Fri, 4 Mar 2022, Jason Merrill wrote:
> > 
> > > On 3/4/22 14:24, Patrick Palka wrote:
> > > > Here we're failing to communicate to cp_finish_decl from tsubst_expr
> > > > that we're in a copy-initialization context (via the
> > > > LOOKUP_ONLYCONVERTING
> > > > flag), which causes do_class_deduction to always consider explicit
> > > > deduction guides when performing CTAD for a templated variable
> > > > initializer.
> > > > 
> > > > We could fix this by passing LOOKUP_ONLYCONVERTING appropriately when
> > > > calling cp_finish_decl from tsubst_expr, but it seems do_class_deduction
> > > > can determine if we're in a copy-init context by simply inspecting the
> > > > initializer, and thus render its flags parameter unnecessary, which is
> > > > what this patch implements.  (If we were to fix this in tsubst_expr
> > > > instead, I think we'd have to inspect the initializer in the same way
> > > > in order to detect a copy-init context?)
> > > 
> > > Hmm, does this affect conversions as well?
> > > 
> > > Looks like it does:
> > > 
> > > struct A
> > > {
> > >explicit operator int();
> > > };
> > > 
> > > template  void f()
> > > {
> > >T t = A();
> > > }
> > > 
> > > int main()
> > > {
> > >f(); // wrongly accepted
> > > }
> > > 
> > > The reverse, initializing via an explicit constructor, is caught by code
> > > in
> > > build_aggr_init much like the code your patch adds to do_auto_deduction;
> > > perhaps we should move/copy that code to cp_finish_decl?
> > 
> > Ah, makes sense.  Moving that code from build_aggr_init to
> > cp_finish_decl broke things, but using it in both spots seems to work
> > well.  And I suppose we might as well use it in do_class_deduction too,
> > since doing so lets us remove the flags parameter.
> 
> Before removing the flags parameter please try asserting that it now matches
> is_copy_initialization and see if anything breaks.

I added to do_class_deduction:

  gcc_assert (bool(flags & LOOKUP_ONLYCONVERTING) == is_copy_initialization 
(init));

Turns out removing the flags parameter breaks CTAD for new-expressions
of the form 'new TT(x)' because in this case build_new passes just 'x'
as the initializer to do_auto_deduction (as opposed to a single TREE_LIST),
for which is_copy_initialization returns true even though it's really
direct initalization.

Also turns out we're similarly not passing the right LOOKUP_* flags to
cp_finish_decl from instantiate_body, which breaks consideration of
explicit conversions/deduction guides when instantiating the initializer
of a static data member.  I added some xfailed testcases for these
situations.

Here's a patch that keeps the flags parameter of do_auto_deduction, and
only changes the call to cp_finish_decl from tsubst_expr:

-- >8 --

Subject: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
flag), which causes us to always consider explicit deduction guides when
performing CTAD for a templated variable initializer.

It turns out this bug also affects consideration of explicit conversion
operators for the same reason.  But consideration of explicit constructors
is unaffected and seems to work correctly thanks to code in build_aggr_init
that sets LOOKUP_ONLYCONVERTING when the initializer represents
copy-initialization.

This patch factors out the copy-initialization check from build_aggr_init
and reuses it in tsubst_expr for sake of cp_finish_decl.  This fixes
consideration of explicit dguides/conversion when instantiating the
initializer of block-scope variables, but the static data member case is
still similarly broken since those are handled from instantiate_body
not tsubst_expr.

PR c++/102137
PR c++/87820

gcc/cp/ChangeLog:

* cp-tree.h (is_copy_initialization): Declare.
* init.cc (build_aggr_init): Split out copy-initialization
check into ...
(is_copy_initialization): ... here.
(tsubst_expr) : Pass LOOKUP_ONLYCONVERTING
to cp_finish_decl when is_copy_initialization is true.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/explicit15.C: New test.
* g++.dg/cpp1z/class-deduction108.C: New test.
---
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/init.cc| 20 +++--
 gcc/cp/pt.cc  |  6 +-
 gcc/testsuite/g++.dg/cpp0x/explicit15.C   | 83 +++
 .../g++.dg/cpp1z/class-deduction108.C | 78 +
 5 files changed, 182 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/explicit15.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction108.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ac723901098..fd76909ca75 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.

Re: [PATCH v3] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-08 Thread H.J. Lu via Gcc-patches
On Tue, Mar 8, 2022 at 4:29 AM Jakub Jelinek  wrote:
>
> On Tue, Mar 08, 2022 at 12:15:15PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > > --- gcc/config/i386/i386.h.jj   2022-02-25 12:06:45.535493490 +0100
> > > +++ gcc/config/i386/i386.h  2022-03-08 11:20:43.207043370 +0100
> > > @@ -2848,6 +2848,10 @@ extern enum attr_cpu ix86_schedule;
> > >  #define NUM_X86_64_MS_CLOBBERED_REGS 12
> > >  #endif
> > >
> > > +/* __builtin_eh_return can't handle stack realignment, so disable SSE in
> > > +   libgcc functions that call it.  */
> > > +#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> > > +
> > >  /*
> > >  Local variables:
> > >  version-control: t
> > >
> > >
> > > As mentioned in PR104838, this likely isn't specific to just Solaris and
> > > cygwin/mingw.  Fedora uses -msse2 -mfpmath=sse -mstackrealign in its 
> > > C{,XX}FLAGS
> > > among other things for i686.
> >
> > Now verified that with your full patch applied gcc on Fedora/i686 doesn't
> > build (gets those sorry messages when compiling unwind-dw2), while if I
> > replace those 2 libgcc hunks from your patch with the above one it seems to
> > get past the previous hanging point of gnat1 processes.
>
> Though, perhaps it should be
> #ifndef __x86_64__
> #define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> #endif
> or something similar, on x86-64 one at least normally doesn't use lower
> stack realignment unless avx or later.  Maybe we want to use
> no-avx for the x86-64 case though.

I have verified that AVX and AVX512 have no issues on x86-64.  In 32-bit,
-mstackrealign triggers the problem.

> Disabling sse/sse2 might be a problem especially on mingw where we need to
> restore SSE registers in the EH return, no?

No, it isn't needed.

> Even better would be to make __builtin_eh_return work even with DRAP,
> but I admit I haven't understood what exactly is the problem that prevents
> it from working.

The EH return is a very special case.  Disable SSE in 32-bit is the simplest
way to make the EH return to work.

-- 
H.J.


Re: [PATCH v2] Add TARGET_MOVE_WITH_MODE_P

2022-03-08 Thread H.J. Lu via Gcc-patches
On Mon, Mar 7, 2022 at 5:45 AM Richard Biener
 wrote:
>
> On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu  wrote:
> >
> > On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote:
> > > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches
> > >  wrote:
> > > >
> > > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to fold 
> > > > memcpy.
> > > > The default is
> > > >
> > > > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun))
> > > >
> > > > For x86, it is MOVE_MAX to restore the old behavior before
> > >
> > > I know we've discussed this to death in the PR, I just want to repeat here
> > > that the GIMPLE folding expects to generate a single load and a single
> > > store (that is what it does on the GIMPLE level) which is why MOVE_MAX
> > > was chosen originally (it's documented to what a "single instruction" 
> > > does).
> > > In practice MOVE_MAX does not seem to cover vector register sizes
> > > so Richard pulled MOVE_RATIO which is really intended to cover
> > > the case of using multiple instructions for moving memory (but then I
> > > don't remember whether for the ARM case the single load/store GIMPLE
> > > will be expanded to multiple load/store instructions).
> > >
> > > TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution,
> > > being very specific for memcpy folding (we also fold memmove btw).
> > >
> > > There is also MOVE_MAX_PIECES which _might_ be more appropriate
> > > than MOVE_MAX here and still honor the idea of single instructions.
> > > Now neither arm nor aarch64 define this and it defaults to MOVE_MAX,
> > > not MOVE_MAX * MOVE_RATIO.
> > >
> > > So if we need a new hook then that hook should at least get the
> > > 'speed' argument of MOVE_RATIO and it should get a better name.
> > >
> > > I still think that it should be possible to improve the insn check to
> > > avoid use of "disabled" modes, maybe that's also a point to add
> > > a new hook like .move_with_mode_p or so?  To quote, we do
> >
> > Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P.
>
> Again I'd like to shine light on MOVE_MAX_PIECES which explicitely
> mentions "a load or store used TO COPY MEMORY" (emphasis mine)
> and whose x86 implementation would already be fine (doing larger moves
> and also not doing too large moves).  But appearantly the arm folks
> decided that that's not fit and instead (mis-?)used MOVE_MAX * MOVE_RATIO.
>
> Yes, MOVE_MAX_PIECES is documented to apply to move_by_pieces.
> Still GIMPLE memcpy/memmove inlining wants to mimic exactly that but
> restrict itself to a single load and a single store.
>
> > >
> > >   scalar_int_mode mode;
> > >   if (int_mode_for_size (ilen * 8, 0).exists (&mode)
> > >   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
> > >   && have_insn_for (SET, mode)
> > >   /* If the destination pointer is not aligned we must be 
> > > able
> > >  to emit an unaligned store.  */
> > >   && (dest_align >= GET_MODE_ALIGNMENT (mode)
> > >   || !targetm.slow_unaligned_access (mode, dest_align)
> > >   || (optab_handler (movmisalign_optab, mode)
> > >   != CODE_FOR_nothing)))
> > >
> > > where I understand the ISA is enabled and if the user explicitely
> > > uses it that's OK but -mprefer-avx128 should tell GCC to never
> > > generate AVX256 code where the user was not explicitely using it
> > > (still for example glibc might happily use AVX256 code to implement
> > > the memcpy we are folding!)
> > >
> > > Note the BB vectorizer also might end up with using AVX256 because
> > > in places it also relies on optab queries and the vector_mode_supported_p
> > > check (but the memcpy folding uses the fake integer modes).  So
> > > x86 might need to implement the related_mode hook to avoid "auto"-using
> > > a larger vector mode which the default implementation would happily do.
> > >
> > > Richard.
> >
> > OK for master?
>
> Looking for opinions from others as well.
>
> Btw, there's a similar use in expand_DEFERRED_INIT:
>
>   && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
> 0).exists (&var_mode)
>   && have_insn_for (SET, var_mode))
>
> So it occured to me that maybe targetm.move_with_mode_p should eventually

TARGET_MOVE_WITH_MODE_P shouldn't be used here since we do want to
use var_mode.

> check have_insn_for (SET, var_mode) or we should abstract checking the two
> things to a generic API somewhere (in optabs-query.h maybe, or expmed.h,
> not sure where it would be more appropriate).
>
> > +@deftypefn {Target Hook} bool TARGET_MOVE_WITH_MODE_P (machine_mode 
> > @var{mode})
> > +This target hook returns true if move with mode @var{mode} can be
> > +generated implicitly.  The default definition returns true.
> > +@end deftypefn
>
> I know what you mean but I'm not sure "can be generated implicitly" captures
> that.  The 

Re: [PATCH v3] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 08, 2022 at 07:37:17AM -0800, H.J. Lu wrote:
> > Though, perhaps it should be
> > #ifndef __x86_64__
> > #define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> > #endif
> > or something similar, on x86-64 one at least normally doesn't use lower
> > stack realignment unless avx or later.  Maybe we want to use
> > no-avx for the x86-64 case though.
> 
> I have verified that AVX and AVX512 have no issues on x86-64.  In 32-bit,
> -mstackrealign triggers the problem.

I bet it would be a problem if we started vectorizing something in there
using avx/avx2/avx512*.  But given the sorry, I think we'd find that out
immediately.

> > Disabling sse/sse2 might be a problem especially on mingw where we need to
> > restore SSE registers in the EH return, no?
> 
> No, it isn't needed.

I meant for 64-bit where I think the Windows ABI preserves some XMM regs
(low 128-bits of them).  So my earlier patch to just define
LIBGCC2_UNWIND_ATTRIBUTE unconditionally would be wrong for it.

> > Even better would be to make __builtin_eh_return work even with DRAP,
> > but I admit I haven't understood what exactly is the problem that prevents
> > it from working.
> 
> The EH return is a very special case.  Disable SSE in 32-bit is the simplest
> way to make the EH return to work.

Ok.  So, what do you think about replacing the libgcc/ part of your patch
with that
/* __builtin_eh_return can't handle stack realignment, so disable SSE in
   32-bit libgcc functions that call it.  */
#ifndef __x86_64__
#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
#endif
?
I'm bootstrapping/regtesting such a patch right now (because I needed some
quick fix for the gnat1 hangs).

Jakub



Re: [PATCH] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-08 Thread Segher Boessenkool
On Tue, Mar 08, 2022 at 10:25:45AM -0500, Marek Polacek wrote:
> On Tue, Mar 08, 2022 at 09:14:56AM -0600, Segher Boessenkool wrote:
> > On Tue, Mar 08, 2022 at 10:08:25AM -0500, Marek Polacek wrote:
> > > ...I don't see that.  In fact copy_rtx does the same thing as
> > > copy_insn:
> > > 
> > >case 'V':
> > >  if (XVEC (orig, i) != NULL)
> > >{
> > >  XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i));
> > > 
> > > which will copy a zero-length vector too, right?
> > 
> > It doesn't.  It copies NULL as NULL.  That is what that "if" is for.
> 
> But XVEC (orig, i) is not null, it just has XVECLEN 0.

So where did *that* come from?  This isn't correct RTL.

> > You can do similar in copy_insn_1?
> 
> You mean copy_rtx?  It already has the same XVEC (orig, i) != NULL check.

No, I mean do similar in copy_insn_1 as what copy_rtx already
(correctly) does.

> But like I said above, even if we didn't copy these XVECLEN 0 rtvecs,
> the crash would not go away.

An rtvec should never have length 0.  Look at gen_rtvec for another
example.

You can get rid of the crash, sure.  But it is a much better plan to try
and get rid of the actual problem!  (And then add some more checking to
make sure this doesn't happen in the future.)


Segher


[PATCH] simplify-rtx: Fix up SUBREG_PROMOTED_SET arguments [PR104839]

2022-03-08 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase is miscompiled on powerpc64le-linux at -O1 and higher
(except for -Og).  The bug was introduced in r12-3252-gcad36f38576a6a7
which for SIGN_EXTEND from SUBREG_PROMOTED_SIGNED_P SUBREG used
SUBREG_PROMOTED_SET (temp, 1) (but that makes temp
SUBREG_PROMOTED_UNSIGNED_P because SRP_UNSIGNED is 1) and similarly the
ZERO_EXTEND from SUBREG_PROMOTED_UNSIGNED_P SUBREG used
SUBREG_PROMOTED_SET (temp, 0) (but that makes temp
SUBREG_PROMOTED_SIGNED_P because SRP_SIGNED is 0).
The following patch fixes that (swaps the 0s and 1s), but for better
readability uses the SRP_* constants.
rtl.h has:
/* Valid for subregs which are SUBREG_PROMOTED_VAR_P().  In that case
   this gives the necessary extensions:
   0  - signed (SPR_SIGNED)
   1  - normal unsigned (SPR_UNSIGNED)
   2  - value is both sign and unsign extended for mode
(SPR_SIGNED_AND_UNSIGNED).
   -1 - pointer unsigned, which most often can be handled like unsigned
extension, except for generating instructions where we need to
emit special code (ptr_extend insns) on some architectures
(SPR_POINTER). */
The expr.c change in the same commit looks ok to me (passes unsignedp
to SUBREG_PROMOTED_SET, so 0 for signed, 1 for unsigned).

Starting bootstrap/regtest on powerpc64{,le}-linux now, ok for trunk?

2022-03-08  Jakub Jelinek  

PR rtl-optimization/104839
* simplify-rtx.cc (simplify_unary_operation_1) :
Use SRP_SIGNED instead of incorrect 1 in SUBREG_PROMOTED_SET.
(simplify_unary_operation_1) : Use SRP_UNSIGNED
instead of incorrect 0 in SUBREG_PROMOTED_SET.

* gcc.c-torture/execute/pr104839.c: New test.

--- gcc/simplify-rtx.cc.jj  2022-02-23 09:17:04.0 +0100
+++ gcc/simplify-rtx.cc 2022-03-08 16:31:20.823246404 +0100
@@ -1527,7 +1527,7 @@ simplify_context::simplify_unary_operati
  if (partial_subreg_p (temp))
{
  SUBREG_PROMOTED_VAR_P (temp) = 1;
- SUBREG_PROMOTED_SET (temp, 1);
+ SUBREG_PROMOTED_SET (temp, SRP_SIGNED);
}
  return temp;
}
@@ -1662,7 +1662,7 @@ simplify_context::simplify_unary_operati
  if (partial_subreg_p (temp))
{
  SUBREG_PROMOTED_VAR_P (temp) = 1;
- SUBREG_PROMOTED_SET (temp, 0);
+ SUBREG_PROMOTED_SET (temp, SRP_UNSIGNED);
}
  return temp;
}
--- gcc/testsuite/gcc.c-torture/execute/pr104839.c.jj   2022-03-08 
16:46:51.418440078 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr104839.c  2022-03-08 
16:46:27.044774203 +0100
@@ -0,0 +1,37 @@
+/* PR rtl-optimization/104839 */
+
+__attribute__((noipa)) short
+foo (void)
+{
+  return -1;
+}
+
+__attribute__((noipa)) int
+bar (void)
+{
+  short i = foo ();
+  if (i == -2)
+return 2;
+  long k = i;
+  int j = -1;
+  volatile long s = 300;
+  if (k < 0)
+{
+  k += s;
+  if (k < 0)
+   j = 0;
+}
+  else if (k >= s)
+j = 0;
+  if (j != -1)
+return 1;
+  return 0;
+}
+
+int
+main ()
+{
+  if (bar () != 0)
+__builtin_abort ();
+  return 0;
+}

Jakub



Re: Porting the Docs to Sphinx - project status

2022-03-08 Thread Martin Liška

On 2/4/22 14:40, Matthias Klose wrote:

On 1/31/22 15:06, Martin Liška wrote:

Hello.

It's about 5 months since the last project status update:
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577108.html
Now it's pretty clear that it won't be merged before GCC 12.1 gets released.

So where we are? I contacted documentation maintainers (Gerald, Sandra and
Joseph) at the
end of the year in a private email, where I pinged the patches. My take away is
that both
Gerald and Joseph are fine with the porting, while Sandra has some concerns.
Based on her
feedback, I was able to improve the PDF generated output significantly and I'm
pleased by the
provided feedback. That led to the following 2 Sphinx pulls requests that need
to be merged
before we can migrate the documentation: [1], [2].

Since the last time I also made one more round of proofreading and the layout
was improved
(mainly for PDF part). Current version of the documentation can be seen here:
https://splichal.eu/scripts/sphinx/

I would like to finish the transition once GCC 12.1 gets released in May/June
this year.
There are still some minor regressions, but overall the Sphinx-based
documentation should
be a significant improvement over what we've got right now.

Please take this email as urgent call for a feedback!


Please take care about the copyrights.  I only checked the D frontend manual,
and this one suddenly has a copyright with invariant sections, compared to the
current gdc.texi which has a copyright *without* the invariant sections.  Debian
doesn't allow me to ship documentation with invariant sections ...


Oh, thank you very much for the pointer. I didn't notice the Copyright sections
differ quite a lot. It should be fixed now.



I didn't look how much you reorganized the sources, but it would nice to split
the files into those documenting command line options (used to generate the man
pages) and other documentation.


Well, the current splitting is done into multiple .rst files and a bunch of them
actually constructs command line options. Please check View page source button
on each HTML page.


This is already done for gcc/doc, but not for
other frontends.  It would allow having manual pages with a copyright requiring
front and back cover texts in the manual pages.


How exactly does it work? Does it mean you don't use official GCC tarballs?
I would expect you just package built man/info pages and don't distribute 
PDF/HTML
version of a documenation, or?



It would also be nice to require the latest sphinx version (and probably some
plugins), so that distros can build the docs with older sphinx versions as well.


I'm sorry but this would be very difficult. It's mainly caused by fact I've 
reported
quite some changes to upstream, where having them leads to a reasonable 
HTML/PDF output.

Note you can quite easily utilize pip&virtualenv for Sphinx installation.

Cheers,
Martin



Matthias




Re: [PATCH] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-08 Thread Marek Polacek via Gcc-patches
On Tue, Mar 08, 2022 at 09:49:15AM -0600, Segher Boessenkool wrote:
> On Tue, Mar 08, 2022 at 10:25:45AM -0500, Marek Polacek wrote:
> > On Tue, Mar 08, 2022 at 09:14:56AM -0600, Segher Boessenkool wrote:
> > > On Tue, Mar 08, 2022 at 10:08:25AM -0500, Marek Polacek wrote:
> > > > ...I don't see that.  In fact copy_rtx does the same thing as
> > > > copy_insn:
> > > > 
> > > >case 'V':
> > > >  if (XVEC (orig, i) != NULL)
> > > >{
> > > >  XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i));
> > > > 
> > > > which will copy a zero-length vector too, right?
> > > 
> > > It doesn't.  It copies NULL as NULL.  That is what that "if" is for.
> > 
> > But XVEC (orig, i) is not null, it just has XVECLEN 0.
> 
> So where did *that* come from?  This isn't correct RTL.

I already said it before:

The zero-length rtvec is originally created in expand_asm_stmt:

  rtvec labelvec = rtvec_alloc (nlabels);

where nlabels is 0 but using NULL_RTVEC instead just means crashes everywhere.

> > > You can do similar in copy_insn_1?
> > 
> > You mean copy_rtx?  It already has the same XVEC (orig, i) != NULL check.
> 
> No, I mean do similar in copy_insn_1 as what copy_rtx already
> (correctly) does.

Do similar what?  They already do the same thing with XVECs as I've said
twice.  If you mean something other than the 'V' case, please be explicit.
 
> > But like I said above, even if we didn't copy these XVECLEN 0 rtvecs,
> > the crash would not go away.
> 
> An rtvec should never have length 0.  Look at gen_rtvec for another
> example.
> 
> You can get rid of the crash, sure.  But it is a much better plan to try
> and get rid of the actual problem!  (And then add some more checking to
> make sure this doesn't happen in the future.)

Yes, I realize that.  That's why I've tried using NULL_RTVEC in expand_asm_stmt
rather than using a zero-length rtvec.  It resulted in crashes in, for instance,
jump.cc.  So I'm not sure if such a fix is suitable for stage4.  I may try 
again.

Marek



Re: [PATCH v3] x86: Disable SSE on unwind-c.c and unwind-dw2.c

2022-03-08 Thread H.J. Lu via Gcc-patches
On Tue, Mar 8, 2022 at 7:46 AM Jakub Jelinek  wrote:
>
> On Tue, Mar 08, 2022 at 07:37:17AM -0800, H.J. Lu wrote:
> > > Though, perhaps it should be
> > > #ifndef __x86_64__
> > > #define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> > > #endif
> > > or something similar, on x86-64 one at least normally doesn't use lower
> > > stack realignment unless avx or later.  Maybe we want to use
> > > no-avx for the x86-64 case though.
> >
> > I have verified that AVX and AVX512 have no issues on x86-64.  In 32-bit,
> > -mstackrealign triggers the problem.
>
> I bet it would be a problem if we started vectorizing something in there
> using avx/avx2/avx512*.  But given the sorry, I think we'd find that out

YMM and ZMM can be used to expand memset with -march=native.
It works fine on Linux.  No stack realignment is needed.

> immediately.

True.

> > > Disabling sse/sse2 might be a problem especially on mingw where we need to
> > > restore SSE registers in the EH return, no?
> >
> > No, it isn't needed.
>
> I meant for 64-bit where I think the Windows ABI preserves some XMM regs

Does it need to realign the stack?

> (low 128-bits of them).  So my earlier patch to just define
> LIBGCC2_UNWIND_ATTRIBUTE unconditionally would be wrong for it.
>
> > > Even better would be to make __builtin_eh_return work even with DRAP,
> > > but I admit I haven't understood what exactly is the problem that prevents
> > > it from working.
> >
> > The EH return is a very special case.  Disable SSE in 32-bit is the simplest
> > way to make the EH return to work.
>
> Ok.  So, what do you think about replacing the libgcc/ part of your patch
> with that
> /* __builtin_eh_return can't handle stack realignment, so disable SSE in
>32-bit libgcc functions that call it.  */
> #ifndef __x86_64__
> #define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> #endif
> ?

Yes, it should work.

Thanks.

> I'm bootstrapping/regtesting such a patch right now (because I needed some
> quick fix for the gnat1 hangs).
>
> Jakub
>


-- 
H.J.


Re: [PATCH] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 08, 2022 at 09:49:15AM -0600, Segher Boessenkool wrote:
> > But like I said above, even if we didn't copy these XVECLEN 0 rtvecs,
> > the crash would not go away.
> 
> An rtvec should never have length 0.  Look at gen_rtvec for another
> example.

That is not true.  In case of ASM_OPERANDS, lots of code relies that it
can use ASM_OPERANDS_{INPUT,LABEL}_LENGTH without checking if
ASM_OPERANDS_{INPUT,LABEL}_VEC is non-NULL.  Those ASM*LENGTH macros are
defined as XVECLEN which I believe will just segfault if the vec is NULL:
#define XVECLEN(RTX, N) GET_NUM_ELEM (XVEC (RTX, N))
#define GET_NUM_ELEM(RTVEC) ((RTVEC)->num_elem)
#define XVEC(RTX, N)(RTL_CHECK2 (RTX, N, 'E', 'V').rt_rtvec)
cfgexpand.cc as Marek said will allocate even zero length vectors using
rtvec_alloc (0):
  rtvec argvec = rtvec_alloc (ninputs);
  rtvec constraintvec = rtvec_alloc (ninputs);
  rtvec labelvec = rtvec_alloc (nlabels);
or e.g. in
  PATTERN (insn) = gen_rtx_ASM_OPERANDS (VOIDmode, ggc_strdup (""), "", 0,
 rtvec_alloc (0),
 rtvec_alloc (0),
 ASM_OPERANDS_LABEL_VEC (tmp),
 ASM_OPERANDS_SOURCE_LOCATION(tmp));

Jakub



Re: [PATCH] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-08 Thread Marek Polacek via Gcc-patches
On Tue, Mar 08, 2022 at 05:12:43PM +0100, Jakub Jelinek wrote:
> On Tue, Mar 08, 2022 at 09:49:15AM -0600, Segher Boessenkool wrote:
> > > But like I said above, even if we didn't copy these XVECLEN 0 rtvecs,
> > > the crash would not go away.
> > 
> > An rtvec should never have length 0.  Look at gen_rtvec for another
> > example.
> 
> That is not true.  In case of ASM_OPERANDS, lots of code relies that it
> can use ASM_OPERANDS_{INPUT,LABEL}_LENGTH without checking if
> ASM_OPERANDS_{INPUT,LABEL}_VEC is non-NULL.  Those ASM*LENGTH macros are
> defined as XVECLEN which I believe will just segfault if the vec is NULL:

Yup, they will segv.  I've guarded a few spots with ASM_OPERANDS_LABEL_VEC
before using _LENGTH but there were just more and more crashes so I gave up.

> #define XVECLEN(RTX, N) GET_NUM_ELEM (XVEC (RTX, N))
> #define GET_NUM_ELEM(RTVEC) ((RTVEC)->num_elem)
> #define XVEC(RTX, N)(RTL_CHECK2 (RTX, N, 'E', 'V').rt_rtvec)
> cfgexpand.cc as Marek said will allocate even zero length vectors using
> rtvec_alloc (0):
>   rtvec argvec = rtvec_alloc (ninputs);
>   rtvec constraintvec = rtvec_alloc (ninputs);
>   rtvec labelvec = rtvec_alloc (nlabels);
> or e.g. in
>   PATTERN (insn) = gen_rtx_ASM_OPERANDS (VOIDmode, ggc_strdup (""), "", 0,
>  rtvec_alloc (0),
>  rtvec_alloc (0),
>  ASM_OPERANDS_LABEL_VEC (tmp),
>  ASM_OPERANDS_SOURCE_LOCATION(tmp));

I didn't see the latter, but I wouldn't be surprised if there were more.

Marek



Re: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/8/22 11:36, Patrick Palka wrote:

On Mon, 7 Mar 2022, Jason Merrill wrote:


On 3/7/22 10:47, Patrick Palka wrote:

On Fri, 4 Mar 2022, Jason Merrill wrote:


On 3/4/22 14:24, Patrick Palka wrote:

Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the
LOOKUP_ONLYCONVERTING
flag), which causes do_class_deduction to always consider explicit
deduction guides when performing CTAD for a templated variable
initializer.

We could fix this by passing LOOKUP_ONLYCONVERTING appropriately when
calling cp_finish_decl from tsubst_expr, but it seems do_class_deduction
can determine if we're in a copy-init context by simply inspecting the
initializer, and thus render its flags parameter unnecessary, which is
what this patch implements.  (If we were to fix this in tsubst_expr
instead, I think we'd have to inspect the initializer in the same way
in order to detect a copy-init context?)


Hmm, does this affect conversions as well?

Looks like it does:

struct A
{
explicit operator int();
};

template  void f()
{
T t = A();
}

int main()
{
f(); // wrongly accepted
}

The reverse, initializing via an explicit constructor, is caught by code
in
build_aggr_init much like the code your patch adds to do_auto_deduction;
perhaps we should move/copy that code to cp_finish_decl?


Ah, makes sense.  Moving that code from build_aggr_init to
cp_finish_decl broke things, but using it in both spots seems to work
well.  And I suppose we might as well use it in do_class_deduction too,
since doing so lets us remove the flags parameter.


Before removing the flags parameter please try asserting that it now matches
is_copy_initialization and see if anything breaks.


I added to do_class_deduction:

   gcc_assert (bool(flags & LOOKUP_ONLYCONVERTING) == is_copy_initialization 
(init));

Turns out removing the flags parameter breaks CTAD for new-expressions
of the form 'new TT(x)' because in this case build_new passes just 'x'
as the initializer to do_auto_deduction (as opposed to a single TREE_LIST),
for which is_copy_initialization returns true even though it's really
direct initalization.

Also turns out we're similarly not passing the right LOOKUP_* flags to
cp_finish_decl from instantiate_body, which breaks consideration of
explicit conversions/deduction guides when instantiating the initializer
of a static data member.  I added some xfailed testcases for these
situations.


Maybe we want to check is_copy_initialization in cp_finish_decl?


Here's a patch that keeps the flags parameter of do_auto_deduction, and
only changes the call to cp_finish_decl from tsubst_expr:

-- >8 --

Subject: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
flag), which causes us to always consider explicit deduction guides when
performing CTAD for a templated variable initializer.

It turns out this bug also affects consideration of explicit conversion
operators for the same reason.  But consideration of explicit constructors
is unaffected and seems to work correctly thanks to code in build_aggr_init
that sets LOOKUP_ONLYCONVERTING when the initializer represents
copy-initialization.

This patch factors out the copy-initialization check from build_aggr_init
and reuses it in tsubst_expr for sake of cp_finish_decl.  This fixes
consideration of explicit dguides/conversion when instantiating the
initializer of block-scope variables, but the static data member case is
still similarly broken since those are handled from instantiate_body
not tsubst_expr.

PR c++/102137
PR c++/87820

gcc/cp/ChangeLog:

* cp-tree.h (is_copy_initialization): Declare.
* init.cc (build_aggr_init): Split out copy-initialization
check into ...
(is_copy_initialization): ... here.
(tsubst_expr) : Pass LOOKUP_ONLYCONVERTING
to cp_finish_decl when is_copy_initialization is true.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/explicit15.C: New test.
* g++.dg/cpp1z/class-deduction108.C: New test.
---
  gcc/cp/cp-tree.h  |  1 +
  gcc/cp/init.cc| 20 +++--
  gcc/cp/pt.cc  |  6 +-
  gcc/testsuite/g++.dg/cpp0x/explicit15.C   | 83 +++
  .../g++.dg/cpp1z/class-deduction108.C | 78 +
  5 files changed, 182 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/explicit15.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction108.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ac723901098..fd76909ca75 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7039,6 +7039,7 @@ extern void emit_mem_initializers (tree);
  extern tree build_aggr_init   (tree, tree, int,
  

Re: [PATCH] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-08 Thread Segher Boessenkool
On Tue, Mar 08, 2022 at 05:12:43PM +0100, Jakub Jelinek wrote:
> On Tue, Mar 08, 2022 at 09:49:15AM -0600, Segher Boessenkool wrote:
> > > But like I said above, even if we didn't copy these XVECLEN 0 rtvecs,
> > > the crash would not go away.
> > 
> > An rtvec should never have length 0.  Look at gen_rtvec for another
> > example.
> 
> That is not true.  In case of ASM_OPERANDS, lots of code relies that it
> can use ASM_OPERANDS_{INPUT,LABEL}_LENGTH without checking if
> ASM_OPERANDS_{INPUT,LABEL}_VEC is non-NULL.  Those ASM*LENGTH macros are
> defined as XVECLEN which I believe will just segfault if the vec is NULL:
> #define XVECLEN(RTX, N) GET_NUM_ELEM (XVEC (RTX, N))
> #define GET_NUM_ELEM(RTVEC) ((RTVEC)->num_elem)
> #define XVEC(RTX, N)(RTL_CHECK2 (RTX, N, 'E', 'V').rt_rtvec)
> cfgexpand.cc as Marek said will allocate even zero length vectors using
> rtvec_alloc (0):
>   rtvec argvec = rtvec_alloc (ninputs);
>   rtvec constraintvec = rtvec_alloc (ninputs);
>   rtvec labelvec = rtvec_alloc (nlabels);
> or e.g. in
>   PATTERN (insn) = gen_rtx_ASM_OPERANDS (VOIDmode, ggc_strdup (""), "", 0,
>  rtvec_alloc (0),
>  rtvec_alloc (0),
>  ASM_OPERANDS_LABEL_VEC (tmp),
>  ASM_OPERANDS_SOURCE_LOCATION(tmp));

Wow, what a mess.  And this part is completely undocumented even :-(
It seems unintentional (and wrong) to me, but yes we are in stage 4, if
we want to clean this up one way or the other, now is not the time.

In that case: your patch looks good to me Marek.


Segher


[PATCH v2] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-08 Thread Marek Polacek via Gcc-patches
On Tue, Mar 08, 2022 at 10:24:50AM -0600, Segher Boessenkool wrote:
> On Tue, Mar 08, 2022 at 05:12:43PM +0100, Jakub Jelinek wrote:
> > On Tue, Mar 08, 2022 at 09:49:15AM -0600, Segher Boessenkool wrote:
> > > > But like I said above, even if we didn't copy these XVECLEN 0 rtvecs,
> > > > the crash would not go away.
> > > 
> > > An rtvec should never have length 0.  Look at gen_rtvec for another
> > > example.
> > 
> > That is not true.  In case of ASM_OPERANDS, lots of code relies that it
> > can use ASM_OPERANDS_{INPUT,LABEL}_LENGTH without checking if
> > ASM_OPERANDS_{INPUT,LABEL}_VEC is non-NULL.  Those ASM*LENGTH macros are
> > defined as XVECLEN which I believe will just segfault if the vec is NULL:
> > #define XVECLEN(RTX, N) GET_NUM_ELEM (XVEC (RTX, N))
> > #define GET_NUM_ELEM(RTVEC) ((RTVEC)->num_elem)
> > #define XVEC(RTX, N)(RTL_CHECK2 (RTX, N, 'E', 'V').rt_rtvec)
> > cfgexpand.cc as Marek said will allocate even zero length vectors using
> > rtvec_alloc (0):
> >   rtvec argvec = rtvec_alloc (ninputs);
> >   rtvec constraintvec = rtvec_alloc (ninputs);
> >   rtvec labelvec = rtvec_alloc (nlabels);
> > or e.g. in
> >   PATTERN (insn) = gen_rtx_ASM_OPERANDS (VOIDmode, ggc_strdup (""), "", 0,
> >  rtvec_alloc (0),
> >  rtvec_alloc (0),
> >  ASM_OPERANDS_LABEL_VEC (tmp),
> >  ASM_OPERANDS_SOURCE_LOCATION(tmp));
> 
> Wow, what a mess.  And this part is completely undocumented even :-(
> It seems unintentional (and wrong) to me, but yes we are in stage 4, if
> we want to clean this up one way or the other, now is not the time.

Yeah.  I guess rtvec_alloc should either assert n > 0 or return NULL_RTVEC
when n == 0.

> In that case: your patch looks good to me Marek.

Thanks.  I've tweaked the patch to use ASM_OPERANDS_LABEL_LENGTH rather than
GET_NUM_ELEM.  I'll push it if it passes testing on x86_64-pc-linux-gnu.

-- >8 --
In r270550, Jakub fixed classify_insn to handle asm goto: if the asm can
jump to a label, the insn should be a JUMP_INSN.

However, as the following testcase shows, non-null ASM_OPERANDS_LABEL_VEC
doesn't guarantee that the rtx has any actual labels it can branch to.
Here, the rtvec has 0 elements because expand_asm_stmt created it:

  rtvec labelvec = rtvec_alloc (nlabels); // nlabels == 0

This causes an ICE in update_br_prob_note: BRANCH_EDGE (bb) crashes
because there's no branch edge.  I think we can fix this by checking
that there is at least one label the asm can jump to before wrapping
the ASM_OPERANDS in a JUMP_INSN.

PR rtl-optimization/104777

gcc/ChangeLog:

* rtl.cc (classify_insn): For ASM_OPERANDS, return JUMP_INSN only if
ASM_OPERANDS_LABEL_VEC has at least one element.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/tls/pr104777.c: New test.
---
 gcc/rtl.cc  |  4 +--
 gcc/testsuite/gcc.dg/torture/tls/pr104777.c | 30 +
 2 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/tls/pr104777.c

diff --git a/gcc/rtl.cc b/gcc/rtl.cc
index f17474bfee1..d383ae9c099 100644
--- a/gcc/rtl.cc
+++ b/gcc/rtl.cc
@@ -765,7 +765,7 @@ classify_insn (rtx x)
 return CALL_INSN;
   if (ANY_RETURN_P (x))
 return JUMP_INSN;
-  if (GET_CODE (x) == ASM_OPERANDS && ASM_OPERANDS_LABEL_VEC (x))
+  if (GET_CODE (x) == ASM_OPERANDS && ASM_OPERANDS_LABEL_LENGTH (x) > 0)
 return JUMP_INSN;
   if (GET_CODE (x) == SET)
 {
@@ -794,7 +794,7 @@ classify_insn (rtx x)
   if (has_return_p)
return JUMP_INSN;
   if (GET_CODE (XVECEXP (x, 0, 0)) == ASM_OPERANDS
- && ASM_OPERANDS_LABEL_VEC (XVECEXP (x, 0, 0)))
+ && ASM_OPERANDS_LABEL_LENGTH (XVECEXP (x, 0, 0)) > 0)
return JUMP_INSN;
 }
 #ifdef GENERATOR_FILE
diff --git a/gcc/testsuite/gcc.dg/torture/tls/pr104777.c 
b/gcc/testsuite/gcc.dg/torture/tls/pr104777.c
new file mode 100644
index 000..abaf59731fc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/tls/pr104777.c
@@ -0,0 +1,30 @@
+/* PR rtl-optimization/104777 */
+/* { dg-do compile } */
+/* { dg-require-effective-target tls } */
+ 
+int savestate_r;
+int savestate_ssb;
+extern void abort();
+__thread int  loop;
+void f (void)
+{
+  int savestate_r0_5;
+  int savestate_r1_6;
+
+  __asm__("" : "=m" (savestate_ssb), "=r" (savestate_r));
+  savestate_r0_5 = savestate_r;
+  if (savestate_r0_5 == 0)
+  {
+__asm__ __volatile__("" :  : "m" (loop));
+abort ();
+  }
+
+  __asm__("" : "=m" (savestate_ssb), "=r" (savestate_r));
+  savestate_r1_6 = savestate_r;
+  if (savestate_r1_6 != 0)
+return;
+
+  __asm__ __volatile__("" :  : "m" (loop));
+  abort ();
+
+}

base-commit: 058d19b42ad4c4c22635f70db6913a80884aedec
-- 
2.35.1



Re: [PATCH v2] rtl: ICE with thread_local and inline asm [PR104777]

2022-03-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 08, 2022 at 11:33:59AM -0500, Marek Polacek wrote:
>   PR rtl-optimization/104777
> 
> gcc/ChangeLog:
> 
>   * rtl.cc (classify_insn): For ASM_OPERANDS, return JUMP_INSN only if
>   ASM_OPERANDS_LABEL_VEC has at least one element.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/torture/tls/pr104777.c: New test.
> ---
>  gcc/rtl.cc  |  4 +--
>  gcc/testsuite/gcc.dg/torture/tls/pr104777.c | 30 +
>  2 files changed, 32 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/tls/pr104777.c
> 
> diff --git a/gcc/rtl.cc b/gcc/rtl.cc
> index f17474bfee1..d383ae9c099 100644
> --- a/gcc/rtl.cc
> +++ b/gcc/rtl.cc
> @@ -765,7 +765,7 @@ classify_insn (rtx x)
>  return CALL_INSN;
>if (ANY_RETURN_P (x))
>  return JUMP_INSN;
> -  if (GET_CODE (x) == ASM_OPERANDS && ASM_OPERANDS_LABEL_VEC (x))
> +  if (GET_CODE (x) == ASM_OPERANDS && ASM_OPERANDS_LABEL_LENGTH (x) > 0)
>  return JUMP_INSN;
>if (GET_CODE (x) == SET)
>  {
> @@ -794,7 +794,7 @@ classify_insn (rtx x)
>if (has_return_p)
>   return JUMP_INSN;
>if (GET_CODE (XVECEXP (x, 0, 0)) == ASM_OPERANDS
> -   && ASM_OPERANDS_LABEL_VEC (XVECEXP (x, 0, 0)))
> +   && ASM_OPERANDS_LABEL_LENGTH (XVECEXP (x, 0, 0)) > 0)

I think the > 0 in there is unnecessary, negative XVECLEN would be invalid
RTL.
Ok for trunk either way if it passes testing.

Jakub



RE: [aarch64] Enable FP16 feature by default for Armv9

2022-03-08 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Andre Vieira (lists) 
> Sent: Tuesday, March 8, 2022 3:20 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Kyrylo Tkachov
> 
> Subject: [aarch64] Enable FP16 feature by default for Armv9
> 
> Hi all,
> 
> This patch adds the feature bit for FP16 to the feature set for Armv9
> since Armv9 requires SVE to be implemented and SVE requires FP16 to be
> implemented.

Ok.
Thanks,
Kyrill

P.S. We may want to updated the Neoverse N2 entry in aarch64-cores.def to use 
AARCH64_FL_FOR_ARCH9. That entry was added before AARCH64_FL_FOR_ARCH9 so it 
uses the old v8.5-based flag.

> 
> 2022-03-04  Andre Vieira  
> 
>      * config/aarch64/aarch64.h (AARCH64_FL_FOR_ARCH9): Add FP16
> feature bit.


[committed] contrib: Fix gcc-descr script [PR102664]

2022-03-08 Thread Jonathan Wakely via Gcc-patches
Pushed to trunk.

-- >8 --

POSIX expr does not support the 'match' keyword, so the git-descr.sh
scripts should use ':' instead.

contrib/ChangeLog:

PR other/102664
* git-descr.sh: Use portable form of expr match.
---
 contrib/git-descr.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/git-descr.sh b/contrib/git-descr.sh
index eb258148a66..ba5d711f330 100755
--- a/contrib/git-descr.sh
+++ b/contrib/git-descr.sh
@@ -23,7 +23,7 @@ elif test x$long = xyes; then
 r=$(git describe --all --abbrev=40 --match 'basepoints/gcc-[0-9]*' $c | 
sed -n 's,^\(tags/\)\?basepoints/gcc-,r,p')
 else
 r=$(git describe --all --abbrev=14 --match 'basepoints/gcc-[0-9]*' $c | 
sed -n 's,^\(tags/\)\?basepoints/gcc-,r,p');
-expr match ${r:-no} 'r[0-9]\+$' >/dev/null && r=${r}-0-g$(git rev-parse 
$c);
+expr ${r:-no} : 'r[0-9]\+$' >/dev/null && r=${r}-0-g$(git rev-parse $c);
 fi;
 if test -n $r; then
 o=$(git config --get gcc-config.upstream);
-- 
2.34.1



Re: [PATCH] c++: non-constant non-dependent decltype folding [PR104823]

2022-03-08 Thread Patrick Palka via Gcc-patches



On Mon, 7 Mar 2022, Jason Merrill wrote:

> On 3/7/22 14:41, Patrick Palka wrote:
> > instantiate_non_dependent_expr_sfinae instantiates only potentially
> > constant expressions
> 
> Hmm, that now strikes me as a problematic interface, as we don't know whether
> what we get back is template or non-template trees.
> 
> Maybe we want to change instantiate_non_dependent_expr to checking_assert that
> the argument is non-dependent (callers are already checking that), and drop
> the potentially-constant test?

That sounds like a nice improvement.  But it happens to break

  template using type = decltype(N);

beause finish_decltype_type checks instantiation_dependent_uneval_expression_p
(which is false here) instead of instantiation_dependent_expression_p
(which is true here) before calling instantiate_non_dependent_expr, so
we end up tripping over the proposed checking_assert (which checks the
latter stronger form of dependence).

I suspect other callers of instantiate_non_dependent_expr might have a
similar problem if they use a weaker dependence check than
instantiation_dependent_expression_p, e.g. build_noexcept_spec only
checks value_dependent_expression_p.

I wonder if we should relax the proposed checking_assert in i_n_d_e, or
strengthen the dependence checks performed by its callers, or something
else?

Here's the diff I'm working with:

-- >8 --

 gcc/cp/parser.cc|  2 +-
 gcc/cp/pt.cc| 21 ++---
 gcc/cp/semantics.cc | 11 ---
 3 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 20aab5eb6b1..a570a9163b9 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -7986,7 +7986,7 @@ cp_parser_parenthesized_expression_list_elt (cp_parser 
*parser, bool cast_p,
 expr = cp_parser_assignment_expression (parser, /*pidk=*/NULL, cast_p);
 
   if (fold_expr_p)
-expr = instantiate_non_dependent_expr (expr);
+expr = fold_non_dependent_expr (expr);
 
   /* If we have an ellipsis, then this is an expression expansion.  */
   if (allow_expansion_p
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 53a74636279..1b2d9a7e4b1 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -6350,8 +6350,7 @@ redeclare_class_template (tree type, tree parms, tree 
cons)
 /* The actual substitution part of instantiate_non_dependent_expr_sfinae,
to be used when the caller has already checked
(processing_template_decl
-&& !instantiation_dependent_expression_p (expr)
-&& potential_constant_expression (expr))
+&& !instantiation_dependent_expression_p (expr))
and cleared processing_template_decl.  */
 
 tree
@@ -6365,8 +6364,7 @@ instantiate_non_dependent_expr_internal (tree expr, 
tsubst_flags_t complain)
/*integral_constant_expression_p=*/true);
 }
 
-/* Simplify EXPR if it is a non-dependent expression.  Returns the
-   (possibly simplified) expression.  */
+/* Instantiate the non-dependent expression EXPR.  */
 
 tree
 instantiate_non_dependent_expr_sfinae (tree expr, tsubst_flags_t complain)
@@ -6374,16 +6372,9 @@ instantiate_non_dependent_expr_sfinae (tree expr, 
tsubst_flags_t complain)
   if (expr == NULL_TREE)
 return NULL_TREE;
 
-  /* If we're in a template, but EXPR isn't value dependent, simplify
- it.  We're supposed to treat:
-
-   template  void f(T[1 + 1]);
-   template  void f(T[2]);
-
- as two declarations of the same function, for example.  */
-  if (processing_template_decl
-  && is_nondependent_constant_expression (expr))
+  if (processing_template_decl)
 {
+  gcc_checking_assert (!instantiation_dependent_expression_p (expr));
   processing_template_decl_sentinel s;
   expr = instantiate_non_dependent_expr_internal (expr, complain);
 }
@@ -6396,8 +6387,8 @@ instantiate_non_dependent_expr (tree expr)
   return instantiate_non_dependent_expr_sfinae (expr, tf_error);
 }
 
-/* Like instantiate_non_dependent_expr, but return NULL_TREE rather than
-   an uninstantiated expression.  */
+/* Like instantiate_non_dependent_expr, but return NULL_TREE if the
+   expression is dependent or non-constant.  */
 
 tree
 instantiate_non_dependent_or_null (tree expr)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 66d90c2f7be..8f744eb21b6 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11234,16 +11234,13 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
 }
   else if (processing_template_decl)
 {
-  /* Instantiate the non-dependent operand to diagnose any ill-formed
-expressions.  And keep processing_template_decl cleared for the rest
+  expr = instantiate_non_dependent_expr_sfinae (expr, complain);
+  if (expr == error_mark_node)
+   return error_mark_node;
+  /* Keep processing_template_decl cleared for the rest
 of the function (for sake of the call to lvalue_kind below, which
 handles templated and non-templated COND_EXPR differently).  */
  

[PATCH] mips: avoid signed overflow in LUI_OPERAND [PR104842]

2022-03-08 Thread Xi Ruoyao via Gcc-patches
I think this one obvious.  Ok for trunk?

gcc/

PR target/104842
* config/mips/mips.h (LUI_OPERAND): Cast the input to an unsigned
value before adding an offset.
---
 gcc/config/mips/mips.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index bf5c1d5a709..0029864fdcd 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -2309,7 +2309,7 @@ enum reg_class
 
 #define LUI_OPERAND(VALUE) \
   (((VALUE) | 0x7fff) == 0x7fff\
-   || ((VALUE) | 0x7fff) + 0x1 == 0)
+   || ((unsigned HOST_WIDE_INT) (VALUE) | 0x7fff) + 0x1 == 0)
 
 /* Return a value X with the low 16 bits clear, and such that
VALUE - X is a signed 16-bit value.  */
-- 
2.35.1




Re: Ping: [PATCH] PR target/102059 Fix inline of target specific functions

2022-03-08 Thread Segher Boessenkool
On Fri, Feb 11, 2022 at 12:53:07PM -0500, Michael Meissner wrote:
> Ping patch for PR target/102059 to ignore implicit -mpower8-fusion that
> prevents a function targeting power9 or power10 from inlining a function that
> declared it needed power8 via attribute/pragma target.

Can we just disable any effect from this flag, instead?  It should just
be implied by -mcpu=power8, and be impossible to be enabled otherwise
(or disabled!)


Segher


[PATCH] x86: Define LIBGCC2_UNWIND_ATTRIBUTE on ia32 [PR104781]

2022-03-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 08, 2022 at 08:09:25AM -0800, H.J. Lu wrote:
> > Ok.  So, what do you think about replacing the libgcc/ part of your patch
> > with that
> > /* __builtin_eh_return can't handle stack realignment, so disable SSE in
> >32-bit libgcc functions that call it.  */
> > #ifndef __x86_64__
> > #define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> > #endif
> > ?
> 
> Yes, it should work.

So, how do we move on with this?
I can't self-approve my own patch, so can anyone please ack the following
provided it passes bootstraps/regtests ({x86_64,i686}-linux) that are
currently pending?

That can go in independently from your patch, and if it is committed,
your V3 patch with the libgcc/ hunks removed is preapproved for trunk.

2022-03-08  Jakub Jelinek  

PR target/104781
* config/i386/i386.h (LIBGCC2_UNWIND_ATTRIBUTE): Define for ia32.

--- gcc/config/i386/i386.h.jj   2022-02-25 12:06:45.535493490 +0100
+++ gcc/config/i386/i386.h  2022-03-08 11:20:43.207043370 +0100
@@ -2848,6 +2848,12 @@ extern enum attr_cpu ix86_schedule;
 #define NUM_X86_64_MS_CLOBBERED_REGS 12
 #endif
 
+/* __builtin_eh_return can't handle stack realignment, so disable SSE in
+   32-bit libgcc functions that call it.  */
+#ifndef __x86_64__
+#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
+#endif
+
 /*
 Local variables:
 version-control: t


Jakub



[PATCH v4] x86: Disallow unsupported EH return

2022-03-08 Thread H.J. Lu via Gcc-patches
Disallow stack realignment and regparm nested function with EH return
since they don't work together.

gcc/

PR target/104781
* config/i386/i386.cc (ix86_expand_epilogue): Sorry if there is
stack realignment or regparm nested function with EH return.

gcc/testsuite/

PR target/104781
* gcc.target/i386/eh_return-1.c: Add -mincoming-stack-boundary=4.
* gcc.target/i386/eh_return-2.c: Likewise.
---
 gcc/config/i386/i386.cc | 11 +++
 gcc/testsuite/gcc.target/i386/eh_return-1.c |  2 +-
 gcc/testsuite/gcc.target/i386/eh_return-2.c |  2 +-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index efa947f9795..4121f986221 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -9444,12 +9444,15 @@ ix86_expand_epilogue (int style)
  rtx sa = EH_RETURN_STACKADJ_RTX;
  rtx_insn *insn;
 
- /* %ecx can't be used for both DRAP register and eh_return.  */
- if (crtl->drap_reg)
-   gcc_assert (REGNO (crtl->drap_reg) != CX_REG);
+ /* Stack realignment doesn't work with eh_return.  */
+ if (crtl->stack_realign_needed)
+   sorry ("Stack realignment not supported with "
+  "%<__builtin_eh_return%>");
 
  /* regparm nested functions don't work with eh_return.  */
- gcc_assert (!ix86_static_chain_on_stack);
+ if (ix86_static_chain_on_stack)
+   sorry ("regparm nested function not supported with "
+  "%<__builtin_eh_return%>");
 
  if (frame_pointer_needed)
{
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-1.c 
b/gcc/testsuite/gcc.target/i386/eh_return-1.c
index b21fd75fc93..43f94f01a97 100644
--- a/gcc/testsuite/gcc.target/i386/eh_return-1.c
+++ b/gcc/testsuite/gcc.target/i386/eh_return-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=haswell -mno-avx512f 
-mtune-ctrl=avx256_move_by_pieces" } */
+/* { dg-options "-O2 -mincoming-stack-boundary=4 -march=haswell -mno-avx512f 
-mtune-ctrl=avx256_move_by_pieces" } */
 
 struct _Unwind_Context
 {
diff --git a/gcc/testsuite/gcc.target/i386/eh_return-2.c 
b/gcc/testsuite/gcc.target/i386/eh_return-2.c
index f23f4492dac..cb762f92cc2 100644
--- a/gcc/testsuite/gcc.target/i386/eh_return-2.c
+++ b/gcc/testsuite/gcc.target/i386/eh_return-2.c
@@ -1,6 +1,6 @@
 /* PR target/101772  */
 /* { dg-do compile } */
-/* { dg-additional-options "-O0 -march=x86-64 -mstackrealign" } */
+/* { dg-additional-options "-O0 -mincoming-stack-boundary=4 -march=x86-64 
-mstackrealign" } */
 
 struct _Unwind_Context _Unwind_Resume_or_Rethrow_this_context;
 
-- 
2.35.1



Re: [PATCH] x86: Define LIBGCC2_UNWIND_ATTRIBUTE on ia32 [PR104781]

2022-03-08 Thread H.J. Lu via Gcc-patches
On Tue, Mar 8, 2022 at 9:35 AM Jakub Jelinek  wrote:
>
> On Tue, Mar 08, 2022 at 08:09:25AM -0800, H.J. Lu wrote:
> > > Ok.  So, what do you think about replacing the libgcc/ part of your patch
> > > with that
> > > /* __builtin_eh_return can't handle stack realignment, so disable SSE in
> > >32-bit libgcc functions that call it.  */
> > > #ifndef __x86_64__
> > > #define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> > > #endif
> > > ?
> >
> > Yes, it should work.
>
> So, how do we move on with this?
> I can't self-approve my own patch, so can anyone please ack the following
> provided it passes bootstraps/regtests ({x86_64,i686}-linux) that are
> currently pending?
>
> That can go in independently from your patch, and if it is committed,
> your V3 patch with the libgcc/ hunks removed is preapproved for trunk.
>
> 2022-03-08  Jakub Jelinek  
>
> PR target/104781
> * config/i386/i386.h (LIBGCC2_UNWIND_ATTRIBUTE): Define for ia32.
>
> --- gcc/config/i386/i386.h.jj   2022-02-25 12:06:45.535493490 +0100
> +++ gcc/config/i386/i386.h  2022-03-08 11:20:43.207043370 +0100
> @@ -2848,6 +2848,12 @@ extern enum attr_cpu ix86_schedule;
>  #define NUM_X86_64_MS_CLOBBERED_REGS 12
>  #endif
>
> +/* __builtin_eh_return can't handle stack realignment, so disable SSE in
> +   32-bit libgcc functions that call it.  */
> +#ifndef __x86_64__
> +#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> +#endif
> +
>  /*
>  Local variables:
>  version-control: t
>
>
> Jakub
>

LGTM.

Thanks.

-- 
H.J.


Re: [PATCH] x86: Define LIBGCC2_UNWIND_ATTRIBUTE on ia32 [PR104781]

2022-03-08 Thread H.J. Lu via Gcc-patches
On Tue, Mar 8, 2022 at 9:35 AM Jakub Jelinek  wrote:
>
> On Tue, Mar 08, 2022 at 08:09:25AM -0800, H.J. Lu wrote:
> > > Ok.  So, what do you think about replacing the libgcc/ part of your patch
> > > with that
> > > /* __builtin_eh_return can't handle stack realignment, so disable SSE in
> > >32-bit libgcc functions that call it.  */
> > > #ifndef __x86_64__
> > > #define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> > > #endif
> > > ?
> >
> > Yes, it should work.
>
> So, how do we move on with this?
> I can't self-approve my own patch, so can anyone please ack the following
> provided it passes bootstraps/regtests ({x86_64,i686}-linux) that are
> currently pending?
>
> That can go in independently from your patch, and if it is committed,
> your V3 patch with the libgcc/ hunks removed is preapproved for trunk.

I am checking in this:

https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591392.html

Thanks.

> 2022-03-08  Jakub Jelinek  
>
> PR target/104781
> * config/i386/i386.h (LIBGCC2_UNWIND_ATTRIBUTE): Define for ia32.
>
> --- gcc/config/i386/i386.h.jj   2022-02-25 12:06:45.535493490 +0100
> +++ gcc/config/i386/i386.h  2022-03-08 11:20:43.207043370 +0100
> @@ -2848,6 +2848,12 @@ extern enum attr_cpu ix86_schedule;
>  #define NUM_X86_64_MS_CLOBBERED_REGS 12
>  #endif
>
> +/* __builtin_eh_return can't handle stack realignment, so disable SSE in
> +   32-bit libgcc functions that call it.  */
> +#ifndef __x86_64__
> +#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> +#endif
> +
>  /*
>  Local variables:
>  version-control: t
>
>
> Jakub
>


-- 
H.J.


Re: [PATCH][RFC] tree-optimization/84201 - add --param vect-induction-float

2022-03-08 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> This adds a --param to allow disabling of vectorization of
> floating point inductions.  Ontop of -Ofast this should allow
> 549.fotonik3d_r to not miscompare.
>
> While I thought of a more elaborate way of disabling certain
> vectorization kinds (reductions also came to my mind) this
> for now simply uses a --param than some sophisticated -fvectorize-*
> scheme.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  I've
> verified that 549.fotonik3d_r miscompares with -Ofast -march=znver2
> and passes when adding --param vect-induction-float=0 which
> should be valid at least for peak (but I guess also base for
> FOPTIMIZE for example).  I did not benchmark against other
> workarounds (it has been said -fno-unsafe-math-optimizations
> or other similar things work as well).
>
> OK for trunk?
>
> Thanks,
> Richard.
>
> 2022-03-08  Richard Biener  
>
>   PR tree-optimization/84201
>   * params.opt (-param=vect-induction-float): Add.
>   * doc/invoke.texi (vect-induction-float): Document.
>   * tree-vect-loop.cc (vectorizable_induction): Honor
>   param_vect_induction_float.
>
>   * gcc.dg/vect/pr84201.c: New testcase.

LGTM FWIW.

Thanks,
Richard

> ---
>  gcc/doc/invoke.texi |  3 +++
>  gcc/params.opt  |  4 
>  gcc/testsuite/gcc.dg/vect/pr84201.c | 22 ++
>  gcc/tree-vect-loop.cc   |  8 
>  4 files changed, 37 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr84201.c
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index b01ffab566a..a0fa5e1cf43 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14989,6 +14989,9 @@ in an inner loop relative to the loop being 
> vectorized.  The factor applied
>  is the maximum of the estimated number of iterations of the inner loop and
>  this parameter.  The default value of this parameter is 50.
>  
> +@item vect-induction-float
> +Enable loop vectorization of floating point inductions.
> +
>  @item avoid-fma-max-bits
>  Maximum number of bits for which we avoid creating FMAs.
>  
> diff --git a/gcc/params.opt b/gcc/params.opt
> index f76f7839916..9561aa61a50 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1176,6 +1176,10 @@ Controls how loop vectorizer uses partial vectors.  0 
> means never, 1 means only
>  Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) 
> IntegerRange(1, 1) Param Optimization
>  The maximum factor which the loop vectorizer applies to the cost of 
> statements in an inner loop relative to the loop being vectorized.
>  
> +-param=vect-induction-float=
> +Common Joined UInteger Var(param_vect_induction_float) Init(1) 
> IntegerRage(0, 1) Param Optimization
> +Enable loop vectorization of floating point inductions.
> +
>  -param=vrp1-mode=
>  Common Joined Var(param_vrp1_mode) Enum(vrp_mode) Init(VRP_MODE_VRP) Param 
> Optimization
>  --param=vrp1-mode=[vrp|ranger] Specifies the mode VRP1 should operate in.
> diff --git a/gcc/testsuite/gcc.dg/vect/pr84201.c 
> b/gcc/testsuite/gcc.dg/vect/pr84201.c
> new file mode 100644
> index 000..1cc6d1ff13c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr84201.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast --param vect-induction-float=0" } */
> +
> +void foo (float *a, float f, float s, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +{
> +  a[i] = f;
> +  f += s;
> +}
> +}
> +
> +void bar (double *a, double f, double s, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +{
> +  a[i] = f;
> +  f += s;
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 2 "vect" } } */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 1f30fc82ca1..7fcec12a3e9 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -8175,6 +8175,14 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>return false;
>  }
>  
> +  if (FLOAT_TYPE_P (vectype) && !param_vect_induction_float)
> +{
> +  if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "floating point induction vectorization disabled\n");
> +  return false;
> +}
> +
>step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
>gcc_assert (step_expr != NULL_TREE);
>tree step_vectype = get_same_sized_vectype (TREE_TYPE (step_expr), 
> vectype);


[PATCH] contrib: Fix non-portable sed commands in gcc-descr [PR102664/]

2022-03-08 Thread Jonathan Wakely via Gcc-patches
This now works with Solaris /usr/xpg4/bin/sed and should work with BSD
sed too.

OK for trunk?

-- >8 --

POSIX sed does not support \? or \+ in its Basic Regular Expression
grammar. Replace the \(tags/\)\? part of the pattern with a substitution
to remove ^tags/ before other substitutions. Replace \([0-9]\+\) with
\([0-9][0-9]*\) or with \([1-9][0-9]*\) in release branch numbers, where
a leading zero does not occur.

contrib/ChangeLog:

PR other/102664
* git-descr.sh: Use portable sed commands.
* git-undescr.sh: Likewise.
---
 contrib/git-descr.sh   | 6 +++---
 contrib/git-undescr.sh | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/contrib/git-descr.sh b/contrib/git-descr.sh
index ba5d711f330..95363279d8c 100755
--- a/contrib/git-descr.sh
+++ b/contrib/git-descr.sh
@@ -18,11 +18,11 @@ do
 done
 
 if test x$short = xyes; then
-r=$(git describe --all --match 'basepoints/gcc-[0-9]*' $c | sed -n 
's,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)-\([0-9]\+\)-g[0-9a-f]*$,r\2-\3,p;s,^\(tags/\)\?basepoints/gcc-\([0-9]\+\)$,r\2-0,p');
+r=$(git describe --all --match 'basepoints/gcc-[1-9]*' $c | sed -n 
's,^tags/,,;s,^basepoints/gcc-\([1-9][0-9]*\)-\([0-9][0-9]*\)-g[0-9a-f]*$,r\1-\2,p;s,^basepoints/gcc-\([1-9][0-9]*\)$,r\1-0,p');
 elif test x$long = xyes; then
-r=$(git describe --all --abbrev=40 --match 'basepoints/gcc-[0-9]*' $c | 
sed -n 's,^\(tags/\)\?basepoints/gcc-,r,p')
+r=$(git describe --all --abbrev=40 --match 'basepoints/gcc-[1-9]*' $c | 
sed -n 's,^tags/,,;s,^basepoints/gcc-,r,p')
 else
-r=$(git describe --all --abbrev=14 --match 'basepoints/gcc-[0-9]*' $c | 
sed -n 's,^\(tags/\)\?basepoints/gcc-,r,p');
+r=$(git describe --all --abbrev=14 --match 'basepoints/gcc-[1-9]*' $c | 
sed -n 's,^tags/,,;s,^basepoints/gcc-,r,p')
 expr ${r:-no} : 'r[0-9]\+$' >/dev/null && r=${r}-0-g$(git rev-parse $c);
 fi;
 if test -n $r; then
diff --git a/contrib/git-undescr.sh b/contrib/git-undescr.sh
index 9d882a6814e..fd694077467 100755
--- a/contrib/git-undescr.sh
+++ b/contrib/git-undescr.sh
@@ -3,11 +3,11 @@
 # Script to undescribe a GCC revision
 
 o=$(git config --get gcc-config.upstream);
-r=$(echo $1 | sed -n 's,^r\([0-9]\+\)-[0-9]\+\(-g[0-9a-f]\+\)\?$,\1,p');
-n=$(echo $1 | sed -n 's,^r[0-9]\+-\([0-9]\+\)\(-g[0-9a-f]\+\)\?$,\1,p');
+r=$(echo $1 | sed -n 's,^r\([1-9][0-9]*\)-[0-9][0-9]*\(-g[0-9a-f]*\)*$,\1,p');
+n=$(echo $1 | sed -n 's,^r[1-9][0-9]*-\([0-9][0-9]*\)\(-g[0-9a-f]*\)*$,\1,p');
 
 test -z $r && echo Invalid id $1 && exit 1;
 h=$(git rev-parse --verify --quiet ${o:-origin}/releases/gcc-$r);
 test -z $h && h=$(git rev-parse --verify --quiet ${o:-origin}/master);
-p=$(git describe --all --match 'basepoints/gcc-'$r $h | sed -n 
's,^\(tags/\)\?basepoints/gcc-[0-9]\+-\([0-9]\+\)-g[0-9a-f]*$,\2,p;s,^\(tags/\)\?basepoints/gcc-[0-9]\+$,0,p');
+p=$(git describe --all --match 'basepoints/gcc-'$r $h | sed -n 
's,^tags/,,;s,^basepoints/gcc-[1-9][0-9]*-\([0-9][0-9]*\)-g[0-9a-f]*$,\1,p;s,^basepoints/gcc-[1-9][0-9]*$,0,p');
 git rev-parse --verify $h~$(expr $p - $n);
-- 
2.34.1



Re: [PATCH] contrib: Fix non-portable sed commands in gcc-descr [PR102664/]

2022-03-08 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 08, 2022 at 05:58:34PM +, Jonathan Wakely via Gcc-patches wrote:
> This now works with Solaris /usr/xpg4/bin/sed and should work with BSD
> sed too.
> 
> OK for trunk?
> 
> -- >8 --
> 
> POSIX sed does not support \? or \+ in its Basic Regular Expression
> grammar. Replace the \(tags/\)\? part of the pattern with a substitution
> to remove ^tags/ before other substitutions. Replace \([0-9]\+\) with
> \([0-9][0-9]*\) or with \([1-9][0-9]*\) in release branch numbers, where
> a leading zero does not occur.
> 
> contrib/ChangeLog:
> 
>   PR other/102664
>   * git-descr.sh: Use portable sed commands.
>   * git-undescr.sh: Likewise.

LGTM, thanks.

Jakub



Re: [PATCH] mips: avoid signed overflow in LUI_OPERAND [PR104842]

2022-03-08 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao  writes:
> I think this one obvious.  Ok for trunk?

OK, thanks.

Richard

>
> gcc/
>
>   PR target/104842
>   * config/mips/mips.h (LUI_OPERAND): Cast the input to an unsigned
>   value before adding an offset.
> ---
>  gcc/config/mips/mips.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
> index bf5c1d5a709..0029864fdcd 100644
> --- a/gcc/config/mips/mips.h
> +++ b/gcc/config/mips/mips.h
> @@ -2309,7 +2309,7 @@ enum reg_class
>  
>  #define LUI_OPERAND(VALUE)   \
>(((VALUE) | 0x7fff) == 0x7fff  \
> -   || ((VALUE) | 0x7fff) + 0x1 == 0)
> +   || ((unsigned HOST_WIDE_INT) (VALUE) | 0x7fff) + 0x1 == 0)
>  
>  /* Return a value X with the low 16 bits clear, and such that
> VALUE - X is a signed 16-bit value.  */


Re: [PATCH] c++: Don't suggest cdtor or conversion op identifiers in spelling hints [PR104806]

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/8/22 05:32, Jakub Jelinek wrote:

On Tue, Mar 08, 2022 at 10:23:28AM +0100, Richard Biener wrote:

On Tue, Mar 8, 2022 at 8:27 AM Jakub Jelinek via Gcc-patches

On the following testcase, we emit "did you mean '__dt '?" in the error
message.  "__dt " shows there because it is dtor_identifier, but we
shouldn't suggest those to the user, they are purely internal and can't
be really typed by the user because of the final space in it.


Are those maybe also DECL_ARTIFICIAL?


You mean the FUNCTION_DECLs in the TYPE_FIELDS chain?  No, they aren't.


These identifiers should have various IDENTIFIER_KIND_BIT_? set, but it 
certainly makes sense to ignore all identifiers with spaces in them. 
The patch is OK.



Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-03-08  Jakub Jelinek  

 PR c++/104806
 * search.cc (lookup_field_fuzzy_info::fuzzy_lookup_field): Ignore
 identifiers with space at the end.

 * g++.dg/spellcheck-pr104806.C: New test.

--- gcc/cp/search.cc.jj 2022-01-18 11:58:59.407984557 +0100
+++ gcc/cp/search.cc2022-03-07 10:44:33.455673155 +0100
@@ -1275,6 +1275,13 @@ lookup_field_fuzzy_info::fuzzy_lookup_fi
if (is_lambda_ignored_entity (field))
 continue;

+  /* Ignore special identifiers with space at the end like cdtor or
+conversion op identifiers.  */
+  if (TREE_CODE (DECL_NAME (field)) == IDENTIFIER_NODE)
+   if (unsigned int len = IDENTIFIER_LENGTH (DECL_NAME (field)))
+ if (IDENTIFIER_POINTER (DECL_NAME (field))[len - 1] == ' ')
+   continue;
+
m_candidates.safe_push (DECL_NAME (field));
  }
  }
--- gcc/testsuite/g++.dg/spellcheck-pr104806.C.jj   2022-03-07 
10:34:07.224499657 +0100
+++ gcc/testsuite/g++.dg/spellcheck-pr104806.C  2022-03-07 10:43:41.900399808 
+0100
@@ -0,0 +1,5 @@
+// PR c++/104806
+
+struct S {};
+int main() { S s; s.__d; } // { dg-bogus "'struct S' has no member named '__d'; 
did you mean '__\[a-z]* '" }
+   // { dg-error "'struct S' has no member named '__d'" 
"" { target *-*-* } .-1 }


Jakub





Re: [PATCH] c++: non-constant non-dependent decltype folding [PR104823]

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/8/22 12:54, Patrick Palka wrote:



On Mon, 7 Mar 2022, Jason Merrill wrote:


On 3/7/22 14:41, Patrick Palka wrote:

instantiate_non_dependent_expr_sfinae instantiates only potentially
constant expressions


Hmm, that now strikes me as a problematic interface, as we don't know whether
what we get back is template or non-template trees.

Maybe we want to change instantiate_non_dependent_expr to checking_assert that
the argument is non-dependent (callers are already checking that), and drop
the potentially-constant test?


That sounds like a nice improvement.  But it happens to break

   template using type = decltype(N);

beause finish_decltype_type checks instantiation_dependent_uneval_expression_p
(which is false here) instead of instantiation_dependent_expression_p
(which is true here) before calling instantiate_non_dependent_expr, so
we end up tripping over the proposed checking_assert (which checks the
latter stronger form of dependence).

I suspect other callers of instantiate_non_dependent_expr might have a
similar problem if they use a weaker dependence check than
instantiation_dependent_expression_p, e.g. build_noexcept_spec only
checks value_dependent_expression_p.

I wonder if we should relax the proposed checking_assert in i_n_d_e, or
strengthen the dependence checks performed by its callers, or something
else?


I think relax the assert to _uneval and strengthen callers that use 
value_dep.



Here's the diff I'm working with:

-- >8 --

  gcc/cp/parser.cc|  2 +-
  gcc/cp/pt.cc| 21 ++---
  gcc/cp/semantics.cc | 11 ---
  3 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 20aab5eb6b1..a570a9163b9 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -7986,7 +7986,7 @@ cp_parser_parenthesized_expression_list_elt (cp_parser 
*parser, bool cast_p,
  expr = cp_parser_assignment_expression (parser, /*pidk=*/NULL, cast_p);
  
if (fold_expr_p)

-expr = instantiate_non_dependent_expr (expr);
+expr = fold_non_dependent_expr (expr);
  
/* If we have an ellipsis, then this is an expression expansion.  */

if (allow_expansion_p
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 53a74636279..1b2d9a7e4b1 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -6350,8 +6350,7 @@ redeclare_class_template (tree type, tree parms, tree 
cons)
  /* The actual substitution part of instantiate_non_dependent_expr_sfinae,
 to be used when the caller has already checked
 (processing_template_decl
-&& !instantiation_dependent_expression_p (expr)
-&& potential_constant_expression (expr))
+&& !instantiation_dependent_expression_p (expr))
 and cleared processing_template_decl.  */
  
  tree

@@ -6365,8 +6364,7 @@ instantiate_non_dependent_expr_internal (tree expr, 
tsubst_flags_t complain)
/*integral_constant_expression_p=*/true);
  }
  
-/* Simplify EXPR if it is a non-dependent expression.  Returns the

-   (possibly simplified) expression.  */
+/* Instantiate the non-dependent expression EXPR.  */
  
  tree

  instantiate_non_dependent_expr_sfinae (tree expr, tsubst_flags_t complain)
@@ -6374,16 +6372,9 @@ instantiate_non_dependent_expr_sfinae (tree expr, 
tsubst_flags_t complain)
if (expr == NULL_TREE)
  return NULL_TREE;
  
-  /* If we're in a template, but EXPR isn't value dependent, simplify

- it.  We're supposed to treat:
-
-   template  void f(T[1 + 1]);
-   template  void f(T[2]);
-
- as two declarations of the same function, for example.  */
-  if (processing_template_decl
-  && is_nondependent_constant_expression (expr))
+  if (processing_template_decl)
  {
+  gcc_checking_assert (!instantiation_dependent_expression_p (expr));
processing_template_decl_sentinel s;
expr = instantiate_non_dependent_expr_internal (expr, complain);
  }
@@ -6396,8 +6387,8 @@ instantiate_non_dependent_expr (tree expr)
return instantiate_non_dependent_expr_sfinae (expr, tf_error);
  }
  
-/* Like instantiate_non_dependent_expr, but return NULL_TREE rather than

-   an uninstantiated expression.  */
+/* Like instantiate_non_dependent_expr, but return NULL_TREE if the
+   expression is dependent or non-constant.  */
  
  tree

  instantiate_non_dependent_or_null (tree expr)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 66d90c2f7be..8f744eb21b6 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -11234,16 +11234,13 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
  }
else if (processing_template_decl)
  {
-  /* Instantiate the non-dependent operand to diagnose any ill-formed
-expressions.  And keep processing_template_decl cleared for the rest
+  expr = instantiate_non_dependent_expr_sfinae (expr, complain);
+  if (expr == error_mark_node)
+   return error_mark_node;
+  /* Keep processing_template_decl cleared for the rest
 

Re: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

2022-03-08 Thread Patrick Palka via Gcc-patches
On Tue, 8 Mar 2022, Jason Merrill wrote:

> On 3/8/22 11:36, Patrick Palka wrote:
> > On Mon, 7 Mar 2022, Jason Merrill wrote:
> > 
> > > On 3/7/22 10:47, Patrick Palka wrote:
> > > > On Fri, 4 Mar 2022, Jason Merrill wrote:
> > > > 
> > > > > On 3/4/22 14:24, Patrick Palka wrote:
> > > > > > Here we're failing to communicate to cp_finish_decl from tsubst_expr
> > > > > > that we're in a copy-initialization context (via the
> > > > > > LOOKUP_ONLYCONVERTING
> > > > > > flag), which causes do_class_deduction to always consider explicit
> > > > > > deduction guides when performing CTAD for a templated variable
> > > > > > initializer.
> > > > > > 
> > > > > > We could fix this by passing LOOKUP_ONLYCONVERTING appropriately
> > > > > > when
> > > > > > calling cp_finish_decl from tsubst_expr, but it seems
> > > > > > do_class_deduction
> > > > > > can determine if we're in a copy-init context by simply inspecting
> > > > > > the
> > > > > > initializer, and thus render its flags parameter unnecessary, which
> > > > > > is
> > > > > > what this patch implements.  (If we were to fix this in tsubst_expr
> > > > > > instead, I think we'd have to inspect the initializer in the same
> > > > > > way
> > > > > > in order to detect a copy-init context?)
> > > > > 
> > > > > Hmm, does this affect conversions as well?
> > > > > 
> > > > > Looks like it does:
> > > > > 
> > > > > struct A
> > > > > {
> > > > > explicit operator int();
> > > > > };
> > > > > 
> > > > > template  void f()
> > > > > {
> > > > > T t = A();
> > > > > }
> > > > > 
> > > > > int main()
> > > > > {
> > > > > f(); // wrongly accepted
> > > > > }
> > > > > 
> > > > > The reverse, initializing via an explicit constructor, is caught by
> > > > > code
> > > > > in
> > > > > build_aggr_init much like the code your patch adds to
> > > > > do_auto_deduction;
> > > > > perhaps we should move/copy that code to cp_finish_decl?
> > > > 
> > > > Ah, makes sense.  Moving that code from build_aggr_init to
> > > > cp_finish_decl broke things, but using it in both spots seems to work
> > > > well.  And I suppose we might as well use it in do_class_deduction too,
> > > > since doing so lets us remove the flags parameter.
> > > 
> > > Before removing the flags parameter please try asserting that it now
> > > matches
> > > is_copy_initialization and see if anything breaks.
> > 
> > I added to do_class_deduction:
> > 
> >gcc_assert (bool(flags & LOOKUP_ONLYCONVERTING) == is_copy_initialization
> > (init));
> > 
> > Turns out removing the flags parameter breaks CTAD for new-expressions
> > of the form 'new TT(x)' because in this case build_new passes just 'x'
> > as the initializer to do_auto_deduction (as opposed to a single TREE_LIST),
> > for which is_copy_initialization returns true even though it's really
> > direct initalization.
> > 
> > Also turns out we're similarly not passing the right LOOKUP_* flags to
> > cp_finish_decl from instantiate_body, which breaks consideration of
> > explicit conversions/deduction guides when instantiating the initializer
> > of a static data member.  I added some xfailed testcases for these
> > situations.
> 
> Maybe we want to check is_copy_initialization in cp_finish_decl?

That seems to work nicely :) All xfailed tests for the static data
member initialization case now also pass.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

Subject: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
flag), which causes us to always consider explicit deduction guides when
performing CTAD for a templated variable initializer.

It turns out this bug also affects consideration of explicit conversion
operators for the same reason.  But consideration of explicit constructors
seems to do the right thing thanks to code in build_aggr_init that sets
LOOKUP_ONLYCONVERTING when the initializer represents copy-initialization.

This patch fixes this by making cp_finish_decl set LOOKUP_ONLYCONVERTING
by inspecting the initializer like build_aggr_init does, so that callers
don't need to explicitly pass this flag.

PR c++/102137
PR c++/87820

gcc/cp/ChangeLog:

* cp-tree.h (is_copy_initialization): Declare.
* decl.cc (cp_finish_decl): Set LOOKUP_ONLYCONVERTING
when is_copy_initialization is true.
* init.cc (build_aggr_init): Split out copy-initialization
check into ...
(is_copy_initialization): ... here.
* pt.cc (instantiate_decl): Pass 0 instead of
LOOKUP_ONLYCONVERTING as flags to cp_finish_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/explicit15.C: New test.
* g++.dg/cpp1z/class-deduction108.C: New test.
---
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/decl.cc|  3 +
 gcc/cp/init.cc 

Re: [PATCH v8 06/12] LoongArch Port: Builtin functions.

2022-03-08 Thread Richard Sandiford via Gcc-patches
xucheng...@loongson.cn writes:
> +#ifndef _GCC_LOONGARCH_BASE_INTRIN_H
> +#define _GCC_LOONGARCH_BASE_INTRIN_H
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +typedef struct drdtime
> +{
> +  unsigned long dvalue;
> +  unsigned long dtimeid;
> +} __drdtime_t;
> +
> +typedef struct rdtime
> +{
> +  unsigned int value;
> +  unsigned int timeid;
> +} __rdtime_t;
> +
> +#ifdef __loongarch64
> +extern __inline __drdtime_t
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +__builtin_loongarch_rdtime_d (void)
> +{
> +  __drdtime_t drdtime;
> +  __asm__ volatile (
> +"rdtime.d\t%[val],%[tid]\n\t"
> +: [val]"=&r"(drdtime.dvalue),[tid]"=&r"(drdtime.dtimeid)
> +:);
> +  return drdtime;

It's usually better to use __foo names for local variables and
parameters, in case the user defines a macro called (in this case)
drdtime.

> +}
> +#define __rdtime_d __builtin_loongarch_rdtime_d

Are both of these names “public”?  In other words, can users use
__builtin_longarch_rdtime_d directly, instead of using __rdtime_d?

If only __rdtime_d is public then it might be better to define
the function directly, since that will give better error messages.

> […]
> +#if defined __loongarch64
> +/* Assembly instruction format:  ui5, rj, si12.  */
> +/* Data types in instruction templates:  VOID, USI, UDI, SI.  */
> +#define __dcacop(/*ui5*/ _1, /*unsigned long int*/ _2, /*si12*/ _3) \
> +  ((void) __builtin_loongarch_dcacop ((_1), (unsigned long int) (_2), (_3)))
> +#else
> +#error "Don't support this ABI."

“Unsupported ABI” might be better.  Same for the rest of the file.

> +#endif
> […]
> +/* Invoke MACRO (COND) for each fcmp.cond.{s/d} condition.  */
> +#define LARCH_FP_CONDITIONS(MACRO) \
> +  MACRO (f), \
> +  MACRO (un),\
> +  MACRO (eq),\
> +  MACRO (ueq),   \
> +  MACRO (olt),   \
> +  MACRO (ult),   \
> +  MACRO (ole),   \
> +  MACRO (ule),   \
> +  MACRO (sf),\
> +  MACRO (ngle),  \
> +  MACRO (seq),   \
> +  MACRO (ngl),   \
> +  MACRO (lt),\
> +  MACRO (nge),   \
> +  MACRO (le),\
> +  MACRO (ngt)
> +
> +/* Enumerates the codes above as LARCH_FP_COND_.  */
> +#define DECLARE_LARCH_COND(X) LARCH_FP_COND_##X
> +enum loongarch_fp_condition
> +{
> +  LARCH_FP_CONDITIONS (DECLARE_LARCH_COND)
> +};
> +#undef DECLARE_LARCH_COND
> +
> +/* Index X provides the string representation of LARCH_FP_COND_.  */
> +#define STRINGIFY(X) #X
> +const char *const
> +loongarch_fp_conditions[16]= {LARCH_FP_CONDITIONS (STRINGIFY)};
> +#undef STRINGIFY

It doesn't look like the code above is needed, since none of the current
built-ins have a condition code attached.

Same applies to the later “cond” field and related comments.

> +
> +/* Declare an availability predicate for built-in functions that require
> + * COND to be true.  NAME is the main part of the predicate's name.  */

Formatting nit: GNU style is not to have the “*” at the start
of the line.

> +#define AVAIL_ALL(NAME, COND) \
> +  static unsigned int \
> +  loongarch_builtin_avail_##NAME (void) \
> +  { \
> +return (COND) ? 1 : 0; \
> +  }
> +
> +static unsigned int
> +loongarch_builtin_avail_default (void)
> +{
> +  return 1;
> +}
> +/* This structure describes a single built-in function.  */
> +struct loongarch_builtin_description

Very minor nit, sorry, but: missing blank line before the comment.

> […]
> +/* Loongson support crc.  */
> +#define CODE_FOR_loongarch_crc_w_b_w CODE_FOR_crc_w_b_w
> +#define CODE_FOR_loongarch_crc_w_h_w CODE_FOR_crc_w_h_w
> +#define CODE_FOR_loongarch_crc_w_w_w CODE_FOR_crc_w_w_w
> +#define CODE_FOR_loongarch_crc_w_d_w CODE_FOR_crc_w_d_w
> +#define CODE_FOR_loongarch_crcc_w_b_w CODE_FOR_crcc_w_b_w
> +#define CODE_FOR_loongarch_crcc_w_h_w CODE_FOR_crcc_w_h_w
> +#define CODE_FOR_loongarch_crcc_w_w_w CODE_FOR_crcc_w_w_w
> +#define CODE_FOR_loongarch_crcc_w_d_w CODE_FOR_crcc_w_d_w
> +
> +/* Privileged state instruction.  */
> +#define CODE_FOR_loongarch_cpucfg CODE_FOR_cpucfg
> +#define CODE_FOR_loongarch_asrtle_d CODE_FOR_asrtle_d
> +#define CODE_FOR_loongarch_asrtgt_d CODE_FOR_asrtgt_d
> +#define CODE_FOR_loongarch_csrrd CODE_FOR_csrrd
> +#define CODE_FOR_loongarch_dcsrrd CODE_FOR_dcsrrd
> +#define CODE_FOR_loongarch_csrwr CODE_FOR_csrwr
> +#define CODE_FOR_loongarch_dcsrwr CODE_FOR_dcsrwr
> +#define CODE_FOR_loongarch_csrxchg CODE_FOR_csrxchg
> +#define CODE_FOR_loongarch_dcsrxchg CODE_FOR_dcsrxchg
> +#define CODE_FOR_loongarch_iocsrrd_b CODE_FOR_iocsrrd_b
> +#define CODE_FOR_loongarch_iocsrrd_h CODE_FOR_iocsrrd_h
> +#define CODE_FOR_loongarch_iocsrrd_w CODE_FOR_iocsrrd_w
> +#define CODE_FOR_loongarch_iocsrrd_d CODE_FOR_iocsrrd_d
> +#define CODE_FOR_loongarch_iocsrwr_b CODE_FOR_iocsrwr_b
> +#define CODE_FOR_loongarch_iocsrwr_h CODE_FOR_iocsrwr_h
> +#define CODE_FOR_loongarch_iocsrwr_w CODE_FOR_iocsrwr_w
> +#define CODE_FOR_loongarch_iocsrwr_d CODE_FOR_iocsrwr_d
> +#define CODE_FOR_loongarch_lddir CODE_FOR_lddir
> +#define COD

Re: [PATCH v8 08/12] LoongArch Port: libgcc

2022-03-08 Thread Richard Sandiford via Gcc-patches
xucheng...@loongson.cn writes:
> diff --git a/libgcc/config/loongarch/crti.S b/libgcc/config/loongarch/crti.S
> new file mode 100644
> index 000..27b7eab3626
> --- /dev/null
> +++ b/libgcc/config/loongarch/crti.S
> @@ -0,0 +1,43 @@
> +/* Copyright (C) 2021-2022 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +.  */
> +
> +/* 4 slots for argument spill area.  1 for cpreturn, 1 for stack.
> +   Return spill offset of 8.  Aligned to 16 bytes for lp64.  */

The comment doesn't apply to the Loongson code.  Probably best
to delete it.

Same for crtn.S.

> +
> + .section .init,"ax",@progbits
> + .globl  _init
> + .type   _init,@function
> +_init:
> + addi.d   $r3,$r3,-16
> + st.d  $r1,$r3,8
> + addi.d   $r3,$r3,16
> + jirl$r0,$r1,0
> +
> + .section .fini,"ax",@progbits
> + .globl  _fini
> + .type   _fini,@function
> +_fini:
> + addi.d   $r3,$r3,-16
> + st.d  $r1,$r3,8
> + addi.d   $r3,$r3,16
> + jirl$r0,$r1,0

Are you sure this is right?  It looks like it pushes LR and then
immediately pops it and returns, which would have the effect of
bypassing the rest of the .init and .fini code.

The idea instead is that .init starts with the code in crti.S,
then contains any .init code linked in from .o files, then ends
with the .init code in crtn.S.  Same for .fini.

Looks good to me otherwise.

Thanks,
Richard


Re: [PATCH v8 11/12] LoongArch Port: gcc/testsuite

2022-03-08 Thread Richard Sandiford via Gcc-patches
xucheng...@loongson.cn writes:
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 737e1a8913b..843b508b010 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -286,6 +286,10 @@ proc check_configured_with { pattern } {
>  proc check_weak_available { } {
>  global target_cpu
>  
> +if { [ string first "loongarch" $target_cpu ] >= 0 } {
> +return 1
> +}
> +
>  # All mips targets should support it
>  
>  if { [ string first "mips" $target_cpu ] >= 0 } {

For modern targets, the procedure ought to give the right answer without
this change.  I'm not sure off-hand which MIPS target required the
special case, but it's probably not one we support any more.

It would be good to have tests in gcc.target/loongarch that cover
all of the intrinsics defined in larchintrin.h.

Looks good to me otherwise, thanks.

Richard


[committed] analyzer: more test coverage of leak detection [PR99771]

2022-03-08 Thread David Malcolm via Gcc-patches
Successfully tested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7541-gb7175f36812b32d3de242f15c065b9cb68e957a9.

gcc/testsuite/ChangeLog:
PR analyzer/99771
* gcc.dg/analyzer/leak-4.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/testsuite/gcc.dg/analyzer/leak-4.c | 103 +
 1 file changed, 103 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/leak-4.c

diff --git a/gcc/testsuite/gcc.dg/analyzer/leak-4.c 
b/gcc/testsuite/gcc.dg/analyzer/leak-4.c
new file mode 100644
index 000..75090e6be83
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/leak-4.c
@@ -0,0 +1,103 @@
+/* Various tests of memory leak detection.  */
+
+#include 
+
+/* Example of an leak due to incomplete cleanup when freeing a struct.  */
+
+struct s1
+{
+  void *ptr;
+};
+
+void test_1 (void)
+{
+  struct s1 *a = malloc (sizeof (struct s1));
+  if (!a)
+return;
+  a->ptr = malloc (1024); /* { dg-message "allocated here" } */
+  free (a); /* { dg-warning "leak of ''" } */
+  /* TODO: we should print "a->ptr' here, rather than ''
+ (PR analyzer/99771).  */
+}
+
+
+/* Examples involving arrays.  */
+
+struct s2
+{
+  void *m_arr[10];
+};
+
+void test_2a (void)
+{
+  struct s2 arr[5];
+  arr[3].m_arr[4] = malloc (1024); /* { dg-message "allocated here" } */
+} /* { dg-warning "leak of 'arr\\\[3\\\].m_arr\\\[4\\\]'" } */
+
+void test_2b (int i)
+{
+  struct s2 arr[5];
+  arr[3].m_arr[i] = malloc (1024); /* { dg-message "allocated here" } */
+} /* { dg-warning "leak of 'arr\\\[3\\\].m_arr\\\[i\\\]'" } */
+
+void test_2c (int i)
+{
+  struct s2 arr[5];
+  arr[i].m_arr[4] = malloc (1024); /* { dg-message "allocated here" } */
+} /* { dg-warning "leak of 'arr\\\[i\\\].m_arr\\\[4\\\]'" } */
+
+void test_2d (int i, int j)
+{
+  struct s2 arr[5];
+  arr[i].m_arr[j] = malloc (1024); /* { dg-message "allocated here" } */
+} /* { dg-warning "leak of 'arr\\\[i\\\].m_arr\\\[j\\\]'" } */
+
+
+/* Example involving fields.  */
+
+struct s3
+{
+  struct s3 *m_left;
+  struct s3 *m_right;  
+};
+
+void test_3 (void)
+{
+  struct s3 *a = malloc (sizeof (struct s3));
+  a->m_right = malloc (sizeof (struct s3)); /* { dg-warning "dereference of 
possibly-NULL 'a'" } */
+  a->m_right->m_left = malloc (sizeof (struct s3)); /* { dg-warning 
"dereference of possibly-NULL '\\*a.m_right'" } */
+} /* { dg-warning "leak of 'a'" "leak of a" } */
+/* { dg-warning "leak of ''" "leak of unknown" { target *-*-* } .-1 } 
*/
+/* TODO: rather than '', we should print 'a->m_right'
+   and 'a->m_right->m_left' (PR analyzer/99771).  */
+
+
+/* Example involving faking inheritance via casts.  */
+
+struct s4_base
+{
+  int m_placeholder;
+};
+
+struct s4_sub
+{
+  void *m_buffer;
+};
+
+static struct s4_sub *
+make_s4_sub (void)
+{
+  struct s4_sub *sub = malloc (sizeof (struct s4_sub)); /* { dg-message 
"allocated here" } */
+  if (!sub)
+return NULL;
+  sub->m_buffer = malloc (1024); /* { dg-message "allocated here" } */
+  return sub;
+}
+
+void test_4 (void)
+{
+  struct s4_base *base = (struct s4_base *)make_s4_sub ();
+} /* { dg-warning "leak of 'base'" "leak of base" } */
+/* { dg-warning "leak of ''" "leak of sub buffer" { target *-*-* } 
.-1 } */
+/* TODO: rather than 'unknown', we should print something
+   like '((struct s4_sub *)base)->m_buffer' (PR analyzer/99771).  */
-- 
2.26.3



[pushed] Darwin: Address a translation comment [PR104552].

2022-03-08 Thread Iain Sandoe via Gcc-patches
This amends an error message to correct punctuation and a little
better wording.

bootstrapped on x86_64-darwin18, pushed to master,
thanks, Iain

Signed-off-by: Iain Sandoe 

PR translation/104552

gcc/ChangeLog:

* config/host-darwin.cc (darwin_gt_pch_get_address): Amend
the PCH out of memory error message punctuation and wording.
---
 gcc/config/host-darwin.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/host-darwin.cc b/gcc/config/host-darwin.cc
index d4289aeb4d3..01554d565f9 100644
--- a/gcc/config/host-darwin.cc
+++ b/gcc/config/host-darwin.cc
@@ -104,7 +104,7 @@ darwin_gt_pch_get_address (size_t sz, int fd)
  space.  */
   if (addr == (void *) MAP_FAILED)
 {
-  error ("PCH memory not available %m");
+  error ("PCH memory is not available: %m");
   return NULL;
 }
 
-- 
2.24.3 (Apple Git-128)



Re: [PATCH v8 08/12] LoongArch Port: libgcc

2022-03-08 Thread Andreas Schwab
On Mär 08 2022, Richard Sandiford via Gcc-patches wrote:

>> +
>> +.section .init,"ax",@progbits
>> +.globl  _init
>> +.type   _init,@function
>> +_init:
>> +addi.d   $r3,$r3,-16
>> +st.d  $r1,$r3,8
>> +addi.d   $r3,$r3,16
>> +jirl$r0,$r1,0
>> +
>> +.section .fini,"ax",@progbits
>> +.globl  _fini
>> +.type   _fini,@function
>> +_fini:
>> +addi.d   $r3,$r3,-16
>> +st.d  $r1,$r3,8
>> +addi.d   $r3,$r3,16
>> +jirl$r0,$r1,0
>
> Are you sure this is right?  It looks like it pushes LR and then
> immediately pops it and returns, which would have the effect of
> bypassing the rest of the .init and .fini code.
>
> The idea instead is that .init starts with the code in crti.S,
> then contains any .init code linked in from .o files, then ends
> with the .init code in crtn.S.  Same for .fini.

New architectures should not use .init/.fini at all.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] c++: Don't allow type-constraint auto(x) [PR104752]

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/2/22 14:31, Marek Polacek wrote:

104752 points out that

   template
   concept C = true;
   auto y = C auto(1);

is ill-formed as per [dcl.type.auto.deduct]: "For an explicit type conversion,
T is the specified type, which shall be auto." which doesn't allow
type-constraint auto.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/104752

gcc/cp/ChangeLog:

* semantics.cc (finish_compound_literal): Disallow auto{x} for
is_constrained_auto.
* typeck2.cc (build_functional_cast_1): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/auto-fncast12.C: New test.
---
  gcc/cp/semantics.cc| 1 +
  gcc/cp/typeck2.cc  | 1 +
  gcc/testsuite/g++.dg/cpp23/auto-fncast12.C | 8 
  3 files changed, 10 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast12.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index a2c0eb050e6..5129b12f00f 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -3148,6 +3148,7 @@ finish_compound_literal (tree type, tree compound_literal,
/* C++23 auto{x}.  */
else if (is_auto (type)
   && !AUTO_IS_DECLTYPE (type)
+  && !is_constrained_auto (type)
   && CONSTRUCTOR_NELTS (compound_literal) == 1)
  {
if (cxx_dialect < cxx23)
diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 39d03e4c3c4..c9314bbeb6f 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -2305,6 +2305,7 @@ build_functional_cast_1 (location_t loc, tree exp, tree 
parms,
init = parms;
/* C++23 auto(x).  */
else if (!AUTO_IS_DECLTYPE (anode)
+  && !is_constrained_auto (anode)
   && list_length (parms) == 1)
{
  init = TREE_VALUE (parms);


We might get a better diagnostic by moving this test inside the block 
and saying that the auto can't be constrained, instead of "invalid use 
of auto"?



diff --git a/gcc/testsuite/g++.dg/cpp23/auto-fncast12.C 
b/gcc/testsuite/g++.dg/cpp23/auto-fncast12.C
new file mode 100644
index 000..f513f7c9325
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/auto-fncast12.C
@@ -0,0 +1,8 @@
+// PR c++/104752
+// { dg-do compile { target c++23 } }
+
+template
+concept C = true;
+auto x = auto(1); // valid (P0849R8)
+auto y = C auto(1);   // { dg-error "invalid use" }
+auto z = C auto{1};   // { dg-error "invalid use" }

base-commit: 8977f4bec650bb6975792772245b07b722ee9843




Re: [PATCH] c++: Attribute deprecated/unavailable divergence

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/2/22 14:31, Marek Polacek wrote:

Attributes deprecated and unavailable are largely the same, except
that the former produces a warning whereas the latter produces an error.
So is_late_template_attribute should treat them the same.  Confirmed by
Iain that this divergence is not intentional:
.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk or defer to GCC 13?


OK.


gcc/cp/ChangeLog:

* decl2.cc (is_late_template_attribute): Do not defer attribute
unavailable.
* pt.cc (tsubst_enum): Set TREE_UNAVAILABLE.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-unavailable-9.C: Add dg-error.
---
  gcc/cp/decl2.cc   | 1 +
  gcc/cp/pt.cc  | 4 +---
  gcc/testsuite/g++.dg/ext/attr-unavailable-9.C | 4 ++--
  3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 22edc2ba7f9..2752426546c 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -1314,6 +1314,7 @@ is_late_template_attribute (tree attr, tree decl)
   /* But some attributes specifically apply to templates.  */
   && !is_attribute_p ("abi_tag", name)
   && !is_attribute_p ("deprecated", name)
+  && !is_attribute_p ("unavailable", name)
   && !is_attribute_p ("visibility", name))
return true;
else
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8fb17349ee1..853738410be 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -26976,9 +26976,7 @@ tsubst_enum (tree tag, tree newtag, tree args)
DECL_SOURCE_LOCATION (TYPE_NAME (newtag))
  = DECL_SOURCE_LOCATION (TYPE_NAME (tag));
TREE_DEPRECATED (newtag) = TREE_DEPRECATED (tag);
-  /* We don't need to propagate TREE_UNAVAILABLE here, because it is, unlike
- deprecated, applied at instantiation time rather than template
- definition time.  */
+  TREE_UNAVAILABLE (newtag) = TREE_UNAVAILABLE (tag);
  }
  
  /* DECL is a FUNCTION_DECL that is a template specialization.  Return

diff --git a/gcc/testsuite/g++.dg/ext/attr-unavailable-9.C 
b/gcc/testsuite/g++.dg/ext/attr-unavailable-9.C
index 44161336e78..6df55d534f8 100644
--- a/gcc/testsuite/g++.dg/ext/attr-unavailable-9.C
+++ b/gcc/testsuite/g++.dg/ext/attr-unavailable-9.C
@@ -3,10 +3,10 @@
  /* { dg-options "" } */
  
  template struct __attribute__ ((unavailable)) S {};

-S s;
+S s;// { dg-error "unavailable" }
  
  template  class T> struct A { };

-A a;
+A a;  // { dg-error "unavailable" }
  
  template  void f() __attribute__ ((unavailable));
  


base-commit: 8fede2876a751d53a28442dcca32466daa929daa




Re: [PATCH v8 12/12] LoongArch Port: Add doc.

2022-03-08 Thread Richard Sandiford via Gcc-patches
xucheng...@loongson.cn writes:
> From: chenglulu 
>
> 2022-03-04  Chenghua Xu  
>   Lulu Cheng  
>
>   * contrib/config-list.mk: Add LoongArch triplet.
>   * gcc/doc/install.texi: Add LoongArch options section.
>   * gcc/doc/invoke.texi: Add LoongArch options section.
>   * gcc/doc/md.texi: Add LoongArch options section.
> ---
>  contrib/config-list.mk |   5 +-
>  gcc/doc/install.texi   |  47 +-
>  gcc/doc/invoke.texi| 202 +
>  gcc/doc/md.texi|  55 +++
>  4 files changed, 303 insertions(+), 6 deletions(-)
>
> diff --git a/contrib/config-list.mk b/contrib/config-list.mk
> index 3e1d1321861..ba6f12e4693 100644
> --- a/contrib/config-list.mk
> +++ b/contrib/config-list.mk
> @@ -57,7 +57,10 @@ LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
>i686-wrs-vxworksae \
>i686-cygwinOPT-enable-threads=yes i686-mingw32crt ia64-elf \
>ia64-freebsd6 ia64-linux ia64-hpux ia64-hp-vms iq2000-elf lm32-elf \
> -  lm32-rtems lm32-uclinux m32c-rtems m32c-elf m32r-elf m32rle-elf \
> +  lm32-rtems lm32-uclinux \
> +  loongarch64-linux-gnu loongarch64-linux-gnuf64 \
> +  loongarch64-linux-gnuf32 loongarch64-linux-gnusf \

If I've understood correctly, loongarch64-linux-gnu defaults to
the same ABI as loongarch64-linux-gnuf64, is that right?  If so,
it's probably worth dropping one of them from this list to reduce
duplication.  In other words, it feels like there should just be
3 entries here rather than 4.

> […]
> @@ -1254,6 +1255,14 @@ profile.  The union of these options is considered 
> when specifying both
>  @code{-mfloat-abi=hard}
>  @end multitable
>  
> +@item loongarch*-*-*
> +@var{list} is a comma-separated list of the following ABI identifiers:
> +@code{lp64d[/base]} @code{lp64f[/base]} @code{lp64d[/base]}, where the
> +@code{/base} suffix may be omitted, to enable their respective run-time
> +libraries.  If @var{list} is empty, @code{default}
> +or @option{--with-multilib-list} is not specified, then the default ABI

Maybe clearer as:

  If @var{list} is empty or @code{default}, or if
  @option{--with-multilib-list} is not specified, […]

> +as specified by @option{--with-abi} or implied by @option{--target} is 
> selected.
> +
>  @item riscv*-*-*
>  @var{list} is a single ABI name.  The target architecture must be either
>  @code{rv32gc} or @code{rv64gc}.  This will build a single multilib for the
> […]
> @@ -995,6 +995,16 @@ Objective-C and Objective-C++ Dialects}.
>  @gccoptlist{-mbarrel-shift-enabled  -mdivide-enabled  -mmultiply-enabled @gol
>  -msign-extend-enabled  -muser-enabled}
>  
> +@emph{LoongArch Options}
> +@gccoptlist{-march=@var{cpu-type}  -mtune=@var{cpu-type} 
> -mabi=@var{base-abi-type} @gol
> +-mfpu=@var{fpu-type} -msoft-float -msingle-float -mdouble-float @gol
> +-mbranch-cost=@var{n}  -mcheck-zero-division -mno-check-zero-division @gol
> +-mcond-move-int  -mno-cond-move-int @gol
> +-mcond-move-float  -mno-cond-move-float @gol
> +-memcpy  -mno-memcpy -mstrict-align -mno-strict-align @gol
> +-mmax-inline-memcpy-size=@var{n} @gol
> +-mlra -mcmodel=@var{code-model}}

Following on from earlier comments, please remove -mlra :-)
(Or more specifically, -mno-lra.)

> +
>  @emph{M32R/D Options}
>  @gccoptlist{-m32r2  -m32rx  -m32r @gol
>  -mdebug @gol
> @@ -18863,6 +18873,7 @@ platform.
>  * HPPA Options::
>  * IA-64 Options::
>  * LM32 Options::
> +* LoongArch Options::
>  * M32C Options::
>  * M32R/D Options::
>  * M680x0 Options::
> @@ -24378,6 +24389,197 @@ Enable user-defined instructions.
>  
>  @end table
>  
> +@node LoongArch Options
> +@subsection LoongArch Options
> +@cindex LoongArch Options
> +
> +These command-line options are defined for LoongArch targets:
> +
> +@table @gcctabopt
> +@item -march=@var{cpu-type}
> +@opindex -march
> +Generate instructions for the machine type @var{cpu-type}.  In contrast to
> +@option{-mtune=@var{cpu-type}}, which merely tunes the generated code
> +for the specified @var{cpu-type}, @option{-march=@var{cpu-type}} allows GCC
> +to generate code that may not run at all on processors other than the one
> +indicated.  Specifying @option{-march=@var{cpu-type}} implies
> +@option{-mtune=@var{cpu-type}}, except where noted otherwise.
> +
> +The choices for @var{cpu-type} are:
> +
> +@table @samp
> +@item native
> +This selects the CPU to generate code for at compilation time by determining
> +the processor type of the compiling machine.  Using @option{-march=native}
> +enables all instruction subsets supported by the local machine (hence
> +the result might not run on different machines).  Using 
> @option{-mtune=native}
> +produces code optimized for the local machine under the constraints
> +of the selected instruction set.
> +@item loongarch64
> +A generic CPU with 64-bit extensions.
> +@item la464
> +LoongArch LA464 CPU with LBT, LSX, LASX, LVZ.
> +@end table
> +
> +
> +@item -mtune=@var{cpu-type}
> +@opindex mtune
> +Optimize the output for the given proc

Re: [PATCH v8 00/12] Add LoongArch support.

2022-03-08 Thread Richard Sandiford via Gcc-patches
Xi Ruoyao via Gcc-patches  writes:
> On Fri, 2022-03-04 at 15:17 +0800, xucheng...@loongson.cn wrote:
>
>> The binutils has been merged into trunk:
>> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=560b3fe208255ae909b4b1c88ba9c28b09043307
>> 
>> Note: We split -mabi= into -mabi=lp64d/f/s, the new options not support by 
>> upstream binutils yet,
>> this GCC port requires the following patch applied to binutils to build.
>> https://github.com/loongson/binutils-gdb/commit/aacb0bf860f02aa5a7dcb76dd0e392bf871c7586
>> (will be submitted to upstream after gcc side comfirmed)
>
> I think you don't need a review for binutils change here.  You should
> get it reviewed and applied in binutils-gdb ASAP.  Then in install.texi
> you would add a note like "loongarch64-*-* requires binutils >= 2.39" in
> "Target specific installation notes", as an unpatched 2.38 does not
> work.
>
> And based on the history of RISC-V port
> (https://gcc.gnu.org/pipermail/gcc/2017-January/222595.html) the process
> for a new port seems:
>
> 1. Get a permission from the Steering Committee.
> 2. Add one or two port maintainers into MAINTAINERS file.
> 3. Now the technical reviewing of the patch series just begin.
>
>
> I'm not an expert in software engineering (or social interaction :) and
> I don't know if the process has been changed in these years.

I'm not sure either, but yeah, this is what I understood the process to be.

On the technical side: I've gone through the series and sent comments
about some of the patches.  The ones I didn't reply to looked good as-is.

Generally the series looks in very good shape to me FWIW.  There are no
target-independent changes, so I agree there's no reason to delay the
patches until GCC 13.  If for some reason they don't go in before
GCC 12.1, they would be safe to backport to GCC 12.2.

Thanks,
Richard


Re: [PING PATCH 3/3] rs6000: Move more g++.dg powerpc tests to g++.target

2022-03-08 Thread Paul A. Clarke via Gcc-patches
Ping.

On Mon, Feb 21, 2022 at 03:17:47PM -0600, Paul A. Clarke via Gcc-patches wrote:
> Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
> longer required.
> 
> 2021-02-21  Paul A. Clarke  
> 
> gcc/testsuite
>   * g++.dg/debug/dwarf2/const2.C: Move to g++.target/powerpc.
>   * g++.dg/other/darwin-minversion-1.C: Likewise.
>   * g++.dg/eh/ppc64-sighandle-cr.C: Likewise.
>   * g++.dg/eh/simd-5.C: Likewise.
>   * g++.dg/eh/simd-4.C: Move to g++.target/powerpc, adjust dg directives.
>   * g++.dg/eh/uncaught3.C: Likewise.
>   * g++.dg/other/spu2vmx-1.C: Likewise.
> ---
>  .../{g++.dg/debug/dwarf2 => g++.target/powerpc}/const2.C| 0
>  .../{g++.dg/other => g++.target/powerpc}/darwin-minversion-1.C  | 0
>  .../{g++.dg/eh => g++.target/powerpc}/ppc64-sighandle-cr.C  | 0
>  gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-4.C| 2 +-
>  gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-5.C| 0
>  gcc/testsuite/{g++.dg/other => g++.target/powerpc}/spu2vmx-1.C  | 2 +-
>  gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/uncaught3.C | 2 +-
>  7 files changed, 3 insertions(+), 3 deletions(-)
>  rename gcc/testsuite/{g++.dg/debug/dwarf2 => g++.target/powerpc}/const2.C 
> (100%)
>  rename gcc/testsuite/{g++.dg/other => 
> g++.target/powerpc}/darwin-minversion-1.C (100%)
>  rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/ppc64-sighandle-cr.C 
> (100%)
>  rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-4.C (95%)
>  rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/simd-5.C (100%)
>  rename gcc/testsuite/{g++.dg/other => g++.target/powerpc}/spu2vmx-1.C (84%)
>  rename gcc/testsuite/{g++.dg/eh => g++.target/powerpc}/uncaught3.C (96%)
> 
> diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/const2.C 
> b/gcc/testsuite/g++.target/powerpc/const2.C
> similarity index 100%
> rename from gcc/testsuite/g++.dg/debug/dwarf2/const2.C
> rename to gcc/testsuite/g++.target/powerpc/const2.C
> diff --git a/gcc/testsuite/g++.dg/other/darwin-minversion-1.C 
> b/gcc/testsuite/g++.target/powerpc/darwin-minversion-1.C
> similarity index 100%
> rename from gcc/testsuite/g++.dg/other/darwin-minversion-1.C
> rename to gcc/testsuite/g++.target/powerpc/darwin-minversion-1.C
> diff --git a/gcc/testsuite/g++.dg/eh/ppc64-sighandle-cr.C 
> b/gcc/testsuite/g++.target/powerpc/ppc64-sighandle-cr.C
> similarity index 100%
> rename from gcc/testsuite/g++.dg/eh/ppc64-sighandle-cr.C
> rename to gcc/testsuite/g++.target/powerpc/ppc64-sighandle-cr.C
> diff --git a/gcc/testsuite/g++.dg/eh/simd-4.C 
> b/gcc/testsuite/g++.target/powerpc/simd-4.C
> similarity index 95%
> rename from gcc/testsuite/g++.dg/eh/simd-4.C
> rename to gcc/testsuite/g++.target/powerpc/simd-4.C
> index 8c9b58bf8684..a01f19c27369 100644
> --- a/gcc/testsuite/g++.dg/eh/simd-4.C
> +++ b/gcc/testsuite/g++.target/powerpc/simd-4.C
> @@ -1,4 +1,4 @@
> -/* { dg-do run { target powerpc*-*-darwin* } } */
> +/* { dg-do run { target *-*-darwin* } } */
>  /* { dg-options "-fexceptions -fnon-call-exceptions -O -maltivec" } */
>  
>  #include 
> diff --git a/gcc/testsuite/g++.dg/eh/simd-5.C 
> b/gcc/testsuite/g++.target/powerpc/simd-5.C
> similarity index 100%
> rename from gcc/testsuite/g++.dg/eh/simd-5.C
> rename to gcc/testsuite/g++.target/powerpc/simd-5.C
> diff --git a/gcc/testsuite/g++.dg/other/spu2vmx-1.C 
> b/gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
> similarity index 84%
> rename from gcc/testsuite/g++.dg/other/spu2vmx-1.C
> rename to gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
> index d9c8faf94592..496b46c22c95 100644
> --- a/gcc/testsuite/g++.dg/other/spu2vmx-1.C
> +++ b/gcc/testsuite/g++.target/powerpc/spu2vmx-1.C
> @@ -1,4 +1,4 @@
> -/* { dg-do compile { target powerpc*-*-* } } */
> +/* { dg-do compile } */
>  /* { dg-require-effective-target powerpc_spu } */
>  /* { dg-options "-maltivec" } */
>  
> diff --git a/gcc/testsuite/g++.dg/eh/uncaught3.C 
> b/gcc/testsuite/g++.target/powerpc/uncaught3.C
> similarity index 96%
> rename from gcc/testsuite/g++.dg/eh/uncaught3.C
> rename to gcc/testsuite/g++.target/powerpc/uncaught3.C
> index 1beaab3f..f891401584ec 100644
> --- a/gcc/testsuite/g++.dg/eh/uncaught3.C
> +++ b/gcc/testsuite/g++.target/powerpc/uncaught3.C
> @@ -1,4 +1,4 @@
> -// { dg-do compile { target powerpc*-*-darwin* } }
> +// { dg-do compile { target *-*-darwin* } }
>  // { dg-final { scan-assembler-not "__cxa_get_exception" } }
>  // { dg-options "-mmacosx-version-min=10.4" }
>  // { dg-additional-options "-Wno-deprecated" { target c++17 } }
> -- 
> 2.27.0
> 


Re: [PING PATCH 2/3] rs6000: Move g++.dg powerpc PR tests to g++.target

2022-03-08 Thread Paul A. Clarke via Gcc-patches
Gentle ping. I am grateful for the initial review, but seek closure on the
final couple of discussion items. Thanks!

PC

On Tue, Feb 22, 2022 at 07:56:40PM -0600, Paul A. Clarke via Gcc-patches wrote:
> On Tue, Feb 22, 2022 at 06:41:45PM -0600, Segher Boessenkool wrote:
> > On Mon, Feb 21, 2022 at 03:17:46PM -0600, Paul A. Clarke wrote:
> > > Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" 
> > > is no
> > > longer required.
> > > 
> > > 2021-02-21  Paul A. Clarke  
> > > 
> > > gcc/testsuite
> > >   * g++.dg/pr65240.h: Move to g++.target/powerpc.
> > >   * g++.dg/pr93974.C: Likewise.
> > >   * g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg directives.
> > >   * g++.dg/pr65240-2.C: Likewise.
> > >   * g++.dg/pr65240-3.C: Likewise.
> > >   * g++.dg/pr65240-4.C: Likewise.
> > >   * g++.dg/pr65242.C: Likewise.
> > >   * g++.dg/pr67211.C: Likewise.
> > >   * g++.dg/pr69667.C: Likewise.
> > >   * g++.dg/pr71294.C: Likewise.
> > >   * g++.dg/pr84264.C: Likewise.
> > >   * g++.dg/pr84279.C: Likewise.
> > >   * g++.dg/pr85657.C: Likewise.
> > 
> > Okay for trunk.  Thanks!
> 
> Thanks for the review! More below...
> 
> > That said...
> > 
> > > -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> > > -/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > > +/* { dg-do compile { target lp64 } } */
> > > +/* { dg-skip-if "" { *-*-darwin* } } */
> > 
> > That skip-if is most likely cargo cult, and it's not clear why lp64
> > would be needed either (there is no comment what it is needed for, for
> > example).
> 
> I can't speak to darwin, nor have an easy way of testing on it.
> 
> As for lp64, these tests fail on -m32 with:
>   cc1plus: error: '-mcmodel' not supported in this configuration
> - g++.dg/pr65240-1.C
> - g++.dg/pr65240-2.C
> - g++.dg/pr65240-3.C
> 
> '-mcmodel' is in the dg-options line for the above tests.
> 
> The rest PASSed.  Shall I remove the 'lp64' restriction for those that PASS?
> 
> > > +++ b/gcc/testsuite/g++.target/powerpc/pr85657.C
> > > @@ -1,4 +1,4 @@
> > > -// { dg-do compile { target { powerpc*-*-linux* } } }
> > > +// { dg-do compile { target { *-*-linux* } } }
> > 
> > A comment here would help as well.  All of that is pre-existing of
> > course.
> 
> I'm not sure what such a comment would say. I suspect it was a testing issue
> (only tested on Linux), but I have similar limitations, so I'm also reluctant
> to enable the test for what would be untested (by me) platforms.
> 
> PC


[Patch] Fortran: Fix CLASS handling in SIZEOF intrinsic

2022-03-08 Thread Tobias Burnus

Fix SIZEOF handling.

I have to admit that I do understand what the current code does,
but do not understand what the previous code did. However, it
still passes the testsuite - and also some code which did ICE
now compiles :-)

While writing the testcase, I did find two issues:
* Passing a CLASS to TYPE(*),dimension(..) will have an
  elem_len of the declared type and not of the dynamic type.
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104844
* var%class_array(1,1)%array will have size(...) == 0
  instead of size(... % array).
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104845

OK for mainline? (Unless you want to hold off until GCC 13)

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix CLASS handling in SIZEOF intrinsic

gcc/fortran/ChangeLog:

	* trans-intrinsic.cc (gfc_conv_intrinsic_sizeof): Fix CLASS handling.

gcc/testsuite/ChangeLog:

	* gfortran.dg/sizeof_6.f90: New test.

 gcc/fortran/trans-intrinsic.cc |  16 +-
 gcc/testsuite/gfortran.dg/sizeof_6.f90 | 437 +
 2 files changed, 446 insertions(+), 7 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index e680de1dbd1..2249723540d 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -8099,12 +8099,14 @@ gfc_conv_intrinsic_sizeof (gfc_se *se, gfc_expr *expr)
 	 class object.  The class object may be a non-pointer object, e.g.
 	 located on the stack, or a memory location pointed to, e.g. a
 	 parameter, i.e., an indirect_ref.  */
-  if (arg->rank < 0
-	  || (arg->rank > 0 && !VAR_P (argse.expr)
-	  && ((INDIRECT_REF_P (TREE_OPERAND (argse.expr, 0))
-		   && GFC_DECL_CLASS (TREE_OPERAND (
-	TREE_OPERAND (argse.expr, 0), 0)))
-		  || GFC_DECL_CLASS (TREE_OPERAND (argse.expr, 0)
+  if (POINTER_TYPE_P (TREE_TYPE (argse.expr))
+	  && GFC_CLASS_TYPE_P (TREE_TYPE (TREE_TYPE (argse.expr
+	byte_size
+	  = gfc_class_vtab_size_get (build_fold_indirect_ref (argse.expr));
+  else if (GFC_CLASS_TYPE_P (TREE_TYPE (argse.expr)))
+	byte_size = gfc_class_vtab_size_get (argse.expr);
+  else if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (argse.expr))
+	   && TREE_CODE (argse.expr) == COMPONENT_REF)
 	byte_size = gfc_class_vtab_size_get (TREE_OPERAND (argse.expr, 0));
   else if (arg->rank > 0
 	   || (arg->rank == 0
@@ -8114,7 +8116,7 @@ gfc_conv_intrinsic_sizeof (gfc_se *se, gfc_expr *expr)
 	byte_size = gfc_class_vtab_size_get (
 	  GFC_DECL_SAVED_DESCRIPTOR (arg->symtree->n.sym->backend_decl));
   else
-	byte_size = gfc_class_vtab_size_get (argse.expr);
+	gcc_unreachable ();
 }
   else
 {
diff --git a/gcc/testsuite/gfortran.dg/sizeof_6.f90 b/gcc/testsuite/gfortran.dg/sizeof_6.f90
new file mode 100644
index 000..21b57350dc3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/sizeof_6.f90
@@ -0,0 +1,437 @@
+! { dg-do run }
+!
+! Check that sizeof is properly handled
+!
+use iso_c_binding
+implicit none (type, external)
+
+type t
+  integer, allocatable :: a(:,:,:), aa
+  integer :: b(5), c
+end type t
+
+type t2
+   class(t), allocatable :: d(:,:), e
+end type t2
+
+type, extends(t2) :: t2e
+  integer :: q(7), z
+end type t2e
+
+type t3
+   class(t2), allocatable :: ct2, ct2a(:,:,:)
+   type(t2), allocatable :: tt2, tt2a(:,:,:)
+   integer, allocatable :: ii, iia(:,:,:)
+end type t3
+
+type(t3) :: var, vara(5)
+type(t3), allocatable :: avar, avara(:)
+class(t3), allocatable :: cvar, cvara(:)
+type(t2), allocatable :: ax, axa(:,:,:)
+class(t2), allocatable :: cx, cxa(:,:,:)
+
+integer(c_size_t) :: n
+
+allocate (t3 :: avar, avara(5))
+allocate (t3 :: cvar, cvara(5))
+
+n = sizeof(var)
+
+! Assume alignment plays no tricks and system has 32bit/64bit.
+! If needed change
+if (n /= 376 .and. n /= 200) error stop
+
+if (n /= sizeof(avar)) error stop
+if (n /= sizeof(cvar)) error stop
+if (n * 5 /= sizeof(vara)) error stop
+if (n * 5 /= sizeof(avara)) error stop
+if (n * 5 /= sizeof(cvara)) error stop
+
+if (n /= sz_ar(var,var,var,var)) error stop
+if (n /= sz_s(var,var)) error stop
+if (n /= sz_t3(var,var,var,var)) error stop
+if (n /= sz_ar(avar,avar,avar,avar)) error stop
+if (n /= sz_s(avar,avar)) error stop
+if (n /= sz_t3(avar,avar,avar,avar)) error stop
+if (n /= sz_t3_at(avar,avar)) error stop
+if (n /= sz_ar(cvar,cvar,cvar,cvar)) error stop
+if (n /= sz_s(cvar,cvar)) error stop
+if (n /= sz_t3(cvar,cvar,cvar,cvar)) error stop
+if (n /= sz_t3_a(cvar,cvar)) error stop
+
+if (n*5 /= sz_ar(vara,vara,vara,vara)) error stop
+if (n*5 /= sz_r1(vara,vara,vara,vara)) error stop
+if (n*5 /= sz_t3(vara,vara,vara,vara)) error stop
+if (n*5 /= sz_ar(avara,avara,avara,avara)) error stop
+if (n*5 /= sz_r1(avara,avara,avara,avara)) error stop
+if (n*5 /= sz_t3(avara,avara,avara,avara))

Re: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/8/22 14:38, Patrick Palka wrote:

On Tue, 8 Mar 2022, Jason Merrill wrote:


On 3/8/22 11:36, Patrick Palka wrote:

On Mon, 7 Mar 2022, Jason Merrill wrote:


On 3/7/22 10:47, Patrick Palka wrote:

On Fri, 4 Mar 2022, Jason Merrill wrote:


On 3/4/22 14:24, Patrick Palka wrote:

Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the
LOOKUP_ONLYCONVERTING
flag), which causes do_class_deduction to always consider explicit
deduction guides when performing CTAD for a templated variable
initializer.

We could fix this by passing LOOKUP_ONLYCONVERTING appropriately
when
calling cp_finish_decl from tsubst_expr, but it seems
do_class_deduction
can determine if we're in a copy-init context by simply inspecting
the
initializer, and thus render its flags parameter unnecessary, which
is
what this patch implements.  (If we were to fix this in tsubst_expr
instead, I think we'd have to inspect the initializer in the same
way
in order to detect a copy-init context?)


Hmm, does this affect conversions as well?

Looks like it does:

struct A
{
 explicit operator int();
};

template  void f()
{
 T t = A();
}

int main()
{
 f(); // wrongly accepted
}

The reverse, initializing via an explicit constructor, is caught by
code
in
build_aggr_init much like the code your patch adds to
do_auto_deduction;
perhaps we should move/copy that code to cp_finish_decl?


Ah, makes sense.  Moving that code from build_aggr_init to
cp_finish_decl broke things, but using it in both spots seems to work
well.  And I suppose we might as well use it in do_class_deduction too,
since doing so lets us remove the flags parameter.


Before removing the flags parameter please try asserting that it now
matches
is_copy_initialization and see if anything breaks.


I added to do_class_deduction:

gcc_assert (bool(flags & LOOKUP_ONLYCONVERTING) == is_copy_initialization
(init));

Turns out removing the flags parameter breaks CTAD for new-expressions
of the form 'new TT(x)' because in this case build_new passes just 'x'
as the initializer to do_auto_deduction (as opposed to a single TREE_LIST),
for which is_copy_initialization returns true even though it's really
direct initalization.

Also turns out we're similarly not passing the right LOOKUP_* flags to
cp_finish_decl from instantiate_body, which breaks consideration of
explicit conversions/deduction guides when instantiating the initializer
of a static data member.  I added some xfailed testcases for these
situations.


Maybe we want to check is_copy_initialization in cp_finish_decl?


That seems to work nicely :) All xfailed tests for the static data
member initialization case now also pass.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

Subject: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
flag), which causes us to always consider explicit deduction guides when
performing CTAD for a templated variable initializer.

It turns out this bug also affects consideration of explicit conversion
operators for the same reason.  But consideration of explicit constructors
seems to do the right thing thanks to code in build_aggr_init that sets
LOOKUP_ONLYCONVERTING when the initializer represents copy-initialization.

This patch fixes this by making cp_finish_decl set LOOKUP_ONLYCONVERTING
by inspecting the initializer like build_aggr_init does, so that callers
don't need to explicitly pass this flag.

PR c++/102137
PR c++/87820

gcc/cp/ChangeLog:

* cp-tree.h (is_copy_initialization): Declare.
* decl.cc (cp_finish_decl): Set LOOKUP_ONLYCONVERTING
when is_copy_initialization is true.
* init.cc (build_aggr_init): Split out copy-initialization
check into ...
(is_copy_initialization): ... here.
* pt.cc (instantiate_decl): Pass 0 instead of
LOOKUP_ONLYCONVERTING as flags to cp_finish_decl.


Any reason not to use LOOKUP_NORMAL, both here and in tsubst_expr?  OK 
either way.



gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/explicit15.C: New test.
* g++.dg/cpp1z/class-deduction108.C: New test.
---
  gcc/cp/cp-tree.h  |  1 +
  gcc/cp/decl.cc|  3 +
  gcc/cp/init.cc| 20 +++--
  gcc/cp/pt.cc  |  3 +-
  gcc/testsuite/g++.dg/cpp0x/explicit15.C   | 83 +++
  .../g++.dg/cpp1z/class-deduction108.C | 78 +
  6 files changed, 181 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/explicit15.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction108.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ac723901098..fd76909ca75 100644
--- a/gcc/cp/cp-tr

Re: [Patch] Fortran: Fix gfc_maybe_dereference_var [PR104430]

2022-03-08 Thread Tobias Burnus

Hi Harald,

On 07.03.22 20:58, Harald Anlauf wrote:

I think there are other PRs which profit from this fix.
Can you please have a look at PR99585, and in particular
the link in comment#0?  ;-)


Good pointer – the testcase looks nearly identical and it is indeed fixed.

I included it in addition in the same testcase file. (See attached patch
for the commit,  .)

Thanks,

Tobias

PS: Can I make you review my two pending patches? (NULL and SIZEOF) ;-)

PPS: I lost a bit track working on other things – are there patches
pending review?

PPPS: I think someone still has to deal with the approved and pending
patches by José ...
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit c0134b7383992aab5c1a91440dbdd8fbb747169c
Author: Tobias Burnus 
Date:   Mon Mar 7 22:11:33 2022 +0100

Fortran: Fix gfc_maybe_dereference_var [PR104430][PR99585]

PR fortran/99585
PR fortran/104430

gcc/fortran/ChangeLog:

* trans-expr.cc (conv_parent_component_references): Fix comment;
simplify comparison.
(gfc_maybe_dereference_var): Avoid d referencing a nonpointer.

gcc/testsuite/ChangeLog:

* gfortran.dg/class_result_10.f90: New test.

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index c9d9a916c28..71d037101d4 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -2805,9 +2805,9 @@ conv_parent_component_references (gfc_se * se, gfc_ref * ref)
   dt = ref->u.c.sym;
   c = ref->u.c.component;
 
-  /* Return if the component is in the parent type.  */
+  /* Return if the component is in this type, i.e. not in the parent type.  */
   for (cmp = dt->components; cmp; cmp = cmp->next)
-if (strcmp (c->name, cmp->name) == 0)
+if (c == cmp)
   return;
 
   /* Build a gfc_ref to recursively call gfc_conv_component_ref.  */
@@ -2867,6 +2867,8 @@ tree
 gfc_maybe_dereference_var (gfc_symbol *sym, tree var, bool descriptor_only_p,
 			   bool is_classarray)
 {
+  if (!POINTER_TYPE_P (TREE_TYPE (var)))
+return var;
   if (is_CFI_desc (sym, NULL))
 return build_fold_indirect_ref_loc (input_location, var);
 
diff --git a/gcc/testsuite/gfortran.dg/class_result_10.f90 b/gcc/testsuite/gfortran.dg/class_result_10.f90
new file mode 100644
index 000..a4d29ab9c1d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/class_result_10.f90
@@ -0,0 +1,52 @@
+! { dg-do run}
+
+
+! PR fortran/99585
+
+module m2
+  type t
+ class(*), pointer :: bar(:)
+  end type
+  type t2
+ class(t), allocatable :: my(:)
+  end type t2
+contains
+  function f (x, y) result(z)
+class(t) :: x(:)
+class(t) :: y(size(x(1)%bar))
+type(t)  :: z(size(x(1)%bar))
+  end
+  function g (x) result(z)
+class(t) :: x(:)
+type(t)  :: z(size(x(1)%bar))
+  end
+  subroutine s ()
+class(t2), allocatable :: a(:), b(:), c(:), d(:)
+class(t2), pointer :: p(:)
+c(1)%my = f (a(1)%my, b(1)%my)
+d(1)%my = g (p(1)%my)
+  end
+end
+
+! Contributed by  G. Steinmetz:
+! PR fortran/104430
+
+module m
+   type t
+  integer :: a
+   end type
+contains
+   function f(x) result(z)
+  class(t) :: x(:)
+  type(t) :: z(size(x%a))
+  z%a = 42
+   end
+end
+program p
+   use m
+   class(t), allocatable :: y(:), z(:)
+   allocate (y(32))
+   z = f(y)
+   if (size(z) /= 32) stop 1
+   if (any (z%a /= 42)) stop 2
+end


Re: [PATCH] c++: non-constant non-dependent decltype folding [PR104823]

2022-03-08 Thread Patrick Palka via Gcc-patches
On Tue, 8 Mar 2022, Jason Merrill wrote:

> On 3/8/22 12:54, Patrick Palka wrote:
> > 
> > 
> > On Mon, 7 Mar 2022, Jason Merrill wrote:
> > 
> > > On 3/7/22 14:41, Patrick Palka wrote:
> > > > instantiate_non_dependent_expr_sfinae instantiates only potentially
> > > > constant expressions
> > > 
> > > Hmm, that now strikes me as a problematic interface, as we don't know
> > > whether
> > > what we get back is template or non-template trees.
> > > 
> > > Maybe we want to change instantiate_non_dependent_expr to checking_assert
> > > that
> > > the argument is non-dependent (callers are already checking that), and
> > > drop
> > > the potentially-constant test?
> > 
> > That sounds like a nice improvement.  But it happens to break
> > 
> >template using type = decltype(N);
> > 
> > beause finish_decltype_type checks
> > instantiation_dependent_uneval_expression_p
> > (which is false here) instead of instantiation_dependent_expression_p
> > (which is true here) before calling instantiate_non_dependent_expr, so
> > we end up tripping over the proposed checking_assert (which checks the
> > latter stronger form of dependence).
> > 
> > I suspect other callers of instantiate_non_dependent_expr might have a
> > similar problem if they use a weaker dependence check than
> > instantiation_dependent_expression_p, e.g. build_noexcept_spec only
> > checks value_dependent_expression_p.
> > 
> > I wonder if we should relax the proposed checking_assert in i_n_d_e, or
> > strengthen the dependence checks performed by its callers, or something
> > else?
> 
> I think relax the assert to _uneval and strengthen callers that use value_dep.

Sounds good, like so?  Note this patch doesn't touch
instantiate_non_dependent_or_null or fold_non_dependent_expr, since the
former already never returns a templated tree, and callers of the latter
should only care about the constant-ness not template-ness of the result
IIUC.

Boostrapped and regtested on x86_64-pc-linux-gnu.

-- >8 --

Subject: [PATCH] c++: non-constant non-dependent decltype folding [PR104823]

When processing a non-dependent decltype operand we want to instantiate
it even if it's non-constant since non-dependent decltype is always
resolved ahead of time.  But currently finish_decltype_type uses
instantiate_non_dependent_expr, which instantiates only potentially
constant expressions, and this causes us to miss diagnosing the narrowing
conversion in S{id(v)} in the below testcase because we never instantiate
this non-constant non-dependent decltype operand.

In light of

  > On Mon, 7 Mar 2022, Jason Merrill wrote:
  >> On 3/7/22 14:41, Patrick Palka wrote:
  >>> instantiate_non_dependent_expr instantiates only potentially constant
  >>> expressions
  >>
  >> Hmm, that now strikes me as a problematic interface, as we don't know 
whether
  >> what we get back is template or non-template trees.

this patch drops the potentially-constant check in i_n_d_e, and turns
its dependence check into a checking_assert, since most callers already
check that the argument is non-dependent.  This patch also relaxes the
dependence check in i_n_d_e to use the _uneval version and strengthens
the dependence checks used by callers accordingly.

In cp_parser_parenthesized_expression_list_elt we were calling
instantiate_non_dependent_expr without first checking for non-dependence.
We could fix this by guarding the call appropriately, but I noticed we
also fold non-dependent attributes later from cp_check_const_attribute.
This double instantiation causes us to reject constexpr-attribute4.C
below due to the second folding seeing non-templated trees (an existing
bug).  Thus the right solution here seems to be to remove this unguarded
call to i_n_d_e so that we end up folding non-dependent attributes only
once.

Finally, after calling i_n_d_e in finish_decltype_type we need to keep
processing_template_decl cleared for sake of the later call to
lvalue_kind, which handles templated and non-templated COND_EXPR
differently.  Otherwise we'd incorrectly reject the declaration of g in
cpp0x/cond2.C with:

  error: 'g' declared as function returning a function

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/104823

gcc/cp/ChangeLog:

* except.cc (build_noexcept_spec): Strengthen dependence check
to instantiation_dependent_expression_p.
* parser.cc (cp_parser_parenthesized_expression_list_elt):
Remove fold_expr_p parameter and call
instantiate_non_dependent_expr.
(cp_parser_parenthesized_expression_list): Adjust accordingly.
* pt.cc (expand_integer_pack): Strengthen dependence check
to instantiation_dependent_expression_p.
(instantiate_non_dependent_expr_internal): Adjust comment.
(instantiate_non_dependent_expr_sfinae): Likewise.  Drop
the potentially-constant check, and relax and turn the
dependence check into a checking assert.
(instantiate_non_depende

[PATCH, committed] PR fortran/104811 - maxloc/minloc cannot accept character arguments without `dim` optional argument

2022-03-08 Thread Harald Anlauf via Gcc-patches
Dear all,

frontend-optimization of MINLOC/MAXLOC tries to generate code for rank-1
arrays that may be expanded inline later and optimzed.  Except when the
argument is a character array...

As there is even a comment in trans-intrinsic.cc that we will call a
library function for character arguments anyway, we better punt here.
The attached obvious patch does this and was pre-approved by Thomas in
the PR.

Regtested on x86_64-pc-linux-gnu and pushed to mainline as

https://gcc.gnu.org/g:e3e369dad6cbecb1b490b3f3b154c600fba5a6f3

As this is a wrong-code issue, I'd like to backport this to 11-branch.

Thanks,
Harald

From e3e369dad6cbecb1b490b3f3b154c600fba5a6f3 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 8 Mar 2022 21:47:04 +0100
Subject: [PATCH] Fortran: do not frontend-optimize MINLOC/MAXLOC for character
 arrays

gcc/fortran/ChangeLog:

	PR fortran/104811
	* frontend-passes.cc (optimize_minmaxloc): Do not attempt
	frontend-optimization of MINLOC/MAXLOC for character arrays, as
	there is no suitable code yet for inline expansion.

gcc/testsuite/ChangeLog:

	PR fortran/104811
	* gfortran.dg/minmaxloc_16.f90: New test.
---
 gcc/fortran/frontend-passes.cc |  1 +
 gcc/testsuite/gfortran.dg/minmaxloc_16.f90 | 14 ++
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/minmaxloc_16.f90

diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc
index 4033f27df99..5eba6345145 100644
--- a/gcc/fortran/frontend-passes.cc
+++ b/gcc/fortran/frontend-passes.cc
@@ -2276,6 +2276,7 @@ optimize_minmaxloc (gfc_expr **e)
   if (fn->rank != 1
   || fn->value.function.actual == NULL
   || fn->value.function.actual->expr == NULL
+  || fn->value.function.actual->expr->ts.type == BT_CHARACTER
   || fn->value.function.actual->expr->rank != 1)
 return;

diff --git a/gcc/testsuite/gfortran.dg/minmaxloc_16.f90 b/gcc/testsuite/gfortran.dg/minmaxloc_16.f90
new file mode 100644
index 000..099248df2e3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/minmaxloc_16.f90
@@ -0,0 +1,14 @@
+! { dg-do run }
+! { dg-options "-fdump-tree-original" }
+! PR fortran/104811
+! Frontend-optimization mis-optimized minloc/maxloc of character arrays
+
+program p
+  character(1) :: str(3)
+  str = ["a", "c", "a"]
+  if (any (maxloc (str) /= 2)) stop 1
+  if (minloc (str,dim=1) /= 1) stop 2
+end
+
+! { dg-final { scan-tree-dump-times "_gfortran_maxloc0_4_s1" 1 "original" } }
+! { dg-final { scan-tree-dump-times "_gfortran_minloc2_4_s1" 1 "original" } }
--
2.34.1



Re: Ping: [PATCH] PR target/102059 Fix inline of target specific functions

2022-03-08 Thread Michael Meissner via Gcc-patches
On Tue, Mar 08, 2022 at 11:28:03AM -0600, Segher Boessenkool wrote:
> On Fri, Feb 11, 2022 at 12:53:07PM -0500, Michael Meissner wrote:
> > Ping patch for PR target/102059 to ignore implicit -mpower8-fusion that
> > prevents a function targeting power9 or power10 from inlining a function 
> > that
> > declared it needed power8 via attribute/pragma target.
> 
> Can we just disable any effect from this flag, instead?  It should just
> be implied by -mcpu=power8, and be impossible to be enabled otherwise
> (or disabled!)

Yes, I can do that.  We should also do the same solution for power10 fusion.

What I propose is to set a regular variable with the results of the
-mpower8-fusion option.  This option would be true by default.

Then change TARGET_POWER8_FUSION to be a macro that tests whether the current
tuning CPU (not CPU we are compiling code for) to check if we are tuning for a
power8.  I would likely then remove any TARGET_POWER8 in places that test for
TARGET_POWER8_FUSION.

And similarly for TARGET_POWER10_FUSION.  Note, I haven't looked at Pat's
latest changes for power10 fusion.

Is this acceptable?

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH] c++: Wrong error with alias template in class tmpl [PR104108]

2022-03-08 Thread Marek Polacek via Gcc-patches
In r10-6329 I tried to optimize the number of calls to v_d_e_p in
convert_nontype_argument by remembering whether the expression was
value-dependent in a bool flag.  I did that wrongly assuming that its
value-dependence will not be changed by build_converted_constant_expr.
This testcase shows that it can: b_c_c_e gets a VAR_DECL for m_parameter,
which is not value-dependent, but we're converting it to "const int &"
so it returns

  (const int &)(const int *) &m_parameter

which suddenly becomes value-dependent because of the added ADDR_EXPR:
has_value_dependent_address is now true because m_parameter's context S
is dependent.  With this bug in place, we went to the second branch here:

  if (TYPE_REF_OBJ_P (TREE_TYPE (expr)) && val_dep_p)
/* OK, dependent reference.  We don't want to ask whether a DECL is
   itself value-dependent, since what we want here is its address.  */;
  else
{
  expr = build_address (expr);

  if (invalid_tparm_referent_p (type, expr, complain))
return NULL_TREE;
}

wherein build_address created a bad tree and then i_t_r_p complained.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11?

PR c++/104108

gcc/cp/ChangeLog:

* pt.cc (convert_nontype_argument): Recompute
value_dependent_expression_p after build_converted_constant_expr.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-74.C: New test.
---
 gcc/cp/pt.cc   | 4 +++-
 gcc/testsuite/g++.dg/cpp0x/alias-decl-74.C | 9 +
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-74.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8b5faeed8ea..b8bf533a747 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7316,7 +7316,7 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
   if (non_dep)
 expr = instantiate_non_dependent_expr_internal (expr, complain);
 
-  const bool val_dep_p = value_dependent_expression_p (expr);
+  bool val_dep_p = value_dependent_expression_p (expr);
   if (val_dep_p)
 expr = canonicalize_expr_argument (expr, complain);
   else
@@ -7357,6 +7357,8 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
  expr = maybe_constant_value (expr, NULL_TREE,
   /*manifestly_const_eval=*/true);
  expr = convert_from_reference (expr);
+ /* EXPR may have become value-dependent.  */
+ val_dep_p = value_dependent_expression_p (expr);
}
   else if (TYPE_PTR_OR_PTRMEM_P (type))
{
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-74.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-74.C
new file mode 100644
index 000..8382d856382
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-74.C
@@ -0,0 +1,9 @@
+// PR c++/104108
+// { dg-do compile { target c++11 } }
+
+template class T>
+struct S {
+  static int m_parameter;
+  template class TT>
+  using U = TT;
+};

base-commit: b7175f36812b32d3de242f15c065b9cb68e957a9
-- 
2.35.1



Re: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

2022-03-08 Thread Patrick Palka via Gcc-patches
On Tue, 8 Mar 2022, Jason Merrill wrote:

> On 3/8/22 14:38, Patrick Palka wrote:
> > On Tue, 8 Mar 2022, Jason Merrill wrote:
> > 
> > > On 3/8/22 11:36, Patrick Palka wrote:
> > > > On Mon, 7 Mar 2022, Jason Merrill wrote:
> > > > 
> > > > > On 3/7/22 10:47, Patrick Palka wrote:
> > > > > > On Fri, 4 Mar 2022, Jason Merrill wrote:
> > > > > > 
> > > > > > > On 3/4/22 14:24, Patrick Palka wrote:
> > > > > > > > Here we're failing to communicate to cp_finish_decl from
> > > > > > > > tsubst_expr
> > > > > > > > that we're in a copy-initialization context (via the
> > > > > > > > LOOKUP_ONLYCONVERTING
> > > > > > > > flag), which causes do_class_deduction to always consider
> > > > > > > > explicit
> > > > > > > > deduction guides when performing CTAD for a templated variable
> > > > > > > > initializer.
> > > > > > > > 
> > > > > > > > We could fix this by passing LOOKUP_ONLYCONVERTING appropriately
> > > > > > > > when
> > > > > > > > calling cp_finish_decl from tsubst_expr, but it seems
> > > > > > > > do_class_deduction
> > > > > > > > can determine if we're in a copy-init context by simply
> > > > > > > > inspecting
> > > > > > > > the
> > > > > > > > initializer, and thus render its flags parameter unnecessary,
> > > > > > > > which
> > > > > > > > is
> > > > > > > > what this patch implements.  (If we were to fix this in
> > > > > > > > tsubst_expr
> > > > > > > > instead, I think we'd have to inspect the initializer in the
> > > > > > > > same
> > > > > > > > way
> > > > > > > > in order to detect a copy-init context?)
> > > > > > > 
> > > > > > > Hmm, does this affect conversions as well?
> > > > > > > 
> > > > > > > Looks like it does:
> > > > > > > 
> > > > > > > struct A
> > > > > > > {
> > > > > > >  explicit operator int();
> > > > > > > };
> > > > > > > 
> > > > > > > template  void f()
> > > > > > > {
> > > > > > >  T t = A();
> > > > > > > }
> > > > > > > 
> > > > > > > int main()
> > > > > > > {
> > > > > > >  f(); // wrongly accepted
> > > > > > > }
> > > > > > > 
> > > > > > > The reverse, initializing via an explicit constructor, is caught
> > > > > > > by
> > > > > > > code
> > > > > > > in
> > > > > > > build_aggr_init much like the code your patch adds to
> > > > > > > do_auto_deduction;
> > > > > > > perhaps we should move/copy that code to cp_finish_decl?
> > > > > > 
> > > > > > Ah, makes sense.  Moving that code from build_aggr_init to
> > > > > > cp_finish_decl broke things, but using it in both spots seems to
> > > > > > work
> > > > > > well.  And I suppose we might as well use it in do_class_deduction
> > > > > > too,
> > > > > > since doing so lets us remove the flags parameter.
> > > > > 
> > > > > Before removing the flags parameter please try asserting that it now
> > > > > matches
> > > > > is_copy_initialization and see if anything breaks.
> > > > 
> > > > I added to do_class_deduction:
> > > > 
> > > > gcc_assert (bool(flags & LOOKUP_ONLYCONVERTING) ==
> > > > is_copy_initialization
> > > > (init));
> > > > 
> > > > Turns out removing the flags parameter breaks CTAD for new-expressions
> > > > of the form 'new TT(x)' because in this case build_new passes just 'x'
> > > > as the initializer to do_auto_deduction (as opposed to a single
> > > > TREE_LIST),
> > > > for which is_copy_initialization returns true even though it's really
> > > > direct initalization.
> > > > 
> > > > Also turns out we're similarly not passing the right LOOKUP_* flags to
> > > > cp_finish_decl from instantiate_body, which breaks consideration of
> > > > explicit conversions/deduction guides when instantiating the initializer
> > > > of a static data member.  I added some xfailed testcases for these
> > > > situations.
> > > 
> > > Maybe we want to check is_copy_initialization in cp_finish_decl?
> > 
> > That seems to work nicely :) All xfailed tests for the static data
> > member initialization case now also pass.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
> > 
> > -- >8 --
> > 
> > Subject: [PATCH] c++: detecting copy-init context during CTAD [PR102137]
> > 
> > Here we're failing to communicate to cp_finish_decl from tsubst_expr
> > that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
> > flag), which causes us to always consider explicit deduction guides when
> > performing CTAD for a templated variable initializer.
> > 
> > It turns out this bug also affects consideration of explicit conversion
> > operators for the same reason.  But consideration of explicit constructors
> > seems to do the right thing thanks to code in build_aggr_init that sets
> > LOOKUP_ONLYCONVERTING when the initializer represents copy-initialization.
> > 
> > This patch fixes this by making cp_finish_decl set LOOKUP_ONLYCONVERTING
> > by inspecting the initializer like build_aggr_init does, so that callers
> > don't need to explicitly pass this flag.
> > 
> > PR c++/102137
> > PR c++/87820
> > 
> > gcc/cp/ChangeLog:
> > 
>

Re: [Patch] Fortran: Fix gfc_conv_gfc_desc_to_cfi_desc with NULL [PR104126]

2022-03-08 Thread Harald Anlauf via Gcc-patches

Hi Tobias,

Am 07.03.22 um 15:16 schrieb Tobias Burnus:

Pre-remark: Related NULL, there some accepts-invalid issues, not
addressed in this
patch. See https://gcc.gnu.org/PR104819

This patch fixes an ICE (12 regression) with NULL() that has no MOLD
argument.


the patch does fix the ICE.  But given your short pre-remark:
are you saying that the testcase is invalid, and with the patch
we silently accept it now?

(The testcase compiles with Intel, but triggers a funny bug in
crayftn, which made me read 16.9.144 to learn more about the
tricks of NULL.  But I tend to think this case is valid.)


OK for mainline?


LGTM.

Thanks for the patch!

Harald


Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
Registergericht München, HRB 106955




[PATCH] ipa-cp: Avoid adjusting references through self-recursion (PR 104813)

2022-03-08 Thread Martin Jambor
Hi,

when writing the patch that downgrades address-taken references to
load references when IPA-CP can prove that all uses of the taken
address ends up in loads, I unfortunately did not take into account
that find_more_scalar_values_for_callers_subset now happily adds
self-recursive edges to the set of callers which should be immediately
redirected (originally recursion was meant to be handled as edge
redirection in a second pass over the SCC).

The code as it is can now decrement the referece counters too many
times.  This can remedied by removing self-recursive edges earlier, we
already do it because of thunk expansion issues, and so this patch
does exactly that.

Bootstrapped and LTO-bootstrapped and tested on x86_64-linux.  OK for
master?

Thanks,

Martin


gcc/ChangeLog:

2022-03-07  Martin Jambor  

PR ipa/104813
* ipa-cp.cc (create_specialized_node): Move removal of
self-recursive calls from callers vector before refrence
adjustments.

gcc/testsuite/ChangeLog:

2022-03-07  Martin Jambor  

PR ipa/104813
* gcc.dg/ipa/pr104813.c: New test.
---
 gcc/ipa-cp.cc   | 20 +-
 gcc/testsuite/gcc.dg/ipa/pr104813.c | 32 +
 2 files changed, 42 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr104813.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 453e9c93cc3..18047c209a8 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -5099,6 +5099,16 @@ create_specialized_node (struct cgraph_node *node,
   else
 new_adjustments = NULL;
 
+  auto_vec self_recursive_calls;
+  for (i = callers.length () - 1; i >= 0; i--)
+{
+  cgraph_edge *cs = callers[i];
+  if (cs->caller == node)
+   {
+ self_recursive_calls.safe_push (cs);
+ callers.unordered_remove (i);
+   }
+}
   replace_trees = cinfo ? vec_safe_copy (cinfo->tree_map) : NULL;
   for (i = 0; i < count; i++)
 {
@@ -5129,16 +5139,6 @@ create_specialized_node (struct cgraph_node *node,
   if (replace_map)
vec_safe_push (replace_trees, replace_map);
 }
-  auto_vec self_recursive_calls;
-  for (i = callers.length () - 1; i >= 0; i--)
-{
-  cgraph_edge *cs = callers[i];
-  if (cs->caller == node)
-   {
- self_recursive_calls.safe_push (cs);
- callers.unordered_remove (i);
-   }
-}
 
   unsigned &suffix_counter = clone_num_suffixes->get_or_insert (
   IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (
diff --git a/gcc/testsuite/gcc.dg/ipa/pr104813.c 
b/gcc/testsuite/gcc.dg/ipa/pr104813.c
new file mode 100644
index 000..34f413e3823
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr104813.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O3"  } */
+
+int a, b, c, d, *e;
+void f(int h) {
+  if (b) {
+int g;
+while (g++)
+  d = *e;
+e++;
+  }
+}
+static void i();
+static void j(int *h, int k, int *l) {
+  if (c) {
+int *o = h, m;
+f(*l);
+i(m);
+j(o, 1, o);
+for (;;)
+  ;
+  }
+}
+void i() {
+  int *n = &a;
+  while (1)
+j(n, 1, n);
+}
+int main() {
+  j(&a, 0, &a);
+  return 0;
+}
-- 
2.35.1



Go patch committed: ignore function type result name in export data

2022-03-08 Thread Ian Lance Taylor via Gcc-patches
This patch to the Go frontend ignores the function type result name
when producing export data.  This change ensures that we never output
a result name in the export data if there is only a single result.
Previously we would output a ? if the single result had a name.  That
made the output unstable, because the hashing ignores the result name,
so whether we output a ? or not depended on how equal hash elements
were handled.  This is for GCC PR 104832.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
d94b2d7240906da100946b596050f1020b87415d
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index e68d2d967cc..d9b12695e5c 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-787fd4475f9d9101bc138d0b9763b0f5ecca89a9
+5042f7efbdb2d64537dfef53a19e96ee5ec4db2d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/types.cc b/gcc/go/gofrontend/types.cc
index 8267f1565ce..3de0bd3ae61 100644
--- a/gcc/go/gofrontend/types.cc
+++ b/gcc/go/gofrontend/types.cc
@@ -5303,7 +5303,7 @@ Function_type::do_export(Export* exp) const
   if (results != NULL)
 {
   exp->write_c_string(" ");
-  if (results->size() == 1 && results->begin()->name().empty())
+  if (results->size() == 1)
exp->write_type(results->begin()->type());
   else
{


Re: [Patch] Fortran: Fix gfc_maybe_dereference_var [PR104430]

2022-03-08 Thread Harald Anlauf via Gcc-patches

Hi Tobias,

Am 08.03.22 um 21:19 schrieb Tobias Burnus:

PS: Can I make you review my two pending patches? (NULL and SIZEOF) ;-)


I just approved the former one, but rather hope that Paul or Mikael
or somebody else would jump in on the other one.


PPS: I lost a bit track working on other things – are there patches
pending review?

PPPS: I think someone still has to deal with the approved and pending
patches by José ...


What did prevent them getting processed after approval?

Cheers,
Harald


Re: [PATCH] c++: merge default targs for function templates [PR65396]

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/3/22 16:06, Patrick Palka wrote:

We currently merge default template arguments for class templates, but
not for function templates.  This patch fixes this by splitting out the
argument merging logic in redeclare_class_template into a separate
function and using it in duplicate_decls as well.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/65396

gcc/cp/ChangeLog:

* cp-tree.h (merge_default_template_args): Declare.
* decl.cc (merge_default_template_args): Define, split out from
redeclare_class_template.
(duplicate_decls): Use it when merging member function template
and free function declarations.
* pt.cc (redeclare_class_template): Split out default argument
merging logic into merge_default_template_args.  Improve location
of a note when there's a template parameter kind mismatch.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/vt-34314.C: Adjust expected location of
"redeclared here" note.
* g++.dg/template/pr92440.C: Likewise.
* g++.old-deja/g++.pt/redecl1.C: Adjust expected location of
"redefinition of default argument" error.
* g++.dg/template/defarg23.C: New test.
* g++.dg/template/defarg23a.C: New test.
---
  gcc/cp/cp-tree.h|  1 +
  gcc/cp/decl.cc  | 60 -
  gcc/cp/pt.cc| 31 ++-
  gcc/testsuite/g++.dg/cpp0x/vt-34314.C   | 12 ++---
  gcc/testsuite/g++.dg/template/defarg23.C| 21 
  gcc/testsuite/g++.dg/template/defarg23a.C   | 24 +
  gcc/testsuite/g++.dg/template/pr92440.C |  4 +-
  gcc/testsuite/g++.old-deja/g++.pt/redecl1.C | 12 ++---
  8 files changed, 123 insertions(+), 42 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/defarg23.C
  create mode 100644 gcc/testsuite/g++.dg/template/defarg23a.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 8a44218611f..ea53e2d0ef2 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6783,6 +6783,7 @@ extern void note_iteration_stmt_body_end  (bool);
  extern void determine_local_discriminator (tree);
  extern int decls_match(tree, tree, bool = 
true);
  extern bool maybe_version_functions   (tree, tree, bool);
+extern bool merge_default_template_args(tree, tree, bool);
  extern tree duplicate_decls   (tree, tree,
 bool hiding = false,
 bool was_hidden = false);
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 23c06655bde..a0bce56c121 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1470,6 +1470,43 @@ duplicate_function_template_decls (tree newdecl, tree 
olddecl)
return false;
  }
  
+/* OLD_PARMS is the innermost set of template parameters for some template

+   declaration, and NEW_PARMS is the corresponding set of template parameters
+   for a redeclaration of that template.  Merge the default arguments within
+   these two sets of parameters.  CLASS_P is true iff the template in
+   question is a class template.  */
+
+bool
+merge_default_template_args (tree new_parms, tree old_parms, bool class_p)
+{
+  gcc_checking_assert (TREE_VEC_LENGTH (new_parms)
+  == TREE_VEC_LENGTH (old_parms));
+  for (int i = 0; i < TREE_VEC_LENGTH (new_parms); i++)
+{
+  tree new_parm = TREE_VALUE (TREE_VEC_ELT (new_parms, i));
+  tree old_parm = TREE_VALUE (TREE_VEC_ELT (old_parms, i));
+  tree& new_default = TREE_PURPOSE (TREE_VEC_ELT (new_parms, i));
+  tree& old_default = TREE_PURPOSE (TREE_VEC_ELT (old_parms, i));
+  if (new_default != NULL_TREE && old_default != NULL_TREE)
+   {
+ auto_diagnostic_group d;
+ error ("redefinition of default argument for %q+#D", new_parm);
+ inform (DECL_SOURCE_LOCATION (old_parm),
+ "original definition appeared here");
+ return false;
+   }
+  else if (new_default != NULL_TREE)
+   /* Update the previous template parameters (which are the ones
+  that will really count) with the new default value.  */
+   old_default = new_default;
+  else if (class_p && old_default != NULL_TREE)
+   /* Update the new parameters, too; they'll be used as the
+  parameters for any members.  */
+   new_default = old_default;
+}
+  return true;
+}
+
  /* If NEWDECL is a redeclaration of OLDDECL, merge the declarations.
 If the redeclaration is invalid, a diagnostic is issued, and the
 error_mark_node is returned.  Otherwise, OLDDECL is returned.
@@ -1990,7 +2027,23 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
hiding, bool was_hidden)
 template shall be specified on the initial declaration
 of the member function within the class template.  */
  

Re: [PATCH] c++: non-constant non-dependent decltype folding [PR104823]

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/8/22 16:57, Patrick Palka wrote:

On Tue, 8 Mar 2022, Jason Merrill wrote:


On 3/8/22 12:54, Patrick Palka wrote:



On Mon, 7 Mar 2022, Jason Merrill wrote:


On 3/7/22 14:41, Patrick Palka wrote:

instantiate_non_dependent_expr_sfinae instantiates only potentially
constant expressions


Hmm, that now strikes me as a problematic interface, as we don't know
whether
what we get back is template or non-template trees.

Maybe we want to change instantiate_non_dependent_expr to checking_assert
that
the argument is non-dependent (callers are already checking that), and
drop
the potentially-constant test?


That sounds like a nice improvement.  But it happens to break

template using type = decltype(N);

beause finish_decltype_type checks
instantiation_dependent_uneval_expression_p
(which is false here) instead of instantiation_dependent_expression_p
(which is true here) before calling instantiate_non_dependent_expr, so
we end up tripping over the proposed checking_assert (which checks the
latter stronger form of dependence).

I suspect other callers of instantiate_non_dependent_expr might have a
similar problem if they use a weaker dependence check than
instantiation_dependent_expression_p, e.g. build_noexcept_spec only
checks value_dependent_expression_p.

I wonder if we should relax the proposed checking_assert in i_n_d_e, or
strengthen the dependence checks performed by its callers, or something
else?


I think relax the assert to _uneval and strengthen callers that use value_dep.


Sounds good, like so?  Note this patch doesn't touch
instantiate_non_dependent_or_null or fold_non_dependent_expr, since the
former already never returns a templated tree, and callers of the latter
should only care about the constant-ness not template-ness of the result
IIUC.

Boostrapped and regtested on x86_64-pc-linux-gnu.

-- >8 --

Subject: [PATCH] c++: non-constant non-dependent decltype folding [PR104823]

When processing a non-dependent decltype operand we want to instantiate
it even if it's non-constant since non-dependent decltype is always
resolved ahead of time.  But currently finish_decltype_type uses
instantiate_non_dependent_expr, which instantiates only potentially
constant expressions, and this causes us to miss diagnosing the narrowing
conversion in S{id(v)} in the below testcase because we never instantiate
this non-constant non-dependent decltype operand.

In light of

   > On Mon, 7 Mar 2022, Jason Merrill wrote:
   >> On 3/7/22 14:41, Patrick Palka wrote:
   >>> instantiate_non_dependent_expr instantiates only potentially constant
   >>> expressions
   >>
   >> Hmm, that now strikes me as a problematic interface, as we don't know 
whether
   >> what we get back is template or non-template trees.

this patch drops the potentially-constant check in i_n_d_e, and turns
its dependence check into a checking_assert, since most callers already
check that the argument is non-dependent.  This patch also relaxes the
dependence check in i_n_d_e to use the _uneval version and strengthens
the dependence checks used by callers accordingly.

In cp_parser_parenthesized_expression_list_elt we were calling
instantiate_non_dependent_expr without first checking for non-dependence.
We could fix this by guarding the call appropriately, but I noticed we
also fold non-dependent attributes later from cp_check_const_attribute.
This double instantiation causes us to reject constexpr-attribute4.C
below due to the second folding seeing non-templated trees (an existing
bug).  Thus the right solution here seems to be to remove this unguarded
call to i_n_d_e so that we end up folding non-dependent attributes only
once.

Finally, after calling i_n_d_e in finish_decltype_type we need to keep
processing_template_decl cleared for sake of the later call to
lvalue_kind, which handles templated and non-templated COND_EXPR
differently.  Otherwise we'd incorrectly reject the declaration of g in
cpp0x/cond2.C with:

   error: 'g' declared as function returning a function

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/104823

gcc/cp/ChangeLog:

* except.cc (build_noexcept_spec): Strengthen dependence check
to instantiation_dependent_expression_p.
* parser.cc (cp_parser_parenthesized_expression_list_elt):
Remove fold_expr_p parameter and call
instantiate_non_dependent_expr.
(cp_parser_parenthesized_expression_list): Adjust accordingly.
* pt.cc (expand_integer_pack): Strengthen dependence check
to instantiation_dependent_expression_p.
(instantiate_non_dependent_expr_internal): Adjust comment.
(instantiate_non_dependent_expr_sfinae): Likewise.  Drop
the potentially-constant check, and relax and turn the
dependence check into a checking assert.
(instantiate_non_dependent_or_null): Adjust comment.
* semantics.cc (finish_decltype_type): Keep
processing_template_decl cleared after

Re: [PATCH] c++: Wrong error with alias template in class tmpl [PR104108]

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/8/22 17:14, Marek Polacek wrote:

In r10-6329 I tried to optimize the number of calls to v_d_e_p in
convert_nontype_argument by remembering whether the expression was
value-dependent in a bool flag.  I did that wrongly assuming that its
value-dependence will not be changed by build_converted_constant_expr.
This testcase shows that it can: b_c_c_e gets a VAR_DECL for m_parameter,
which is not value-dependent, but we're converting it to "const int &"
so it returns

   (const int &)(const int *) &m_parameter

which suddenly becomes value-dependent because of the added ADDR_EXPR:
has_value_dependent_address is now true because m_parameter's context S
is dependent.  With this bug in place, we went to the second branch here:

   if (TYPE_REF_OBJ_P (TREE_TYPE (expr)) && val_dep_p)
 /* OK, dependent reference.  We don't want to ask whether a DECL is
itself value-dependent, since what we want here is its address.  */;
   else
 {
   expr = build_address (expr);

   if (invalid_tparm_referent_p (type, expr, complain))
 return NULL_TREE;
 }

wherein build_address created a bad tree and then i_t_r_p complained.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/11?


OK.


PR c++/104108

gcc/cp/ChangeLog:

* pt.cc (convert_nontype_argument): Recompute
value_dependent_expression_p after build_converted_constant_expr.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-74.C: New test.
---
  gcc/cp/pt.cc   | 4 +++-
  gcc/testsuite/g++.dg/cpp0x/alias-decl-74.C | 9 +
  2 files changed, 12 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-74.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8b5faeed8ea..b8bf533a747 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7316,7 +7316,7 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
if (non_dep)
  expr = instantiate_non_dependent_expr_internal (expr, complain);
  
-  const bool val_dep_p = value_dependent_expression_p (expr);

+  bool val_dep_p = value_dependent_expression_p (expr);
if (val_dep_p)
  expr = canonicalize_expr_argument (expr, complain);
else
@@ -7357,6 +7357,8 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
  expr = maybe_constant_value (expr, NULL_TREE,
   /*manifestly_const_eval=*/true);
  expr = convert_from_reference (expr);
+ /* EXPR may have become value-dependent.  */
+ val_dep_p = value_dependent_expression_p (expr);
}
else if (TYPE_PTR_OR_PTRMEM_P (type))
{
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-74.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-74.C
new file mode 100644
index 000..8382d856382
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-74.C
@@ -0,0 +1,9 @@
+// PR c++/104108
+// { dg-do compile { target c++11 } }
+
+template class T>
+struct S {
+  static int m_parameter;
+  template class TT>
+  using U = TT;
+};

base-commit: b7175f36812b32d3de242f15c065b9cb68e957a9




Re: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/8/22 17:17, Patrick Palka wrote:

On Tue, 8 Mar 2022, Jason Merrill wrote:


On 3/8/22 14:38, Patrick Palka wrote:

On Tue, 8 Mar 2022, Jason Merrill wrote:


On 3/8/22 11:36, Patrick Palka wrote:

On Mon, 7 Mar 2022, Jason Merrill wrote:


On 3/7/22 10:47, Patrick Palka wrote:

On Fri, 4 Mar 2022, Jason Merrill wrote:


On 3/4/22 14:24, Patrick Palka wrote:

Here we're failing to communicate to cp_finish_decl from
tsubst_expr
that we're in a copy-initialization context (via the
LOOKUP_ONLYCONVERTING
flag), which causes do_class_deduction to always consider
explicit
deduction guides when performing CTAD for a templated variable
initializer.

We could fix this by passing LOOKUP_ONLYCONVERTING appropriately
when
calling cp_finish_decl from tsubst_expr, but it seems
do_class_deduction
can determine if we're in a copy-init context by simply
inspecting
the
initializer, and thus render its flags parameter unnecessary,
which
is
what this patch implements.  (If we were to fix this in
tsubst_expr
instead, I think we'd have to inspect the initializer in the
same
way
in order to detect a copy-init context?)


Hmm, does this affect conversions as well?

Looks like it does:

struct A
{
  explicit operator int();
};

template  void f()
{
  T t = A();
}

int main()
{
  f(); // wrongly accepted
}

The reverse, initializing via an explicit constructor, is caught
by
code
in
build_aggr_init much like the code your patch adds to
do_auto_deduction;
perhaps we should move/copy that code to cp_finish_decl?


Ah, makes sense.  Moving that code from build_aggr_init to
cp_finish_decl broke things, but using it in both spots seems to
work
well.  And I suppose we might as well use it in do_class_deduction
too,
since doing so lets us remove the flags parameter.


Before removing the flags parameter please try asserting that it now
matches
is_copy_initialization and see if anything breaks.


I added to do_class_deduction:

 gcc_assert (bool(flags & LOOKUP_ONLYCONVERTING) ==
is_copy_initialization
(init));

Turns out removing the flags parameter breaks CTAD for new-expressions
of the form 'new TT(x)' because in this case build_new passes just 'x'
as the initializer to do_auto_deduction (as opposed to a single
TREE_LIST),
for which is_copy_initialization returns true even though it's really
direct initalization.

Also turns out we're similarly not passing the right LOOKUP_* flags to
cp_finish_decl from instantiate_body, which breaks consideration of
explicit conversions/deduction guides when instantiating the initializer
of a static data member.  I added some xfailed testcases for these
situations.


Maybe we want to check is_copy_initialization in cp_finish_decl?


That seems to work nicely :) All xfailed tests for the static data
member initialization case now also pass.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

Subject: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
flag), which causes us to always consider explicit deduction guides when
performing CTAD for a templated variable initializer.

It turns out this bug also affects consideration of explicit conversion
operators for the same reason.  But consideration of explicit constructors
seems to do the right thing thanks to code in build_aggr_init that sets
LOOKUP_ONLYCONVERTING when the initializer represents copy-initialization.

This patch fixes this by making cp_finish_decl set LOOKUP_ONLYCONVERTING
by inspecting the initializer like build_aggr_init does, so that callers
don't need to explicitly pass this flag.

PR c++/102137
PR c++/87820

gcc/cp/ChangeLog:

* cp-tree.h (is_copy_initialization): Declare.
* decl.cc (cp_finish_decl): Set LOOKUP_ONLYCONVERTING
when is_copy_initialization is true.
* init.cc (build_aggr_init): Split out copy-initialization
check into ...
(is_copy_initialization): ... here.
* pt.cc (instantiate_decl): Pass 0 instead of
LOOKUP_ONLYCONVERTING as flags to cp_finish_decl.


Any reason not to use LOOKUP_NORMAL, both here and in tsubst_expr?  OK either
way.


Thanks a lot.  I'm not really sure what the consequences of using
LOOKUP_NORMAL (which implies LOOKUP_PROTECT) instead of 0 would be here,
and there's a bunch of other callers of cp_finish_decl which pass 0 or
some other value that excludes LOOKUP_PROTECT.  So I figure such a change
would be better off as a separate patch that changes/audits all such
cp_finish_decl at once.


Agreed.  Not important, just curious.




gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/explicit15.C: New test.
* g++.dg/cpp1z/class-deduction108.C: New test.
---
   gcc/cp/cp-tree.h  |  1 +
   gcc/cp/decl.cc|  3 +
   gcc/cp/init.cc

Re: [PATCH] c++: naming a dependently-scoped template for CTAD [PR104641]

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/2/22 14:32, Patrick Palka wrote:

In order to be able to perform CTAD for a dependently-scoped template
such as A::B in the testcase below, we need to permit a
typename-specifier to resolve to a template as per [dcl.type.simple]/2,
at least when it appears in a CTAD-enabled context.

This patch implements this using a new tsubst flag tf_tst_ok to control
when a TYPENAME_TYPE is allowed to name a template, and sets this flag
when substituting into the type of a CAST_EXPR, CONSTRUCTOR or VAR_DECL
(each of which is a CTAD-enabled context).


What breaks if we always allow that, or at least in -std that support CTAD?


Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk perhaps?

PR c++/104641

gcc/cp/ChangeLog:

* cp-tree.h (tsubst_flags::tf_tst_ok): New flag.
* decl.cc (make_typename_type): Allow a typename-specifier to
resolve to a template when tf_tst_ok.
* pt.cc (tsubst_decl) : Set tf_tst_ok when
substituting the type.
(tsubst): Clear tf_tst_ok and remember if it was set.
: Pass tf_tst_ok to make_typename_type
appropriately.  Do make_template_placeholder when when
make_typename_type returns a TEMPLATE_DECL.
(tsubst_copy) : Set tf_tst_ok when substituting
the type.
(tsubst_copy_and_build) : Likewise.
: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction107.C: New test.
---
  gcc/cp/cp-tree.h  |  2 ++
  gcc/cp/decl.cc| 14 +---
  gcc/cp/pt.cc  | 35 +++
  .../g++.dg/cpp1z/class-deduction107.C | 20 +++
  4 files changed, 61 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction107.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index da63d51d9bc..8a44218611f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5551,6 +5551,8 @@ enum tsubst_flags {
(build_target_expr and friends) */
tf_norm = 1 << 11, /* Build diagnostic information during
constraint normalization.  */
+  tf_tst_ok = 1 << 12,/* Allow a typename-specifier to name
+   a template.  */
/* Convenient substitution flags combinations.  */
tf_warning_or_error = tf_warning | tf_error
  };
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 7f80f9d4d7a..23c06655bde 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -4148,10 +4148,16 @@ make_typename_type (tree context, tree name, enum 
tag_types tag_type,
  }
if (!want_template && TREE_CODE (t) != TYPE_DECL)
  {
-  if (complain & tf_error)
-   error ("% names %q#T, which is not a type",
-  context, name, t);
-  return error_mark_node;
+  if ((complain & tf_tst_ok) && DECL_TYPE_TEMPLATE_P (t))
+   /* The caller permits this typename-specifier to name a template
+  (because it appears in a CTAD-enabled context).  */;
+  else
+   {
+ if (complain & tf_error)
+   error ("% names %q#T, which is not a type",
+  context, name, t);
+ return error_mark_node;
+   }
  }
  
if (!check_accessibility_of_qualified_id (t, /*object_type=*/NULL_TREE,

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8fb17349ee1..18a21572ce3 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -14926,7 +14926,10 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
complain)
&& VAR_HAD_UNKNOWN_BOUND (t)
&& type != error_mark_node)
  type = strip_array_domain (type);
-   type = tsubst (type, args, complain, in_decl);
+   tsubst_flags_t tcomplain = complain;
+   if (VAR_P (t))
+ tcomplain |= tf_tst_ok;
+   type = tsubst (type, args, tcomplain, in_decl);
/* Substituting the type might have recursively instantiated this
   same alias (c++/86171).  */
if (gen_tmpl && DECL_ALIAS_TEMPLATE_P (gen_tmpl)
@@ -15612,6 +15615,9 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
bool fndecl_type = (complain & tf_fndecl_type);
complain &= ~tf_fndecl_type;
  
+  bool tst_ok = (complain & tf_tst_ok);

+  complain &= ~tf_tst_ok;
+
if (type
&& code != TYPENAME_TYPE
&& code != TEMPLATE_TYPE_PARM
@@ -16199,8 +16205,10 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
  return error_mark_node;
  }
  
-	f = make_typename_type (ctx, f, typename_type,

-   complain | tf_keep_type_decl);
+   tsubst_flags_t tcomplain = complain | tf_keep_type_decl;
+   if (tst_ok)
+ tcomplain |= tf_tst_ok;
+   f = make_typename_type (ctx, f, typename_type, tcomplain);
if (f == error_mark_node)
  return f;
if (TREE_CODE (f) == TYPE_DE

Re: [PATCH] c, c++, c-family: -Wshift-negative-value and -Wshift-overflow* tweaks for -fwrapv and C++20+ [PR104711]

2022-03-08 Thread Jason Merrill via Gcc-patches

On 3/2/22 05:22, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, different standards have different definition
on what is an UB left shift.  They all agree on out of bounds (including
negative) shift count.
The rules used by ubsan are:
C99-C2x ((unsigned) x >> (uprecm1 - y)) != 0 then UB
C++11-C++17 x < 0 || ((unsigned) x >> (uprecm1 - y)) > 1 then UB
C++20 and later everything is well defined
Now, for C++20, I've in the P1236R1 implementation added an early
exit for -Wshift-overflow* warning so that it never warns, but apparently
-Wshift-negative-value remained as is.  As it is well defined in C++20,
the following patch doesn't enable -Wshift-negative-value from -Wextra
anymore for C++20 and later, if users want for compatibility with C++17
and earlier get the warning, they still can by using -Wshift-negative-value
explicitly.
Another thing is -fwrapv, that is an extension to the standards, so it is up
to us how exactly we define that case.  Our ubsan code treats
TYPE_OVERFLOW_WRAPS (type0) and cxx_dialect >= cxx20 the same as only
diagnosing out of bounds shift count and nothing else and IMHO it is most
sensical to treat -fwrapv signed left shifts the same as C++20 treats
them, https://eel.is/c++draft/expr.shift#2
"The value of E1 << E2 is the unique value congruent to E1×2^E2 modulo 2^N,
where N is the width of the type of the result.
[Note 1: E1 is left-shifted E2 bit positions; vacated bits are zero-filled.
— end note]"
with no UB dependent on the E1 values.  The UB is only
"The behavior is undefined if the right operand is negative, or greater
than or equal to the width of the promoted left operand."
Under the hood (except for FEs and ubsan from FEs) GCC middle-end doesn't
consider UB in left shifts dependent on the first operand's value, only
the out of bounds shifts.

While this change isn't a regression, I'd think it is useful for GCC 12,
it doesn't add new warnings, but just removes warnings that aren't
appropriate.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2022-03-02  Jakub Jelinek  

PR c/104711
gcc/
* doc/invoke.texi (-Wextra): Document that -Wshift-negative-value
is enabled by it only for C++11 to C++17 rather than for C++03 or
later.
(-Wshift-negative-value): Similarly (except here we stated
that it is enabled for C++11 or later).
gcc/c-family/
* c-opts.cc (c_common_post_options): Don't enable
-Wshift-negative-value from -Wextra for C++20 or later.
* c-ubsan.cc (ubsan_instrument_shift): Adjust comments.
* c-warn.cc (maybe_warn_shift_overflow): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
gcc/c/
* c-fold.cc (c_fully_fold_internal): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
* c-typeck.cc (build_binary_op): Likewise.
gcc/cp/
* constexpr.cc (cxx_eval_check_shift_p): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
* typeck.cc (cp_build_binary_op): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
gcc/testsuite/
* c-c++-common/Wshift-negative-value-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-negative-value-2.c: Likewise.
* c-c++-common/Wshift-negative-value-3.c: Likewise.
* c-c++-common/Wshift-negative-value-4.c: Likewise.
* c-c++-common/Wshift-negative-value-7.c: New test.
* c-c++-common/Wshift-negative-value-8.c: New test.
* c-c++-common/Wshift-negative-value-9.c: New test.
* c-c++-common/Wshift-negative-value-10.c: New test.
* c-c++-common/Wshift-overflow-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-overflow-2.c: Likewise.
* c-c++-common/Wshift-overflow-5.c: Likewise.
* c-c++-common/Wshift-overflow-6.c: Likewise.
* c-c++-common/Wshift-overflow-7.c: Likewise.
* c-c++-common/Wshift-overflow-8.c: New test.
* c-c++-common/Wshift-overflow-9.c: New test.
* c-c++-common/Wshift-overflow-10.c: New test.
* c-c++-common/Wshift-overflow-11.c: New test.
* c-c++-common/Wshift-overflow-12.c: New test.

--- gcc/doc/invoke.texi.jj  2022-02-25 10:46:53.085181500 +0100
+++ gcc/doc/invoke.texi 2022-03-01 09:59:15.040855224 +0100
@@ -5809,7 +5809,7 @@ name is still supported, but the newer n
  -Wredundant-move @r{(only for C++)}  @gol
  -Wtype-limits  @gol
  -Wuninitialized  @gol
--Wshift-negative-value @r{(in C++03 and in C99 and newer)}  @gol
+-Wshift-negative-value @r{(in C++11 to C++17 and in C99 and newer)}  @gol
  -Wunused-parameter @r{(only with} @option{-Wunused} @r{or} 
@option{-Wall}@r{)} @gol
  -Wunused-but-set-parameter @r{(only with} @option{-Wunused} @r{or} 
@option

Re: [Patch] Fortran: Fix gfc_conv_gfc_desc_to_cfi_desc with NULL [PR104126]

2022-03-08 Thread Tobias Burnus

Hi Harald,

On 08.03.22 22:44, Harald Anlauf wrote:

Am 07.03.22 um 15:16 schrieb Tobias Burnus:

Pre-remark: Related NULL, there some accepts-invalid issues, not
addressed in this
patch. See https://gcc.gnu.org/PR104819

This patch fixes an ICE (12 regression) with NULL() that has no MOLD
argument.

the patch does fix the ICE.  But given your short pre-remark:
are you saying that the testcase is invalid, and with the patch
we silently accept it now?


Sorry for being confusing. I also believe the testcase of the just
committed patch is valid Fortran.

However, when fixing this PR, I was looking at the spec – and saw that
GCC accepts invalid code using NULL(), which is not diagnosed. Those
issues are orthogonal to this patch, except that the accepts-invalid
issues also are about NULL().

Thanks for the review!

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] vect: fix out-of-bound access in supports_vec_convert_optab_p [PR 104851]

2022-03-08 Thread Xi Ruoyao via Gcc-patches
This should be obvious, OK for trunk?

-- >8 --

Calling VECTOR_MODE_P with MAX_MACHINE_MODE has caused out-of-bound
access.

gcc/

PR tree-optimization/104851
* optabs-query.cc (supports_vec_convert_optab_p): Fix off-by-one
error.
---
 gcc/optabs-query.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc
index 713c098ba4e..68dc679cc6a 100644
--- a/gcc/optabs-query.cc
+++ b/gcc/optabs-query.cc
@@ -720,7 +720,7 @@ static bool
 supports_vec_convert_optab_p (optab op, machine_mode mode)
 {
   int start = mode == VOIDmode ? 0 : mode;
-  int end = mode == VOIDmode ? MAX_MACHINE_MODE : mode;
+  int end = mode == VOIDmode ? MAX_MACHINE_MODE - 1 : mode;
   for (int i = start; i <= end; ++i)
 if (VECTOR_MODE_P ((machine_mode) i))
   for (int j = MIN_MODE_VECTOR_INT; j < MAX_MODE_VECTOR_INT; ++j)
-- 
2.35.1




Re: [PATCH] Check if loading const from mem is faster

2022-03-08 Thread Jiufu Guo via Gcc-patches


Hi!

Richard Biener  writes:

> On Tue, 8 Mar 2022, Jiufu Guo wrote:
>
>> Jiufu Guo  writes:
>> 
>> Hi!
>> 
>> > Hi Sehger,
>> >
>> > Segher Boessenkool  writes:
>> >
>> >> On Tue, Mar 01, 2022 at 10:28:57PM +0800, Jiufu Guo wrote:
>> >>> Segher Boessenkool  writes:
>> >>> > No.  insn_cost is only for correct, existing instructions, not for
>> >>> > made-up nonsense.  I created insn_cost precisely to get away from that
>> >>> > aspect of rtx_cost (and some other issues, like, it is incredibly hard
>> >>> > and cumbersome to write a correct rtx_cost).
>> >>> 
>> >>> Thanks! The implementations of hook insn_cost are align with this
>> >>> design, they are  checking insn's attributes and COSTS_N_INSNS.
>> >>> 
>> >>> One question on the speciall case: 
>> >>> For instruction: "r119:DI=0x100803004101001"
>> >>> Would we treat it as valid instruction?
>> >>
>> >> Currently we do, alternative 6 in *movdi_internal64: we allow any r<-n.
>> >> This is costed as 5 insns (cost=20).
>> >>
>> >> It generally is better to split things into patterns close to the
>> >> eventual machine isntructions as early as possible: all the more generic
>> >> optimisations can take advantage of that then.
>> > Get it!
>> >>
>> >>> A patch, which is attached the end of this mail, accepts
>> >>> "r119:DI=0x100803004101001" as input of insn_cost.
>> >>> In this patch, 
>> >>> - A tmp instruction is generated via make_insn_raw.
>> >>> - A few calls to rtx_cost (in cse_insn) is replaced by insn_cost.
>> >>> - In hook of insn_cost, checking the special 'constant' instruction.
>> >>> Are these make sense?
>> >>
>> >> I'll review that patch inline.
>> 
>> I drafted a new patch that replace rtx_cost with insn_cost for cse.cc.
>> Different from the previous partial patch, this patch replaces all usage
>> of rtx_cost. It may be better/aggressive than previous one.
>
> I think there's no advantage for using insn_cost over rtx_cost for
> the simple SET case.

Thanks for your comments and raise this concern.

For those targets which do not implement insn_cost, insn_cost calls
rtx_cost through pattern_cost, then insn_cost is equal to rtx_cost.

While, for those targets which have insn_cost, it seems insn_cost would
be better(or say more accurate/consistent?) than rtx_cost. Since:
- insn_cost recog the insn first, and compute cost through something
(like length/cost attributes from .md file) for the 'machine insn'.
- rtx_cost estimates the cost through analyzing the 'rtx content'.
The accurate estimation relates to the context.

For a special example: "%r100 = C", as a previous patch, by tunning
target's rtx_cost hook, cost could be computed according to the value
of C. insn_cost may just model the cost in the define of the machine
instruction.

These reasons are my initial thoughts.  Segher may have better
explain. :-) 

To replace rtx_cost with insn_cost, this patch build a SET instruction:
"%r = rtx_expr", then using "%r = rtx_expr" from insn_cost to simulate
the cost of "rtx_expr" from rtx_cost.


BR,
Jiufu

>
> Richard.
>
>> With this patch, bootstrap pass.
>> From regtest, only output of fusion-p10-ldcmpi.c is changed, and the
>> change seems as expected.
>> 
>> 
>> BR,
>> Jiufu
>> 
>> diff --git a/gcc/cse.cc b/gcc/cse.cc
>> index a18b599d324..e623ad298db 100644
>> --- a/gcc/cse.cc
>> +++ b/gcc/cse.cc
>> @@ -262,6 +262,9 @@ static struct qty_table_elem *qty_table;
>>  static rtx_insn *this_insn;
>>  static bool optimize_this_for_speed_p;
>>  
>> +/* Used for insn_cost. */
>> +static rtx_insn *estimate_insn;
>> +
>>  /* Index by register number, gives the number of the next (or
>> previous) register in the chain of registers sharing the same
>> value.
>> @@ -445,7 +448,7 @@ struct table_elt
>>  /* Compute cost of X, as stored in the `cost' field of a table_elt.  Fixed
>> hard registers and pointers into the frame are the cheapest with a cost
>> of 0.  Next come pseudos with a cost of one and other hard registers with
>> -   a cost of 2.  Aside from these special cases, call `rtx_cost'.  */
>> +   a cost of 2.  Aside from these special cases, call `insn_cost'.  */
>>  
>>  #define CHEAP_REGNO(N)  
>> \
>>(REGNO_PTR_FRAME_P (N)\
>> @@ -698,18 +701,33 @@ preferable (int cost_a, int regcost_a, int cost_b, int 
>> regcost_b)
>> from COST macro to keep it simple.  */
>>  
>>  static int
>> -notreg_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno)
>> +notreg_cost (rtx x, machine_mode mode, enum rtx_code /*outer*/, int 
>> /*opno*/)
>>  {
>>scalar_int_mode int_mode, inner_mode;
>> -  return ((GET_CODE (x) == SUBREG
>> -   && REG_P (SUBREG_REG (x))
>> -   && is_int_mode (mode, &int_mode)
>> -   && is_int_mode (GET_MODE (SUBREG_REG (x)), &inner_mode)
>> -   && GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode)
>> -   && subreg_lowpart_p (x)
>> -   && TRULY_NOOP_TRUNCATION_MODES_P (int_mode, i

Re: [PATCH] simplify-rtx: Fix up SUBREG_PROMOTED_SET arguments [PR104839]

2022-03-08 Thread Richard Biener via Gcc-patches
On Tue, 8 Mar 2022, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled on powerpc64le-linux at -O1 and higher
> (except for -Og).  The bug was introduced in r12-3252-gcad36f38576a6a7
> which for SIGN_EXTEND from SUBREG_PROMOTED_SIGNED_P SUBREG used
> SUBREG_PROMOTED_SET (temp, 1) (but that makes temp
> SUBREG_PROMOTED_UNSIGNED_P because SRP_UNSIGNED is 1) and similarly the
> ZERO_EXTEND from SUBREG_PROMOTED_UNSIGNED_P SUBREG used
> SUBREG_PROMOTED_SET (temp, 0) (but that makes temp
> SUBREG_PROMOTED_SIGNED_P because SRP_SIGNED is 0).
> The following patch fixes that (swaps the 0s and 1s), but for better
> readability uses the SRP_* constants.
> rtl.h has:
> /* Valid for subregs which are SUBREG_PROMOTED_VAR_P().  In that case
>this gives the necessary extensions:
>0  - signed (SPR_SIGNED)
>1  - normal unsigned (SPR_UNSIGNED)
>2  - value is both sign and unsign extended for mode
> (SPR_SIGNED_AND_UNSIGNED).
>-1 - pointer unsigned, which most often can be handled like unsigned
> extension, except for generating instructions where we need to
> emit special code (ptr_extend insns) on some architectures
> (SPR_POINTER). */
> The expr.c change in the same commit looks ok to me (passes unsignedp
> to SUBREG_PROMOTED_SET, so 0 for signed, 1 for unsigned).
> 
> Starting bootstrap/regtest on powerpc64{,le}-linux now, ok for trunk?

OK.

> 2022-03-08  Jakub Jelinek  
> 
>   PR rtl-optimization/104839
>   * simplify-rtx.cc (simplify_unary_operation_1) :
>   Use SRP_SIGNED instead of incorrect 1 in SUBREG_PROMOTED_SET.
>   (simplify_unary_operation_1) : Use SRP_UNSIGNED
>   instead of incorrect 0 in SUBREG_PROMOTED_SET.
> 
>   * gcc.c-torture/execute/pr104839.c: New test.
> 
> --- gcc/simplify-rtx.cc.jj2022-02-23 09:17:04.0 +0100
> +++ gcc/simplify-rtx.cc   2022-03-08 16:31:20.823246404 +0100
> @@ -1527,7 +1527,7 @@ simplify_context::simplify_unary_operati
> if (partial_subreg_p (temp))
>   {
> SUBREG_PROMOTED_VAR_P (temp) = 1;
> -   SUBREG_PROMOTED_SET (temp, 1);
> +   SUBREG_PROMOTED_SET (temp, SRP_SIGNED);
>   }
> return temp;
>   }
> @@ -1662,7 +1662,7 @@ simplify_context::simplify_unary_operati
> if (partial_subreg_p (temp))
>   {
> SUBREG_PROMOTED_VAR_P (temp) = 1;
> -   SUBREG_PROMOTED_SET (temp, 0);
> +   SUBREG_PROMOTED_SET (temp, SRP_UNSIGNED);
>   }
> return temp;
>   }
> --- gcc/testsuite/gcc.c-torture/execute/pr104839.c.jj 2022-03-08 
> 16:46:51.418440078 +0100
> +++ gcc/testsuite/gcc.c-torture/execute/pr104839.c2022-03-08 
> 16:46:27.044774203 +0100
> @@ -0,0 +1,37 @@
> +/* PR rtl-optimization/104839 */
> +
> +__attribute__((noipa)) short
> +foo (void)
> +{
> +  return -1;
> +}
> +
> +__attribute__((noipa)) int
> +bar (void)
> +{
> +  short i = foo ();
> +  if (i == -2)
> +return 2;
> +  long k = i;
> +  int j = -1;
> +  volatile long s = 300;
> +  if (k < 0)
> +{
> +  k += s;
> +  if (k < 0)
> + j = 0;
> +}
> +  else if (k >= s)
> +j = 0;
> +  if (j != -1)
> +return 1;
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  if (bar () != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] x86: Define LIBGCC2_UNWIND_ATTRIBUTE on ia32 [PR104781]

2022-03-08 Thread Richard Biener via Gcc-patches
On Tue, 8 Mar 2022, H.J. Lu wrote:

> On Tue, Mar 8, 2022 at 9:35 AM Jakub Jelinek  wrote:
> >
> > On Tue, Mar 08, 2022 at 08:09:25AM -0800, H.J. Lu wrote:
> > > > Ok.  So, what do you think about replacing the libgcc/ part of your 
> > > > patch
> > > > with that
> > > > /* __builtin_eh_return can't handle stack realignment, so disable SSE in
> > > >32-bit libgcc functions that call it.  */
> > > > #ifndef __x86_64__
> > > > #define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> > > > #endif
> > > > ?
> > >
> > > Yes, it should work.
> >
> > So, how do we move on with this?
> > I can't self-approve my own patch, so can anyone please ack the following
> > provided it passes bootstraps/regtests ({x86_64,i686}-linux) that are
> > currently pending?
> >
> > That can go in independently from your patch, and if it is committed,
> > your V3 patch with the libgcc/ hunks removed is preapproved for trunk.
> >
> > 2022-03-08  Jakub Jelinek  
> >
> > PR target/104781
> > * config/i386/i386.h (LIBGCC2_UNWIND_ATTRIBUTE): Define for ia32.
> >
> > --- gcc/config/i386/i386.h.jj   2022-02-25 12:06:45.535493490 +0100
> > +++ gcc/config/i386/i386.h  2022-03-08 11:20:43.207043370 +0100
> > @@ -2848,6 +2848,12 @@ extern enum attr_cpu ix86_schedule;
> >  #define NUM_X86_64_MS_CLOBBERED_REGS 12
> >  #endif
> >
> > +/* __builtin_eh_return can't handle stack realignment, so disable SSE in
> > +   32-bit libgcc functions that call it.  */
> > +#ifndef __x86_64__
> > +#define LIBGCC2_UNWIND_ATTRIBUTE __attribute__((target ("no-sse")))
> > +#endif
> > +
> >  /*
> >  Local variables:
> >  version-control: t
> >
> >
> > Jakub
> >
> 
> LGTM.

I wonder if this is a good case for general-regs-only instead?  At
least no-sse cannot be functionally equivalent (since then we would
not have needed general-regs-only ...).

Richard.


Re: [PATCH v3] libgo: Don't use pt_regs member in mcontext_t

2022-03-08 Thread Sören Tempel via Gcc-patches
Ian Lance Taylor  wrote:
> Have you tested this in 32-bit mode?  It does not look correct based
> on the glibc definitions.  Looking at glibc it seems that it ought to
> be

As stated in the commit message, I have only tested this on Alpine Linux
ppc64le (which uses musl libc). Unfortunately, I don't have access to a
32-bit PowerPC machine and hence haven't performed any tests with it.

> reg.sigpc = ((ucontext_t*)(context))->uc_mcontext.uc_regs->gregs[32];

While this should work with glibc, it doesn't work with musl. In order
to support both (musl and glibc) on 32-bit PowerPC, we would have to do
something along the lines of:

#ifdef __PPC__
#if defined(__PPC64__)   /* ppc64 glibc & musl */
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gp_regs[32]
#elif defined(__GLIBC__) /* ppc32 glibc */
reg.sigpc = ((ucontext_t*)(context))->uc_mcontext.uc_regs->gregs[32];
#else/* ppc32 musl */
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32];
#endif /* __PPC64__ */
#endif /* __PPC__ */

In light of these observations, maybe using asm/ptrace.h and .regs (as
proposed in the v1 patch) is the "better" (i.e. more readable) solution
for now? I agree with Rich that using .regs is certainly a "code smell",
but this gigantic ifdef block also looks pretty smelly to me. That being
said, I can also send a v4 which uses this ifdef block.

Greetings,
Sören


  1   2   >