[PATCH] libstdc++: vxworks: remove stray include

2022-03-04 Thread Rasmus Villemoes
There doesn't seem to be any reason for this TU to include
, and it causes errors when the resulting libstdc++ is used
on our VxWorks 5.5 target - presumably because now libstdc++ itself
contains an instance of std::ios_base::Init. Which should be mostly
harmless, but apparently isn't, and from a QoI viewpoint should
probably be avoided anyway.
---
 libstdc++-v3/config/locale/vxworks/ctype_members.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libstdc++-v3/config/locale/vxworks/ctype_members.cc 
b/libstdc++-v3/config/locale/vxworks/ctype_members.cc
index 82569d075c6..d8ca551078d 100644
--- a/libstdc++-v3/config/locale/vxworks/ctype_members.cc
+++ b/libstdc++-v3/config/locale/vxworks/ctype_members.cc
@@ -33,7 +33,6 @@
 #include 
 #include 
 #include 
-#include 
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
-- 
2.31.1



[PATCH] gimplify: Clear TREE_READONLY on automatic vars being stored into [PR104529]

2022-03-04 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase regressed when SRA started punting on stores to
TREE_READONLY vars.  We document that:
"In a VAR_DECL, PARM_DECL or FIELD_DECL, or any kind of ..._REF node,
nonzero means it may not be the lhs of an assignment."
so the SRA change looks desirable.  On the other side, at least in this
testcase the TREE_READONLY is set there intentionally from the
PR85873 fix, because gimplify_init_constructor itself uses TREE_READONLY
on the object to determine if it can perform promotion to static const
or not.

So, similarly to other spots in the gimplifier where we also clear
TREE_READONLY when we emit IL that stores into the object, this
does the same in gimplify_init_constructor, but in the way so that
the TREE_READONLY test for the promotion to static const keeps working
and doesn't change anything for notify_temp_creation mode, which doesn't
emit any IL, just tests if it would need a temporary or not.

This keeps PR85873 testcase working as before and fixes this regression.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-03-04  Jakub Jelinek  

PR middle-end/104529
* gimplify.cc (gimplify_init_constructor): Clear TREE_READONLY
on automatic objects which will be runtime initialized.

* g++.dg/tree-ssa/pr104529.C: New test.

--- gcc/gimplify.cc.jj  2022-03-03 09:13:16.0 +0100
+++ gcc/gimplify.cc 2022-03-03 15:21:02.270198275 +0100
@@ -5120,6 +5120,12 @@ gimplify_init_constructor (tree *expr_p,
  {
if (notify_temp_creation)
  return GS_OK;
+
+   /* The var will be initialized and so appear on lhs of
+  assignment, it can't be TREE_READONLY anymore.  */
+   if (VAR_P (object))
+ TREE_READONLY (object) = 0;
+
is_empty_ctor = true;
break;
  }
@@ -5171,6 +5177,11 @@ gimplify_init_constructor (tree *expr_p,
break;
  }
 
+   /* The var will be initialized and so appear on lhs of
+  assignment, it can't be TREE_READONLY anymore.  */
+   if (VAR_P (object) && !notify_temp_creation)
+ TREE_READONLY (object) = 0;
+
/* If there are "lots" of initialized elements, even discounting
   those that are not address constants (and thus *must* be
   computed at runtime), then partition the constructor into
--- gcc/testsuite/g++.dg/tree-ssa/pr104529.C.jj 2022-03-03 14:57:30.216939375 
+0100
+++ gcc/testsuite/g++.dg/tree-ssa/pr104529.C2022-03-03 14:57:23.002040380 
+0100
@@ -0,0 +1,20 @@
+// PR middle-end/104529
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -fdump-tree-optimized" }
+// { dg-final { scan-tree-dump-not "MEM\[^\n\r]*MEM" "optimized" } }
+
+#include 
+#include 
+
+struct S {
+  unsigned int a;
+  std::vector b;
+  std::vector c;
+};
+
+std::size_t
+foo ()
+{
+  S test[] = { { 48, { 255, 0, 0, 0, 0, 0 } } };
+  return sizeof (test);
+}

Jakub



Re: [PATCH] libstdc++: vxworks: remove stray include

2022-03-04 Thread Jonathan Wakely via Gcc-patches
On Fri, 4 Mar 2022 at 08:28, Rasmus Villemoes wrote:
>
> There doesn't seem to be any reason for this TU to include
> , and it causes errors when the resulting libstdc++ is used
> on our VxWorks 5.5 target - presumably because now libstdc++ itself
> contains an instance of std::ios_base::Init. Which should be mostly
> harmless, but apparently isn't, and from a QoI viewpoint should
> probably be avoided anyway.

100% agreed, thanks.


> ---
>  libstdc++-v3/config/locale/vxworks/ctype_members.cc | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/libstdc++-v3/config/locale/vxworks/ctype_members.cc 
> b/libstdc++-v3/config/locale/vxworks/ctype_members.cc
> index 82569d075c6..d8ca551078d 100644
> --- a/libstdc++-v3/config/locale/vxworks/ctype_members.cc
> +++ b/libstdc++-v3/config/locale/vxworks/ctype_members.cc
> @@ -33,7 +33,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>
>  namespace std _GLIBCXX_VISIBILITY(default)
>  {
> --
> 2.31.1
>



[i386 PATCH] PR 104732: Simplify/fix DI mode logic expansion/splitting on -m32.

2022-03-04 Thread Roger Sayle

This clean-up patch resolves PR testsuite/104732, the failure of the recent
test gcc.target/i386/pr100711-1.c on 32-bit Solaris/x86.  Rather than just
tweak the testcase, the proposed approach is to fix the underlying problem
by removing the "TARGET_STV && TARGET_SSE2" conditionals from the DI mode
logical operation expanders and pre-reload splitters in i386.md, which as
I'll show generate inferior code (even a GCC 12 regression) on !TARGET_64BIT
whenever -mno-stv (such as Solaris) or -msse (but not -msse2).

First a little bit of history.  In the beginning, DImode operations on
i386 weren't defined by the machine description, and lowered during RTL
expansion to SI mode operations.  The with PR 65105 in 2015, -mstv was
added, together with a SWIM1248x mode iterator (later renamed to SWIM1248x)
together with several *di3_doubleword post-reload splitters that
made use of register allocation to perform some double word operations
in 64-but XMM registers.  A short while later in 2016, PR 70322 added
similar support for one_cmpldi2.  All of this logic was dependent upon
"!TARGET_64BIT && TARGET_STV && TARGET_SSE2".  With the passing of time,
these conditions became irrelevant when in 2019, it was decided to split
these double-word patterns before reload.
https://gcc.gnu.org/pipermail/gcc-patches/2019-June/523877.html
https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532236.html
Hence the current situation, where on most modern CPU architectures
(where "TARGET_STV && TARGET_SSE2" is true), RTL is expanded with DI
mode operations, that are then split into two SI mode instructions
before reload, except on Solaris and other odd cases, where the splitting
is to two SI mode instructions is done during RTL expansion.  By the
time compilation reaches register allocation both paths in theory
produce identical or similar code, so the vestigial legacy/logic would
appear to be harmless.

Unfortunately, there is one place where this arbitrary choice of how
to lower DI mode doubleword operations is visible to the middle-end,
it controls whether the backend appears to have a suitable optab, and
the presence (or not) of DImode optabs can influence vectorization
cost models and veclower decisions.

The issue (and code quality regression) can be seen in this test case:

typedef long long v2di __attribute__((vector_size (16)));
v2di x;
void foo (long long a)
{
v2di t = {a, a};
x = ~t;
}

which when compiled with "-O2 -m32 -msse -march=pentiumpro" produces:

foo:subl$28, %esp
movl%ebx, 16(%esp)
movl32(%esp), %eax
movl%esi, 20(%esp)
movl36(%esp), %edx
movl%edi, 24(%esp)
movl%eax, %esi
movl%eax, %edi
movl%edx, %ebx
movl%edx, %ecx
notl%esi
notl%ebx
movl%esi, (%esp)
notl%edi
notl%ecx
movl%ebx, 4(%esp)
movl20(%esp), %esi
movl%edi, 8(%esp)
movl16(%esp), %ebx
movl%ecx, 12(%esp)
movl24(%esp), %edi
movss   8(%esp), %xmm1
movss   12(%esp), %xmm2
movss   (%esp), %xmm0
movss   4(%esp), %xmm3
unpcklps%xmm2, %xmm1
unpcklps%xmm3, %xmm0
movlhps %xmm1, %xmm0
movaps  %xmm0, x
addl$28, %esp
ret


Importantly notice the four "notl" instructions.  With this patch:

foo:subl$28, %esp
movl32(%esp), %edx
movl36(%esp), %eax
notl%edx
movl%edx, (%esp)
notl%eax
movl%eax, 4(%esp)
movl%edx, 8(%esp)
movl%eax, 12(%esp)
movaps  (%esp), %xmm1
movaps  %xmm1, x
addl$28, %esp
ret

Notice only two "notl" instructions.  Checking with Godbolt.org, GCC
generated 4 NOTs in GCC 4.x and 5.x, 2 NOTs between GCC 6.x and 9.x,
and regressed to 4 NOTs since GCC 10.x [which hopefully qualifies
this clean-up as suitable for stage 4].

Most significantly, this patch allows pr100711-1.c to pass with
-mno-stv, allowing pandn to be used with V2DImode on Solaris/x86.
Fingers-crossed this should reduce the number of discrepancies
that Rainer Orth encounters supporting Solaris/x86.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, with/without --target_board='unix{-m32\ -march=
cascadelake}' with no new failures.  Ok for mainline?


2022-03-04  Roger Sayle  

gcc/ChangeLog
PR testsuite/104732
* config/i386/i386.md (SWIM1248s): Include DI mode unconditionally.
(*anddi3_doubleword): Remove && TARGET_STV && TARGET_SSE2 condition,
i.e. always split on !TARGET_64BIT.
(*di3_doubleword): Likewise.
(*one_cmpldi2_doubleword): Likewise.

gcc/testsuite/ChangeLog
PR testsuite/104732
* gcc.target/i386/pr104732.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/confi

Re: [i386 PATCH] PR 104732: Simplify/fix DI mode logic expansion/splitting on -m32.

2022-03-04 Thread Uros Bizjak via Gcc-patches
On Fri, Mar 4, 2022 at 11:30 AM Roger Sayle  wrote:
>
>
> This clean-up patch resolves PR testsuite/104732, the failure of the recent
> test gcc.target/i386/pr100711-1.c on 32-bit Solaris/x86.  Rather than just
> tweak the testcase, the proposed approach is to fix the underlying problem
> by removing the "TARGET_STV && TARGET_SSE2" conditionals from the DI mode
> logical operation expanders and pre-reload splitters in i386.md, which as
> I'll show generate inferior code (even a GCC 12 regression) on !TARGET_64BIT
> whenever -mno-stv (such as Solaris) or -msse (but not -msse2).
>
> First a little bit of history.  In the beginning, DImode operations on
> i386 weren't defined by the machine description, and lowered during RTL
> expansion to SI mode operations.  The with PR 65105 in 2015, -mstv was
> added, together with a SWIM1248x mode iterator (later renamed to SWIM1248x)
> together with several *di3_doubleword post-reload splitters that
> made use of register allocation to perform some double word operations
> in 64-but XMM registers.  A short while later in 2016, PR 70322 added
> similar support for one_cmpldi2.  All of this logic was dependent upon
> "!TARGET_64BIT && TARGET_STV && TARGET_SSE2".  With the passing of time,
> these conditions became irrelevant when in 2019, it was decided to split
> these double-word patterns before reload.
> https://gcc.gnu.org/pipermail/gcc-patches/2019-June/523877.html
> https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532236.html
> Hence the current situation, where on most modern CPU architectures
> (where "TARGET_STV && TARGET_SSE2" is true), RTL is expanded with DI
> mode operations, that are then split into two SI mode instructions
> before reload, except on Solaris and other odd cases, where the splitting
> is to two SI mode instructions is done during RTL expansion.  By the
> time compilation reaches register allocation both paths in theory
> produce identical or similar code, so the vestigial legacy/logic would
> appear to be harmless.
>
> Unfortunately, there is one place where this arbitrary choice of how
> to lower DI mode doubleword operations is visible to the middle-end,
> it controls whether the backend appears to have a suitable optab, and
> the presence (or not) of DImode optabs can influence vectorization
> cost models and veclower decisions.
>
> The issue (and code quality regression) can be seen in this test case:
>
> typedef long long v2di __attribute__((vector_size (16)));
> v2di x;
> void foo (long long a)
> {
> v2di t = {a, a};
> x = ~t;
> }
>
> which when compiled with "-O2 -m32 -msse -march=pentiumpro" produces:
>
> foo:subl$28, %esp
> movl%ebx, 16(%esp)
> movl32(%esp), %eax
> movl%esi, 20(%esp)
> movl36(%esp), %edx
> movl%edi, 24(%esp)
> movl%eax, %esi
> movl%eax, %edi
> movl%edx, %ebx
> movl%edx, %ecx
> notl%esi
> notl%ebx
> movl%esi, (%esp)
> notl%edi
> notl%ecx
> movl%ebx, 4(%esp)
> movl20(%esp), %esi
> movl%edi, 8(%esp)
> movl16(%esp), %ebx
> movl%ecx, 12(%esp)
> movl24(%esp), %edi
> movss   8(%esp), %xmm1
> movss   12(%esp), %xmm2
> movss   (%esp), %xmm0
> movss   4(%esp), %xmm3
> unpcklps%xmm2, %xmm1
> unpcklps%xmm3, %xmm0
> movlhps %xmm1, %xmm0
> movaps  %xmm0, x
> addl$28, %esp
> ret
>
>
> Importantly notice the four "notl" instructions.  With this patch:
>
> foo:subl$28, %esp
> movl32(%esp), %edx
> movl36(%esp), %eax
> notl%edx
> movl%edx, (%esp)
> notl%eax
> movl%eax, 4(%esp)
> movl%edx, 8(%esp)
> movl%eax, 12(%esp)
> movaps  (%esp), %xmm1
> movaps  %xmm1, x
> addl$28, %esp
> ret
>
> Notice only two "notl" instructions.  Checking with Godbolt.org, GCC
> generated 4 NOTs in GCC 4.x and 5.x, 2 NOTs between GCC 6.x and 9.x,
> and regressed to 4 NOTs since GCC 10.x [which hopefully qualifies
> this clean-up as suitable for stage 4].
>
> Most significantly, this patch allows pr100711-1.c to pass with
> -mno-stv, allowing pandn to be used with V2DImode on Solaris/x86.
> Fingers-crossed this should reduce the number of discrepancies
> that Rainer Orth encounters supporting Solaris/x86.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, with/without --target_board='unix{-m32\ -march=
> cascadelake}' with no new failures.  Ok for mainline?

The idea was to leave decomposition of double-word operations to the
generic middle-end, where simplification and propagation of constants
will be handled in a generic way. However, several releases later,
these simplifications were also introduced to STV-enabeld pattern

[committed] libstdc++: Fix -Wunused-local-typedefs warning in

2022-03-04 Thread Jonathan Wakely via Gcc-patches

On 03/03/22 22:38 +, Jonathan Wakely wrote:

Tested x86_64-linux (-m32/-m64), powerpc64-linux (-m32/-m64),
powerpc64le-linux, powerpc-aix (maix32/-maix64/-mlong-double-128).

Pushed to trunk. I'm inclined to backport this to gcc-11 after some soak
time on trunk (but not gcc-10, because it needs __builtin_bit_cast).

-- >8 --

This removes a FIXME in , defining the total order for
floating-point types. I originally opened PR96526 to request a new
compiler built-in to implement this, but now that we have std::bit_cast
it can be done entirely in the library.

The implementation is based on the glibc definitions of totalorder,
totalorderf, totalorderl etc.

I think this works for all the types that satisfy std::floating_point
today, and should also work for the types expected to be added by P1467
except for std::bfloat16_t. It also supports some additional types that
don't currently satisfy std::floating_point, such as __float80, but we
probably do want that to satisfy the concept for non-strict modes.

libstdc++-v3/ChangeLog:

PR libstdc++/96526
* libsupc++/compare (strong_order): Add missing support for
floating-point types.
* testsuite/18_support/comparisons/algorithms/strong_order_floats.cc:
New test.


That commit produces a warning due to a typedef that is only needed
for some targets, which caused:

FAIL: g++.dg/warn/Wstringop-overflow-6.C  -std=gnu++20 (test for excess errors)

Here's the fix. Tested x86_64-linux, pushed to trunk.



commit 289f65d643e18210433e0f08ccaaf5b08b3d6f39
Author: Jonathan Wakely 
Date:   Fri Mar 4 10:43:29 2022

libstdc++: Fix -Wunused-local-typedefs warning in 

libstdc++-v3/ChangeLog:

* libsupc++/compare (strong_order::_S_fp_cmp): Move typedef
inside #if condition.

diff --git a/libstdc++-v3/libsupc++/compare b/libstdc++-v3/libsupc++/compare
index a8747207b23..050cf7ed20d 100644
--- a/libstdc++-v3/libsupc++/compare
+++ b/libstdc++-v3/libsupc++/compare
@@ -850,8 +850,6 @@ namespace std
 	return strong_ordering::equal; // All bits are equal, we're done.
 
 	  using enum _Fp_fmt;
-	  using _Int = decltype(__ix);
-
 	  constexpr auto __fmt = _S_fp_fmt<_Tp>();
 
 	  if constexpr (__fmt == _Dbldbl) // double-double
@@ -899,6 +897,8 @@ namespace std
 		  // bit to be reversed. Flip that to give desired ordering.
 		  if (__builtin_isnan(__x) && __builtin_isnan(__y))
 		{
+		  using _Int = decltype(__ix);
+
 		  constexpr int __nantype = __fmt == _Binary32  ?  22
 	  : __fmt == _Binary64  ?  51
 	  : __fmt == _Binary128 ? 111


Re: [PATCH] gimplify: Clear TREE_READONLY on automatic vars being stored into [PR104529]

2022-03-04 Thread Richard Biener via Gcc-patches



> Am 04.03.2022 um 10:04 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> Hi!
> 
> The following testcase regressed when SRA started punting on stores to
> TREE_READONLY vars.  We document that:
> "In a VAR_DECL, PARM_DECL or FIELD_DECL, or any kind of ..._REF node,
> nonzero means it may not be the lhs of an assignment."
> so the SRA change looks desirable.  On the other side, at least in this
> testcase the TREE_READONLY is set there intentionally from the
> PR85873 fix, because gimplify_init_constructor itself uses TREE_READONLY
> on the object to determine if it can perform promotion to static const
> or not.
> 
> So, similarly to other spots in the gimplifier where we also clear
> TREE_READONLY when we emit IL that stores into the object, this
> does the same in gimplify_init_constructor, but in the way so that
> the TREE_READONLY test for the promotion to static const keeps working
> and doesn't change anything for notify_temp_creation mode, which doesn't
> emit any IL, just tests if it would need a temporary or not.
> 
> This keeps PR85873 testcase working as before and fixes this regression.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Thanks,
Richard 

> 2022-03-04  Jakub Jelinek  
> 
>PR middle-end/104529
>* gimplify.cc (gimplify_init_constructor): Clear TREE_READONLY
>on automatic objects which will be runtime initialized.
> 
>* g++.dg/tree-ssa/pr104529.C: New test.
> 
> --- gcc/gimplify.cc.jj2022-03-03 09:13:16.0 +0100
> +++ gcc/gimplify.cc2022-03-03 15:21:02.270198275 +0100
> @@ -5120,6 +5120,12 @@ gimplify_init_constructor (tree *expr_p,
>  {
>if (notify_temp_creation)
>  return GS_OK;
> +
> +/* The var will be initialized and so appear on lhs of
> +   assignment, it can't be TREE_READONLY anymore.  */
> +if (VAR_P (object))
> +  TREE_READONLY (object) = 0;
> +
>is_empty_ctor = true;
>break;
>  }
> @@ -5171,6 +5177,11 @@ gimplify_init_constructor (tree *expr_p,
>break;
>  }
> 
> +/* The var will be initialized and so appear on lhs of
> +   assignment, it can't be TREE_READONLY anymore.  */
> +if (VAR_P (object) && !notify_temp_creation)
> +  TREE_READONLY (object) = 0;
> +
>/* If there are "lots" of initialized elements, even discounting
>   those that are not address constants (and thus *must* be
>   computed at runtime), then partition the constructor into
> --- gcc/testsuite/g++.dg/tree-ssa/pr104529.C.jj2022-03-03 
> 14:57:30.216939375 +0100
> +++ gcc/testsuite/g++.dg/tree-ssa/pr104529.C2022-03-03 14:57:23.002040380 
> +0100
> @@ -0,0 +1,20 @@
> +// PR middle-end/104529
> +// { dg-do compile { target c++11 } }
> +// { dg-options "-O2 -fdump-tree-optimized" }
> +// { dg-final { scan-tree-dump-not "MEM\[^\n\r]*MEM" "optimized" } }
> +
> +#include 
> +#include 
> +
> +struct S {
> +  unsigned int a;
> +  std::vector b;
> +  std::vector c;
> +};
> +
> +std::size_t
> +foo ()
> +{
> +  S test[] = { { 48, { 255, 0, 0, 0, 0, 0 } } };
> +  return sizeof (test);
> +}
> 
>Jakub
> 


[PATCH v2] eliminate mutex in fast path of __register_frame

2022-03-04 Thread Thomas Neumann via Gcc-patches

The __register_frame/__deregister_frame functions are used to register
unwinding frames from JITed code in a sorted list. That list itself
is protected by object_mutex, which leads to terrible performance
in multi-threaded code and is somewhat expensive even if single-threaded.
There was already a fast-path that avoided taking the mutex if no
frame was registered at all.

This commit eliminates both the mutex and the sorted list from
the atomic fast path, and replaces it with a btree that uses
optimistic lock coupling during lookup. This allows for fully parallel
unwinding and is essential to scale exception handling to large
core counts.

Changes since v1:
- eliminate all undefined behavior within the optimistic section
- use a mutex instead of spinning in case of conflicts
- addressed all other reviewer comments except for async-signal-safety.
  The old code was not async-signal-safe either, and that should be fixed
  in a separate commit, as that requires touching other parts of the code.

libgcc/ChangeLog:

* unwind-dw2-fde.c (release_registered_frames): Cleanup at shutdown.
(__register_frame_info_table_bases): Use btree in atomic fast path.
(__deregister_frame_info_bases): Likewise.
(_Unwind_Find_FDE): Likewise.
(base_from_object): Make parameter const.
(get_pc_range_from_fdes): Compute PC range for lookup.
(get_pc_range): Likewise.
* unwind-dw2-fde.h (last_fde): Make parameter const.
* unwind-dw2-btree.h: New file.
---
 libgcc/unwind-dw2-btree.h | 952 ++
 libgcc/unwind-dw2-fde.c   | 194 ++--
 libgcc/unwind-dw2-fde.h   |   2 +-
 3 files changed, 1112 insertions(+), 36 deletions(-)
 create mode 100644 libgcc/unwind-dw2-btree.h

diff --git a/libgcc/unwind-dw2-btree.h b/libgcc/unwind-dw2-btree.h
new file mode 100644
index 000..0b4835a439a
--- /dev/null
+++ b/libgcc/unwind-dw2-btree.h
@@ -0,0 +1,952 @@
+/* Lock-free btree for manually registered unwind frames  */
+/* Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Thomas Neumann
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#ifndef GCC_UNWIND_DW2_BTREE_H
+#define GCC_UNWIND_DW2_BTREE_H
+
+#include 
+
+// Common logic for version locks
+struct version_lock
+{
+  // The lock itself. The lowest bit indicates an exclusive lock,
+  // the second bit indicates waiting threads. All other bits are
+  // used as counter to recognize changes.
+  // Overflows are okay here, we must only prevent overflow to the
+  // same value within one lock_optimistic/validate
+  // range. Even on 32 bit platforms that would require 1 billion
+  // frame registrations within the time span of a few assembler
+  // instructions.
+  uintptr_t version_lock;
+};
+
+#ifdef __GTHREAD_HAS_COND
+// We should never get contention within the tree as it rarely changes.
+// But if we ever do get contention we use these for waiting
+static __gthread_mutex_t version_lock_mutex = __GTHREAD_MUTEX_INIT;
+static __gthread_cond_t version_lock_cond = __GTHREAD_COND_INIT;
+#endif
+
+// Initialize in locked state
+static inline void
+version_lock_initialize_locked_exclusive (struct version_lock *vl)
+{
+  vl->version_lock = 1;
+}
+
+// Try to lock the node exclusive
+static inline bool
+version_lock_try_lock_exclusive (struct version_lock *vl)
+{
+  uintptr_t state = __atomic_load_n (&(vl->version_lock), __ATOMIC_SEQ_CST);
+  if (state & 1)
+return false;
+  return __atomic_compare_exchange_n (&(vl->version_lock), &state, state | 1,
+ false, __ATOMIC_SEQ_CST,
+ __ATOMIC_SEQ_CST);
+}
+
+// Lock the node exclusive, blocking as needed
+static void
+version_lock_lock_exclusive (struct version_lock *vl)
+{
+#ifndef __GTHREAD_HAS_COND
+restart:
+#endif
+
+  // We should virtually never get contention here, as frame
+  // changes are rare
+  uintptr_t state = __atomic_load_n (&(vl->version_lock), __ATOMIC_SEQ_CST);
+  if (!(state & 1))
+{
+  if (__atomic_compare_exchange_n (&(vl->version_lock), &state, state | 1,
+

Re: [PATCH] [i386] Optimize v4si broadcast for noavx512vl.

2022-03-04 Thread Uros Bizjak via Gcc-patches
On Fri, Mar 4, 2022 at 3:28 AM liuhongt  wrote:
>
> This is incremental patch based on [1], it enables optimization as below
>
> -   vbroadcastss.LC1(%rip), %xmm0
> +   movl$-45, %edx
> +   vmovd   %edx, %xmm0
> +   vpshufd $0, %xmm0, %xmm0
>
> According to microbenchmark, it's faster than broadcast from memory.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591162.html.
>
> Bootstrapped and regtest on x86_64-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/104704
> * config/i386/sse.md (*vec_dupv4si): Add alternative $r and
> corresponding post_reload splitter.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr100865-8a.c: Adjust testcase.
> * gcc.target/i386/pr100865-8c.c: Ditto.
> * gcc.target/i386/pr100865-9c.c: Ditto.
> ---
>  gcc/config/i386/sse.md  | 41 -
>  gcc/testsuite/gcc.target/i386/pr100865-8a.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-8c.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr100865-9c.c |  2 +-
>  4 files changed, 35 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 3066ea3734a..d124545aa5d 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -25121,20 +25121,43 @@ (define_insn "vec_dupv4sf"
> (set_attr "mode" "V4SF")])
>
>  (define_insn "*vec_dupv4si"
> -  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
> +  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x,v")
> (vec_duplicate:V4SI
> - (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
> + (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0,$r")))]
>"TARGET_SSE"
>"@
> %vpshufd\t{$0, %1, %0|%0, %1, 0}
> vbroadcastss\t{%1, %0|%0, %1}
> -   shufps\t{$0, %0, %0|%0, %0, 0}"
> -  [(set_attr "isa" "sse2,avx,noavx")
> -   (set_attr "type" "sselog1,ssemov,sselog1")
> -   (set_attr "length_immediate" "1,0,1")
> -   (set_attr "prefix_extra" "0,1,*")
> -   (set_attr "prefix" "maybe_vex,maybe_evex,orig")
> -   (set_attr "mode" "TI,V4SF,V4SF")])
> +   shufps\t{$0, %0, %0|%0, %0, 0}
> +   #"
> +  [(set_attr "isa" "sse2,avx,noavx,noavx512vl")
> +   (set_attr "type" "sselog1,ssemov,sselog1,sselog1")
> +   (set_attr "length_immediate" "1,0,1,1")
> +   (set_attr "prefix_extra" "0,1,*,0")
> +   (set_attr "prefix" "maybe_vex,maybe_evex,orig,maybe_vex")
> +   (set_attr "mode" "TI,V4SF,V4SF,TI")
> +   (set (attr "preferred_for_speed")
> + (cond [(eq_attr "alternative" "3")
> + (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC")
> +  ]
> +  (symbol_ref "true")))])

What happens if you set preferred_for_speed to false for alternative 1?

> +(define_split
> +  [(set (match_operand:V4SI 0 "sse_reg_operand")
> +   (vec_duplicate:V4SI
> + (match_operand:SI 1 "general_reg_operand")))]
> +  "TARGET_SSE && reload_completed
> +   /* Disable this splitter if avx512vl_vec_dup_gprv4si insn is
> +  available, because then we can broadcast from GPRs directly.  */

I think avx512vl_vec_dup_gprv4si should be merged with the above
pattern instead.

Uros.

> +   && !TARGET_AVX512VL"
> +  [(const_int 0)]
> +{
> +  emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]),
> +   CONST0_RTX (V4SImode),
> +   gen_lowpart (SImode, operands[1])));
> +  emit_insn (gen_vec_duplicatev4si (operands[0], operands[0]));
> +  DONE;
> +})
>
>  (define_insn "*vec_dupv2di"
>[(set (match_operand:V2DI 0 "register_operand" "=x,v,v,x")
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8a.c 
> b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> index 911b14d4a25..544a14db6f7 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> @@ -20,5 +20,5 @@ foo (void)
>  array[i] = MK_CONST128_BROADCAST_SIGNED (-45);
>  }
>
> -/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
> \]+\[^\n\]*, %xmm\[0-9\]+" 1 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
> \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
>  /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } 
> */
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8c.c 
> b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> index 00682edb8c9..efee0488614 100644
> --- a/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> +++ b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> @@ -3,5 +3,5 @@
>
>  #include "pr100865-8a.c"
>
> -/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, 
> %xmm\[0-9\]+" 1 { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, 
> %xmm\[0-9\]+" 1 } } */
>  /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } 
> */
> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9c.c 
> b/gcc/testsuite/gcc.target/i386/pr100865-9c.c
> index 8ffcdc162

Re: [PATCH] opts: fix -gtoggle + optimize attribute

2022-03-04 Thread Martin Liška

On 3/1/22 09:48, Richard Biener wrote:

I think moving flag_gtoggle handling before the flag_syntax_only handling
is a good thing.  But I don't quite understand the flag_var_tracking disabling
or how it worked before.


Well, as you know, the debugging options are a can of worms. During GCC 12 
development I moved
most of the option logic to finish_options which is a place that is used both 
for command line
option processing and optimize/target pragma/attribute processing.

That's why we see this problem. OPT_LEVELS_1_PLUS enables flag_var_tracking but 
we have to drop
debug debug_info_level == DINFO_LEVEL_NONE.


At least I think you want to check for
debug_info_level == NONE, no?  Why should DINFO_LEVEL_TERSE be
special?


No, sending updated version of the patch.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
MartinFrom 1d9f77f00f208cca4142abc9c56995b865291229 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 4 Feb 2022 15:50:17 +0100
Subject: [PATCH] opts: fix -gtoggle + optimize attribute

Note -fvar-tracking is enabled automatically with OPT_LEVELS_1_PLUS and
so we need to drop it if we are called from optimize attribute and the
option is unset.

	PR middle-end/104381

gcc/ChangeLog:

	* opts.cc (finish_options): If debug info is disabled
	(debug_info_level) and -fvar-tracking is unset, disable it.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr104381.c: New test.
---
 gcc/opts.cc | 49 +++--
 gcc/testsuite/gcc.dg/pr104381.c | 20 ++
 2 files changed, 48 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr104381.c

diff --git a/gcc/opts.cc b/gcc/opts.cc
index 19c68aed065..ef5fe9b11ca 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -1302,6 +1302,34 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
 SET_OPTION_IF_UNSET (opts, opts_set, flag_vect_cost_model,
 			 VECT_COST_MODEL_CHEAP);
 
+  if (flag_gtoggle)
+{
+  /* Make sure to process -gtoggle only once.  */
+  flag_gtoggle = false;
+  if (debug_info_level == DINFO_LEVEL_NONE)
+	{
+	  debug_info_level = DINFO_LEVEL_NORMAL;
+
+	  if (write_symbols == NO_DEBUG)
+	write_symbols = PREFERRED_DEBUGGING_TYPE;
+	}
+  else
+	debug_info_level = DINFO_LEVEL_NONE;
+}
+
+  if (!OPTION_SET_P (debug_nonbind_markers_p))
+debug_nonbind_markers_p
+  = (optimize
+	 && debug_info_level >= DINFO_LEVEL_NORMAL
+	 && dwarf_debuginfo_p ()
+	 && !(flag_selective_scheduling || flag_selective_scheduling2));
+
+  /* Note -fvar-tracking is enabled automatically with OPT_LEVELS_1_PLUS and
+ so we need to drop it if we are called from optimize attribute.  */
+  if (debug_info_level == DINFO_LEVEL_NONE
+  && !OPTION_SET_P (flag_var_tracking))
+flag_var_tracking = false;
+
   /* One could use EnabledBy, but it would lead to a circular dependency.  */
   if (!OPTION_SET_P (flag_var_tracking_uninit))
  flag_var_tracking_uninit = flag_var_tracking;
@@ -1328,27 +1356,6 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
   profile_flag = 0;
 }
 
-  if (flag_gtoggle)
-{
-  /* Make sure to process -gtoggle only once.  */
-  flag_gtoggle = false;
-  if (debug_info_level == DINFO_LEVEL_NONE)
-	{
-	  debug_info_level = DINFO_LEVEL_NORMAL;
-
-	  if (write_symbols == NO_DEBUG)
-	write_symbols = PREFERRED_DEBUGGING_TYPE;
-	}
-  else
-	debug_info_level = DINFO_LEVEL_NONE;
-}
-
-  if (!OPTION_SET_P (debug_nonbind_markers_p))
-debug_nonbind_markers_p
-  = (optimize
-	 && debug_info_level >= DINFO_LEVEL_NORMAL
-	 && dwarf_debuginfo_p ()
-	 && !(flag_selective_scheduling || flag_selective_scheduling2));
 
   diagnose_options (opts, opts_set, loc);
 }
diff --git a/gcc/testsuite/gcc.dg/pr104381.c b/gcc/testsuite/gcc.dg/pr104381.c
new file mode 100644
index 000..a3aec919bee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104381.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -g -gtoggle -fdump-tree-optimized" } */
+
+int foo (int x)
+{
+  int tem = x + 1;
+  int tem2 = tem - 1;
+  return tem2;
+}
+
+int
+__attribute__((optimize("no-tree-pre")))
+bar (int x)
+{
+  int tem = x + 1;
+  int tem2 = tem - 1;
+  return tem2;
+}
+
+// { dg-final { scan-tree-dump-not "DEBUG " "optimized" } }
-- 
2.35.1



Update 'c-c++-common/goacc/classify-*', 'gfortran.dg/goacc/classify-*'

2022-03-04 Thread Thomas Schwinge
Hi!

Pushed to master branch commit fda0b0eb4f744f012f21c6976c2e42df87c313bb
"Update 'c-c++-common/goacc/classify-*', 'gfortran.dg/goacc/classify-*'",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From fda0b0eb4f744f012f21c6976c2e42df87c313bb Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 1 Mar 2022 14:57:38 +0100
Subject: [PATCH] Update 'c-c++-common/goacc/classify-*',
 'gfortran.dg/goacc/classify-*'

... to use 'dg-line', simplifying later changes.  Also some minor miscellaneous
diagnostics scanning maintenance.

	gcc/testsuite/
	* c-c++-common/goacc/classify-kernels-parloops.c: Update.
	* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
	Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/classify-parallel.c: Likewise.
	* c-c++-common/goacc/classify-routine-nohost.c: Likewise.
	* c-c++-common/goacc/classify-routine.c: Likewise.
	* c-c++-common/goacc/classify-serial.c: Likewise.
	* gfortran.dg/goacc/classify-kernels-parloops.f95: Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
	Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	* gfortran.dg/goacc/classify-parallel.f95: Likewise.
	* gfortran.dg/goacc/classify-routine-nohost.f95: Likewise.
	* gfortran.dg/goacc/classify-routine.f95: Likewise.
	* gfortran.dg/goacc/classify-serial.f95: Likewise.
---
 .../c-c++-common/goacc/classify-kernels-parloops.c   | 3 ++-
 .../goacc/classify-kernels-unparallelized-parloops.c | 3 ++-
 .../c-c++-common/goacc/classify-kernels-unparallelized.c | 3 ++-
 gcc/testsuite/c-c++-common/goacc/classify-kernels.c  | 3 ++-
 gcc/testsuite/c-c++-common/goacc/classify-parallel.c | 3 ++-
 .../c-c++-common/goacc/classify-routine-nohost.c | 3 ++-
 gcc/testsuite/c-c++-common/goacc/classify-routine.c  | 3 ++-
 gcc/testsuite/c-c++-common/goacc/classify-serial.c   | 9 +
 .../gfortran.dg/goacc/classify-kernels-parloops.f95  | 3 ++-
 .../goacc/classify-kernels-unparallelized-parloops.f95   | 3 ++-
 .../goacc/classify-kernels-unparallelized.f95| 3 ++-
 gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 | 3 ++-
 gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95| 3 ++-
 .../gfortran.dg/goacc/classify-routine-nohost.f95| 3 ++-
 gcc/testsuite/gfortran.dg/goacc/classify-routine.f95 | 3 ++-
 gcc/testsuite/gfortran.dg/goacc/classify-serial.f95  | 9 +
 16 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c
index f3685f2e8c5..5f470eb86bc 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c
@@ -20,7 +20,8 @@ extern unsigned int *__restrict c;
 
 void KERNELS ()
 {
-#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC gang loop parallelism" } */
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-line l_compute1 } */
+  /* { dg-optimized {assigned OpenACC gang loop parallelism} {} { target *-*-* } l_compute1 } */
   for (unsigned int i = 0; i < N; i++)
 c[i] = a[i] + b[i];
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
index 6522caf9135..06c70fb9d9f 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
@@ -24,7 +24,8 @@ extern unsigned int f (unsigned int);
 
 void KERNELS ()
 {
-#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: assigned OpenACC seq loop parallelism" } */
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-line l_compute1 } */
+  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute1 } */
   for (unsigned int i = 0; i < N; i++)
 c[i] = a[f (i)] + b[f (i)];
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
index daa8fcb7662..4ee8e9d5f39 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -24,7 +24,8 @@ extern unsigned int f (unsigned int);
 
 void KERNELS ()
 {
-#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "optimized: as

Add 'c-c++-common/goacc/kernels-decompose-pr104132-1.c' [PR104132]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-03-01T17:46:20+0100, I wrote:
> On 2022-01-13T10:54:16+0100, I wrote:
>> --- a/gcc/omp-oacc-kernels-decompose.cc
>> +++ b/gcc/omp-oacc-kernels-decompose.cc

>> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple 
>> *body,
>> inner_data_clauses = new_clause;
>>
>> prev_mapped_var = v;
>> +
>> +   /* See .  */
>> +   TREE_ADDRESSABLE (v) = 1;
>>   }
>>  }
>
> So, that's too simple.  ;-) [...]

> We're after gimplification, and must not just set 'TREE_ADDRESSABLE',
> because that may easily violate GIMPLE invariants, leading to ICEs later.
> There are a few open PRs

Pushed to master branch commit 741859b390c042755e9379f8061a157e5af378b6
"Add 'c-c++-common/goacc/kernels-decompose-pr104132-1.c' [PR104132]", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 741859b390c042755e9379f8061a157e5af378b6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 19 Jan 2022 22:28:55 +0100
Subject: [PATCH] Add 'c-c++-common/goacc/kernels-decompose-pr104132-1.c'
 [PR104132]

..., currently XFAILed with 'dg-ice'.

	PR middle-end/104132
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: New file.
---
 .../goacc/kernels-decompose-pr104132-1.c  | 38 +++
 1 file changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104132-1.c

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104132-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104132-1.c
new file mode 100644
index 000..4fbfdd81e15
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104132-1.c
@@ -0,0 +1,38 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO }
+   { dg-prune-output {k = 0 \+ \.offset\.[0-9]+;} }
+   { dg-prune-output {k = 0 \+ 2;} }
+   { dg-prune-output {during IPA pass: \*free_lang_data} } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+#pragma acc kernels /* { dg-line l_compute1 } */
+  /* { dg-note {variable 'k' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 } */
+  {
+int k;
+
+/* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop /* { dg-line l_loop_k1 } */
+/* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k1 } */
+for (k = 0; k < 2; k++)
+  arr_0 = k;
+
+/* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop /* { dg-line l_loop_k2 } */
+/* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k2 } */
+for (k = 0; k < 2; k++)
+  arr_0 = k;
+  }
+}
+/* { dg-bogus {error: non-register as LHS of binary operation} {} { xfail *-*-* } .-1 } */
-- 
2.34.1



Add 'c-c++-common/goacc/kernels-decompose-pr104133-1.c' [PR104133]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-03-01T17:46:20+0100, I wrote:
> On 2022-01-13T10:54:16+0100, I wrote:
>> --- a/gcc/omp-oacc-kernels-decompose.cc
>> +++ b/gcc/omp-oacc-kernels-decompose.cc

>> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple 
>> *body,
>> inner_data_clauses = new_clause;
>>
>> prev_mapped_var = v;
>> +
>> +   /* See .  */
>> +   TREE_ADDRESSABLE (v) = 1;
>>   }
>>  }
>
> So, that's too simple.  ;-) [...]

> We're after gimplification, and must not just set 'TREE_ADDRESSABLE',
> because that may easily violate GIMPLE invariants, leading to ICEs later.
> There are a few open PRs

Pushed to master branch commit e085900fa10e28b684d656b66557d181247a1a48
"Add 'c-c++-common/goacc/kernels-decompose-pr104133-1.c' [PR104133]", see
attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e085900fa10e28b684d656b66557d181247a1a48 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 19 Jan 2022 22:28:55 +0100
Subject: [PATCH] Add 'c-c++-common/goacc/kernels-decompose-pr104133-1.c'
 [PR104133]

..., currently XFAILed with 'dg-ice'.

	PR middle-end/104133
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: New file.
---
 .../goacc/kernels-decompose-pr104133-1.c  | 40 +++
 1 file changed, 40 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c
new file mode 100644
index 000..72dde346dbf
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104133-1.c
@@ -0,0 +1,40 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO }
+   { dg-prune-output {D\.[0-9]+ = arr_0\.0 \+ k;} }
+   { dg-prune-output {D\.[0-9]+ = arr_0\.1 \+ k;} }
+   { dg-prune-output {during GIMPLE pass: lower} } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+#pragma acc kernels /* { dg-line l_compute1 } */
+  /* { dg-note {variable 'k' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute1 } */
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute1 } */
+  /* { dg-note {variable 'arr_0\.1' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute1 } */
+  {
+int k;
+
+/* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop /* { dg-line l_loop_k1 } */
+/* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k1 } */
+for (k = 0; k < 2; k++)
+  arr_0 += k;
+
+/* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop /* { dg-line l_loop_k2 } */
+/* { dg-note {variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_k2 } */
+for (k = 0; k < 2; k++)
+  arr_0 += k;
+  /* { dg-bogus {error: invalid operands in binary operation} {} { xfail *-*-* } .-1 } */
+  }
+}
-- 
2.34.1



Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]' declared in block made addressable" [PR100280]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-01-13T10:54:16+0100, I wrote:
> On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>>  - The "addressable" bit is set during the kernels conversion pass for
>>variables that have "create" (alloc) clauses created for them in the
>>synthesised outer data region (instead of in the front-end, etc.,
>>where it can't be done accurately). Such variables actually have
>>their address taken during transformations made in a later pass
>>(omp-low, I think), but there's a phase-ordering problem that means
>>the flag should be set earlier.
>
> The actual issue is a bit different, but yes, there is a problem.
> The related ICE has also been reported as 
> "ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
> didn't run into that with the OpenACC 'kernels' decomposition
> originally.)  I've pushed to master branch
> commit 9b32c1669aad5459dd053424f9967011348add83
> "OpenACC 'kernels' decomposition: Mark variables used in synthesized data
> clauses as addressable [PR100280]", see attached.

> --- a/gcc/omp-oacc-kernels-decompose.cc
> +++ b/gcc/omp-oacc-kernels-decompose.cc
> @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple 
> *body)
>
>  /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
> location LOC containing BODY and having 'create (var)' clauses for each
> -   variable.  If INNER_CLEANUP is present, add a try-finally statement with
> +   variable (as a side effect, such variables also get TREE_ADDRESSABLE set).
> +   If INNER_CLEANUP is present, add a try-finally statement with
> this cleanup code in the finally block.  Return the new data region, or
> the original BODY if no data region was needed.  */
>
> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple 
> *body,
> inner_data_clauses = new_clause;
>
> prev_mapped_var = v;
> +
> +   /* See .  */
> +   TREE_ADDRESSABLE (v) = 1;
>   }
>  }

Pushed to master branch commit e5ae22c56152b1a1f4b4e1d7ae04431a9e4710cc
"Add diagnostic: "note: OpenACC 'kernels' decomposition: variable '[...]'
declared in block made addressable" [PR100280]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e5ae22c56152b1a1f4b4e1d7ae04431a9e4710cc Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 15 Feb 2022 16:54:30 +0100
Subject: [PATCH] Add diagnostic: "note: OpenACC 'kernels' decomposition:
 variable '[...]' declared in block made addressable" [PR100280]

Follow-up to commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".

	PR middle-end/100280
	gcc/
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Add diagnostic: "note: OpenACC 'kernels' decomposition: variable
	'[...]' declared in block made addressable".
	gcc/testsuite/
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
	'--param=openacc-privatization=noisy'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
---
 gcc/omp-oacc-kernels-decompose.cc | 24 ++-
 .../goacc/classify-kernels-unparallelized.c   |  7 ++
 .../c-c++-common/goacc/classify-kernels.c |  7 ++
 .../c-c++-common/goacc/kernels-decompose-2.c  |  2 ++
 .../goacc/kernels-decompose-pr100280-1.c  |  1 +
 .../goacc/kernels-decompose-pr104061-1-2.c|  1 +
 .../goacc/kernels-decompose-pr104061-1-3.c|  1 +
 .../goacc/kernels-decompose-pr104061-1-4.c|  1 +
 .../goacc/kernels-decompose-pr104132-1.c  |  1 +
 .../goacc/kernels-decompose-pr104133-1.c  |  1 +
 .../libgomp.oacc-c-c++-common/f-asyncwait-1.c |  3 +++
 .../kernels-decompose-1.c |  1 +
 12 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index 98eafdbe3a1..5093386f718 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -845,7 +845,29 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
 	  prev_mapped_var = v;
 
 	  /* See

OpenACC 'kernels' decomposition: Move 'TREE_ADDRESSABLE' setting into OMP lowering [PR100280]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-01-13T10:54:16+0100, I wrote:
> On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>>  - The "addressable" bit is set during the kernels conversion pass for
>>variables that have "create" (alloc) clauses created for them in the
>>synthesised outer data region (instead of in the front-end, etc.,
>>where it can't be done accurately). Such variables actually have
>>their address taken during transformations made in a later pass
>>(omp-low, I think), but there's a phase-ordering problem that means
>>the flag should be set earlier.
>
> The actual issue is a bit different, but yes, there is a problem.
> The related ICE has also been reported as 
> "ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
> didn't run into that with the OpenACC 'kernels' decomposition
> originally.)  I've pushed to master branch
> commit 9b32c1669aad5459dd053424f9967011348add83
> "OpenACC 'kernels' decomposition: Mark variables used in synthesized data
> clauses as addressable [PR100280]", see attached.

> --- a/gcc/omp-oacc-kernels-decompose.cc
> +++ b/gcc/omp-oacc-kernels-decompose.cc
> @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple 
> *body)
>
>  /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
> location LOC containing BODY and having 'create (var)' clauses for each
> -   variable.  If INNER_CLEANUP is present, add a try-finally statement with
> +   variable (as a side effect, such variables also get TREE_ADDRESSABLE set).
> +   If INNER_CLEANUP is present, add a try-finally statement with
> this cleanup code in the finally block.  Return the new data region, or
> the original BODY if no data region was needed.  */
>
> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple 
> *body,
> inner_data_clauses = new_clause;
>
> prev_mapped_var = v;
> +
> +   /* See .  */
> +   TREE_ADDRESSABLE (v) = 1;
>   }
>  }

Pushed to master branch commit de6e81ea961219d0726db67776d11ce75a4cae1b
"OpenACC 'kernels' decomposition: Move 'TREE_ADDRESSABLE' setting into
OMP lowering [PR100280]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From de6e81ea961219d0726db67776d11ce75a4cae1b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 15 Feb 2022 23:03:49 +0100
Subject: [PATCH] OpenACC 'kernels' decomposition: Move 'TREE_ADDRESSABLE'
 setting into OMP lowering [PR100280]

... in preparation for later changes.  No functional change.

Follow-up to commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".

	PR middle-end/100280
	gcc/
	* tree.h (OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE): New.
	* tree-core.h: Document it.
	* omp-low.cc (scan_sharing_clauses) : Handle
	'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Set 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' instead of
	'TREE_ADDRESSABLE'.
	gcc/testsuite/
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Adjust.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
---
 gcc/omp-low.cc| 31 +++
 gcc/omp-oacc-kernels-decompose.cc |  5 +--
 .../goacc/classify-kernels-unparallelized.c   |  3 +-
 .../c-c++-common/goacc/classify-kernels.c |  3 +-
 .../c-c++-common/goacc/kernels-decompose-2.c  |  6 ++--
 .../goacc/kernels-decompose-pr100280-1.c  |  3 +-
 .../goacc/kernels-decompose-pr104061-1-2.c|  3 +-
 .../goacc/kernels-decompose-pr104061-1-3.c|  3 +-
 .../goacc/kernels-decompose-pr104061-1-4.c|  3 +-
 .../goacc/kernels-decompose-pr104132-1.c  |  3 +-
 .../goacc/kernels-decompose-pr104133-1.c  |  3 +-
 gcc/tree-core.h   |  3 ++
 gcc/tree.h|  5 +++
 .../libgomp.oacc-c-c++-common/f-asyncwait-1.c |  9 --
 .../kernels-decompose-1.c |  3 +-
 15 files changed, 70 insertions(+), 16 deletions(-)

diff --git a/gcc/omp-low.cc b/gcc/om

OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-03-01T17:46:20+0100, I wrote:
> On 2022-01-13T10:54:16+0100, I wrote:
>> On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>>>  - The "addressable" bit is set during the kernels conversion pass for
>>>variables that have "create" (alloc) clauses created for them in the
>>>synthesised outer data region (instead of in the front-end, etc.,
>>>where it can't be done accurately). Such variables actually have
>>>their address taken during transformations made in a later pass
>>>(omp-low, I think), but there's a phase-ordering problem that means
>>>the flag should be set earlier.
>>
>> The actual issue is a bit different, but yes, there is a problem.
>> The related ICE has also been reported as 
>> "ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
>> didn't run into that with the OpenACC 'kernels' decomposition
>> originally.)  I've pushed to master branch
>> commit 9b32c1669aad5459dd053424f9967011348add83
>> "OpenACC 'kernels' decomposition: Mark variables used in synthesized data
>> clauses as addressable [PR100280]"

>> --- a/gcc/omp-oacc-kernels-decompose.cc
>> +++ b/gcc/omp-oacc-kernels-decompose.cc
>> @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple 
>> *body)
>>
>>  /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
>> location LOC containing BODY and having 'create (var)' clauses for each
>> -   variable.  If INNER_CLEANUP is present, add a try-finally statement with
>> +   variable (as a side effect, such variables also get TREE_ADDRESSABLE 
>> set).
>> +   If INNER_CLEANUP is present, add a try-finally statement with
>> this cleanup code in the finally block.  Return the new data region, or
>> the original BODY if no data region was needed.  */
>>
>> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple 
>> *body,
>> inner_data_clauses = new_clause;
>>
>> prev_mapped_var = v;
>> +
>> +   /* See .  */
>> +   TREE_ADDRESSABLE (v) = 1;
>>   }
>>  }
>
> So, that's too simple.  ;-) [...]

> We're after gimplification, and must not just set 'TREE_ADDRESSABLE',
> because that may easily violate GIMPLE invariants, leading to ICEs later.
> There are a few open PRs, which my following changes are addressing.  To
> make "late" 'TREE_ADDRESSABLE' work, we have a precedent in OpenMP's
> 'gcc/omp-low.cc:task_shared_vars' handling, as Jakub had pointed to in
> discussion of .

> I'm thus proposing to generalize 'gcc/omp-low.cc:task_shared_vars' into
> 'make_addressable_vars', plus new 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'
> that we then may use instead of the 'TREE_ADDRESSABLE (v) = 1;' quoted
> above (plus one or two additional ones to be introduced in later
> patches), and wire that up in 'gcc/omp-low.cc:scan_sharing_clauses', for
> 'OMP_CLAUSE_MAP': set 'TREE_ADDRESSABLE' and put into
> 'make_addressable_vars' for later fix-up.

Pushed to master branch commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8935589b496f755e08cadf26d8ceddf0dd6e0968 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 15 Feb 2022 23:31:34 +0100
Subject: [PATCH] OMP lowering: Regimplify
 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs [PR100280, PR104132, PR104133]

... by generalizing the existing 'gcc/omp-low.cc:task_shared_vars'.

Fix-up for commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in
synthesized data clauses as addressable [PR100280]".

	PR middle-end/100280
	PR middle-end/104132
	PR middle-end/104133
	gcc/
	* omp-low.cc (task_shared_vars): Rename to
	'make_addressable_vars'.  Adjust all users.
	(scan_sharing_clauses)  Use it for
	'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs, too.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104132-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-pr104133-1.c: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Extend.
---
 gcc/omp-low.cc| 47 +++---
 .../goacc/kernels-decompose-pr104061-1-3.c| 11 +---
 .../goacc/kernels-decompose-pr104061-1-4.c| 17 ++---
 .../goacc/kernels-decompose-pr104132-1.c  |  9 +--
 .../goacc/kernels-decompose-pr104133-1.c  |  9 +--
 .../kernels-decompose-1.c | 62 +--
 6 files changed, 95 insertio

Re: [PATCH] Improve profile handling in switch lowering.

2022-03-04 Thread Martin Liška

PING^1

On 1/26/22 12:11, Martin Liška wrote:

Hello.

Right now, switch lowering does not update basic_block::count values
so that they are uninitiliazed. Moreover, I've updated probability scaling
when a more complex expansion happens. There are still some situations where
the profile is a bit imprecise, but the patch improves rapidly the current 
situation.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

 PR tree-optimization/101301
 PR tree-optimization/103680

gcc/ChangeLog:

 * tree-switch-conversion.cc (bit_test_cluster::emit):
 Handle correctly remaining probability.
 (switch_decision_tree::try_switch_expansion): Fix BB's count
 where a cluster expansion happens.
 (switch_decision_tree::emit_cmp_and_jump_insns): Fill up also
 BB count.
 (switch_decision_tree::do_jump_if_equal): Likewise.
 (switch_decision_tree::emit_case_nodes): Handle special case
 for BT expansion which can also fallback to a default BB.
 * tree-switch-conversion.h (cluster::cluster): Add
 m_default_prob probability.
---
  gcc/tree-switch-conversion.cc | 51 ---
  gcc/tree-switch-conversion.h  |  8 +-
  2 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-switch-conversion.cc b/gcc/tree-switch-conversion.cc
index 670397c87e4..d6679e8dee3 100644
--- a/gcc/tree-switch-conversion.cc
+++ b/gcc/tree-switch-conversion.cc
@@ -1538,10 +1538,12 @@ bit_test_cluster::emit (tree index_expr, tree 
index_type,
    test[k].target_bb = n->m_case_bb;
    test[k].label = n->m_case_label_expr;
    test[k].bits = 0;
+  test[k].prob = profile_probability::never ();
    count++;
  }

    test[k].bits += n->get_range (n->get_low (), n->get_high ());
+  test[k].prob += n->m_prob;

    lo = tree_to_uhwi (int_const_binop (MINUS_EXPR, n->get_low (), minval));
    if (n->get_high () == NULL_TREE)
@@ -1629,6 +1631,11 @@ bit_test_cluster::emit (tree index_expr, tree index_type,
    /*simple=*/true, NULL_TREE,
    /*before=*/true, GSI_SAME_STMT);

+  profile_probability subtree_prob = m_subtree_prob;
+  profile_probability default_prob = m_default_prob;
+  if (!default_prob.initialized_p ())
+    default_prob = m_subtree_prob.invert ();
+
    if (m_handles_entire_switch && entry_test_needed)
  {
    tree range = int_const_binop (MINUS_EXPR, maxval, minval);
@@ -1639,9 +1646,10 @@ bit_test_cluster::emit (tree index_expr, tree index_type,
  /*simple=*/true, NULL_TREE,
  /*before=*/true, GSI_SAME_STMT);
    tmp = fold_build2 (GT_EXPR, boolean_type_node, idx, range);
+  default_prob = default_prob.apply_scale (1, 2);
    basic_block new_bb
  = hoist_edge_and_branch_if_true (&gsi, tmp, default_bb,
- profile_probability::unlikely ());
+ default_prob);
    gsi = gsi_last_bb (new_bb);
  }

@@ -1662,14 +1670,12 @@ bit_test_cluster::emit (tree index_expr, tree 
index_type,
    else
  csui = tmp;

-  profile_probability prob = profile_probability::always ();
-
    /* for each unique set of cases:
     if (const & csui) goto target  */
    for (k = 0; k < count; k++)
  {
-  prob = profile_probability::always ().apply_scale (test[k].bits,
- bt_range);
+  profile_probability prob = test[k].prob / (subtree_prob + default_prob);
+  subtree_prob -= test[k].prob;
    bt_range -= test[k].bits;
    tmp = wide_int_to_tree (word_type_node, test[k].mask);
    tmp = fold_build2 (BIT_AND_EXPR, word_type_node, csui, tmp);
@@ -1908,9 +1914,13 @@ switch_decision_tree::try_switch_expansion (vec 
&clusters)
    /* Emit cluster-specific switch handling.  */
    for (unsigned i = 0; i < clusters.length (); i++)
  if (clusters[i]->get_type () != SIMPLE_CASE)
-  clusters[i]->emit (index_expr, index_type,
- gimple_switch_default_label (m_switch),
- m_default_bb, gimple_location (m_switch));
+  {
+    edge e = single_pred_edge (clusters[i]->m_case_bb);
+    e->dest->count = e->src->count.apply_probability (e->probability);
+    clusters[i]->emit (index_expr, index_type,
+   gimple_switch_default_label (m_switch),
+   m_default_bb, gimple_location (m_switch));
+  }
  }

    fix_phi_operands_for_edges ();
@@ -2162,6 +2172,7 @@ switch_decision_tree::emit_cmp_and_jump_insns 
(basic_block bb, tree op0,
    edge false_edge = split_block (bb, cond);
    false_edge->flags = EDGE_FALSE_VALUE;
    false_edge->probability = prob.invert ();
+  false_edge->dest->count = bb->count.apply_probability (prob.invert ());

    edge true_edge = make_edge (bb, label_bb, EDGE_TRUE_VALUE);
    true_edge->probability = prob;
@@ -2192,6 +2203,7 @@ switch_decision_tree::do_jump_if_equal (basic_block bb, 
tr

Re: [PATCH] call mark_dfs_back_edges() before testing EDGE_DFS_BACK [PR104761]

2022-03-04 Thread Jakub Jelinek via Gcc-patches
On Thu, Mar 03, 2022 at 05:08:30PM -0700, Martin Sebor wrote:
> > 1) shouldn't it give up for EDGE_ABNORMAL too?  I mean, e.g.
> > following a non-local goto forced edge from a noreturn call
> > to a non-local label (if there is just one) doesn't seem
> > right to me
> 
> Possibly yes.  I can add it but I don't have a lot of experience with
> these bits so if you can suggest a test case to exercise this that
> would be helpful.

Something like:
void
foo (void)
{
  __label__ l;
  __attribute__((noreturn)) void bar (int x) { if (x) goto l; __builtin_trap 
(); }
  bar (0);
l:;
}
shows a single EDGE_ABNORMAL from the bar call.
But it would need tweaking for the ptr use and clobber.

> > 2) if EDGE_DFS_BACK is computed and 1) is done, is there any
> > reason why you need 2 levels of protection, i.e. the EDGE_DFS_BACK
> > check as well as the visited bitmap (and having them use
> > very different answers, if EDGE_DFS_BACK is seen, the function
> > will return false, if visited bitmap has a bb, it will return true)?
> > Can't the visited bitmap go away?
> 
> Possibly.  As I said above, I don't have enough experience with these
> bits to make (and test) the changes quickly, or enough bandwidth to
> come up to speed on them.  Please feel free to make these improvements.

I'll change that if it passes testing.

Jakub



[PATCH] arm: Remove unused variable arm_binop_none_none_unone_qualifiers

2022-03-04 Thread Christophe Lyon via Gcc-patches
From: Christophe Lyon 

Commits r12-7342 and r12-7344 made some cleanup, leaving
arm_binop_none_none_unone_qualifiers unused.
This is causing build failures with -Werror (eg bootstrap).

This patch fixes the problem by removing the definition of
arm_binop_none_none_unone_qualifiers and
BINOP_NONE_NONE_UNONE_QUALIFIERS which are now unused.

Tested by bootstraping on arm-linux-gnueaibhf.

2022-03-04  Christophe Lyon  

gcc/
* config/arm/arm-builtins.cc
(arm_binop_none_none_unone_qualifiers): Delete.
(BINOP_NONE_NONE_UNONE_QUALIFIERS): Delete.
---
 gcc/config/arm/arm-builtins.cc | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index a7acc1d71e7..6afca7a82cb 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -432,12 +432,6 @@ arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_NONE_IMM_QUALIFIERS \
   (arm_binop_unone_none_imm_qualifiers)
 
-static enum arm_type_qualifiers
-arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_unsigned };
-#define BINOP_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_binop_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_predicate, qualifier_none, qualifier_none };
-- 
2.25.1



[PATCH] OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.

2022-03-04 Thread Marcel Vollweiler

Hi,

This patch adds the OpenMP runtime routine "omp_get_mapped_ptr" which was
introduced in OpenMP 5.1 (specification section 3.8.11):

"The omp_get_mapped_ptr routine returns the device pointer that is associated
with a host pointer for a given device."

"The device_num argument must be greater than or equal to zero and less than or
equal to the result of omp_get_num_devices()."

"A call to this routine for a pointer that is not NULL (or C_NULL_PTR, for
Fortran) and does not have an associated pointer on the given device results in
a NULL pointer."

"The routine returns NULL (or C_NULL_PTR, for Fortran) if unsuccessful.
Otherwise it returns the device pointer, which is ptr if device_num is the value
returned by omp_get_initial_device()."

Implementation and tests were added for C/C++ and Fortran.

There is a small inconvenience considering zero-length arrays as list items of
the "target map" construct: it seems that zero-length arrays are not associated
correctly there, such that omp_get_mapped_ptr returns NULL instead of the
associated device pointer - in contrast to the situation where a device pointer
is associated with the host pointer via omp_target_associate_ptr.
However, the result for omp_get_mapped_ptr is consistent with
omp_target_is_present (which returns 0, i.e. "not present") in this situation.

The patch was tested on x86_64-linux with nvptx and amdgcn offloading. All with
no regressions.

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.

libgomp/ChangeLog:

* libgomp.map: Added omp_get_mapped_ptr.
* libgomp.texi: Tagged omp_get_mapped_ptr as supported.
* omp.h.in: Added omp_get_mapped_ptr.
* omp_lib.f90.in: Added interface for omp_get_mapped_ptr.
* omp_lib.h.in: Likewise.
* target.c (omp_get_mapped_ptr): Added implementation of
omp_get_mapped_ptr.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c: New test.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-2.c: New test.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-3.c: New test.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-4.c: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-1.f90: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-2.f90: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-3.f90: New test.
* testsuite/libgomp.fortran/get-mapped-ptr-4.f90: New test.

diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2ac5809..00a4858 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -224,6 +224,7 @@ OMP_5.1 {
omp_set_teams_thread_limit_8_;
omp_get_teams_thread_limit;
omp_get_teams_thread_limit_;
+   omp_get_mapped_ptr;
 } OMP_5.0.2;
 
 GOMP_1.0 {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 161a423..c163b56 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -314,7 +314,7 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{omp_target_is_accessible} runtime routine @tab N @tab
 @item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
   runtime routines @tab N @tab
-@item @code{omp_get_mapped_ptr} runtime routine @tab N @tab
+@item @code{omp_get_mapped_ptr} runtime routine @tab Y @tab
 @item @code{omp_calloc}, @code{omp_realloc}, @code{omp_aligned_alloc} and
   @code{omp_aligned_calloc} runtime routines @tab Y @tab
 @item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialized} added,
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 89c5d65..18d0152 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -282,6 +282,7 @@ extern int omp_target_memcpy_rect (void *, const void *, 
__SIZE_TYPE__, int,
 extern int omp_target_associate_ptr (const void *, const void *, __SIZE_TYPE__,
 __SIZE_TYPE__, int) __GOMP_NOTHROW;
 extern int omp_target_disassociate_ptr (const void *, int) __GOMP_NOTHROW;
+extern void *omp_get_mapped_ptr (const void *, int) __GOMP_NOTHROW;
 
 extern void omp_set_affinity_format (const char *) __GOMP_NOTHROW;
 extern __SIZE_TYPE__ omp_get_affinity_format (char *, __SIZE_TYPE__)
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index daf40dc..506f15c 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -835,6 +835,15 @@
   end function omp_target_disassociate_ptr
 end interface
 
+interface
+  function omp_get_mapped_ptr (ptr, device_num) bind(c)
+use, intrinsic :: iso_c_binding, only : c_ptr, c_int
+type(c_ptr) :: omp_get_mapped_ptr
+type(c_ptr), value :: ptr
+integer(c_int), value :: device_num
+  end function omp_get_mapped_ptr
+end

Test 'libgomp.oacc-*/kernels-private-vars-*' with '--param=openacc-kernels=decompose' [PR104784]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-03-04T14:46:25+0100, I wrote:
> Pushed to master branch commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
> "OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
> [PR100280, PR104132, PR104133]", see attached.

Pushed to master branch commit e28eb86c18ed765dceb3c56471a848e9f0e120ff
"Test 'libgomp.oacc-*/kernels-private-vars-*' with
'--param=openacc-kernels=decompose' [PR104784]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e28eb86c18ed765dceb3c56471a848e9f0e120ff Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 16 Feb 2022 22:24:03 +0100
Subject: [PATCH] Test 'libgomp.oacc-*/kernels-private-vars-*' with
 '--param=openacc-kernels=decompose' [PR104784]

Before recent commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]", 'libgomp.oacc-c' testing already worked fine,
but 'libgomp.oacc-c++' testing ICEed.  Via the commit mentioned, the C++
testing ICEs are now resolved, but the underlying issue remains to be looked
into: PR104784 "OpenACC 'kernels' decomposition: C vs. C++ differences".

	PR middle-end/104784
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
	Test with '--param=openacc-kernels=decompose'.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-6.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-7.f90:
	Likewise.
---
 .../kernels-private-vars-local-worker-1.c | 23 +++
 .../kernels-private-vars-local-worker-2.c | 20 
 .../kernels-private-vars-local-worker-3.c | 20 
 .../kernels-private-vars-local-worker-4.c | 20 
 .../kernels-private-vars-local-worker-5.c | 20 
 .../kernels-private-vars-loop-gang-1.c| 11 ++---
 .../kernels-private-vars-loop-gang-2.c| 11 ++---
 .../kernels-private-vars-loop-gang-3.c| 11 ++---
 .../kernels-private-vars-loop-gang-4.c| 10 ++--
 .../kernels-private-vars-loop-gang-5.c| 11 ++---
 .../kernels-private-vars-l

Re: [PATCH] OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.

2022-03-04 Thread Jakub Jelinek via Gcc-patches
On Fri, Mar 04, 2022 at 03:47:31PM +0100, Marcel Vollweiler wrote:
> libgomp/ChangeLog:
> 
>   * libgomp.map: Added omp_get_mapped_ptr.
>   * libgomp.texi: Tagged omp_get_mapped_ptr as supported.
>   * omp.h.in: Added omp_get_mapped_ptr.
>   * omp_lib.f90.in: Added interface for omp_get_mapped_ptr.
>   * omp_lib.h.in: Likewise.
>   * target.c (omp_get_mapped_ptr): Added implementation of
>   omp_get_mapped_ptr.
>   * testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c: New test.
>   * testsuite/libgomp.c-c++-common/get-mapped-ptr-2.c: New test.
>   * testsuite/libgomp.c-c++-common/get-mapped-ptr-3.c: New test.
>   * testsuite/libgomp.c-c++-common/get-mapped-ptr-4.c: New test.
>   * testsuite/libgomp.fortran/get-mapped-ptr-1.f90: New test.
>   * testsuite/libgomp.fortran/get-mapped-ptr-2.f90: New test.
>   * testsuite/libgomp.fortran/get-mapped-ptr-3.f90: New test.
>   * testsuite/libgomp.fortran/get-mapped-ptr-4.f90: New test.
> 
> diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
> index 2ac5809..00a4858 100644
> --- a/libgomp/libgomp.map
> +++ b/libgomp/libgomp.map
> @@ -224,6 +224,7 @@ OMP_5.1 {
>   omp_set_teams_thread_limit_8_;
>   omp_get_teams_thread_limit;
>   omp_get_teams_thread_limit_;
> + omp_get_mapped_ptr;
>  } OMP_5.0.2;

I think it is too late for this to be targetted for GCC 12, and
for GCC 13 it will need to go into OMP_5.1.1 symver.

>  GOMP_1.0 {
> diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
> index 161a423..c163b56 100644
> --- a/libgomp/libgomp.texi
> +++ b/libgomp/libgomp.texi
> @@ -314,7 +314,7 @@ The OpenMP 4.5 specification is fully supported.
>  @item @code{omp_target_is_accessible} runtime routine @tab N @tab
>  @item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
>runtime routines @tab N @tab
> -@item @code{omp_get_mapped_ptr} runtime routine @tab N @tab
> +@item @code{omp_get_mapped_ptr} runtime routine @tab Y @tab
>  @item @code{omp_calloc}, @code{omp_realloc}, @code{omp_aligned_alloc} and
>@code{omp_aligned_calloc} runtime routines @tab Y @tab
>  @item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialized} added,
> diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
> index 89c5d65..18d0152 100644
> --- a/libgomp/omp.h.in
> +++ b/libgomp/omp.h.in
> @@ -282,6 +282,7 @@ extern int omp_target_memcpy_rect (void *, const void *, 
> __SIZE_TYPE__, int,
>  extern int omp_target_associate_ptr (const void *, const void *, 
> __SIZE_TYPE__,
>__SIZE_TYPE__, int) __GOMP_NOTHROW;
>  extern int omp_target_disassociate_ptr (const void *, int) __GOMP_NOTHROW;
> +extern void *omp_get_mapped_ptr (const void *, int) __GOMP_NOTHROW;
>  
>  extern void omp_set_affinity_format (const char *) __GOMP_NOTHROW;
>  extern __SIZE_TYPE__ omp_get_affinity_format (char *, __SIZE_TYPE__)
> diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
> index daf40dc..506f15c 100644
> --- a/libgomp/omp_lib.f90.in
> +++ b/libgomp/omp_lib.f90.in
> @@ -835,6 +835,15 @@
>end function omp_target_disassociate_ptr
>  end interface
>  
> +interface
> +  function omp_get_mapped_ptr (ptr, device_num) bind(c)
> +use, intrinsic :: iso_c_binding, only : c_ptr, c_int
> +type(c_ptr) :: omp_get_mapped_ptr
> +type(c_ptr), value :: ptr
> +integer(c_int), value :: device_num
> +  end function omp_get_mapped_ptr
> +end interface
> +
>  #if _OPENMP >= 201811
>  !GCC$ ATTRIBUTES DEPRECATED :: omp_get_nested, omp_set_nested
>  #endif
> diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
> index ff857a4..0f48510 100644
> --- a/libgomp/omp_lib.h.in
> +++ b/libgomp/omp_lib.h.in
> @@ -416,3 +416,12 @@
>integer(c_int), value :: device_num
>  end function omp_target_disassociate_ptr
>end interface
> +
> +  interface
> +function omp_get_mapped_ptr (ptr, device_num) bind(c)
> +  use, intrinsic :: iso_c_binding, only : c_ptr, c_int
> +  type(c_ptr) :: omp_get_mapped_ptr
> +  type(c_ptr), value :: ptr
> +  integer(c_int), value :: device_num
> +end function omp_get_mapped_ptr
> +  end interface
> diff --git a/libgomp/target.c b/libgomp/target.c
> index 9017458..735d70b 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -3665,6 +3665,49 @@ omp_target_disassociate_ptr (const void *ptr, int 
> device_num)
>return ret;
>  }
>  
> +void *
> +omp_get_mapped_ptr (const void *ptr, int device_num)
> +{
> +  if (device_num < 0 || device_num > omp_get_num_devices ())
> +return NULL;
> +
> +  if (device_num == omp_get_initial_device ())
> +return (void*)ptr;

Space before * and space after )

> +  struct gomp_device_descr *devicep = resolve_device (device_num);
> +  if (devicep == NULL)
> +return NULL;
> +
> +  if (!(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
> + 

[c++] New module mangling ABI

2022-03-04 Thread Nathan Sidwell


This implements a new module mangling ABI as the original one has a
few issues:

a) it was not demangleable (oops)

b) implemented a weak ownership model.

This implements a strong ownership model, so that exported entities
from named modules are mangled to include their module attachment.
This gives more informative linker diagnostics and better module
isolation.  Weak ownership was hoped to allow backwards compatibility
with non-modular code, but in practice was very brittle, and C++20
added new semantics for linkage declarations that cover the needed
functionality.

FAOD Clang is also moving to this ABI and documentation will be added
to the Itanium ABI specification.

--
Nathan SidwellFrom 73baba1ae1b8f3618c2d3b674117b8a462e0ca76 Mon Sep 17 00:00:00 2001
From: Nathan Sidwell 
Date: Wed, 2 Mar 2022 19:13:43 -0500
Subject: [PATCH 1/2] c++: New module mangling ABI

This implements a new module mangling ABI as the original one has a
few issues:

a) it was not demangleable (oops)

b) implemented a weak ownership model.

This implements a strong ownership model, so that exported entities
from named modules are mangled to include their module attachment.
This gives more informative linker diagnostics and better module
isolation.  Weak ownership was hoped to allow backwards compatibility
with non-modular code, but in practice was very brittle, and C++20
added new semantics for linkage declarations that cover the needed
functionality.

FAOD Clang is also moving to this ABI and documentation will be added
to the Itanium ABI specification.

gcc/cp/
	* cp-tree.h (mangle_identifier): Replace with ...
	(mangle_module_component): ... this.
	* mangle.cc (dump_substitution_candidates): Adjust.
	(add_substitution): Likewise.
	(find_substitution): Likewise.
	(unmangled_name_p): Likewise.
	(mangle_module_substitution): Reimplement.
	(mangle_module_component): New.
	(write_module, maybe_write_module): Adjust.
	(write_name): Drop modules here.
	(write_unqualified): Do them here instead.
	(mangle_global_init): Adjust.
	* module.cc (module_state::mangle): Adjust.
	(mangle_module): Likewise.
	(get_originating_module): Adjust.

gcc/testsuite/
	* g++.dg/modules/fn-inline-1_b.C: Adjust.
	* g++.dg/modules/fn-inline-1_c.C: Adjust.
	* g++.dg/modules/imp-inline-1_a.C: Adjust.
	* g++.dg/modules/imp-inline-1_b.C: Adjust.
	* g++.dg/modules/init-2_a.C: Adjust.
	* g++.dg/modules/init-2_b.C: Adjust.
	* g++.dg/modules/init-2_c.C: Adjust.
	* g++.dg/modules/member-def-2_d.C: Adjust.
	* g++.dg/modules/mod-sym-1.C: Adjust.
	* g++.dg/modules/mod-sym-2.C: Adjust.
	* g++.dg/modules/mod-sym-3.C: Adjust.
	* g++.dg/modules/sym-subst-1.C: Adjust.
	* g++.dg/modules/sym-subst-2_b.C: Adjust.
	* g++.dg/modules/sym-subst-3_a.C: Adjust.
	* g++.dg/modules/sym-subst-3_b.C: Adjust.
	* g++.dg/modules/sym-subst-4.C: Adjust.
	* g++.dg/modules/sym-subst-5.C: Adjust.
	* g++.dg/modules/sym-subst-6.C: Adjust.
	* g++.dg/modules/tpl-spec-1_a.C: Adjust.
	* g++.dg/modules/tpl-spec-2_b.C: Adjust.
	* g++.dg/modules/tpl-spec-2_d.C: Adjust.
	* g++.dg/modules/tpl-spec-3_a.C: Adjust.
	* g++.dg/modules/virt-1_a.C: Adjust.
	* g++.dg/modules/virt-2_a.C: Adjust.
	* g++.dg/modules/virt-2_b.C: Adjust.
	* g++.dg/modules/virt-2_c.C: Adjust.
	* g++.dg/modules/vtt-1_a.C: Adjust.
	* g++.dg/modules/vtt-1_b.C: Adjust.
---
 gcc/cp/cp-tree.h  |   8 +-
 gcc/cp/mangle.cc  | 124 ++
 gcc/cp/module.cc  |  27 ++--
 gcc/testsuite/g++.dg/modules/fn-inline-1_b.C  |   6 +-
 gcc/testsuite/g++.dg/modules/fn-inline-1_c.C  |   4 +-
 gcc/testsuite/g++.dg/modules/imp-inline-1_a.C |   4 +-
 gcc/testsuite/g++.dg/modules/imp-inline-1_b.C |  12 +-
 gcc/testsuite/g++.dg/modules/init-2_a.C   |   2 +-
 gcc/testsuite/g++.dg/modules/init-2_b.C   |   4 +-
 gcc/testsuite/g++.dg/modules/init-2_c.C   |   4 +-
 gcc/testsuite/g++.dg/modules/member-def-2_d.C |   2 +-
 gcc/testsuite/g++.dg/modules/mod-sym-1.C  |  13 +-
 gcc/testsuite/g++.dg/modules/mod-sym-2.C  |   4 +-
 gcc/testsuite/g++.dg/modules/mod-sym-3.C  |   8 +-
 gcc/testsuite/g++.dg/modules/sym-subst-1.C|   2 +-
 gcc/testsuite/g++.dg/modules/sym-subst-2_b.C  |   4 +-
 gcc/testsuite/g++.dg/modules/sym-subst-3_a.C  |   2 +-
 gcc/testsuite/g++.dg/modules/sym-subst-3_b.C  |   2 +-
 gcc/testsuite/g++.dg/modules/sym-subst-4.C|   2 +-
 gcc/testsuite/g++.dg/modules/sym-subst-5.C|   2 +-
 gcc/testsuite/g++.dg/modules/sym-subst-6.C|   2 +-
 gcc/testsuite/g++.dg/modules/tpl-spec-1_a.C   |   2 +-
 gcc/testsuite/g++.dg/modules/tpl-spec-2_b.C   |   2 +-
 gcc/testsuite/g++.dg/modules/tpl-spec-2_d.C   |   4 +-
 gcc/testsuite/g++.dg/modules/tpl-spec-3_a.C   |   2 +-
 gcc/testsuite/g++.dg/modules/virt-1_a.C   |   7 +-
 gcc/testsuite/g++.dg/modules/virt-2_a.C   |   6 +-
 gcc/testsuite/g++.dg/modules/virt-2_b.C   |   6 +-
 gcc/testsuite/g++.dg/modules/virt-2_c.C   |   6 +-
 gcc/testsuite/g++.dg/modules/vtt-1_a.C|   6 +-

[committed] c++: Add testcase for already fixed PR [PR103443]

2022-03-04 Thread Patrick Palka via Gcc-patches
Fixed by r12-7264.

PR c++/103443

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval29.C: New test.
---
 gcc/testsuite/g++.dg/cpp2a/consteval29.C | 20 
 1 file changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/consteval29.C

diff --git a/gcc/testsuite/g++.dg/cpp2a/consteval29.C 
b/gcc/testsuite/g++.dg/cpp2a/consteval29.C
new file mode 100644
index 000..61590225bd6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/consteval29.C
@@ -0,0 +1,20 @@
+// PR c++/103443
+// { dg-do compile { target c++20 } }
+
+template
+struct A { };
+
+template
+consteval unsigned index_sequence2mask(A) {
+  if constexpr (sizeof...(Is) == 0u)
+return 0u;
+  else
+return ((1u << Is) | ...);
+}
+
+template{})>
+void use_mask();
+
+int main() {
+  use_mask();
+}
-- 
2.35.1.354.g715d08a9e5



Re: [PATCH] [i386] Optimize v4si broadcast for noavx512vl.

2022-03-04 Thread Richard Biener via Gcc-patches



> Am 04.03.2022 um 03:30 schrieb Hongtao Liu via Gcc-patches 
> :
> 
> On Fri, Mar 4, 2022 at 10:29 AM liuhongt via Gcc-patches
>  wrote:
>> 
>> This is incremental patch based on [1], it enables optimization as below
>> 
>> -   vbroadcastss.LC1(%rip), %xmm0
>> +   movl$-45, %edx
>> +   vmovd   %edx, %xmm0
>> +   vpshufd $0, %xmm0, %xmm0
>> 
>> According to microbenchmark, it's faster than broadcast from memory

Is that true even on AMD uarchs?

>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591162.html.
>> 
>> Bootstrapped and regtest on x86_64-linux-gnu{-m32,}.
>> Ok for trunk?
>> 
>> gcc/ChangeLog:
>> 
>>PR target/104704
>>* config/i386/sse.md (*vec_dupv4si): Add alternative $r and
>>corresponding post_reload splitter.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>* gcc.target/i386/pr100865-8a.c: Adjust testcase.
>>* gcc.target/i386/pr100865-8c.c: Ditto.
>>* gcc.target/i386/pr100865-9c.c: Ditto.
>> ---
>> gcc/config/i386/sse.md  | 41 -
>> gcc/testsuite/gcc.target/i386/pr100865-8a.c |  2 +-
>> gcc/testsuite/gcc.target/i386/pr100865-8c.c |  2 +-
>> gcc/testsuite/gcc.target/i386/pr100865-9c.c |  2 +-
>> 4 files changed, 35 insertions(+), 12 deletions(-)
>> 
>> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
>> index 3066ea3734a..d124545aa5d 100644
>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -25121,20 +25121,43 @@ (define_insn "vec_dupv4sf"
>>(set_attr "mode" "V4SF")])
>> 
>> (define_insn "*vec_dupv4si"
>> -  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
>> +  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x,v")
>>(vec_duplicate:V4SI
>> - (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
>> + (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0,$r")))]
>>   "TARGET_SSE"
>>   "@
>>%vpshufd\t{$0, %1, %0|%0, %1, 0}
>>vbroadcastss\t{%1, %0|%0, %1}
>> -   shufps\t{$0, %0, %0|%0, %0, 0}"
>> -  [(set_attr "isa" "sse2,avx,noavx")
>> -   (set_attr "type" "sselog1,ssemov,sselog1")
>> -   (set_attr "length_immediate" "1,0,1")
>> -   (set_attr "prefix_extra" "0,1,*")
>> -   (set_attr "prefix" "maybe_vex,maybe_evex,orig")
>> -   (set_attr "mode" "TI,V4SF,V4SF")])
>> +   shufps\t{$0, %0, %0|%0, %0, 0}
>> +   #"
>> +  [(set_attr "isa" "sse2,avx,noavx,noavx512vl")
>> +   (set_attr "type" "sselog1,ssemov,sselog1,sselog1")
>> +   (set_attr "length_immediate" "1,0,1,1")
>> +   (set_attr "prefix_extra" "0,1,*,0")
>> +   (set_attr "prefix" "maybe_vex,maybe_evex,orig,maybe_vex")
>> +   (set_attr "mode" "TI,V4SF,V4SF,TI")
>> +   (set (attr "preferred_for_speed")
>> + (cond [(eq_attr "alternative" "3")
>> + (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC")
>> +  ]
>> +  (symbol_ref "true")))])
>> +
>> +(define_split
>> +  [(set (match_operand:V4SI 0 "sse_reg_operand")
>> +   (vec_duplicate:V4SI
>> + (match_operand:SI 1 "general_reg_operand")))]
>> +  "TARGET_SSE && reload_completed
>> +   /* Disable this splitter if avx512vl_vec_dup_gprv4si insn is
>> +  available, because then we can broadcast from GPRs directly.  */
>> +   && !TARGET_AVX512VL"
>> +  [(const_int 0)]
>> +{
>> +  emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]),
>> +   CONST0_RTX (V4SImode),
>> +   gen_lowpart (SImode, operands[1])));
>> +  emit_insn (gen_vec_duplicatev4si (operands[0], operands[0]));
>> +  DONE;
>> +})
>> 
>> (define_insn "*vec_dupv2di"
>>   [(set (match_operand:V2DI 0 "register_operand" "=x,v,v,x")
>> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8a.c 
>> b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
>> index 911b14d4a25..544a14db6f7 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr100865-8a.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
>> @@ -20,5 +20,5 @@ foo (void)
>> array[i] = MK_CONST128_BROADCAST_SIGNED (-45);
>> }
>> 
>> -/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
>> \]+\[^\n\]*, %xmm\[0-9\]+" 1 { xfail *-*-* } } } */
>> +/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
>> \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
>> /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } 
>> */
>> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8c.c 
>> b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
>> index 00682edb8c9..efee0488614 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr100865-8c.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
>> @@ -3,5 +3,5 @@
>> 
>> #include "pr100865-8a.c"
>> 
>> -/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, 
>> %xmm\[0-9\]+" 1 { xfail *-*-* } } } */
>> +/* { dg-final { scan-assembler-times "vpshufd\[\\t \]+\[^\n\]*, 
>> %xmm\[0-9\]+" 1 } } */
>> /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } } 
>> */
>> diff --git a/gcc/testsuite/gcc.target/i386/pr1008

[pushed] Darwin, libgcc: Fix build errors on powerpc-darwin8.

2022-03-04 Thread Iain Sandoe via Gcc-patches
PowerPC Darwin8 is the last version to use an unwind frame fallback routine.
This had been omitted from the new shared EH library, along with one more
header dependency that only fires there.

tested on x86_64-darwin18, powerpc-darwin9 and cross to powerpc-darwin8
pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

libgcc/ChangeLog:

* config/rs6000/t-darwin-ehs: Add darwin-fallback.o.
* config/t-darwin-ehs: Add dependency on unwind.h.
---
 libgcc/config/rs6000/t-darwin-ehs | 4 ++--
 libgcc/config/t-darwin-ehs| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgcc/config/rs6000/t-darwin-ehs 
b/libgcc/config/rs6000/t-darwin-ehs
index 42f521411af..581344e862a 100644
--- a/libgcc/config/rs6000/t-darwin-ehs
+++ b/libgcc/config/rs6000/t-darwin-ehs
@@ -1,3 +1,3 @@
-# We need the save_world code for the EH library.
+# We need the save_world and anu unwind fallback code for the EH library.
 
-LIBEHSOBJS += darwin-world_s.o
+LIBEHSOBJS += darwin-world_s.o darwin-fallback.o
diff --git a/libgcc/config/t-darwin-ehs b/libgcc/config/t-darwin-ehs
index 95275023dac..df46f8a6529 100644
--- a/libgcc/config/t-darwin-ehs
+++ b/libgcc/config/t-darwin-ehs
@@ -3,5 +3,5 @@
 
 LIBEHSOBJS = unwind-dw2_s.o unwind-dw2-fde-darwin_s.o unwind-c_s.o
 
-unwind-dw2_s.o: gthr-default.h md-unwind-support.h
+unwind-dw2_s.o: gthr-default.h md-unwind-support.h unwind.h
 $(LIBEHSOBJS): libgcc_tm.h
-- 
2.24.3 (Apple Git-128)



[pushed] Darwin: Fix a type mismatch warning for a non-GCC bootstrap compiler.

2022-03-04 Thread Iain Sandoe via Gcc-patches
DECL_MD_FUNCTION_CODE() returns an int, on one particular compiler the
code in darwin_fold_builtin() triggers a warning.

Fixed thus.

tested on x86_64-darwin18, powerpc-darwin9, cross to powerpc-darwin8.
pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* config/darwin.cc (darwin_fold_builtin): Make fcode an int to
avoid a mismatch with DECL_MD_FUNCTION_CODE().
---
 gcc/config/darwin.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/darwin.cc b/gcc/config/darwin.cc
index 783fe3cb443..f065a13d73d 100644
--- a/gcc/config/darwin.cc
+++ b/gcc/config/darwin.cc
@@ -3621,7 +3621,7 @@ tree
 darwin_fold_builtin (tree fndecl, int n_args, tree *argp,
 bool ARG_UNUSED (ignore))
 {
-  unsigned int fcode = DECL_MD_FUNCTION_CODE (fndecl);
+  int fcode = DECL_MD_FUNCTION_CODE (fndecl);
 
   if (fcode == darwin_builtin_cfstring)
 {
-- 
2.24.3 (Apple Git-128)



Re: [PATCH] [i386] Optimize v4si broadcast for noavx512vl.

2022-03-04 Thread H.J. Lu via Gcc-patches
On Fri, Mar 4, 2022 at 8:40 AM Richard Biener via Gcc-patches
 wrote:
>
>
>
> > Am 04.03.2022 um 03:30 schrieb Hongtao Liu via Gcc-patches 
> > :
> >
> > On Fri, Mar 4, 2022 at 10:29 AM liuhongt via Gcc-patches
> >  wrote:
> >>
> >> This is incremental patch based on [1], it enables optimization as below
> >>
> >> -   vbroadcastss.LC1(%rip), %xmm0
> >> +   movl$-45, %edx
> >> +   vmovd   %edx, %xmm0
> >> +   vpshufd $0, %xmm0, %xmm0
> >>
> >> According to microbenchmark, it's faster than broadcast from memory
>
> Is that true even on AMD uarchs?

Please check TARGET_INTER_UNIT_MOVES_TO_VEC.

> >> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591162.html.
> >>
> >> Bootstrapped and regtest on x86_64-linux-gnu{-m32,}.
> >> Ok for trunk?
> >>
> >> gcc/ChangeLog:
> >>
> >>PR target/104704
> >>* config/i386/sse.md (*vec_dupv4si): Add alternative $r and
> >>corresponding post_reload splitter.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* gcc.target/i386/pr100865-8a.c: Adjust testcase.
> >>* gcc.target/i386/pr100865-8c.c: Ditto.
> >>* gcc.target/i386/pr100865-9c.c: Ditto.
> >> ---
> >> gcc/config/i386/sse.md  | 41 -
> >> gcc/testsuite/gcc.target/i386/pr100865-8a.c |  2 +-
> >> gcc/testsuite/gcc.target/i386/pr100865-8c.c |  2 +-
> >> gcc/testsuite/gcc.target/i386/pr100865-9c.c |  2 +-
> >> 4 files changed, 35 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> >> index 3066ea3734a..d124545aa5d 100644
> >> --- a/gcc/config/i386/sse.md
> >> +++ b/gcc/config/i386/sse.md
> >> @@ -25121,20 +25121,43 @@ (define_insn "vec_dupv4sf"
> >>(set_attr "mode" "V4SF")])
> >>
> >> (define_insn "*vec_dupv4si"
> >> -  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
> >> +  [(set (match_operand:V4SI 0 "register_operand" "=v,v,x,v")
> >>(vec_duplicate:V4SI
> >> - (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
> >> + (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0,$r")))]
> >>   "TARGET_SSE"
> >>   "@
> >>%vpshufd\t{$0, %1, %0|%0, %1, 0}
> >>vbroadcastss\t{%1, %0|%0, %1}
> >> -   shufps\t{$0, %0, %0|%0, %0, 0}"
> >> -  [(set_attr "isa" "sse2,avx,noavx")
> >> -   (set_attr "type" "sselog1,ssemov,sselog1")
> >> -   (set_attr "length_immediate" "1,0,1")
> >> -   (set_attr "prefix_extra" "0,1,*")
> >> -   (set_attr "prefix" "maybe_vex,maybe_evex,orig")
> >> -   (set_attr "mode" "TI,V4SF,V4SF")])
> >> +   shufps\t{$0, %0, %0|%0, %0, 0}
> >> +   #"
> >> +  [(set_attr "isa" "sse2,avx,noavx,noavx512vl")
> >> +   (set_attr "type" "sselog1,ssemov,sselog1,sselog1")
> >> +   (set_attr "length_immediate" "1,0,1,1")
> >> +   (set_attr "prefix_extra" "0,1,*,0")
> >> +   (set_attr "prefix" "maybe_vex,maybe_evex,orig,maybe_vex")
> >> +   (set_attr "mode" "TI,V4SF,V4SF,TI")
> >> +   (set (attr "preferred_for_speed")
> >> + (cond [(eq_attr "alternative" "3")
> >> + (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC")
> >> +  ]
> >> +  (symbol_ref "true")))])
> >> +
> >> +(define_split
> >> +  [(set (match_operand:V4SI 0 "sse_reg_operand")
> >> +   (vec_duplicate:V4SI
> >> + (match_operand:SI 1 "general_reg_operand")))]
> >> +  "TARGET_SSE && reload_completed
> >> +   /* Disable this splitter if avx512vl_vec_dup_gprv4si insn is
> >> +  available, because then we can broadcast from GPRs directly.  */
> >> +   && !TARGET_AVX512VL"
> >> +  [(const_int 0)]
> >> +{
> >> +  emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]),
> >> +   CONST0_RTX (V4SImode),
> >> +   gen_lowpart (SImode, operands[1])));
> >> +  emit_insn (gen_vec_duplicatev4si (operands[0], operands[0]));
> >> +  DONE;
> >> +})
> >>
> >> (define_insn "*vec_dupv2di"
> >>   [(set (match_operand:V2DI 0 "register_operand" "=x,v,v,x")
> >> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8a.c 
> >> b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> >> index 911b14d4a25..544a14db6f7 100644
> >> --- a/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> >> +++ b/gcc/testsuite/gcc.target/i386/pr100865-8a.c
> >> @@ -20,5 +20,5 @@ foo (void)
> >> array[i] = MK_CONST128_BROADCAST_SIGNED (-45);
> >> }
> >>
> >> -/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
> >> \]+\[^\n\]*, %xmm\[0-9\]+" 1 { xfail *-*-* } } } */
> >> +/* { dg-final { scan-assembler-times "(?:vpbroadcastd|vpshufd)\[\\t 
> >> \]+\[^\n\]*, %xmm\[0-9\]+" 1 } } */
> >> /* { dg-final { scan-assembler-times "vmovdqa\[\\t \]%xmm\[0-9\]+, " 16 } 
> >> } */
> >> diff --git a/gcc/testsuite/gcc.target/i386/pr100865-8c.c 
> >> b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> >> index 00682edb8c9..efee0488614 100644
> >> --- a/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> >> +++ b/gcc/testsuite/gcc.target/i386/pr100865-8c.c
> >> @@ -3,5 +3,5 @@
> >>
> >> #include "pr100865-8a.c"
> >>
>

[PATCH] rs6000: Improve .machine

2022-03-04 Thread Segher Boessenkool
Hi!

This adds more correct .machine for most older CPUs.  It should be
conservative in the sense that everything we handled before we handle at
least as well now.  This does not yet revamp the server CPU handling, it
is too risky at this point in time.

Tested on powerpc64-linux {-m32,-m64}.  Also manually tested with all
-mcpu=, and the output of that passed through the GNU assembler.

I plan to commit this later today.


Segher


2022-03-04  Segher Boessenkool  

* config/rs6000/rs6000.c (rs6000_machine_from_flags): Restructure a bit.
Handle most older CPUs.
---
 gcc/config/rs6000/rs6000.cc | 69 +++--
 1 file changed, 48 insertions(+), 21 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 78cc085d7855..f2b977bfe93c 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -5790,33 +5790,60 @@ const char *rs6000_machine;
 const char *
 rs6000_machine_from_flags (void)
 {
-  /* For some CPUs, the machine cannot be determined by ISA flags.  We have to
- check them first.  */
-  switch (rs6000_cpu)
-{
-case PROCESSOR_PPC8540:
-case PROCESSOR_PPC8548:
-  return "e500";
+  /* e300 and e500 */
+  if (rs6000_cpu == PROCESSOR_PPCE300C2 || rs6000_cpu == PROCESSOR_PPCE300C3)
+return "e300";
+  if (rs6000_cpu == PROCESSOR_PPC8540 || rs6000_cpu == PROCESSOR_PPC8548)
+return "e500";
+  if (rs6000_cpu == PROCESSOR_PPCE500MC)
+return "e500mc";
+  if (rs6000_cpu == PROCESSOR_PPCE500MC64)
+return "e500mc64";
+  if (rs6000_cpu == PROCESSOR_PPCE5500)
+return "e5500";
+  if (rs6000_cpu == PROCESSOR_PPCE6500)
+return "e6500";
 
-case PROCESSOR_PPCE300C2:
-case PROCESSOR_PPCE300C3:
-  return "e300";
+  /* 400 series */
+  if (rs6000_cpu == PROCESSOR_PPC403)
+return "\"403\"";
+  if (rs6000_cpu == PROCESSOR_PPC405)
+return "\"405\"";
+  if (rs6000_cpu == PROCESSOR_PPC440)
+return "\"440\"";
+  if (rs6000_cpu == PROCESSOR_PPC476)
+return "\"476\"";
 
-case PROCESSOR_PPCE500MC:
-  return "e500mc";
+  /* A2 */
+  if (rs6000_cpu == PROCESSOR_PPCA2)
+return "a2";
 
-case PROCESSOR_PPCE500MC64:
-  return "e500mc64";
+  /* Cell BE */
+  if (rs6000_cpu == PROCESSOR_CELL)
+return "cell";
 
-case PROCESSOR_PPCE5500:
-  return "e5500";
+  /* Titan */
+  if (rs6000_cpu == PROCESSOR_TITAN)
+return "titan";
 
-case PROCESSOR_PPCE6500:
-  return "e6500";
+  /* 500 series and 800 series */
+  if (rs6000_cpu == PROCESSOR_MPCCORE)
+return "\"821\"";
 
-default:
-  break;
-}
+  /* 600 series and 700 series, "classic" */
+  if (rs6000_cpu == PROCESSOR_PPC601 || rs6000_cpu == PROCESSOR_PPC603
+  || rs6000_cpu == PROCESSOR_PPC604 || rs6000_cpu == PROCESSOR_PPC604e
+  || rs6000_cpu == PROCESSOR_PPC750 || rs6000_cpu == PROCESSOR_POWERPC)
+return "ppc";
+
+  /* Classic with AltiVec, "G4" */
+  if (rs6000_cpu == PROCESSOR_PPC7400 || rs6000_cpu == PROCESSOR_PPC7450)
+return "\"7450\"";
+
+  /* The older 64-bit CPUs */
+  if (rs6000_cpu == PROCESSOR_PPC620 || rs6000_cpu == PROCESSOR_PPC630
+  || rs6000_cpu == PROCESSOR_RS64A || rs6000_cpu == PROCESSOR_POWERPC64)
+return "ppc64";
 
   HOST_WIDE_INT flags = rs6000_isa_flags;
 
-- 
1.8.3.1



Re: [PATCH] libstdc++: vxworks: remove stray include

2022-03-04 Thread Olivier Hainque via Gcc-patches
Good for me, thanks Rasmus.


> On 4 Mar 2022, at 09:27, Rasmus Villemoes  wrote:
> 
> There doesn't seem to be any reason for this TU to include
> , and it causes errors when the resulting libstdc++ is used
> on our VxWorks 5.5 target - presumably because now libstdc++ itself
> contains an instance of std::ios_base::Init. Which should be mostly
> harmless, but apparently isn't, and from a QoI viewpoint should
> probably be avoided anyway.
> ---
> libstdc++-v3/config/locale/vxworks/ctype_members.cc | 1 -
> 1 file changed, 1 deletion(-)
> 
> diff --git a/libstdc++-v3/config/locale/vxworks/ctype_members.cc 
> b/libstdc++-v3/config/locale/vxworks/ctype_members.cc
> index 82569d075c6..d8ca551078d 100644
> --- a/libstdc++-v3/config/locale/vxworks/ctype_members.cc
> +++ b/libstdc++-v3/config/locale/vxworks/ctype_members.cc
> @@ -33,7 +33,6 @@
> #include 
> #include 
> #include 
> -#include 
> 
> namespace std _GLIBCXX_VISIBILITY(default)
> {
> -- 
> 2.31.1
> 



libgo patch committed: Fix AIX build

2022-03-04 Thread Ian Lance Taylor via Gcc-patches
This patch by Clément Chigot fixes the build of libgo on AIX, which
was broken in the update to the Go 1.18 release.  Bootstrapped and ran
Go testsuite on x86_64-unknown-linux-gnu.  Committed to mainline.

Ian
98a7a7b5275b226932f503cc1dcc21fd9a9f8476
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 424bbebfeed..5cf2ace711b 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-45fd14ab8baf5e86012a808426f8ef52c1d77943
+34dece725f9f8826f4abe86209112626867bc716
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/internal/syscall/unix/ioctl_aix.go 
b/libgo/go/internal/syscall/unix/ioctl_aix.go
index af105d6158b..1a768048ba8 100644
--- a/libgo/go/internal/syscall/unix/ioctl_aix.go
+++ b/libgo/go/internal/syscall/unix/ioctl_aix.go
@@ -12,7 +12,7 @@ import (
 //extern __go_ioctl_ptr
 func ioctl(int32, int32, unsafe.Pointer) int32
 
-func Ioctl(fd int, cmd int, args uintptr) (err error) {
+func Ioctl(fd int, cmd int, args unsafe.Pointer) (err error) {
if ioctl(int32(fd), int32(cmd), unsafe.Pointer(args)) < 0 {
return syscall.GetErrno()
}
diff --git a/libgo/go/os/user/listgroups_unix.go 
b/libgo/go/os/user/listgroups_unix.go
index b3cf839b3ec..af9b544bcbe 100644
--- a/libgo/go/os/user/listgroups_unix.go
+++ b/libgo/go/os/user/listgroups_unix.go
@@ -14,7 +14,6 @@ import (
"io"
"os"
"strconv"
-   "syscall"
 )
 
 const groupFile = "/etc/group"
diff --git a/libgo/go/runtime/malloc.go b/libgo/go/runtime/malloc.go
index e5ab8dedafa..7c019ee42d3 100644
--- a/libgo/go/runtime/malloc.go
+++ b/libgo/go/runtime/malloc.go
@@ -321,7 +321,7 @@ const (
//
// On other platforms, the user address space is contiguous
// and starts at 0, so no offset is necessary.
-   arenaBaseOffset = 0x8000*goarch.IsAmd64 + 
0x0a00*goos.IsAix
+   arenaBaseOffset = 0x8000*goarch.IsAmd64 + 
0x0a00*goos.IsAix*goarch.IsPpc64
// A typed version of this constant that will make it into DWARF (for 
viewcore).
arenaBaseOffsetUintptr = uintptr(arenaBaseOffset)
 
diff --git a/libgo/go/runtime/os_aix.go b/libgo/go/runtime/os_aix.go
index d43765ab884..943cd2205d1 100644
--- a/libgo/go/runtime/os_aix.go
+++ b/libgo/go/runtime/os_aix.go
@@ -7,7 +7,6 @@
 package runtime
 
 import (
-   "internal/abi"
"unsafe"
 )
 


[PATCH] c++: detecting copy-init context during CTAD [PR102137]

2022-03-04 Thread Patrick Palka via Gcc-patches
Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
flag), which causes do_class_deduction to always consider explicit
deduction guides when performing CTAD for a templated variable initializer.

We could fix this by passing LOOKUP_ONLYCONVERTING appropriately when
calling cp_finish_decl from tsubst_expr, but it seems do_class_deduction
can determine if we're in a copy-init context by simply inspecting the
initializer, and thus render its flags parameter unnecessary, which is
what this patch implements.  (If we were to fix this in tsubst_expr
instead, I think we'd have to inspect the initializer in the same way
in order to detect a copy-init context?)

Bootstrapped and regtestd on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/102137

gcc/cp/ChangeLog:

* cp-tree.h (do_auto_deduction): Remove flags parameter.
* decl.cc (cp_finish_decl): Adjust call to do_auto_deduction.
* pt.cc (convert_template_argument): Likewise.
(do_class_deduction): Remove flags parameter and instead
determine if we're in a copy-init context by inspecting the
initializer.
(do_auto_deduction): Adjust call to do_class_deduction.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction108.C: New test.
---
 gcc/cp/cp-tree.h  |  3 +-
 gcc/cp/decl.cc|  2 +-
 gcc/cp/pt.cc  | 23 +++--
 .../g++.dg/cpp1z/class-deduction108.C | 32 +++
 4 files changed, 48 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction108.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ac723901098..c2ef6544389 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7279,8 +7279,7 @@ extern tree do_auto_deduction   (tree, 
tree, tree,
 = tf_warning_or_error,
  auto_deduction_context
 = adc_unspecified,
-tree = NULL_TREE,
-int = LOOKUP_NORMAL);
+tree = NULL_TREE);
 extern tree type_uses_auto (tree);
 extern tree type_uses_auto_or_concept  (tree);
 extern void append_type_to_template_for_access_check (tree, tree, tree,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 199ac768d43..152f657e9f2 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8039,7 +8039,7 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
outer_targs = DECL_TI_ARGS (decl);
   type = TREE_TYPE (decl) = do_auto_deduction (type, d_init, auto_node,
   tf_warning_or_error, adc,
-  outer_targs, flags);
+  outer_targs);
   if (type == error_mark_node)
return;
   if (TREE_CODE (type) == FUNCTION_TYPE)
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d94d4538faa..66fc8cacdc6 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -8567,8 +8567,7 @@ convert_template_argument (tree parm,
   can happen in the context of -fnew-ttp-matching.  */;
   else if (tree a = type_uses_auto (t))
{
- t = do_auto_deduction (t, arg, a, complain, adc_unify, args,
-LOOKUP_IMPLICIT);
+ t = do_auto_deduction (t, arg, a, complain, adc_unify, args);
  if (t == error_mark_node)
return error_mark_node;
}
@@ -29832,8 +29831,7 @@ ctad_template_p (tree tmpl)
type.  */
 
 static tree
-do_class_deduction (tree ptype, tree tmpl, tree init,
-   int flags, tsubst_flags_t complain)
+do_class_deduction (tree ptype, tree tmpl, tree init, tsubst_flags_t complain)
 {
   /* We should have handled this in the caller.  */
   if (DECL_TEMPLATE_TEMPLATE_PARM_P (tmpl))
@@ -29881,6 +29879,13 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
   if (type_dependent_expression_p (init))
 return ptype;
 
+  bool copy_init_p = true;
+  if (!init
+  || TREE_CODE (init) == TREE_LIST
+  || (BRACE_ENCLOSED_INITIALIZER_P (init)
+ && CONSTRUCTOR_IS_DIRECT_INIT (init)))
+copy_init_p = false;
+
   tree type = TREE_TYPE (tmpl);
 
   bool try_list_ctor = false;
@@ -29929,7 +29934,7 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
   /* Prune explicit deduction guides in copy-initialization context (but
  not copy-list-initialization).  */
   bool elided = false;
-  if (!list_init_p && (flags & LOOKUP_ONLYCONVERTING))
+  if (!list_init_p && copy_init_p)
 {
   for (lkp_iterator iter (cands); !elided && iter; ++iter)
if (DECL_NONCONVERTING_P (STRIP_TEMPLATE (*i

libgo patch committed: move semaphore to gotool packages

2022-03-04 Thread Ian Lance Taylor via Gcc-patches
This patch by Clément Chigot moves  golang.org/x/sync/semaphore from
the libgo packages to the gotools packages, since it is only used by
gofmt.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
e71079517f16fee6759bad2be14f574c3548743e
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 5cf2ace711b..7778cd91235 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-34dece725f9f8826f4abe86209112626867bc716
+943b95876ca0f14c3cea7067d33170ba76cf0fab
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/gotool-packages.txt b/libgo/gotool-packages.txt
index 78ce9ba602a..8e105030a7b 100644
--- a/libgo/gotool-packages.txt
+++ b/libgo/gotool-packages.txt
@@ -63,6 +63,7 @@ golang.org/x/mod/sumdb/dirhash
 golang.org/x/mod/sumdb/note
 golang.org/x/mod/sumdb/tlog
 golang.org/x/mod/zip
+golang.org/x/sync/semaphore
 golang.org/x/tools/go/analysis
 golang.org/x/tools/go/analysis/internal/analysisflags
 golang.org/x/tools/go/analysis/internal/facts
diff --git a/libgo/libgo-packages.txt b/libgo/libgo-packages.txt
index cb2f19d61b3..6b722e1c3ec 100644
--- a/libgo/libgo-packages.txt
+++ b/libgo/libgo-packages.txt
@@ -95,7 +95,6 @@ golang.org/x/net/http/httpproxy
 golang.org/x/net/http2/hpack
 golang.org/x/net/idna
 golang.org/x/net/nettest
-golang.org/x/sync/semaphore
 golang.org/x/sys/cpu
 golang.org/x/text/secure/bidirule
 golang.org/x/text/transform


libgo patch committed: Skip _FILE in mkruntimeinc

2022-03-04 Thread Ian Lance Taylor via Gcc-patches
This libgo patch skips the _FILE struct in mkruntimeinc.sh.  We don't
need it, and it breaks uclibc.  This should fix GCC PR 101246.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
59a20b189dcbda8d929503ae1b1f864535a27584
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 7778cd91235..e68d2d967cc 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-943b95876ca0f14c3cea7067d33170ba76cf0fab
+787fd4475f9d9101bc138d0b9763b0f5ecca89a9
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/mkruntimeinc.sh b/libgo/mkruntimeinc.sh
index 61d830af876..5ef3eca25cc 100755
--- a/libgo/mkruntimeinc.sh
+++ b/libgo/mkruntimeinc.sh
@@ -18,13 +18,14 @@ rm -f runtime.inc.tmp2 runtime.inc.tmp3
 # sigset conflicts with system type sigset on AIX, so we need to rename it.
 # boundsError has a field name that is a C keyword, and we don't need it.
 # mSpanInuse is both a constant and a field name, and we don't need it.
+# _FILE has incomplete __lock and __state fields on uclibc-ng.
 
 grep -v "#define _" ${IN} | grep -v "#define [cm][012345] " | grep -v "#define 
empty " | grep -v "#define \\$" | grep -v "#define mSpanInUse " > 
runtime.inc.tmp2
 for pattern in '_[GP][a-z]' _Max _Lock _Sig _Trace _MHeap _Num
 do
   grep "#define $pattern" ${IN} >> runtime.inc.tmp2
 done
-TYPES="_Complex_lock _Reader_lock semt boundsError"
+TYPES="_Complex_lock _Reader_lock semt boundsError _FILE"
 for TYPE in $TYPES
 do
   sed -e '/struct '${TYPE}' {/,/^}/s/^.*$//' runtime.inc.tmp2 > 
runtime.inc.tmp3;


[committed] analyzer: reduce svalue depth limit from 13 to 12 [PR103521]

2022-03-04 Thread David Malcolm via Gcc-patches
PR analyzer/103521 reports that commit 
r12-5585-g132902177138c09803d639e12b1daebf2b9edddc
("analyzer: further false leak fixes due to overzealous state merging 
[PR103217]")
led to failures of gcc.dg/analyzer/pr93032-mztools.c on some targets,
where rather than reporting FILE * leaks, the analyzer would hit
complexity limits and give up.

The cause is that pr93032-mztools.c has some 'unsigned char' values that
are copied to 'char'.  On targets where 'char' defaults to being signed,
this leads to casts, whereas on targets where 'char' defaults to being
unsigned, no casts are needed.

When the casts occur, various symbolic values within the loop (the
locals 'crc', 'cpsize', and 'uncpsize') become sufficiently complex as
to hit the --param=analyzer-max-svalue-depth= limit, and are treated as
UNKNOWN, allowing the analysis of the loop to quickly terminate, with
much of this state as UNKNOWN (but retaining the FILE * information, and
thus correctly reporting the FILE * leaks).

Without the casts, the symbolic values for these variables don't quite
hit the complexity limit, and the analyzer attempts to track these
values in the loop, leading to the analyzer eventually hitting the
per-program-point limit on the number of states, and giving up on
these execution paths, thus failing to report the FILE * leaks.

This patch tweaks the default value of the param:
  --param=analyzer-max-svalue-depth=.
from 13 down to 12.  This allows the pr93032-mztools.c testcase to
succeeed with both -fsigned-char and -funsigned-char, and thus allows
this integration test to succeed on both styles of target without
requiring extra command-line flags.  The patch duplicates the test so
it runs with both -fsigned-char and -funsigned-char.

My hope is that this will allow similar cases to terminate loop analysis
earlier.  I tried reducing it further, but doing so caused some test
cases to regress.

The tradeoff here is between:
(a) precision of individual states in the analysis, versus
(b) maximizing code-path coverage in the analysis

I can imagine a more nuanced approach that splits the current
per-program-point hard limit into soft and hard limits: on hitting the
soft limit at a program point, go into a less precise mode for states
at that program point, in the hope that we can fully explore execution
paths beyond it without hitting the hard limit, but this seems like
GCC 13 material.

Another possible future fix might be for the analysis plan to make an
attempt to prioritize parts of the code in an enode budget, rather than
setting the same hard limit uniformly across all program points.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Lightly tested by hand with --target=powerpc64le-linux-gnu.
Pushed to trunk as r12-7494-g458ad38ce2bbec85016d88757ec6a35d2c393e2c.
Please let me know if this regresses anything on the various configurations.

gcc/analyzer/ChangeLog:
PR analyzer/103521
* analyzer.opt (-param=analyzer-max-svalue-depth=): Reduce from 13
to 12.

gcc/testsuite/ChangeLog:
PR analyzer/103521
* gcc.dg/analyzer/pr93032-mztools.c: Move to...
* gcc.dg/analyzer/pr93032-mztools-signed-char.c: ...this, adding
-fsigned-char to args, and...
* gcc.dg/analyzer/pr93032-mztools-unsigned-char.c: ...copy to here,
adding -funsigned-char to args.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/analyzer.opt |   2 +-
 ...ztools.c => pr93032-mztools-signed-char.c} |   1 +
 .../analyzer/pr93032-mztools-unsigned-char.c  | 332 ++
 3 files changed, 334 insertions(+), 1 deletion(-)
 rename gcc/testsuite/gcc.dg/analyzer/{pr93032-mztools.c => 
pr93032-mztools-signed-char.c} (99%)
 create mode 100644 
gcc/testsuite/gcc.dg/analyzer/pr93032-mztools-unsigned-char.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index d20ac5c34ee..b9d2ece273c 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -43,7 +43,7 @@ Common Joined UInteger 
Var(param_analyzer_max_recursion_depth) Init(2) Param
 The maximum number of times a callsite can appear in a call stack within the 
analyzer, before terminating analysis of a call that would recurse deeper.
 
 -param=analyzer-max-svalue-depth=
-Common Joined UInteger Var(param_analyzer_max_svalue_depth) Init(13) Param
+Common Joined UInteger Var(param_analyzer_max_svalue_depth) Init(12) Param
 The maximum depth of a symbolic value, before approximating the value as 
unknown.
 
 -param=analyzer-min-snodes-for-call-summary=
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr93032-mztools.c 
b/gcc/testsuite/gcc.dg/analyzer/pr93032-mztools-signed-char.c
similarity index 99%
rename from gcc/testsuite/gcc.dg/analyzer/pr93032-mztools.c
rename to gcc/testsuite/gcc.dg/analyzer/pr93032-mztools-signed-char.c
index 88ab5bf8c18..1f3df7c211f 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr93032-mztools.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr93032-mztools-signed-char.c
@@ -4,6 +4,7 @@
 

[no subject]

2022-03-04 Thread Thomas Schwinge
Hi!

On 2022-03-04T14:46:25+0100, I wrote:
> Pushed to master branch commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
> "OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
> [PR100280, PR104132, PR104133]", see attached.

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
> [...]
> @@ -27,8 +31,12 @@ int main()
>(volatile int *) &a;
>  #define N 123
>int b[N] = { 0 };
> +  unsigned long long f1;
> +  /*TODO See above.  */
> +  (volatile void *) &f1;

Ah, the famous last-minute change just before 'git push'...  To work
around execution failure with GCN offloading, we're explicitly making
'f1' addressable here -- but I didn't realize that this also affects
diagnostics, sorry.

Pushed to master branch commit 14dfbb53594e164fe222476523a68039a8bd5252
"Fix 'libgomp.oacc-c-c++-common/kernels-decompose-1.c' expected
diagnostics", see attached.


Grüße
 Thomas


>
>  #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
> +  /* { dg-note {variable 'g2\.0' declared in block isn't candidate for 
> adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 
> l_compute$c_compute } */
>{
> [...]
> +/* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} 
> {} { target *-*-* } .+1 } */
> +  f1 = 1;
> +  /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 
> 'parloops' for analysis} {} { target *-*-* } .+1 } */
> +#pragma acc loop /* { dg-line l_loop_c[incr c_loop_c] } */
> +  /* { dg-note {variable 'c' in 'private' clause is candidate for 
> adjusting OpenACC privatization level} {} { target *-*-* } l_loop_c$c_loop_c 
> } */
> +  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target 
> *-*-* } l_loop_c$c_loop_c } */
> +  for (c = 20; c > 0; --c)
> + f1 *= c;
> +
> +  /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} 
> {} { target *-*-* } .+1 } */
> +  if (c != 234)
> + __builtin_abort ();
> +  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target 
> *-*-* } l_compute$c_compute } */
> +}
>}
> [...]
> +  assert (f1 == 243290200817664ULL);
>
>return 0;
>  }


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 14dfbb53594e164fe222476523a68039a8bd5252 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 4 Mar 2022 20:34:40 +0100
Subject: [PATCH] Fix 'libgomp.oacc-c-c++-common/kernels-decompose-1.c'
 expected diagnostics

Fix-up for recent commit 8935589b496f755e08cadf26d8ceddf0dd6e0968
"OMP lowering: Regimplify 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' DECLs
[PR100280, PR104132, PR104133]": adjust for a GCN offloading workaround
added just before commit: '(volatile void *) &f1;'.

	PR testsuite/104791
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Fix
	expected diagnostics.
---
 .../testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c   | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index 049b3a44b03..985a547d381 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -37,6 +37,8 @@ int main()
 
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
   /* { dg-note {variable 'g2\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'f1\.1' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'f1\.2' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
   {
 /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
 int c = 234;
-- 
2.25.1



[PATCH] c++: Standard mangling abbreviations & modules

2022-03-04 Thread Nathan Sidwell


The std manglings for things like std::string should not apply if
we're not in the global module.

nathan

--
Nathan SidwellFrom 591d2130348b15ec9158bb69a7fd9442bb81fa3a Mon Sep 17 00:00:00 2001
From: Nathan Sidwell 
Date: Wed, 2 Mar 2022 19:42:23 -0500
Subject: [PATCH] c++: Standard mangling abbreviations & modules

The std manglings for things like std::string should not apply if
we're not in the global module.

gcc/cp/
	* mangle.cc (is_std_substitution): Check global module.
	(is_std_substitution_char): Return bool.
gcc/testsuite/
	* g++.dg/modules/std-subst-2.C: New.
	* g++.dg/modules/std-subst-3.C: New.
	* g++.dg/modules/std-subst-4_a.C: New.
	* g++.dg/modules/std-subst-4_b.C: New.
	* g++.dg/modules/std-subst-4_c.C: New.
---
 gcc/cp/mangle.cc | 32 +++---
 gcc/testsuite/g++.dg/modules/std-subst-2.C   | 13 
 gcc/testsuite/g++.dg/modules/std-subst-3.C   | 34 
 gcc/testsuite/g++.dg/modules/std-subst-4_a.C | 14 
 gcc/testsuite/g++.dg/modules/std-subst-4_b.C | 14 
 gcc/testsuite/g++.dg/modules/std-subst-4_c.C | 16 +
 6 files changed, 112 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/std-subst-2.C
 create mode 100644 gcc/testsuite/g++.dg/modules/std-subst-3.C
 create mode 100644 gcc/testsuite/g++.dg/modules/std-subst-4_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/std-subst-4_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/std-subst-4_c.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 6657ce4d983..dbcec0a55bc 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -180,9 +180,9 @@ static tree maybe_template_info (const tree);
 
 static inline tree canonicalize_for_substitution (tree);
 static void add_substitution (tree);
-static inline int is_std_substitution (const tree,
+static inline bool is_std_substitution (const tree,
    const substitution_identifier_index_t);
-static inline int is_std_substitution_char (const tree,
+static inline bool is_std_substitution_char (const tree,
 	const substitution_identifier_index_t);
 static int find_substitution (tree);
 static void mangle_call_offset (const tree, const tree);
@@ -467,9 +467,10 @@ add_substitution (tree node)
 
 /* Helper function for find_substitution.  Returns nonzero if NODE,
which may be a decl or a CLASS_TYPE, is a template-id with template
-   name of substitution_index[INDEX] in the ::std namespace.  */
+   name of substitution_index[INDEX] in the ::std namespace, with
+   global module attachment.  */
 
-static inline int
+static bool
 is_std_substitution (const tree node,
 		 const substitution_identifier_index_t index)
 {
@@ -488,13 +489,22 @@ is_std_substitution (const tree node,
 }
   else
 /* These are not the droids you're looking for.  */
-return 0;
+return false;
 
-  return (DECL_NAMESPACE_STD_P (CP_DECL_CONTEXT (decl))
-	  && TYPE_LANG_SPECIFIC (type)
-	  && TYPE_TEMPLATE_INFO (type)
-	  && (DECL_NAME (TYPE_TI_TEMPLATE (type))
-	  == subst_identifiers[index]));
+  if (!DECL_NAMESPACE_STD_P (CP_DECL_CONTEXT (decl)))
+return false;
+
+  if (!(TYPE_LANG_SPECIFIC (type) && TYPE_TEMPLATE_INFO (type)))
+return false;
+
+  tree tmpl = TYPE_TI_TEMPLATE (type);
+  if (DECL_NAME (tmpl) != subst_identifiers[index])
+return false;
+
+  if (modules_p () && get_originating_module (tmpl, true) >= 0)
+return false;
+
+  return true;
 }
 
 /* Return the ABI tags (the TREE_VALUE of the "abi_tag" attribute entry) for T,
@@ -526,7 +536,7 @@ get_abi_tags (tree t)
::std::identifier, where identifier is
substitution_index[INDEX].  */
 
-static inline int
+static bool
 is_std_substitution_char (const tree node,
 			  const substitution_identifier_index_t index)
 {
diff --git a/gcc/testsuite/g++.dg/modules/std-subst-2.C b/gcc/testsuite/g++.dg/modules/std-subst-2.C
new file mode 100644
index 000..e7c77063a93
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/std-subst-2.C
@@ -0,0 +1,13 @@
+// { dg-additional-options "-fmodules-ts" }
+export module FOO;
+// { dg-module-cmi FOO }
+namespace Outer {
+class Y;
+class Inner {
+  class X;
+  void Fn (X &, Y &); // #2
+};
+void Inner::Fn (X &, Y &) {}
+}
+
+// { dg-final { scan-assembler {_ZN5OuterW3FOO5Inner2FnERNS1_1XERNS_S0_1YE:} } }
diff --git a/gcc/testsuite/g++.dg/modules/std-subst-3.C b/gcc/testsuite/g++.dg/modules/std-subst-3.C
new file mode 100644
index 000..75b81acf2f6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/std-subst-3.C
@@ -0,0 +1,34 @@
+// { dg-additional-options "-fmodules-ts -Wno-pedantic" }
+
+module;
+# 5 __FILE__ 1
+class Pooh;
+class Piglet;
+# 8 "" 2
+
+export module std; // might happen, you can't say it won't!
+// { dg-module-cmi std }
+
+namespace std {
+export template class allocator {
+// just for testing, not real!
+void M (T *);
+template  U *N (T *);
+};
+
+template void allocator::M (T *) {}
+template template U *allocator::N (T *) {
+return nullptr;
+

Re: [PATCH v8 00/12] Add LoongArch support.

2022-03-04 Thread Xi Ruoyao via Gcc-patches
On Fri, 2022-03-04 at 15:17 +0800, xucheng...@loongson.cn wrote:

> The binutils has been merged into trunk:
> https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=560b3fe208255ae909b4b1c88ba9c28b09043307
> 
> Note: We split -mabi= into -mabi=lp64d/f/s, the new options not support by 
> upstream binutils yet,
> this GCC port requires the following patch applied to binutils to build.
> https://github.com/loongson/binutils-gdb/commit/aacb0bf860f02aa5a7dcb76dd0e392bf871c7586
> (will be submitted to upstream after gcc side comfirmed)

I think you don't need a review for binutils change here.  You should
get it reviewed and applied in binutils-gdb ASAP.  Then in install.texi
you would add a note like "loongarch64-*-* requires binutils >= 2.39" in
"Target specific installation notes", as an unpatched 2.38 does not
work.

And based on the history of RISC-V port
(https://gcc.gnu.org/pipermail/gcc/2017-January/222595.html) the process
for a new port seems:

1. Get a permission from the Steering Committee.
2. Add one or two port maintainers into MAINTAINERS file.
3. Now the technical reviewing of the patch series just begin.


I'm not an expert in software engineering (or social interaction :) and
I don't know if the process has been changed in these years.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH, V2] Optimize signed DImode -> TImode on power10, PR target/104698

2022-03-04 Thread Michael Meissner via Gcc-patches
On Wed, Mar 02, 2022 at 06:47:39PM -0600, Segher Boessenkool wrote:
> On Wed, Mar 02, 2022 at 03:54:29PM -0500, Michael Meissner wrote:
> > Optimize signed DImode -> TImode on power10.
> 
> > On power10, GCC tries to optimize the signed conversion from DImode to
> > TImode by using the vextsd2q instruction.  However to generate this
> > instruction, it would have to generate 3 direct moves (1 from the GPR
> > registers to the altivec registers, and 2 from the altivec registers to
> > the GPR register).
> > 
> > This patch generates the shift right immediate instruction to do the
> > conversion if the target/source registers ares GPR registers like it does
> > on earlier systems.  If the target/source registers are Altivec registers,
> > it will generate the vextsd2q instruction.
> 
> > PR target/104698
> > * config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete.
> > (extendditi2): Convert from define_expand to
> > define_insn_and_split.  Replace with code to deal with both GPR
> > registers and with altivec registers.
> > 
> > gcc/testsuite/
> > PR target/104698
> > * gcc.target/powerpc/pr104698-1.c: New test.
> > * gcc.target/powerpc/pr104698-2.c: New test.
> 
> > +;; Sign extend DI to TI.  We provide both GPR targets and Altivec targets 
> > on
> > +;; power10.  On earlier systems, the machine independent code will 
> > generate a
> > +;; shift left to sign extend the 64-bit value to 128-bit.
> > +;;
> > +;; If the register allocator prefers to use GPR registers, we will use a 
> > shift
> > +;; left instruction to sign extend the 64-bit value to 128-bit.
> > +;;
> > +;; If the register allocator prefers to use Altivec registers on power10,
> > +;; generate the vextsd2q instruction.
> > +(define_insn_and_split "extendditi2"
> > +  [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v")
> > +   (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z")))
> > +   (clobber (reg:DI CA_REGNO))]
> > +  "TARGET_POWERPC64 && TARGET_POWER10"
> 
> What happens with -m32 -m{no,}-powerpc64?

The __int128_t and __uint128_t types are not defined in 32-bit.  So you would
never get a DImode to TImode conversion.

> > +  "#"
> > +  "&& reload_completed"
> > +  [(pc)]
> > +{
> > +  rtx dest = operands[0];
> > +  rtx src = operands[1];
> > +  int dest_regno = reg_or_subregno (dest);
> > +
> > +  /* Handle conversion to GPR registers.  Load up the low part and then do
> > + a sign extension to the upper part.  */
> > +  if (INT_REGNO_P (dest_regno))
> > +{
> > +  rtx dest_hi = gen_highpart (DImode, dest);
> > +  rtx dest_lo = gen_lowpart (DImode, dest);
> > +
> > +  emit_move_insn (dest_lo, src);
> > +  emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63)));
> 
> Please use src instead of dest_lo.  This always works, because you did
> the low-part move first.

Ok.

> > +  DONE;
> > +}
> > +
> > +  /* For conversion to an Altivec register, generate either a splat 
> > operation
> > + or a load rightmost double word instruction.  Both instructions gets 
> > the
> > + DImode value into the lower 64 bits, and then do the vextsd2q
> > + instruction.  */
> > + 
> 
> (trailing whitespace)

Ok.

> > +  else if (ALTIVEC_REGNO_P (dest_regno))
> > +{
> > +  if (MEM_P (src))
> > +   emit_insn (gen_vsx_lxvrdx (dest, src));
> > +  else
> > +   {
> > + rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
> > + emit_insn (gen_vsx_splat_v2di (dest_v2di, src));
> > +   }
> > +
> > +  emit_insn (gen_extendditi2_vector (dest, dest));
> > +  DONE;
> > +}
> 
> This patch needs testing on BE (and 32-bit as well of course).

Will do, but for 32-bit, it will be a NOP.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] c++: detecting copy-init context during CTAD [PR102137]

2022-03-04 Thread Jason Merrill via Gcc-patches

On 3/4/22 14:24, Patrick Palka wrote:

Here we're failing to communicate to cp_finish_decl from tsubst_expr
that we're in a copy-initialization context (via the LOOKUP_ONLYCONVERTING
flag), which causes do_class_deduction to always consider explicit
deduction guides when performing CTAD for a templated variable initializer.

We could fix this by passing LOOKUP_ONLYCONVERTING appropriately when
calling cp_finish_decl from tsubst_expr, but it seems do_class_deduction
can determine if we're in a copy-init context by simply inspecting the
initializer, and thus render its flags parameter unnecessary, which is
what this patch implements.  (If we were to fix this in tsubst_expr
instead, I think we'd have to inspect the initializer in the same way
in order to detect a copy-init context?)


Hmm, does this affect conversions as well?

Looks like it does:

struct A
{
  explicit operator int();
};

template  void f()
{
  T t = A();
}

int main()
{
  f(); // wrongly accepted
}

The reverse, initializing via an explicit constructor, is caught by code 
in build_aggr_init much like the code your patch adds to 
do_auto_deduction; perhaps we should move/copy that code to cp_finish_decl?



Bootstrapped and regtestd on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/102137

gcc/cp/ChangeLog:

* cp-tree.h (do_auto_deduction): Remove flags parameter.
* decl.cc (cp_finish_decl): Adjust call to do_auto_deduction.
* pt.cc (convert_template_argument): Likewise.
(do_class_deduction): Remove flags parameter and instead
determine if we're in a copy-init context by inspecting the
initializer.
(do_auto_deduction): Adjust call to do_class_deduction.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction108.C: New test.
---
  gcc/cp/cp-tree.h  |  3 +-
  gcc/cp/decl.cc|  2 +-
  gcc/cp/pt.cc  | 23 +++--
  .../g++.dg/cpp1z/class-deduction108.C | 32 +++
  4 files changed, 48 insertions(+), 12 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction108.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ac723901098..c2ef6544389 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7279,8 +7279,7 @@ extern tree do_auto_deduction   (tree, 
tree, tree,
 = tf_warning_or_error,
   auto_deduction_context
 = adc_unspecified,
-tree = NULL_TREE,
-int = LOOKUP_NORMAL);
+tree = NULL_TREE);
  extern tree type_uses_auto(tree);
  extern tree type_uses_auto_or_concept (tree);
  extern void append_type_to_template_for_access_check (tree, tree, tree,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 199ac768d43..152f657e9f2 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8039,7 +8039,7 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
outer_targs = DECL_TI_ARGS (decl);
type = TREE_TYPE (decl) = do_auto_deduction (type, d_init, auto_node,
   tf_warning_or_error, adc,
-  outer_targs, flags);
+  outer_targs);
if (type == error_mark_node)
return;
if (TREE_CODE (type) == FUNCTION_TYPE)
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d94d4538faa..66fc8cacdc6 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -8567,8 +8567,7 @@ convert_template_argument (tree parm,
   can happen in the context of -fnew-ttp-matching.  */;
else if (tree a = type_uses_auto (t))
{
- t = do_auto_deduction (t, arg, a, complain, adc_unify, args,
-LOOKUP_IMPLICIT);
+ t = do_auto_deduction (t, arg, a, complain, adc_unify, args);
  if (t == error_mark_node)
return error_mark_node;
}
@@ -29832,8 +29831,7 @@ ctad_template_p (tree tmpl)
 type.  */
  
  static tree

-do_class_deduction (tree ptype, tree tmpl, tree init,
-   int flags, tsubst_flags_t complain)
+do_class_deduction (tree ptype, tree tmpl, tree init, tsubst_flags_t complain)
  {
/* We should have handled this in the caller.  */
if (DECL_TEMPLATE_TEMPLATE_PARM_P (tmpl))
@@ -29881,6 +29879,13 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
if (type_dependent_expression_p (init))
  return ptype;
  
+  bool copy_init_p = true;

+  if (!init
+  || TREE_CODE (init) == TREE_LIST
+  || (BRACE_ENCLOSED_INITIALIZER_P (init)
+ && CONSTRUCTOR_IS_DIRECT_INIT (init)))
+copy_init_p = false;
+
tree type = TREE_TYPE (tmpl);
  

New German PO file for 'gcc' (version 12.1-b20220213)

2022-03-04 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

https://translationproject.org/latest/gcc/de.po

(This file, 'gcc-12.1-b20220213.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH, V2] Optimize signed DImode -> TImode on power10, PR target/104698

2022-03-04 Thread Michael Meissner via Gcc-patches
On Wed, Mar 02, 2022 at 06:47:39PM -0600, Segher Boessenkool wrote:
> Please use src instead of dest_lo.  This always works, because you did
> the low-part move first.

That doesn't work in the case where src is a memory operation.  Dest_lo is
guarantee to be a register, but src isn't.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH v3] configure: Implement --enable-host-pie

2022-03-04 Thread Marek Polacek via Gcc-patches
On Fri, Feb 25, 2022 at 12:04:32AM +, Joseph Myers wrote:
> On Thu, 24 Feb 2022, Marek Polacek via Gcc-patches wrote:
> 
> > gmp/mpfr/mpc/isl are DSOs I believe and therefore always PIC.
> 
> They are *not* DSOs when built in-tree (see the use of --disable-shared in 
> the relevant parts of Makefile.def).

Ah.  You are, of course, right.  I used contrib/download_prerequisites and
then got the expected errors about the object files not having been compiled
with -fPIE.  I've fixed this by the top-level Makefile.def change.
 
> > intl: I have no idea about this; I don't see any binaries in that directory
> > after a bootstrap.
> 
> If you use --with-included-gettext, there should be libintl.a or similar 
> there and it should be linked into host binaries.

That worked!  And again, I got the expected "recompile with -fPIE" error.
Fixed by the intl/configure.ac change.  I also had to tweak intl/Makefile.in
because apparently CFLAGS was "set with a command argument" and therefore the
assignment of CFLAGS was ignored, so PICFLAG wasn't propagated.  Another way
to fix this would be to use override, but I guess that's not the way to go.

Thanks a lot, I hope this version is close to being acceptable.

Bootstrapped on x86_64-pc-linux-gnu with --with-included-gettext 
--enable-host-pie
as well as without --enable-host-pie.

-- >8 --
This patch implements the --enable-host-pie configure option which
makes the compiler executables PIE.  This can be used to enhance
protection against ROP attacks, and can be viewed as part of a wider
trend to harden binaries.

It is similar to the option --enable-host-shared, except that --e-h-s
won't add -shared to the linker flags whereas --e-h-p will add -pie.
It is different from --enable-default-pie because that option just
adds an implicit -fPIE/-pie when the compiler is invoked, but the
compiler itself isn't PIE.

Since r12-5768-gfe7c3ecf, PCH works well with PIE, so there are no PCH
regressions.

When building the compiler, the build process may use various in-tree
libraries; these need to be built with -fPIE so that it's possible to
use them when building a PIE.  For instance, when --with-included-gettext
is in effect, intl object files must be compiled with -fPIE.  Similarly,
when building in-tree gmp, isl, mpfr and mpc, they must be compiled with
-fPIE.

I plan to add an option to link with -Wl,-z,now.

ChangeLog:

* Makefile.def: Pass $(PICFLAG) to AM_CFLAGS for gmp, mpfr, mpc, and
isl.
* Makefile.in: Regenerate.
* Makefile.tpl: Set PICFLAG.
* configure.ac (--enable-host-pie): New check.  Set PICFLAG after this
check.
* configure: Regenerate.

c++tools/ChangeLog:

* Makefile.in: Rename PIEFLAG to PICFLAG.  Set LD_PICFLAG.  Use it.
Use pic/libiberty.a if PICFLAG is set.
* configure.ac (--enable-default-pie): Set PICFLAG instead of PIEFLAG.
(--enable-host-pie): New check.
* configure: Regenerate.

fixincludes/ChangeLog:

* Makefile.in: Set and use PICFLAG and LD_PICFLAG.  Use the "pic"
build of libiberty if PICFLAG is set.
* configure.ac:
* configure: Regenerate.

gcc/ChangeLog:

* Makefile.in: Set LD_PICFLAG.  Use it.  Set enable_host_pie.
Remove NO_PIE_CFLAGS and NO_PIE_FLAG.  Pass LD_PICFLAG to
ALL_LINKERFLAGS.  Use the "pic" build of libiberty if --enable-host-pie.
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check.  Set PICFLAG and LD_PICFLAG after this
check.
* configure: Regenerate.
* doc/install.texi: Document --enable-host-pie.

gcc/d/ChangeLog:

* Make-lang.in: Remove NO_PIE_CFLAGS.

intl/ChangeLog:

* Makefile.in: Use @PICFLAG@ in COMPILE as well.
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check.  Set PICFLAG after this check.
* configure: Regenerate.

libcody/ChangeLog:

* Makefile.in: Pass LD_PICFLAG to LDFLAGS.
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check.  Set PICFLAG and LD_PICFLAG after this
check.
* configure: Regenerate.

libcpp/ChangeLog:

* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check.  Set PICFLAG after this check.
* configure: Regenerate.

libdecnumber/ChangeLog:

* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check.  Set PICFLAG after this check.
* configure: Regenerate.

libiberty/ChangeLog:

* configure.ac: Also set shared when enable_host_pie.
* configure: Regenerate.

zlib/ChangeLog:

* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check.  Set PICFLAG after this check.
* configure: Regenerate.
---
 Makefile.def  |   7 +-

Re: [PATCH v8 00/12] Add LoongArch support.

2022-03-04 Thread Paul Hua via Gcc-patches
>
> And based on the history of RISC-V port
> (https://gcc.gnu.org/pipermail/gcc/2017-January/222595.html) the process
> for a new port seems:
>
> 1. Get a permission from the Steering Committee.
> 2. Add one or two port maintainers into MAINTAINERS file.
> 3. Now the technical reviewing of the patch series just begin.
>

Hi Ruoyao,
Thanks for your advice.  But I don't know how to contact the GCC
Steering Committee.

Hi David,
Any suggestions?


Re: [PATCH v8 00/12] Add LoongArch support.

2022-03-04 Thread Paul Hua via Gcc-patches
> > And based on the history of RISC-V port
> > (https://gcc.gnu.org/pipermail/gcc/2017-January/222595.html) the process
> > for a new port seems:
> >
> > 1. Get a permission from the Steering Committee.
> > 2. Add one or two port maintainers into MAINTAINERS file.
> > 3. Now the technical reviewing of the patch series just begin.
> >
>
> Hi Ruoyao,
> Thanks for your advice.  But I don't know how to contact the GCC
> Steering Committee.
>
> Hi David,
> Any suggestions?
Sorry, CCed David Edelsohn.


[COMMITTED] Optimize signed DImode -> TImode on power10.

2022-03-04 Thread Michael Meissner via Gcc-patches
Here is the patch that I committed to the trunk:

Optimize signed DImode -> TImode on power10.

On power10, GCC tries to optimize the signed conversion from DImode to
TImode by using the vextsd2q instruction.  However to generate this
instruction, it would have to generate 3 direct moves (1 from the GPR
registers to the altivec registers, and 2 from the altivec registers to
the GPR register).

This patch generates the shift right immediate instruction to do the
conversion if the target/source registers ares GPR registers like it does
on earlier systems.  If the target/source registers are Altivec registers,
it will generate the vextsd2q instruction.

2022-03-05   Michael Meissner  

gcc/
PR target/104698
* config/rs6000/vsx.md (UNSPEC_MTVSRD_DITI_W1): Delete.
(mtvsrdd_diti_w1): Delete.
(extendditi2): Convert from define_expand to
define_insn_and_split.  Replace with code to deal with both GPR
registers and with altivec registers.

gcc/testsuite/
PR target/104698
* gcc.target/powerpc/pr104698-1.c: New test.
* gcc.target/powerpc/pr104698-2.c: New test.
---
 gcc/config/rs6000/vsx.md  | 83 ++-
 gcc/testsuite/gcc.target/powerpc/pr104698-1.c | 30 +++
 gcc/testsuite/gcc.target/powerpc/pr104698-2.c | 33 
 3 files changed, 124 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr104698-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr104698-2.c

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index b53de103872..d0fb92f5985 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -360,7 +360,6 @@ (define_c_enum "unspec"
UNSPEC_XXGENPCV
UNSPEC_MTVSBM
UNSPEC_EXTENDDITI2
-   UNSPEC_MTVSRD_DITI_W1
UNSPEC_VCNTMB
UNSPEC_VEXPAND
UNSPEC_VEXTRACT
@@ -5023,15 +5022,67 @@ (define_expand "vsignextend_si_v2di"
   DONE;
 })
 
-;; ISA 3.1 vector sign extend
-;; Move DI value from GPR to TI mode in VSX register, word 1.
-(define_insn "mtvsrdd_diti_w1"
-  [(set (match_operand:TI 0 "register_operand" "=wa")
-   (unspec:TI [(match_operand:DI 1 "register_operand" "r")]
-UNSPEC_MTVSRD_DITI_W1))]
-  "TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
-  "mtvsrdd %x0,0,%1"
-  [(set_attr "type" "vecmove")])
+;; Sign extend DI to TI.  We provide both GPR targets and Altivec targets on
+;; power10.  On earlier systems, the machine independent code will generate a
+;; shift left to sign extend the 64-bit value to 128-bit.
+;;
+;; If the register allocator prefers to use GPR registers, we will use a shift
+;; left instruction to sign extend the 64-bit value to 128-bit.
+;;
+;; If the register allocator prefers to use Altivec registers on power10,
+;; generate the vextsd2q instruction.
+(define_insn_and_split "extendditi2"
+  [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v")
+   (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z")))
+   (clobber (reg:DI CA_REGNO))]
+  "TARGET_POWERPC64 && TARGET_POWER10"
+  "#"
+  "&& reload_completed"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  int dest_regno = reg_or_subregno (dest);
+
+  /* Handle conversion to GPR registers.  Load up the low part and then do
+ a sign extension to the upper part.  */
+  if (INT_REGNO_P (dest_regno))
+{
+  rtx dest_hi = gen_highpart (DImode, dest);
+  rtx dest_lo = gen_lowpart (DImode, dest);
+
+  emit_move_insn (dest_lo, src);
+  /* In case src is a MEM, we have to use the destination, which is a
+ register, instead of re-using the source.  */
+  rtx src2 = (REG_P (src) || SUBREG_P (src)) ? src : dest_lo;
+  emit_insn (gen_ashrdi3 (dest_hi, src2, GEN_INT (63)));
+  DONE;
+}
+
+  /* For conversion to an Altivec register, generate either a splat operation
+ or a load rightmost double word instruction.  Both instructions gets the
+ DImode value into the lower 64 bits, and then do the vextsd2q
+ instruction.  */
+
+  else if (ALTIVEC_REGNO_P (dest_regno))
+{
+  if (MEM_P (src))
+   emit_insn (gen_vsx_lxvrdx (dest, src));
+  else
+   {
+ rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
+ emit_insn (gen_vsx_splat_v2di (dest_v2di, src));
+   }
+
+  emit_insn (gen_extendditi2_vector (dest, dest));
+  DONE;
+}
+
+  else
+gcc_unreachable ();
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "shift,load,vecmove,vecperm,load")])
 
 ;; Sign extend 64-bit value in TI reg, word 1, to 128-bit value in TI reg
 (define_insn "extendditi2_vector"
@@ -5042,18 +5093,6 @@ (define_insn "extendditi2_vector"
   "vextsd2q %0,%1"
   [(set_attr "type" "vecexts")])
 
-(define_expand "extendditi2"
-  [(set (match_operand:TI 0 "gpc_reg_operand")
-   (sign_extend:DI (match_operand:DI 1 "gpc_reg_operand")))]
-  "TARGET_POWER10"
-  {
-/* Move 64-bit src from GPR to vector reg and sign extend to 128-bits.