date:20220222

[committed][nvptx, testsuite] Remove mptx settings in gcc.target/nvptx tests

2022-02-22 Thread Tom de Vries via Gcc-patches

Hi,

Some test-cases in gcc/testsuite/gcc.target/nvptx contain mptx
settings, which are paired with misa settings, in order to have the mptx
version support the misa version.

Since commit decde11183bd ("[nvptx] Choose -mptx default based on -misa"),
this is no longer necessary.

Remove the mptx settings.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx, testsuite] Remove mptx settings in gcc.target/nvptx tests

gcc/testsuite/ChangeLog:

2022-02-20  Tom de Vries  

* gcc.target/nvptx/float16-1.c: Drop -mptx setting.
* gcc.target/nvptx/float16-2.c: Same.
* gcc.target/nvptx/float16-3.c: Same.
* gcc.target/nvptx/float16-4.c: Same.
* gcc.target/nvptx/float16-5.c: Same.
* gcc.target/nvptx/float16-6.c: Same.
* gcc.target/nvptx/tanh-1.c: Same.

---
 gcc/testsuite/gcc.target/nvptx/float16-1.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/float16-2.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/float16-3.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/float16-4.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/float16-5.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/float16-6.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/tanh-1.c| 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/float16-1.c 
b/gcc/testsuite/gcc.target/nvptx/float16-1.c
index 3a0324d1652..9c3f8fe8f9d 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */
+/* { dg-options "-O2 -misa=sm_53 -ffast-math" } */
 
 _Float16 var;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-2.c 
b/gcc/testsuite/gcc.target/nvptx/float16-2.c
index 5748a9c7a97..2d1dc1aafb5 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ffast-math -misa=sm_80 -mptx=7.0" } */
+/* { dg-options "-O2 -ffast-math -misa=sm_80" } */
 
 _Float16 x;
 _Float16 y;
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-3.c 
b/gcc/testsuite/gcc.target/nvptx/float16-3.c
index 914282aa1c3..3abcec39a8a 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-3.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -misa=sm_53 -mptx=6.3" } */
+/* { dg-options "-O2 -misa=sm_53" } */
 
 _Float16 var;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-4.c 
b/gcc/testsuite/gcc.target/nvptx/float16-4.c
index b11f17a43ce..173f9600ac7 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-4.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */
+/* { dg-options "-O2 -misa=sm_53 -ffast-math" } */
 
 _Float16 var;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-5.c 
b/gcc/testsuite/gcc.target/nvptx/float16-5.c
index 5fe15ecdf7e..700b3159a97 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-5.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */
+/* { dg-options "-O2 -misa=sm_53 -ffast-math" } */
 
 _Float16 a;
 _Float16 b;
diff --git a/gcc/testsuite/gcc.target/nvptx/float16-6.c 
b/gcc/testsuite/gcc.target/nvptx/float16-6.c
index 8fe4fa3051f..4889577f7f6 100644
--- a/gcc/testsuite/gcc.target/nvptx/float16-6.c
+++ b/gcc/testsuite/gcc.target/nvptx/float16-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -misa=sm_53 -mptx=6.3" } */
+/* { dg-options "-O2 -misa=sm_53" } */
 
 _Float16 x;
 _Float16 y;
diff --git a/gcc/testsuite/gcc.target/nvptx/tanh-1.c 
b/gcc/testsuite/gcc.target/nvptx/tanh-1.c
index 56a0e5a8578..946b8c1ad4b 100644
--- a/gcc/testsuite/gcc.target/nvptx/tanh-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/tanh-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ffast-math -misa=sm_75 -mptx=7.0" } */
+/* { dg-options "-O2 -ffast-math -misa=sm_75" } */
 
 float foo(float x)
 {

[committed][nvptx] Xfail sibcall execution tests

2022-02-22 Thread Tom de Vries via Gcc-patches

Hi,

On nvptx I see the following FAIL:
...
FAIL: gcc.dg/sibcall-3.c execution test
...

The test-case states that "this test is xfailed on targets without sibcall
patterns".

The nvptx port doesn't have a sibcall pattern, so add an xfail.  Likewise in
two similar test-cases.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Xfail sibcall execution tests

gcc/testsuite/ChangeLog:

2022-02-20  Tom de Vries  

* gcc.dg/sibcall-10.c: Xfail execution test for nvptx.
* gcc.dg/sibcall-3.c: Same.
* gcc.dg/sibcall-4.c: Same.

---
 gcc/testsuite/gcc.dg/sibcall-10.c | 2 +-
 gcc/testsuite/gcc.dg/sibcall-3.c  | 2 +-
 gcc/testsuite/gcc.dg/sibcall-4.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/sibcall-10.c 
b/gcc/testsuite/gcc.dg/sibcall-10.c
index dcb3e6a5ba2..e78d88fc8fc 100644
--- a/gcc/testsuite/gcc.dg/sibcall-10.c
+++ b/gcc/testsuite/gcc.dg/sibcall-10.c
@@ -5,7 +5,7 @@
Copyright (C) 2002 Free Software Foundation Inc.
Contributed by Hans-Peter Nilsson*/
 
-/* { dg-do run { xfail { { amdgcn*-*-* cris-*-* csky-*-* h8300-*-* 
hppa*64*-*-* m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* 
xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* } || { arm*-*-* && { ! arm32 } } } 
} } */
+/* { dg-do run { xfail { { amdgcn*-*-* cris-*-* csky-*-* h8300-*-* 
hppa*64*-*-* m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* 
xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* nvptx*-*-* } || { arm*-*-* && { ! 
arm32 } } } } } */
 /* -mlongcall disables sibcall patterns.  */
 /* { dg-skip-if "" { powerpc*-*-* } { "-mlongcall" } { "" } } */
 /* -msave-restore disables sibcall patterns.  */
diff --git a/gcc/testsuite/gcc.dg/sibcall-3.c b/gcc/testsuite/gcc.dg/sibcall-3.c
index 80555cf0640..82ad8c79809 100644
--- a/gcc/testsuite/gcc.dg/sibcall-3.c
+++ b/gcc/testsuite/gcc.dg/sibcall-3.c
@@ -5,7 +5,7 @@
Copyright (C) 2002 Free Software Foundation Inc.
Contributed by Hans-Peter Nilsson*/
 
-/* { dg-do run { xfail { { cris-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* 
mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* 
} || { arm*-*-* && { ! arm32 } } } } } */
+/* { dg-do run { xfail { { cris-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* 
mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* 
nvptx*-*-* } || { arm*-*-* && { ! arm32 } } } } } */
 /* -mlongcall disables sibcall patterns.  */
 /* { dg-skip-if "" { powerpc*-*-* } { "-mlongcall" } { "" } } */
 /* { dg-options "-O2 -foptimize-sibling-calls" } */
diff --git a/gcc/testsuite/gcc.dg/sibcall-4.c b/gcc/testsuite/gcc.dg/sibcall-4.c
index 97086bb5106..5dcff3f8d43 100644
--- a/gcc/testsuite/gcc.dg/sibcall-4.c
+++ b/gcc/testsuite/gcc.dg/sibcall-4.c
@@ -5,7 +5,7 @@
Copyright (C) 2002 Free Software Foundation Inc.
Contributed by Hans-Peter Nilsson*/
 
-/* { dg-do run { xfail { { cris-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* 
mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* 
} || { arm*-*-* && { ! arm32 } } } } } */
+/* { dg-do run { xfail { { cris-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* 
mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* 
nvptx*-*-* } || { arm*-*-* && { ! arm32 } } } } } */
 /* -mlongcall disables sibcall patterns.  */
 /* { dg-skip-if "" { powerpc*-*-* } { "-mlongcall" } { "" } } */
 /* { dg-options "-O2 -foptimize-sibling-calls" } */

[committed][libgomp, testsuite, nvptx] Fix pr96390.c without CUDA

2022-02-22 Thread Tom de Vries via Gcc-patches

Hi,

When running the libgomp testsuite on x86_64 with nvptx accelerator, we run 
into:
...
XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test
...

The problem is that we're expecting the following ptxas error:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
ptxas /tmp/ccZYDw8N.o, line 90; error   : Call to 'baz' requires call prototype
ptxas /tmp/ccZYDw8N.o, line 90; error   : Unknown symbol 'baz'
...

But it's not triggered because ptxas is not in the path, so nvptx-none-as
defaults to --no-verify.

So instead, we run into the same error at execution time.

Fix this by forcing verification using:
...
/* { dg-additional-options "-foffload=-Wa,--verify" \
 { target offload_target_nvptx } } */
...
such that we run into the xfail in this way instead:
...
XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
Excess errors:
nvptx-as: error trying to exec 'ptxas': execvp: No such file or directory
nvptx-as: ptxas returned 255 exit status
...

Tested on x86_64-linux with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[libgomp, testsuite, nvptx] Fix pr96390.c without CUDA

libgomp/ChangeLog:

2022-02-21  Tom de Vries  

PR testsuite/104146
* testsuite/libgomp.c++/pr96390.C: Add additional-option
-foffload=-Wa,--verify for nvptx.
* testsuite/libgomp.c-c++-common/pr96390.c: Same.

---
 libgomp/testsuite/libgomp.c++/pr96390.C  | 1 +
 libgomp/testsuite/libgomp.c-c++-common/pr96390.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c++/pr96390.C 
b/libgomp/testsuite/libgomp.c++/pr96390.C
index 8c770ecb80c..1f3c3e05661 100644
--- a/libgomp/testsuite/libgomp.c++/pr96390.C
+++ b/libgomp/testsuite/libgomp.c++/pr96390.C
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O0 -fdump-tree-omplower" } */
+/* { dg-additional-options "-foffload=-Wa,--verify" { target 
offload_target_nvptx } } */
 /* { dg-xfail-if "PR 97106/PR 97102 - .alias not (yet) supported for nvptx" { 
offload_target_nvptx } } */
 
 #include 
diff --git a/libgomp/testsuite/libgomp.c-c++-common/pr96390.c 
b/libgomp/testsuite/libgomp.c-c++-common/pr96390.c
index 4fe09cebb5d..b89f934811a 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/pr96390.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/pr96390.c
@@ -1,4 +1,5 @@
 /* { dg-additional-options "-O0 -fdump-tree-omplower" } */
+/* { dg-additional-options "-foffload=-Wa,--verify" { target 
offload_target_nvptx } } */
 /* { dg-require-alias "" } */
 /* { dg-xfail-if "PR 97102/PR 97106 - .alias not (yet) supported for nvptx" { 
offload_target_nvptx } } */

[PATCH 1/2] wwwdocs: Group sanitiser changes together

2022-02-22 Thread Richard Sandiford via Gcc-patches

Group the ThreadSanitizer and HardwareAssistedAddressSanitizer
changes under a single top-level bullet point.  This makes it
easier to add a third sanitiser-related change.

No (intended) change to the actual text or wording.  (TBH I don't
understand the ThreadSanitizer bit: is it describing three changes
(KCSAN + two new options), four changes (other environments),
or one big inter-related change?)

OK to install?

Richard

---
 htdocs/gcc-11/changes.html | 63 ++
 1 file changed, 36 insertions(+), 27 deletions(-)

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index 8e6d4ec8..cc3ae989 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -69,18 +69,6 @@ You may also want to check out our
 General Improvements
 
 
-  
-https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual";>
-ThreadSanitizer improvements to support alternative runtimes and
-environments. The https://www.kernel.org/doc/html/latest/dev-tools/kcsan.html";>
-Linux Kernel Concurrency Sanitizer (KCSAN) is now supported.
-
-  Add --param tsan-distinguish-volatile to optionally emit
-  instrumentation distinguishing volatile accesses.
-  Add --param tsan-instrument-func-entry-exit to 
optionally
-  control if function entries and exits should be instrumented.
-
-  
   
 
   In previous releases of GCC, the "column numbers" emitted in diagnostics
@@ -121,22 +109,43 @@ You may also want to check out our
 
   
   
-
-Introduce https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";>
-  Hardware-assisted AddressSanitizer support.  This sanitizer currently
-only works for the AArch64 target.  It helps debug address problems
-similarly to
-https://github.com/google/sanitizers/wiki/AddressSanitizer";>
-  AddressSanitizer but is based on partial hardware assistance and
-provides probabilistic protection to use less RAM at run time.
-https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";>
-  Hardware-assisted AddressSanitizer is not production-ready for user
-space, and is provided mainly for use compiling the Linux Kernel.
-
-To use this sanitizer the command line arguments are:
+Sanitizer improvements:
 
-  -fsanitize=hwaddress to instrument userspace code.
-  -fsanitize=kernel-hwaddress to instrument kernel 
code.
+  
+   https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual";>
+   ThreadSanitizer improvements to support alternative runtimes
+   and environments.  The
+   https://www.kernel.org/doc/html/latest/dev-tools/kcsan.html";>
+   Linux Kernel Concurrency Sanitizer (KCSAN) is now supported.
+   
+ Add --param tsan-distinguish-volatile to optionally
+ emit instrumentation distinguishing volatile accesses.
+ Add --param tsan-instrument-func-entry-exit to
+ optionally control if function entries and exits should be
+ instrumented.
+   
+  
+  
+   
+ Introduce https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";>
+ Hardware-assisted AddressSanitizer support.  This sanitizer 
currently
+ only works for the AArch64 target.  It helps debug address problems
+ similarly to
+ https://github.com/google/sanitizers/wiki/AddressSanitizer";>
+ AddressSanitizer but is based on partial hardware assistance and
+ provides probabilistic protection to use less RAM at run time.
+ https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";>
+ Hardware-assisted AddressSanitizer is not production-ready for 
user
+ space, and is provided mainly for use compiling the Linux Kernel.
+   
+   
+ To use this sanitizer the command line arguments are:
+ 
+   -fsanitize=hwaddress to instrument userspace 
code.
+   -fsanitize=kernel-hwaddress to instrument kernel 
code.
+ 
+   
+  
 
   
   
-- 
2.25.1

Re: [PATCH 1/2] wwwdocs: Group sanitiser changes together

2022-02-22 Thread Richard Sandiford via Gcc-patches

Richard Sandiford  writes:
> Group the ThreadSanitizer and HardwareAssistedAddressSanitizer
> changes under a single top-level bullet point.  This makes it
> easier to add a third sanitiser-related change.
>
> No (intended) change to the actual text or wording.  (TBH I don't
> understand the ThreadSanitizer bit: is it describing three changes
> (KCSAN + two new options), four changes (other environments),
> or one big inter-related change?)
>
> OK to install?

Err, scratch that.  Clearly I've not had tea this morning, and forgot
which version we're about to release :-)

Richard

>
> Richard
>
> ---
>  htdocs/gcc-11/changes.html | 63 ++
>  1 file changed, 36 insertions(+), 27 deletions(-)
>
> diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
> index 8e6d4ec8..cc3ae989 100644
> --- a/htdocs/gcc-11/changes.html
> +++ b/htdocs/gcc-11/changes.html
> @@ -69,18 +69,6 @@ You may also want to check out our
>  General Improvements
>  
>  
> -  
> - href="https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual";>
> -ThreadSanitizer improvements to support alternative runtimes and
> -environments. The  href="https://www.kernel.org/doc/html/latest/dev-tools/kcsan.html";>
> -Linux Kernel Concurrency Sanitizer (KCSAN) is now supported.
> -
> -  Add --param tsan-distinguish-volatile to optionally 
> emit
> -  instrumentation distinguishing volatile accesses.
> -  Add --param tsan-instrument-func-entry-exit to 
> optionally
> -  control if function entries and exits should be instrumented.
> -
> -  
>
>  
>In previous releases of GCC, the "column numbers" emitted in 
> diagnostics
> @@ -121,22 +109,43 @@ You may also want to check out our
>  
>
>
> -
> -Introduce  href="https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";>
> -  Hardware-assisted AddressSanitizer support.  This sanitizer 
> currently
> -only works for the AArch64 target.  It helps debug address problems
> -similarly to
> -https://github.com/google/sanitizers/wiki/AddressSanitizer";>
> -  AddressSanitizer but is based on partial hardware assistance and
> -provides probabilistic protection to use less RAM at run time.
> - href="https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";>
> -  Hardware-assisted AddressSanitizer is not production-ready for user
> -space, and is provided mainly for use compiling the Linux Kernel.
> -
> -To use this sanitizer the command line arguments are:
> +Sanitizer improvements:
>  
> -  -fsanitize=hwaddress to instrument userspace 
> code.
> -  -fsanitize=kernel-hwaddress to instrument kernel 
> code.
> +  
> +  href="https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual";>
> + ThreadSanitizer improvements to support alternative runtimes
> + and environments.  The
> + https://www.kernel.org/doc/html/latest/dev-tools/kcsan.html";>
> + Linux Kernel Concurrency Sanitizer (KCSAN) is now supported.
> + 
> +   Add --param tsan-distinguish-volatile to optionally
> +   emit instrumentation distinguishing volatile accesses.
> +   Add --param tsan-instrument-func-entry-exit to
> +   optionally control if function entries and exits should be
> +   instrumented.
> + 
> +  
> +  
> + 
> +   Introduce  href="https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";>
> +   Hardware-assisted AddressSanitizer support.  This sanitizer 
> currently
> +   only works for the AArch64 target.  It helps debug address problems
> +   similarly to
> +   https://github.com/google/sanitizers/wiki/AddressSanitizer";>
> +   AddressSanitizer but is based on partial hardware assistance and
> +   provides probabilistic protection to use less RAM at run time.
> +href="https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";>
> +   Hardware-assisted AddressSanitizer is not production-ready for 
> user
> +   space, and is provided mainly for use compiling the Linux Kernel.
> + 
> + 
> +   To use this sanitizer the command line arguments are:
> +   
> + -fsanitize=hwaddress to instrument userspace 
> code.
> + -fsanitize=kernel-hwaddress to instrument kernel 
> code.
> +   
> + 
> +  
>  
>
>

Re: [PATCH 3/3] target/99881 - x86 vector cost of CTOR from integer regs

2022-02-22 Thread Richard Biener via Gcc-patches

On Tue, 22 Feb 2022, Richard Biener wrote:

> On Tue, 22 Feb 2022, Hongtao Liu wrote:
> 
> > On Mon, Feb 21, 2022 at 5:10 PM Richard Biener  wrote:
> > >
> > > On Mon, 21 Feb 2022, Hongtao Liu wrote:
> > >
> > > > On Fri, Feb 18, 2022 at 10:01 PM Richard Biener via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > This uses the now passed SLP node to the vectorizer costing hook
> > > > > to adjust vector construction costs for the cost of moving an
> > > > > integer component from a GPR to a vector register when that's
> > > > > required for building a vector from components.  A cruical difference
> > > > > here is whether the component is loaded from memory or extracted
> > > > > from a vector register as in those cases no intermediate GPR is 
> > > > > involved.
> > > > >
> > > > > The pr99881.c testcase can be Un-XFAILed with this patch, the
> > > > > pr91446.c testcase now produces scalar code which looks superior
> > > > > to me so I've adjusted it as well.
> > > > >
> > > > > I'm currently re-bootstrapping and testing on x86_64-unknown-linux-gnu
> > > > > after adding the BIT_FIELD_REF vector extracting special casing.
> > > > Does the patch handle PR101929?
> > >
> > > The patch will regress the testcase posted in PR101929 again:
> > >
> > >  _255 1 times scalar_store costs 12 in body
> > >  _261 1 times scalar_store costs 12 in body
> > >  _258 1 times scalar_store costs 12 in body
> > >  _264 1 times scalar_store costs 12 in body
> > >  t0_247 + t2_251 1 times scalar_stmt costs 4 in body
> > >  t1_472 + t3_444 1 times scalar_stmt costs 4 in body
> > >  t0_406 - t2_451 1 times scalar_stmt costs 4 in body
> > >  t1_472 - t3_444 1 times scalar_stmt costs 4 in body
> > > -node 0x4182f48 1 times vec_construct costs 16 in prologue
> > > -node 0x41882b0 1 times vec_construct costs 16 in prologue
> > > +node 0x4182f48 1 times vec_construct costs 28 in prologue
> > > +node 0x41882b0 1 times vec_construct costs 28 in prologue
> > >  t0_406 + t2_451 1 times vector_stmt costs 4 in body
> > >  t1_472 - t3_444 1 times vector_stmt costs 4 in body
> > >  node 0x41829f8 1 times vec_perm costs 4 in body
> > >  _436 1 times vector_store costs 16 in body
> > >  t.c:37:9: note: Cost model analysis for part in loop 0:
> > > -  Vector cost: 60
> > > +  Vector cost: 84
> > >Scalar cost: 64
> > > +t.c:37:9: missed: not vectorized: vectorization is not profitable.
> > >
> > > We're constructing V4SI from patterns like { _407, _480, _407, _480 }
> > > where the components are results of integer adds (so the result is
> > > definitely in a GPR).  We are costing the construction as
> > > 4 * sse_op + 2 * sse_to_integer which with skylake cost is
> > > 4 * COSTS_N_INSNS (1) + 2 * 6.
> > >
> > > Whether the vectorization itself is profitable is likely questionable
> > > but then it's true that the construction of V4SI is more costly
> > > in terms of uops than a construction of V4SF.
> > >
> > > Now, we can - for the first time - now see the actual construction
> > > pattern and ideal construction might be two GPR->xmm moves
> > > + two splats + one unpack or maybe two GPR->xmm moves + one
> > > unpack + splat of DI (or other means of duplicating the lowpart).
> > Yes, the patch is technically right. I'm ok with the patch.
> 
> Thanks, I've pushed it now.  I've also tested the suggested adjustment
> doing
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b2bf90576d5..acf2cc977b4 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22595,7 +22595,7 @@ ix86_builtin_vectorization_cost (enum 
> vect_cost_for_stmt type_of_cost,
>case vec_construct:
> {
>   /* N element inserts into SSE vectors.  */
> - int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op;
> + int cost = (TYPE_VECTOR_SUBPARTS (vectype) - 1) * 
> ix86_cost->sse_op;
>   /* One vinserti128 for combining two SSE vectors for AVX256.  */
>   if (GET_MODE_BITSIZE (mode) == 256)
> cost += ix86_vec_cost (mode, ix86_cost->addss);
> 
> successfully (with no effect on the PR101929 case as expected), I
> will queue that for stage1 since it isn't known to fix any
> regression (but I will keep it as option in case something pops up).
> 
> I'll also have a more detailed look into the x264_r case to see
> if there's something we can do about the regression that will now
> show up (and I'll watch autotesters).

I found a way to get back the vectorization doing

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9188d727e33..7f1f12fb6c6 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2374,7 +2375,7 @@ fail:
n_vector_builds++;
}
}
-  if (all_uniform_p
+  if ((all_uniform_p && !two_operators)
  || n_vector_builds > 1
  || (n_vector_builds == children.length ()
  && is_a  (stmt_info->stmt)))

which in itself is reasonable since the result of the operation
is a no

[PATCH] libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]

2022-02-22 Thread Jakub Jelinek via Gcc-patches

Hi!

On
#define A(n) int foo1##n(void) { return 1##n; }
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) 
A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) 
B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) 
C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) 
D(n##8) D(n##9)
E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c 
-ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link 
value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link 
value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link 
value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a 
symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a 
symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a 
string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-02-22  Jakub Jelinek  

PR lto/104617
* simple-object-elf.c (simple_object_elf_match): Fix up URL
in comment.
(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
range (inclusive).

--- libiberty/simple-object-elf.c.jj2022-01-11 23:11:23.967267993 +0100
+++ libiberty/simple-object-elf.c   2022-02-21 20:37:12.815202845 +0100
@@ -528,7 +528,7 @@ simple_object_elf_match (unsigned char h
 not handle objects with more than SHN_LORESERVE sections
 correctly.  All large section indexes were offset by
 0x100.  There is more information at
-http://sourceware.org/bugzilla/show_bug.cgi?id-5900 .
+https://sourceware.org/PR5900 .
 Fortunately these object files are easy to detect, as the
 GNU binutils always put the section header string table
 near the end of the list of sections.  Thus if the
@@ -1559,17 +1559,13 @@ simple_object_elf_copy_lto_debug_section
  {
sh_info = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
   shdr, sh_info, Elf_Word);
-   if (sh_info < SHN_LORESERVE
-   || sh_info > SHN_HIRESERVE)
- sh_info = sh_map[sh_info];
+   sh_info = sh_map[sh_info];
ELF_SET_FIELD (type_functions, ei_class, Shdr,
   shdr, sh_info, Elf_Word, sh_info);
  }
sh_link = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
   shdr, sh_link, Elf_Word);
-   if (sh_link < SHN_LORESERVE
-   || sh_link > SHN_HIRESERVE)
- sh_link = sh_map[sh_link];
+   sh_link = sh_map[sh_link];
ELF_SET_FIELD (type_functions, ei_class, Shdr,
   shdr, sh_link, Elf_Word, sh_link);
   }

Jakub

[PATCH] wwwdocs: Document ShadowCallStack support

2022-02-22 Thread Richard Sandiford via Gcc-patches

Document ShadowCallStack support.  The option link doesn't work yet
of course, but I checked that it works with gcc-12.1.0/ removed.

OK to install?

Thanks,
Richard


---
 htdocs/gcc-12/changes.html | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index b6341fda..216ee0b6 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -96,6 +96,17 @@ a work-in-progress.
   Note that default vectorizer cost model has been changed which used to 
behave
   as -fvect-cost-model=cheap were specified.
   
+  
+GCC now supports the
+https://clang.llvm.org/docs/ShadowCallStack.html";>
+ShadowCallStack sanitizer, which can be enabled using the
+command-line option
+https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Instrumentation-Options.html#index-fsanitize_003dshadow-call-stack";>
+-fshadow-call-stack.  This sanitizer currently
+only works on AArch64 targets and it requires an environment in
+which all code has been compiled with -ffixed-r18.
+Its primary initial user is the Linux kernel.
+  
 
 
 
-- 
2.25.1

Re: [PATCH] libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]

2022-02-22 Thread Richard Biener via Gcc-patches

On Tue, 22 Feb 2022, Jakub Jelinek wrote:

> Hi!
> 
> On
> #define A(n) int foo1##n(void) { return 1##n; }
> #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) 
> A(n##8) A(n##9)
> #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) 
> B(n##8) B(n##9)
> #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) 
> C(n##8) C(n##9)
> #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) 
> D(n##8) D(n##9)
> E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
> B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
> testcase with
> ./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c 
> -ffunction-sections
> ./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
> /tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
> (testcase too slow to be included into testsuite).
> The problem is clearly reported by readelf:
> readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link 
> value of 65321
> readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link 
> value of 65321
> readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link 
> value of 65323
> readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index 
> a symtab section.
> readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index 
> a symtab section.
> readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index 
> a string section.
> because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
> sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
> inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
> range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
> where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
> used instead and .symtab_shndx section should contain the real section
> index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
> SHN_LORESERVE value is needed it should put those into
> Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
> contain any section index.
> 
> Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
> 2011) used to mishandle the > 63.75K sections case and assumed there is a
> hole in between the sections, but what
> simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
> for the debug temp object creation, we'd need to detect the case also in
> that routine and take it into account in the remapping etc.  I think
> it is not worth it given that it is over 10 years, if somebody needs
> 63.75K or more sections, better use more recent binutils.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.  I suppose this also qualifies for backports?

Thanks,
Richard.

> 2022-02-22  Jakub Jelinek  
> 
>   PR lto/104617
>   * simple-object-elf.c (simple_object_elf_match): Fix up URL
>   in comment.
>   (simple_object_elf_copy_lto_debug_sections): Remap sh_info and
>   sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
>   range (inclusive).
> 
> --- libiberty/simple-object-elf.c.jj  2022-01-11 23:11:23.967267993 +0100
> +++ libiberty/simple-object-elf.c 2022-02-21 20:37:12.815202845 +0100
> @@ -528,7 +528,7 @@ simple_object_elf_match (unsigned char h
>not handle objects with more than SHN_LORESERVE sections
>correctly.  All large section indexes were offset by
>0x100.  There is more information at
> -  http://sourceware.org/bugzilla/show_bug.cgi?id-5900 .
> +  https://sourceware.org/PR5900 .
>Fortunately these object files are easy to detect, as the
>GNU binutils always put the section header string table
>near the end of the list of sections.  Thus if the
> @@ -1559,17 +1559,13 @@ simple_object_elf_copy_lto_debug_section
> {
>   sh_info = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
>  shdr, sh_info, Elf_Word);
> - if (sh_info < SHN_LORESERVE
> - || sh_info > SHN_HIRESERVE)
> -   sh_info = sh_map[sh_info];
> + sh_info = sh_map[sh_info];
>   ELF_SET_FIELD (type_functions, ei_class, Shdr,
>  shdr, sh_info, Elf_Word, sh_info);
> }
>   sh_link = ELF_FETCH_FIELD (type_functions, ei_class, Shdr,
>  shdr, sh_link, Elf_Word);
> - if (sh_link < SHN_LORESERVE
> - || sh_link > SHN_HIRESERVE)
> -   sh_link = sh_map[sh_link];
> + sh_link = sh_map[sh_link];
>   ELF_SET_FIELD (type_functions, ei_class, Shdr,
>  shdr, sh_link, Elf_Word, sh_link);
>}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409

Re: [PATCH] wwwdocs: Document ShadowCallStack support

2022-02-22 Thread Jakub Jelinek via Gcc-patches

On Tue, Feb 22, 2022 at 10:11:06AM +, Richard Sandiford via Gcc-patches 
wrote:
> Document ShadowCallStack support.  The option link doesn't work yet
> of course, but I checked that it works with gcc-12.1.0/ removed.
> 
> OK to install?
> 
> Thanks,
> Richard
> 
> 
> ---
>  htdocs/gcc-12/changes.html | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
> index b6341fda..216ee0b6 100644
> --- a/htdocs/gcc-12/changes.html
> +++ b/htdocs/gcc-12/changes.html
> @@ -96,6 +96,17 @@ a work-in-progress.
>Note that default vectorizer cost model has been changed which used to 
> behave
>as -fvect-cost-model=cheap were specified.
>
> +  
> +GCC now supports the
> +https://clang.llvm.org/docs/ShadowCallStack.html";>
> +ShadowCallStack sanitizer, which can be enabled using the
> +command-line option
> + href="https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Instrumentation-Options.html#index-fsanitize_003dshadow-call-stack";>
> +-fshadow-call-stack.  This sanitizer currently

The option is -fsanitize=shadow-call-stack , no?

> +only works on AArch64 targets and it requires an environment in
> +which all code has been compiled with -ffixed-r18.
> +Its primary initial user is the Linux kernel.
> +  
>  
>  

Jakub

Re: [PATCH 1/2] wwwdocs: Group sanitiser changes together

2022-02-22 Thread Gerald Pfeifer

On Tue, 22 Feb 2022, Richard Sandiford wrote:
> Err, scratch that.  Clearly I've not had tea this morning, and forgot 
> which version we're about to release :-)

No worries!  (And it's not even 13 yet. ;-)

For the record, I for one am happy for you to make such changes
as you see fit (where applicable), i.e., happy to provide a second
pair of eyes, but that's an offer, not a requirement.

Gerald

Re: [PATCH] wwwdocs: Document ShadowCallStack support

2022-02-22 Thread Richard Sandiford via Gcc-patches

Jakub Jelinek  writes:
> On Tue, Feb 22, 2022 at 10:11:06AM +, Richard Sandiford via Gcc-patches 
> wrote:
>> Document ShadowCallStack support.  The option link doesn't work yet
>> of course, but I checked that it works with gcc-12.1.0/ removed.
>> 
>> OK to install?
>> 
>> Thanks,
>> Richard
>> 
>> 
>> ---
>>  htdocs/gcc-12/changes.html | 11 +++
>>  1 file changed, 11 insertions(+)
>> 
>> diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
>> index b6341fda..216ee0b6 100644
>> --- a/htdocs/gcc-12/changes.html
>> +++ b/htdocs/gcc-12/changes.html
>> @@ -96,6 +96,17 @@ a work-in-progress.
>>Note that default vectorizer cost model has been changed which used 
>> to behave
>>as -fvect-cost-model=cheap were specified.
>>
>> +  
>> +GCC now supports the
>> +https://clang.llvm.org/docs/ShadowCallStack.html";>
>> +ShadowCallStack sanitizer, which can be enabled using the
>> +command-line option
>> +> href="https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Instrumentation-Options.html#index-fsanitize_003dshadow-call-stack";>
>> +-fshadow-call-stack.  This sanitizer currently
>
> The option is -fsanitize=shadow-call-stack , no?

Gah, thanks.  Clearly one of those days :-(

Richard


diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index b6341fda..9c2d9ea8 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -96,6 +96,17 @@ a work-in-progress.
   Note that default vectorizer cost model has been changed which used to 
behave
   as -fvect-cost-model=cheap were specified.
   
+  
+GCC now supports the
+https://clang.llvm.org/docs/ShadowCallStack.html";>
+ShadowCallStack sanitizer, which can be enabled using the
+command-line option
+https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Instrumentation-Options.html#index-fsanitize_003dshadow-call-stack";>
+-fsanitize=shadow-call-stack.  This sanitizer currently
+only works on AArch64 targets and it requires an environment in
+which all code has been compiled with -ffixed-r18.
+Its primary initial user is the Linux kernel.
+  
 
 
 
-- 
2.25.1

Re: [PATCH] x86: Update Intel architectures ISA support in documentation.

2022-02-22 Thread Uros Bizjak via Gcc-patches

On Tue, Feb 22, 2022 at 7:39 AM Cui,Lili  wrote:
>
> Hi Uros,
>
> This patch is to update Intel architectures ISA support in documentation.
> Since the ISA supported by Intel architectures in the documentation
> are inconsistent with the actual, modify them all.
>
> OK for master?

OK.

Thanks,
Uros.

>
>
> gcc/Changelog:
>
>   * gcc/doc/invoke.texi: Update documents for Intel architectures.
> ---
>  gcc/doc/invoke.texi | 185 +++-
>  1 file changed, 98 insertions(+), 87 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 635c5f79278..60472a21255 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -31086,66 +31086,69 @@ instruction set is used, so the code runs on all 
> i686 family chips.
>  When used with @option{-mtune}, it has the same meaning as @samp{generic}.
>
>  @item pentium2
> -Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set
> -support.
> +Intel Pentium II CPU, based on Pentium Pro core with MMX and FXSR instruction
> +set support.
>
>  @item pentium3
>  @itemx pentium3m
> -Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction
> -set support.
> +Intel Pentium III CPU, based on Pentium Pro core with MMX, FXSR and SSE
> +instruction set support.
>
>  @item pentium-m
>  Intel Pentium M; low-power version of Intel Pentium III CPU
> -with MMX, SSE and SSE2 instruction set support.  Used by Centrino notebooks.
> +with MMX, SSE, SSE2 and FXSR instruction set support.  Used by Centrino
> +notebooks.
>
>  @item pentium4
>  @itemx pentium4m
> -Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support.
> +Intel Pentium 4 CPU with MMX, SSE, SSE2 and FXSR instruction set support.
>
>  @item prescott
> -Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 
> instruction
> -set support.
> +Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2, SSE3 and FXSR
> +instruction set support.
>
>  @item nocona
>  Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE,
> -SSE2 and SSE3 instruction set support.
> +SSE2, SSE3 and FXSR instruction set support.
>
>  @item core2
> -Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3
> -instruction set support.
> +Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, CX16,
> +SAHF and FXSR instruction set support.
>
>  @item nehalem
>  Intel Nehalem CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
> -SSE4.1, SSE4.2 and POPCNT instruction set support.
> +SSE4.1, SSE4.2, POPCNT, CX16, SAHF and FXSR instruction set support.
>
>  @item westmere
>  Intel Westmere CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
> -SSE4.1, SSE4.2, POPCNT, AES and PCLMUL instruction set support.
> +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR and PCLMUL instruction set support.
>
>  @item sandybridge
>  Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
> -SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support.
> +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE and PCLMUL instruction 
> set
> +support.
>
>  @item ivybridge
>  Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3,
> -SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C
> -instruction set support.
> +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND
> +and F16C instruction set support.
>
>  @item haswell
>  Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
> -SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
> -BMI, BMI2 and F16C instruction set support.
> +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, 
> RDRND,
> +F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE and HLE instruction set support.
>
>  @item broadwell
>  Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, 
> SSSE3,
> -SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, 
> BMI2,
> -F16C, RDSEED ADCX and PREFETCHW instruction set support.
> +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, 
> RDRND,
> +F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX and PREFETCHW
> +instruction set support.
>
>  @item skylake
>  Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
> -SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA,
> -BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC and XSAVES
> -instruction set support.
> +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, 
> RDRND,
> +F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW, AES,
> +CLFLUSHOPT, XSAVEC, XSAVES and SGX instruction set support.
>
>  @item bonnell
>  Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and 
> SSSE3
> @@ -31153,113 +31156,121 @@ instruction set support.
>
>  @item silvermont
>  Intel Silvermont CPU with 64-bit ext

[C++ PATCH] PR c++/96442: Another improved error recovery in enumerations.

2022-02-22 Thread Roger Sayle


This patch resolves PR c++/96442, another ICE-after-error regression.
In this case, invalid code attempts to use a non-integral type as the
underlying type for an enumeration (a record_type in the example given
in the bugzilla PR), for which the parser emits an error message but
allows the inappropriate type to leak to downstream code.  The minimal
safe fix is to double check that the enumeration's underlying type
EUTYPE satisfies INTEGRAL_TYPE_P before calling int_fits_type_p in
build_enumerator.  This is a one line fix, but correcting indentation
and storing a common subexpression in a variable makes the change look
a little bigger.

This patch has been tested on x86_64-pc-linunx-gnu with make bootstrap
and make -k check with no new (unexpected) failures.  Ok for mainline?


2022-02-22  Roger Sayle  

gcc/cp/ChangeLog
PR c++/96442
* decl.cc (build_enumeration): Check ENUM_UNDERLYING_TYPE is
INTEGRAL_TYPE_P before calling int_fits_type_p.

gcc/testsuite/ChangeLog
PR c++/96442
* g++.dg/pr96442.C: New test cae.


Thanks in advance,
Roger
--

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 7b48b56..c430f78 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -16542,19 +16542,21 @@ incremented enumerator value is too large for 
%"));
   STRIP_TYPE_NOPS (value);
 
   /* If the underlying type of the enum is fixed, check whether
- the enumerator values fits in the underlying type.  If it
- does not fit, the program is ill-formed [C++0x dcl.enum].  */
-  if (ENUM_UNDERLYING_TYPE (enumtype)
-  && value
-  && TREE_CODE (value) == INTEGER_CST)
-{
- if (!int_fits_type_p (value, ENUM_UNDERLYING_TYPE (enumtype)))
+the enumerator values fits in the underlying type.  If it
+does not fit, the program is ill-formed [C++0x dcl.enum].  */
+  tree eutype = ENUM_UNDERLYING_TYPE (enumtype);
+  if (eutype
+ && value
+ && INTEGRAL_TYPE_P (eutype)
+ && TREE_CODE (value) == INTEGER_CST)
+   {
+ if (!int_fits_type_p (value, eutype))
error ("enumerator value %qE is outside the range of underlying "
-  "type %qT", value, ENUM_UNDERLYING_TYPE (enumtype));
+  "type %qT", value, eutype);
 
-  /* Convert the value to the appropriate type.  */
-  value = fold_convert (ENUM_UNDERLYING_TYPE (enumtype), value);
-}
+ /* Convert the value to the appropriate type.  */
+ value = fold_convert (eutype, value);
+   }
 }
 
   /* C++ associates enums with global, function, or class declarations.  */
diff --git a/gcc/testsuite/g++.dg/pr96442.C b/gcc/testsuite/g++.dg/pr96442.C
new file mode 100644
index 000..235bb11
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr96442.C
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+enum struct a : struct {};
+template  enum class a : class c{};
+enum struct a {b};
+// { dg-excess-errors "" }

Re: libgo patch committed: Update to Go1.18rc1 release

2022-02-22 Thread Rainer Orth

Hi Ian,

> On Sun, Feb 20, 2022 at 2:13 PM Rainer Orth  
> wrote:
>>
>> > This patch updates libgo to the Go1.18rc1 release.  Bootstrapped and
>> > ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.
>>
>> this broke Solaris bootstrap:
>>
>> ld: fatal: file runtime/internal/.libs/syscall.o: open failed: No such
>> file or directory
>> collect2: error: ld returned 1 exit status
>>
>> Creating a dummy syscall_solaris.go worked around that for now.
>
> Sorry about that.  I committed this patch which should fix the problem.

great, thanks.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH] Dump def that we use for a splat

2022-02-22 Thread Richard Biener via Gcc-patches

This makes the SLP vectorizer dump the def we use for a splat to
aid debugging.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-02-22  Richard Biener  

* tree-vect-slp.cc (vect_build_slp_tree_2): Dump the def used
for a splat.
---
 gcc/tree-vect-slp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9188d727e33..341bd5220a5 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2202,7 +2202,8 @@ out:
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
-"Using a splat of the uniform operand\n");
+"Using a splat of the uniform operand %G",
+first_def->stmt);
  oprnd_info->first_dt = vect_external_def;
}
}
-- 
2.34.1

[PATCH] Restore bootstrap on x86_64-pc-linux-gnu

2022-02-22 Thread Roger Sayle

 

This patch resolves the bootstrap failure on x86_64-pc-linux-gnu.

Is this sufficiently "obvious" in stage4, or should I wait for the bootstrap

and regression testing to complete?

 

2022-02-22  Roger Sayle  

 

gcc/ChangeLog

* config/i386/i386-expand.cc (ix86_expand_cmpxchg_loop): Restore

bootstrap.

 

Cheers,

Roger

--

 

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 7f7055b..faa0191 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -23287,11 +23287,11 @@ void ix86_expand_cmpxchg_loop (rtx *ptarget_bool, rtx 
target_val,
 
   switch (mode)
 {
-case TImode:
+case E_TImode:
   gendw = gen_atomic_compare_and_swapti_doubleword;
   hmode = DImode;
   break;
-case DImode:
+case E_DImode:
   if (doubleword)
{
  gendw = gen_atomic_compare_and_swapdi_doubleword;
@@ -23300,12 +23300,15 @@ void ix86_expand_cmpxchg_loop (rtx *ptarget_bool, rtx 
target_val,
   else
gen = gen_atomic_compare_and_swapdi_1;
   break;
-case SImode:
-  gen = gen_atomic_compare_and_swapsi_1; break;
-case HImode:
-  gen = gen_atomic_compare_and_swaphi_1; break;
-case QImode:
-  gen = gen_atomic_compare_and_swapqi_1; break;
+case E_SImode:
+  gen = gen_atomic_compare_and_swapsi_1;
+  break;
+case E_HImode:
+  gen = gen_atomic_compare_and_swaphi_1;
+  break;
+case E_QImode:
+  gen = gen_atomic_compare_and_swapqi_1;
+  break;
 default:
   gcc_unreachable ();
 }

[committed][nvptx] Add -mptx-comment

2022-02-22 Thread Tom de Vries via Gcc-patches

Hi,

Add functionality that indicates which insns are added by -minit-regs, such
that for instance we have for pr53465.s:
...
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// Start: Added by -minit-regs=3:
// #NO_APP
mov.u32 %r26, 0;
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// End: Added by -minit-regs=3:
// #NO_APP
...

Can be switched off using -mno-ptx-comment.

Tested on nvptx.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add -mptx-comment

gcc/ChangeLog:

2022-02-21  Tom de Vries  

* config/nvptx/nvptx.cc (gen_comment): New function.
(workaround_uninit_method_1, workaround_uninit_method_2)
(workaround_uninit_method_3): : Use gen_comment.
* config/nvptx/nvptx.opt (mptx-comment): New option.

---
 gcc/config/nvptx/nvptx.cc  | 42 ++
 gcc/config/nvptx/nvptx.opt |  3 +++
 2 files changed, 45 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index a37a6c78b41..981b91f7095 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5372,6 +5372,17 @@ workaround_barsyncs (void)
 }
 #endif
 
+static rtx
+gen_comment (const char *s)
+{
+  const char *sep = " ";
+  size_t len = strlen (ASM_COMMENT_START) + strlen (sep) + strlen (s) + 1;
+  char *comment = (char *) alloca (len);
+  snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s);
+  return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment),
+   cfun->function_start_locus);
+}
+
 /* Initialize all declared regs at function entry.
Advantage   : Fool-proof.
Disadvantage: Potentially creates a lot of long live ranges and adds a lot
@@ -5394,6 +5405,8 @@ workaround_uninit_method_1 (void)
   gcc_assert (CONST0_RTX (GET_MODE (reg)));
 
   start_sequence ();
+  if (nvptx_comment && first != NULL)
+   emit_insn (gen_comment ("Start: Added by -minit-regs=1"));
   emit_move_insn (reg, CONST0_RTX (GET_MODE (reg)));
   rtx_insn *inits = get_insns ();
   end_sequence ();
@@ -5411,6 +5424,9 @@ workaround_uninit_method_1 (void)
   else
insert_here = emit_insn_after (inits, insert_here);
 }
+
+  if (nvptx_comment && insert_here != NULL)
+emit_insn_after (gen_comment ("End: Added by -minit-regs=1"), insert_here);
 }
 
 /* Find uses of regs that are not defined on all incoming paths, and insert a
@@ -5446,6 +5462,8 @@ workaround_uninit_method_2 (void)
   gcc_assert (CONST0_RTX (GET_MODE (reg)));
 
   start_sequence ();
+  if (nvptx_comment && first != NULL)
+   emit_insn (gen_comment ("Start: Added by -minit-regs=2:"));
   emit_move_insn (reg, CONST0_RTX (GET_MODE (reg)));
   rtx_insn *inits = get_insns ();
   end_sequence ();
@@ -5463,6 +5481,9 @@ workaround_uninit_method_2 (void)
   else
insert_here = emit_insn_after (inits, insert_here);
 }
+
+  if (nvptx_comment && insert_here != NULL)
+emit_insn_after (gen_comment ("End: Added by -minit-regs=2"), insert_here);
 }
 
 /* Find uses of regs that are not defined on all incoming paths, and insert a
@@ -5531,6 +5552,27 @@ workaround_uninit_method_3 (void)
}
 }
 
+  if (nvptx_comment)
+FOR_EACH_BB_FN (bb, cfun)
+  {
+   if (single_pred_p (bb))
+ continue;
+
+   edge e;
+   edge_iterator ei;
+   FOR_EACH_EDGE (e, ei, bb->preds)
+ {
+   if (e->insns.r == NULL_RTX)
+ continue;
+   start_sequence ();
+   emit_insn (gen_comment ("Start: Added by -minit-regs=3:"));
+   emit_insn (e->insns.r);
+   emit_insn (gen_comment ("End: Added by -minit-regs=3:"));
+   e->insns.r = get_insns ();
+   end_sequence ();
+ }
+  }
+
   commit_edge_insertions ();
 }
 
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 08580071731..e56ec9288da 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -95,3 +95,6 @@ Specify the version of the ptx version to use.
 minit-regs=
 Target Var(nvptx_init_regs) IntegerRange(0, 3) Joined UInteger Init(3)
 Initialize ptx registers.
+
+mptx-comment
+Target Var(nvptx_comment) Init(1) Undocumented

[PATCH][final] Handle compiler-generated asm insn

2022-02-22 Thread Tom de Vries via Gcc-patches

Hi,

For the nvptx port, with -mptx-comment we have in pr53465.s:
...
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// Start: Added by -minit-regs=3:
// #NO_APP
mov.u32 %r26, 0;
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// End: Added by -minit-regs=3:
// #NO_APP
...

The comments where generated using the compiler-generated equivalent of:
...
  asm ("// Comment");
...
but both the printed location and the NO_APP/APP are unnecessary for a
compiler-generated asm insn.

Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in
final_scan_insn_1, such what we simply get:
...
// Start: Added by -minit-regs=3:
mov.u32 %r26, 0;
// End: Added by -minit-regs=3:
...

Tested on nvptx.

OK for trunk?

Thanks,
- Tom

[final] Handle compiler-generated asm insn

gcc/ChangeLog:

2022-02-21  Tom de Vries  

PR rtl-optimization/104596
* config/nvptx/nvptx.cc (gen_comment): Use gen_rtx_ASM_INPUT instead
of gen_rtx_ASM_INPUT_loc.
* final.cc (final_scan_insn_1): Handle
ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION.

---
 gcc/config/nvptx/nvptx.cc |  3 +--
 gcc/final.cc  | 17 +++--
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 858789e6df7..4124c597f24 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5381,8 +5381,7 @@ gen_comment (const char *s)
   size_t len = strlen (ASM_COMMENT_START) + strlen (sep) + strlen (s) + 1;
   char *comment = (char *) alloca (len);
   snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s);
-  return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment),
-   cfun->function_start_locus);
+  return gen_rtx_ASM_INPUT (VOIDmode, ggc_strdup (comment));
 }
 
 /* Initialize all declared regs at function entry.
diff --git a/gcc/final.cc b/gcc/final.cc
index a9868861bd2..e6443ef7a4f 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -2642,15 +2642,20 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int 
optimize_p ATTRIBUTE_UNUSED,
if (string[0])
  {
expanded_location loc;
+   bool unknown_loc_p
+ = ASM_INPUT_SOURCE_LOCATION (body) == UNKNOWN_LOCATION;
 
-   app_enable ();
-   loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body));
-   if (*loc.file && loc.line)
- fprintf (asm_out_file, "%s %i \"%s\" 1\n",
-  ASM_COMMENT_START, loc.line, loc.file);
+   if (!unknown_loc_p)
+ {
+   app_enable ();
+   loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body));
+   if (*loc.file && loc.line)
+ fprintf (asm_out_file, "%s %i \"%s\" 1\n",
+  ASM_COMMENT_START, loc.line, loc.file);
+ }
fprintf (asm_out_file, "\t%s\n", string);
 #if HAVE_AS_LINE_ZERO
-   if (*loc.file && loc.line)
+   if (!unknown_loc_p && loc.file && *loc.file && loc.line)
  fprintf (asm_out_file, "%s 0 \"\" 2\n", ASM_COMMENT_START);
 #endif
  }

Re: [PATCH v2] x86: Add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO

2022-02-22 Thread H.J. Lu via Gcc-patches

On Mon, Feb 21, 2022 at 6:43 PM Hongtao Liu  wrote:
>
> On Tue, Feb 22, 2022 at 2:35 AM H.J. Lu  wrote:
> >
> > On Sun, Feb 20, 2022 at 6:01 PM Hongtao Liu  wrote:
> > >
> > > On Thu, Feb 17, 2022 at 9:56 PM H.J. Lu  wrote:
> > > >
> > > > On Thu, Feb 17, 2022 at 08:51:31AM +0100, Uros Bizjak wrote:
> > > > > On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches
> > > > > >  wrote:
> > > > > > >
> > > > > > > Reading YMM registers with all zero bits needs VZEROUPPER on 
> > > > > > > Sandy Bride,
> > > > > > > Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX
> > > > > > > transition penalty.  Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER 
> > > > > > > to
> > > > > > > generate vzeroupper instruction after loading all-zero YMM/YMM 
> > > > > > > registers
> > > > > > > and enable it by default.
> > > > > > Shouldn't TARGET_READ_ZERO_YMM_ZMM_NONEED_VZEROUPPER sounds a bit 
> > > > > > smoother?
> > > > > > Because originally we needed to add vzeroupper to all avx<->sse 
> > > > > > cases,
> > > > > > now it's a tune to indicate that we don't need to add it in some
> > > > >
> > > > > Perhaps we should go from the other side and use
> > > > > X86_TUNE_OPTIMIZE_AVX_READ for new processors?
> > > > >
> > > >
> > > > Here is the v2 patch to add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO.
> > > >
> > > The patch LGTM in general, but please rebase against
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590541.html
> > > and resend the patch, also wait a couple days in case Uros(and others)
> > > have any comments.
> >
> > I am dropping my patch since it causes the compile-time regression.
> I think only vextractif128 part is reverted, but we still have
> vmovdqu(below) which should also cause penalty?

commit fe79d652c96b53384ddfa43e312cb0010251391b
Author: Richard Biener 
Date:   Thu Feb 17 14:40:16 2022 +0100

target/104581 - compile-time regression in mode-switching

has

diff --git a/gcc/testsuite/gcc.target/i386/pr101456-1.c
b/gcc/testsuite/gcc.target/i386/pr101456-1.c
index 803fc6e0207..7fb3a3f055c 100644
--- a/gcc/testsuite/gcc.target/i386/pr101456-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr101456-1.c
@@ -30,4 +30,5 @@ foo3 (void)
   bar ();
 }

-/* { dg-final { scan-assembler-not "vzeroupper" } } */
+/* See PR104581 for the XFAIL reason.  */
+/* { dg-final { scan-assembler-not "vzeroupper" { xfail *-*-* } } } */

and I checked in:

commit 1931cbad498e625b1e24452dcfffe02539b12224
Author: H.J. Lu 
Date:   Fri Feb 18 10:36:53 2022 -0800

pieces-memset-21.c: Expect vzeroupper for ia32

Update gcc.target/i386/pieces-memset-21.c to expect vzeroupper for ia32
caused by

commit fe79d652c96b53384ddfa43e312cb0010251391b
Author: Richard Biener 
Date:   Thu Feb 17 14:40:16 2022 +0100

target/104581 - compile-time regression in mode-switching

PR target/104581
* gcc.target/i386/pieces-memset-21.c: Expect vzeroupper for ia32.

I believe that vmovdqu is also covered.

-- 
H.J.

Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70

2022-02-22 Thread Tom de Vries via Gcc-patches


On 2/17/22 18:24, Tobias Burnus wrote:

PTX version (-mptx=)
[patch adds -mptx=6.0 as option]

* Currently supported internally are 3.1 (CUDA 5.0, used by GCC <= 11),
   6.0 (CUDA 9.0, current GCC 12 default), 6.3 (CUDA 10.0), 7.0 (CUDA 11.0)
* -mptx= supports 3.1, 6.3, 7.0 – but not the internal default 6.0



I tend not to think in terms of CUDA versions, but supported driver 
versions.


In the end, drivers are used to translate ptx to SASS for execution, 
CUDA is just used for build time verification (or not, if it's not in 
the path).


And a driver may or may not be supported.  F.i. 390.x still may receive 
updates from nvidia, but there are JIT bugs that we've reported that 
they've decided not to fix, so from that point of view 390.x is unsupported.



I think it makes sense to expose the 6.0 value to the user and not
only use it internally behind the scenes. As it is already used internally,
the change is tiny but user visible. 


Sure, I've committed this (with a somewhat shorter commit log).


Thus, it has to stay when we will
bump the default in later GCC versions; on the other hand, if we bump
the default, it might be also a good reason to have it to permit the
user to have a backward compatible PTX output for linking libraries.



FWIW, I think that it's possible to link different versions of ptx isa 
together (though perhaps there are specific scenarios where that's not 
possible, I'm not sure).  But mixing versions restricts the range of 
drivers you can use, so it may make sense to just use one version.


Thanks,
- Tomnvptx: Add -mptx=6.0

Currently supported internally are 3.1, 6.0, 6.3 and 7.0.

However, -mptx= supports 3.1, 6.3, 7.0 – but not the internal default 6.0.

Add -mptx=6.0 for consistency.

Tested on nvptx.

gcc/ChangeLog:

	* config/nvptx/nvptx.opt (mptx): Add 6.0 alias PTX_VERSION_6_0.
	* doc/invoke.texi (-mptx): Update for new values and defaults.

Co-Authored-By: Tom de Vries 

---
 gcc/config/nvptx/nvptx.opt | 3 +++
 gcc/doc/invoke.texi| 7 ---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index e56ec9288da..97e127cc4fb 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -82,6 +82,9 @@ Known PTX versions (for use with the -mptx= option):
 EnumValue
 Enum(ptx_version) String(3.1) Value(PTX_VERSION_3_1)
 
+EnumValue
+Enum(ptx_version) String(6.0) Value(PTX_VERSION_6_0)
+
 EnumValue
 Enum(ptx_version) String(6.3) Value(PTX_VERSION_6_3)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 635c5f79278..56f3a01de44 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -27286,9 +27286,10 @@ strings must be lower-case.  Valid ISA strings include @samp{sm_30} and
 
 @item -mptx=@var{version-string}
 @opindex mptx
-Generate code for given the specified PTX version (e.g.@: @samp{6.3}).
-Valid version strings include @samp{3.1} and @samp{6.3}.  The default PTX
-version is 3.1.
+Generate code for given the specified PTX version (e.g.@: @samp{7.0}).
+Valid version strings include @samp{3.1}, @samp{6.0}, @samp{6.3}, and
+@samp{7.0}.  The default PTX version is 6.0, unless a higher minimal
+version is required for specified PTX ISA via option @option{-misa=}.
 
 @item -mmainkernel
 @opindex mmainkernel

Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70

2022-02-22 Thread Tom de Vries via Gcc-patches


On 2/17/22 18:24, Tobias Burnus wrote:

SM version (-misa=)
[Patch adds -misa=sm_70]

* The compiler supports internally: SM_30, SM_35, SM_53, SM_70, SM_75, 
SM_80.


I'd formulate it like: it uses SM_70 internally to accurately formulate 
when certain insns can be used.



I think it makes sense to have sm_70 in addition:
* The current code actually does generate different code for >= sm_70
   already.


Agreed.

I've committed this (with a somewhat shorter commit log), and a 
test-case update.


Thanks,
- Tomnvptx: Add -misa=sm_70

Add -misa=sm_70, and use it to specify the misa value in test-case
gcc.target/nvptx/atomic-store-2.c.

Tested on nvptx.

gcc/ChangeLog:

	* config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Handle SM70.
	* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm):
	Likewise.
	* config/nvptx/nvptx.opt (misa): Add sm_70 alias PTX_ISA_SM70.

gcc/testsuite/ChangeLog:

2022-02-22  Tom de Vries  

	* gcc.target/nvptx/atomic-store-2.c: Use -misa=sm_70.
	* gcc.target/nvptx/uniform-simt-3.c: Same.

Co-Authored-By: Tom de Vries 

---
 gcc/config/nvptx/nvptx-c.cc | 2 ++
 gcc/config/nvptx/nvptx.cc   | 2 ++
 gcc/config/nvptx/nvptx.opt  | 3 +++
 gcc/testsuite/gcc.target/nvptx/atomic-store-2.c | 2 +-
 gcc/testsuite/gcc.target/nvptx/uniform-simt-3.c | 2 +-
 5 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-c.cc b/gcc/config/nvptx/nvptx-c.cc
index d68b9910d7e..b2375fb5b16 100644
--- a/gcc/config/nvptx/nvptx-c.cc
+++ b/gcc/config/nvptx/nvptx-c.cc
@@ -43,6 +43,8 @@ nvptx_cpu_cpp_builtins (void)
 cpp_define (parse_in, "__PTX_SM__=800");
   else if (TARGET_SM75)
 cpp_define (parse_in, "__PTX_SM__=750");
+  else if (TARGET_SM70)
+cpp_define (parse_in, "__PTX_SM__=700");
   else if (TARGET_SM53)
 cpp_define (parse_in, "__PTX_SM__=530");
   else if (TARGET_SM35)
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 981b91f7095..858789e6df7 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -217,6 +217,8 @@ first_ptx_version_supporting_sm (enum ptx_isa sm)
   return PTX_VERSION_3_1;
 case PTX_ISA_SM53:
   return PTX_VERSION_4_2;
+case PTX_ISA_SM70:
+  return PTX_VERSION_6_0;
 case PTX_ISA_SM75:
   return PTX_VERSION_6_3;
 case PTX_ISA_SM80:
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 97e127cc4fb..9776c3b9a1f 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -64,6 +64,9 @@ Enum(ptx_isa) String(sm_35) Value(PTX_ISA_SM35)
 EnumValue
 Enum(ptx_isa) String(sm_53) Value(PTX_ISA_SM53)
 
+EnumValue
+Enum(ptx_isa) String(sm_70) Value(PTX_ISA_SM70)
+
 EnumValue
 Enum(ptx_isa) String(sm_75) Value(PTX_ISA_SM75)
 
diff --git a/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c b/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c
index cd5e4c38267..b58f33f2abd 100644
--- a/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c
@@ -2,7 +2,7 @@
shared state space.  */
 
 /* { dg-do compile } */
-/* { dg-options "-misa=sm_75" } */
+/* { dg-options "-misa=sm_70" } */
 
 enum memmodel
 {
diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-3.c b/gcc/testsuite/gcc.target/nvptx/uniform-simt-3.c
index 532fa825161..b61b8ba9d5b 100644
--- a/gcc/testsuite/gcc.target/nvptx/uniform-simt-3.c
+++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-3.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -muniform-simt -misa=sm_75" } */
+/* { dg-options "-O2 -muniform-simt -misa=sm_70" } */
 
 #include "atomic-store-2.c"

Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70

2022-02-22 Thread Tom de Vries via Gcc-patches


On 2/17/22 18:24, Tobias Burnus wrote:

diff --git a/gcc/config/nvptx/t-omp-device b/gcc/config/nvptx/t-omp-device
index 8765d9f1881..4228218a424 100644
--- a/gcc/config/nvptx/t-omp-device
+++ b/gcc/config/nvptx/t-omp-device
@@ -1,4 +1,4 @@
 omp-device-properties-nvptx: $(srcdir)/config/nvptx/nvptx.cc
echo kind: gpu > $@
echo arch: nvptx >> $@
-   echo isa: sm_30 sm_35 >> $@
+   echo isa: sm_30 sm_35 sm_53 sm_70 sm_75 sm_80 >> $@


I'm not sure I understand how this is used.  Is this user-visible?  Is 
there a libgomp test-case where we can observe a difference?


Thanks,
- Tom

Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-22 Thread Vladimir Makarov via Gcc-patches




On 2022-02-20 12:34, Iain Sandoe wrote:


^^^ this is mostly for my education - the stuff below is a potential solution 
to leaving lra-constraints unchanged and fixing the Darwin bug….

I'd be really glad if you do manage to fix this w/o changing LRA. 
Richard has a legitimate point that my proposed change in LRA 
prohibiting `...;reg=low_sum; ...mem[reg]` might force LRA to generate 
less optimized code or even might make LRA to generate unrecognized 
insns `reg = orginal addr` for some ports requiring further fixes in 
machine-dependent code of the ports.

Re: [PING][PATCH][libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end

2022-02-22 Thread Tom de Vries via Gcc-patches


On 5/19/21 16:52, Tom de Vries wrote:

On 4/23/21 6:48 PM, Tom de Vries wrote:

On 4/23/21 5:45 PM, Alexander Monakov wrote:

On Thu, 22 Apr 2021, Tom de Vries wrote:


Ah, I see, agreed, that makes sense.  I was afraid there was some
fundamental problem that I overlooked.

Here's an updated version.  I've tried to make it clear that the
futex_wait/wake are locally used versions, not generic functionality.

Could you please regenerate the patch passing appropriate flags to
'git format-patch' so it presents a rewrite properly (see documentation
for --patience and --break-rewrites options). The attached patch was mostly
unreadable, I'm afraid.

Sure.  I did notice that the patch was not readable, but I didn't known
there were options to improve that, so thanks for pointing that out.



Ping.  Any comments?


I've hardcoded do_spin to 1, and tested on:
- turing, pascal, maxwell (510.x driver)
- kepler (470.x driver)

Committed.

Thanks,
- Tom

[PATCH v4 03/12] arm: Add support for VPR_REG in arm_class_likely_spilled_p

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

VPR_REG is the only register in its class, so it should be handled by
TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
default_class_likely_spilled_p.  No test fails without this patch, but
it seems it should be implemented.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
* config/arm/arm.cc (arm_class_likely_spilled_p): Handle VPR_REG.

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 9c19589186f..8d7f095b59b 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -29369,7 +29369,7 @@ arm_class_likely_spilled_p (reg_class_t rclass)
   || rclass  == CC_REG)
 return true;
 
-  return false;
+  return default_class_likely_spilled_p (rclass);
 }
 
 /* Implements target hook small_register_classes_for_mode_p.  */
-- 
2.25.1

[PATCH v4 02/12] arm: Add GENERAL_AND_VPR_REGS regclass

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS).  The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
-mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(CLASS_MAX_NREGS): Handle VPR.
* config/arm/arm.cc (arm_hard_regno_nregs): Handle VPR.

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 663f4595050..9c19589186f 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -25339,6 +25339,9 @@ thumb2_asm_output_opcode (FILE * stream)
 static unsigned int
 arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
 {
+  if (IS_VPR_REGNUM (regno))
+return CEIL (GET_MODE_SIZE (mode), 2);
+
   if (TARGET_32BIT
   && regno > PC_REGNUM
   && regno != FRAME_POINTER_REGNUM
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f52724d01ad..61c02218b78 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1287,6 +1287,7 @@ enum reg_class
   SFP_REG,
   AFP_REG,
   VPR_REG,
+  GENERAL_AND_VPR_REGS,
   ALL_REGS,
   LIM_REG_CLASSES
 };
@@ -1316,6 +1317,7 @@ enum reg_class
   "SFP_REG",   \
   "AFP_REG",   \
   "VPR_REG",   \
+  "GENERAL_AND_VPR_REGS", \
   "ALL_REGS"   \
 }
 
@@ -1344,6 +1346,7 @@ enum reg_class
   { 0x, 0x, 0x, 0x0040 }, /* SFP_REG */\
   { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */\
   { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.  */  \
+  { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. 
 */ \
   { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.  */ \
 }
 
@@ -1453,7 +1456,9 @@ extern const char *fp_sysreg_names[NB_FP_SYSREGS];
ARM regs are UNITS_PER_WORD bits.  
FIXME: Is this true for iWMMX?  */
 #define CLASS_MAX_NREGS(CLASS, MODE)  \
-  (ARM_NUM_REGS (MODE))
+  (CLASS == VPR_REG) \
+  ? CEIL (GET_MODE_SIZE (MODE), 2)\
+  : (ARM_NUM_REGS (MODE))
 
 /* If defined, gives a class of registers that cannot be used as the
operand of a SUBREG that changes the mode of the object illegally.  */
-- 
2.25.1

[PATCH v4 04/12] arm: Fix mve_vmvnq_n_ argument mode

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
 iterator instead of HI in mve_vmvnq_n_.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
* config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode
for operand 1.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 171dd384133..5c3b34dce3a 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
 (define_insn "mve_vmvnq_n_"
   [
(set (match_operand:MVE_5 0 "s_register_operand" "=w")
-   (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
+   (unspec:MVE_5 [(match_operand: 1 "immediate_operand" "i")]
 VMVNQ_N))
   ]
   "TARGET_HAVE_MVE"
-- 
2.25.1

[PATCH v4 01/12] arm: Add new tests for comparison vectorization with Neon and MVE

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

This patch mainly adds Neon tests similar to existing MVE ones,
to make sure we do not break Neon when fixing MVE.

mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional
with 2.0f and 3.0f constants to help scan-assembler-times.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon 

gcc/testsuite/
* gcc.target/arm/simd/mve-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-compare-1.c: New.
* gcc.target/arm/simd/neon-compare-2.c: New.
* gcc.target/arm/simd/neon-compare-3.c: New.
* gcc.target/arm/simd/neon-compare-scalar-1.c: New.
* gcc.target/arm/simd/neon-vcmp-f16.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-3.c: New.
* gcc.target/arm/simd/neon-vcmp-f32.c: New.
* gcc.target/arm/simd/neon-vcmp.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c 
b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
new file mode 100644
index 000..917a95bf141
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include 
+
+#define NB 4
+
+#define FUNC(OP, NAME) \
+  void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \
+int i; \
+for (i=0; i, vcmpgt)
+FUNC(>=, vcmpge)
+
+/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 24 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1077936128\n} 24 } } */ /* 
Constant 3.0f.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c 
b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
new file mode 100644
index 000..2e0222a71f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3" } */
+
+#include "mve-compare-1.c"
+
+/* 64-bit vectors.  */
+/* vmvn is used by 'ne' comparisons: 3 sizes * 2 (signed/unsigned) * 2
+   (register/zero) = 12.  */
+/* { dg-final { scan-assembler-times {\tvmvn\td[0-9]+, d[0-9]+\n} 12 } } */
+
+/* { 8 bits } x { eq, ne, lt, le, gt, ge }. */
+/* ne uses eq, lt/le only apply to comparison with zero, they use gt/ge
+   otherwise.  */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, #0\n} 4 } } 
*/
+/* { dg-final { scan-assembler-times {\tvclt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcle.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+
+/* { 16 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, #0\n} 4 } 
} */
+/* { dg-final { scan-assembler-times {\tvclt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcle.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+
+/* { 32 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, #0\n} 4 } 
} */
+/* { dg-final { scan-assembler-times {\tvclt.s32\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {

[PATCH v4 00/12] ARM/MVE use vectors of boolean for predicates

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

This is v4 of this patch series, fixing issues I discovered before
committing v2 (which had been approved).  I am posting it for the
record of what I am going commit after I implemented all the requested
changes to v3.

Thanks a lot to Richard Sandiford for his help.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

The changes v3 -> v4 are:

Patch 5: Use build_truth_vector_type_for_mode to construct the boolean
types. Also fix the definition of B2Imode etc in init_emit_once.

Patches 6 and 7: Squash code change and testcases as requested during
the review.

Original text (patch numbers no longer match because of the squashes):

This patch series addresses PR 100757 and 101325 by representing
vectors of predicates (MVE VPR.P0 register) as vectors of booleans
rather than using HImode.

As this implies a lot of mostly mechanical changes, I have tried to
split the patches in a way that should help reviewers, but the split
is a bit artificial.

Patches 1-3 add new tests.

Patches 4-6 are small independent improvements.

Patch 7 implements the predicate qualifier, but does not change any
builtin yet.

Patch 8 is the first of the two main patches, and uses the new
qualifier to describe the vcmp and vpsel builtins that are useful for
auto-vectorization of comparisons.

Patch 9 is the second main patch, which fixes the vcond_mask expander.

Patches 10-13 convert almost all the remaining builtins with HI
operands to use the predicate qualifier.  After these, there are still
a few builtins with HI operands left, about which I am not sure: vctp,
vpnot, load-gather and store-scatter with v2di operands.  In fact,
patches 11/12 update some STR/LDR qualifiers in a way that breaks
these v2di builtins although existing tests still pass.

Christophe Lyon (12):
  arm: Add new tests for comparison vectorization with Neon and MVE
  arm: Add GENERAL_AND_VPR_REGS regclass
  arm: Add support for VPR_REG in arm_class_likely_spilled_p
  arm: Fix mve_vmvnq_n_ argument mode
  arm: Implement MVE predicates as vectors of booleans
  arm: Implement auto-vectorized MVE comparisons with vectors of boolean
predicates
  arm: Fix vcond_mask expander for MVE (PR target/100757)
  arm: Convert remaining MVE vcmp builtins to predicate qualifiers
  arm: Convert more MVE builtins to predicate qualifiers
  arm: Convert more load/store MVE builtins to predicate qualifiers
  arm: Convert more MVE/CDE builtins to predicate qualifiers
  arm: Add VPR_REG to ALL_REGS

 gcc/config/aarch64/aarch64-modes.def  |   8 +-
 gcc/config/arm/arm-builtins.cc| 239 --
 gcc/config/arm/arm-builtins.h |   4 +-
 gcc/config/arm/arm-modes.def  |   8 +
 gcc/config/arm/arm-protos.h   |   4 +-
 gcc/config/arm/arm-simd-builtin-types.def |   4 +
 gcc/config/arm/arm.cc | 166 ++--
 gcc/config/arm/arm.h  |   9 +-
 gcc/config/arm/arm_mve_builtins.def   | 746 
 gcc/config/arm/constraints.md |   6 +
 gcc/config/arm/iterators.md   |   6 +
 gcc/config/arm/mve.md | 795 ++
 gcc/config/arm/neon.md|  39 +
 gcc/config/arm/vec-common.md  |  52 --
 gcc/config/arm/vfp.md |  34 +-
 gcc/doc/sourcebuild.texi  |   4 +
 gcc/emit-rtl.cc   |  28 +-
 gcc/genmodes.cc   |  71 +-
 gcc/machmode.def  |  11 +-
 gcc/rtx-vector-builder.cc |   4 +-
 gcc/simplify-rtx.cc   |  34 +-
 gcc/testsuite/gcc.dg/rtl/arm/mve-vxbi.c   |  89 ++
 gcc/testsuite/gcc.dg/signbit-2.c  |   1 +
 .../gcc.target/arm/simd/mve-vcmp-f32-2.c  |  32 +
 .../gcc.target/arm/simd/neon-compare-1.c  |  78 ++
 .../gcc.target/arm/simd/neon-compare-2.c  |  13 +
 .../gcc.target/arm/simd/neon-compare-3.c  |  14 +
 .../arm/simd/neon-compare-scalar-1.c  |  57 ++
 .../gcc.target/arm/simd/neon-vcmp-f16.c   |  12 +
 .../gcc.target/arm/simd/neon-vcmp-f32-2.c |  15 +
 .../gcc.target/arm/simd/neon-vcmp-f32-3.c |  12 +
 .../gcc.target/arm/simd/neon-vcmp-f32.c   |  12 +
 gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c |  22 +
 .../gcc.target/arm/simd/pr100757-2.c  |  20 +
 .../gcc.target/arm/simd/pr100757-3.c  |  20 +
 .../gcc.target/arm/simd/pr100757-4.c  |  19 +
 gcc/testsuite/gcc.target/arm/simd/pr100757.c  |  19 +
 .../gcc.target/arm/simd/pr101325-2.c  |  19 +
 gcc/testsuite/gcc.target/arm/simd/pr101325.c  |  14 +
 gcc/testsuite/lib/target-supports.exp |  15 +-
 gcc/varasm.cc |   7 +-
 41 files changed, 1738 insertions(+), 1024 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/arm/mve-vxbi.c
 create mo

[PATCH v4 05/12] arm: Implement MVE predicates as vectors of booleans

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

In addition, we have to fix the underlying definition of vectors of
booleans because ARM/MVE needs a different representation than
AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
element size, so that a true element of V4BI is represented by
'0b'.  This patch updates the aarch64 definition of VNx*BI as
needed.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  
Richard Sandiford  

gcc/
PR target/100757
PR target/101325
* config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
VNx2BI): Update definition.
* config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Add new
simd types.
(arm_init_builtin): Map predicate vectors arguments to HImode.
(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
rtx. Move return value to HImode rtx.
* config/arm/arm-builtins.h (arm_type_qualifiers): Add 
qualifier_predicate.
* config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New modes.
* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
Pred2x8_t,Pred4x4_t): New.
* emit-rtl.cc (init_emit_once): Handle all boolean modes.
* genmodes.cc (mode_data): Add boolean field.
(blank_mode): Initialize it.
(make_complex_modes): Fix handling of boolean modes.
(make_vector_modes): Likewise.
(VECTOR_BOOL_MODE): Use new COMPONENT parameter.
(make_vector_bool_mode): Likewise.
(BOOL_MODE): New.
(make_bool_mode): New.
(emit_insn_modes_h): Fix generation of boolean modes.
(emit_class_narrowest_mode): Likewise.
* machmode.def: (VECTOR_BOOL_MODE): Document new COMPONENT
parameter.  Use new BOOL_MODE instead of FRACTIONAL_INT_MODE to
define BImode.
* rtx-vector-builder.cc (rtx_vector_builder::find_cached_value):
Fix handling of constm1_rtx for VECTOR_BOOL.
* simplify-rtx.cc (native_encode_rtx): Fix support for VECTOR_BOOL.
(native_decode_vector_rtx): Likewise.
(test_vector_ops_duplicate): Skip vec_merge test
with vectors of booleans.
* varasm.cc (output_constant_pool_2): Likewise.

diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index 976bf9b42be..8f399225a80 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -47,10 +47,10 @@ ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
 
 /* Vector modes.  */
 
-VECTOR_BOOL_MODE (VNx16BI, 16, 2);
-VECTOR_BOOL_MODE (VNx8BI, 8, 2);
-VECTOR_BOOL_MODE (VNx4BI, 4, 2);
-VECTOR_BOOL_MODE (VNx2BI, 2, 2);
+VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
+VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
+VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
+VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
 
 ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
 ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index e6bbda23e3e..993a2f7b082 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -1553,11 +1553,28 @@ arm_init_simd_builtin_types (void)
   tree eltype = arm_simd_types[i].eltype;
   machine_mode mode = arm_simd_types[i].mode;
 
-  if (eltype == NULL)
+  if (eltype == NULL
+ /* VECTOR_BOOL is not supported unless MVE is activated,
+this would make build_truth_vector_type_for_mode
+crash.  */
+ && ((GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
+ || !TARGET_HAVE_MVE))
continue;
   if (arm_simd_types[i].itype == NULL)
{
- tree type = build_vector_type (eltype, GET_MODE_NUNITS (mode));
+ tree type;
+ if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+   {
+ /* Handle MVE predicates: they are internally stored as
+16 bits, but are used as vectors of 1, 2 or 4-bit
+elements.  */
+ type = build_truth_vector_type_for_mode (GET_MODE_NUNITS (mode),
+  mode);
+ eltype = TREE_TYPE (type);
+   }
+ else
+   type = build_vector_type (eltype, GET_MODE_NUNITS (mode));
+
  type = build_distinct_type_copy (type);
  SET_TYPE_STRUCTU

[PATCH v4 06/12] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

We make use of qualifier_predicate to describe MVE builtins
prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
as they are exercised by the tests added earlier in the series.

Special handling is needed for mve_vpselq because it has a v2di
variant, which has no natural VPR.P0 representation: we keep HImode
for it.

The vector_compare expansion code is updated to use the right VxBI
mode instead of HI for the result.

We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16 patterns
to use the new MVE_7_HI iterator which covers HI and the new VxBI
modes, in conjunction with the new DB constraint for a constant vector
of booleans.

This patch also adds tests derived from the one provided in PR
target/101325: there is a compile-only test because I did not have
access to anything that could execute MVE code until recently.  I have
been able to add an executable test since QEMU supports MVE.

Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it
uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does.  This
ensures arm_mve_hw passes even if the toolchain does not generate MVE
code by default.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon 
Richard Sandiford  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
(BINOP_PRED_NONE_NONE_QUALIFIERS)
(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm-protos.h (mve_const_bool_vec_to_hi): New.
* config/arm/arm.cc (arm_hard_regno_mode_ok): Handle new VxBI
modes.
(arm_mode_to_pred_mode): New.
(arm_expand_vector_compare): Use the right VxBI mode instead of
HI.
(arm_expand_vcond): Likewise.
(simd_valid_immediate): Handle MODE_VECTOR_BOOL.
(mve_const_bool_vec_to_hi): New.
(neon_make_constant): Call mve_const_bool_vec_to_hi when needed.
* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
(vpselq_s, vpselq_f): Use new predicated qualifiers.
* config/arm/constraints.md (DB): New.
* config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators.
(MVE_VPRED, MVE_vpred): New attribute iterators.
* config/arm/mve.md (@mve_vcmpq_)
(@mve_vcmpq_f, @mve_vpselq_)
(@mve_vpselq_f): Use MVE_VPRED instead of HI.
(@mve_vpselq_v2di): Define separately.
(mov): New expander for VxBI modes.
* config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Use
MVE_7_HI iterator and add support for DB constraint.

gcc/testsuite/
PR target/100757
PR target/101325
* gcc.dg/rtl/arm/mve-vxbi.c: New test.
* gcc.target/arm/simd/pr101325.c: New.
* gcc.target/arm/simd/pr101325-2.c: New.
* lib/target-supports.exp (check_effective_target_arm_mve_hw): Use
add_options_for_arm_v8_1m_mve_fp.

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 993a2f7b082..1c6b9c986ee 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -420,6 +420,12 @@ 
arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_binop_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_PRED_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_pred_unone_unone_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_none, qualifier_immediate };
@@ -438,6 +444,12 @@ arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_NONE_NONE_QUALIFIERS \
   (arm_binop_unone_none_none_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none };
+#define BINOP_PRED_NONE_NONE_QUALIFIERS \
+  (arm_binop_pred_none_none_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
@@ -509,6 +521,12 @@ 
arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
   (arm_ternop_none_none_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
+#define TERNOP_NONE

[PATCH v4 07/12] arm: Fix vcond_mask expander for MVE (PR target/100757)

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp,
vec_cmpu and vcond_mask_, and we can
move vec_cmp, vec_cmpu and
vcond_mask_ back to neon.md since they are not
used by MVE anymore.  The new * patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_ before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(! || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
arm_mve effective target.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
if (a[b] != 2.0f)
  c = 5.0f;
  return c;
}

fn1:
ldr r3, .L3+48
vldr.64 d4, .L3  // q2=(2.0,2.0,2.0,2.0)
vldr.64 d5, .L3+8
vldrw.32q0, [r3] // q0=a(0..3)
addsr3, r3, #16
vcmp.f32eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
vldrw.32q1, [r3] // q1=a(4..7)
vmrs r3, P0
vcmp.f32eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
vmrsr2, P0  @ movhi
andsr3, r3, r2   // r3=select(a(0..3]) & select(a(4..7))
vldr.64 d4, .L3+16   // q2=(5.0,5.0,5.0,5.0)
vldr.64 d5, .L3+24
vmsr P0, r3
vldr.64 d6, .L3+32   // q3=(4.0,4.0,4.0,4.0)
vldr.64 d7, .L3+40
vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0)
vmov.32 r2, q3[1]// keep the scalar max
vmov.32 r0, q3[3]
vmov.32 r3, q3[2]
vmov.f32s11, s12
vmovs15, r2
vmovs14, r3
vmaxnm.f32  s15, s11, s15
vmaxnm.f32  s15, s15, s14
vmovs14, r0
vmaxnm.f32  s15, s15, s14
vmovr0, s15
bx  lr
.L4:
.align  3
.L3:
.word   1073741824  // 2.0f
.word   1073741824
.word   1073741824
.word   1073741824
.word   1084227584  // 5.0f
.word   1084227584
.word   1084227584
.word   1084227584
.word   1082130432  // 4.0f
.word   1082130432
.word   1082130432
.word   1082130432

This patch adds tests that trigger an ICE without this fix.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

PR target/100757
gcc/
* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
(arm_expand_vector_compare): Update prototype.
* config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New.
(arm_vector_mode_supported_p): Add support for VxBI modes.
(arm_expand_vector_compare): Remove useless generation of vpsel.
(arm_expand_vcond): Fix select operands.
(arm_get_mask_mode): New.
* config/arm/mve.md (vec_cmp): New.
(vec_cmpu): New.
(vcond_mask_): New.
* config/arm/vec-common.md (vec_cmp)
(vec_cmpu): Move to ...
* config/arm/neon.md (vec_cmp)
(vec_cmpu): ... here
and disable for MVE.
* doc/sourcebuild.texi (arm_mve): Document new effective-target.

gcc/testsuite/
PR target/100757
* gcc.target/arm/simd/pr100757-2.c: New.
* gcc.target/arm/simd/pr100757-3.c: New.
* gcc.target/arm/simd/pr100757-4.c: New.
* gcc.target/arm/simd/pr100757.c: New.

[PATCH v4 08/12] arm: Convert remaining MVE vcmp builtins to predicate qualifiers

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

This is mostly a mechanical change, only tested by the intrinsics
expansion tests.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (BINOP_UNONE_NONE_NONE_QUALIFIERS):
Delete.
(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this.
(TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new
predicated qualifiers.
* config/arm/mve.md (mve_vcmpq_n_)
(mve_vcmp*q_m_f): Use MVE_VPRED instead of HI.

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 1c6b9c986ee..02411c61098 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -438,12 +438,6 @@ arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_NONE_NONE_UNONE_QUALIFIERS \
   (arm_binop_none_none_unone_qualifiers)
 
-static enum arm_type_qualifiers
-arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none };
-#define BINOP_UNONE_NONE_NONE_QUALIFIERS \
-  (arm_binop_unone_none_none_qualifiers)
-
 static enum arm_type_qualifiers
 arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_predicate, qualifier_none, qualifier_none };
@@ -504,10 +498,10 @@ 
arm_ternop_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_ternop_unone_unone_imm_unone_qualifiers)
 
 static enum arm_type_qualifiers
-arm_ternop_unone_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_unone_none_none_unone_qualifiers)
+arm_ternop_pred_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none, qualifier_predicate 
};
+#define TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_none_none_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -553,6 +547,13 @@ 
arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
   (arm_ternop_unone_unone_unone_pred_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_pred_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned,
+qualifier_predicate };
+#define TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index 44b41eab4c5..b7ebbcab87f 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -118,9 +118,9 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vaddvq_p_u, v16qi, v8hi, v4si)
@@ -142,17 +142,17 @@ VAR3 (BINOP_UNONE_UNONE_NONE, vbrsrq_n_u, v16qi, v8hi, 
v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshlq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vrshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vqshlq_n_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
+VAR3

[PATCH v4 12/12] arm: Add VPR_REG to ALL_REGS

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

VPR_REG should be part of ALL_REGS, this patch fixes this omission.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
* config/arm/arm.h (REG_CLASS_CONTENTS): Add VPR_REG to ALL_REGS.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 61c02218b78..ef7b66f34ae 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1347,7 +1347,7 @@ enum reg_class
   { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */\
   { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.  */  \
   { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. 
 */ \
-  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.  */ \
+  { 0x7FFF, 0x, 0x, 0x040F }  /* ALL_REGS.  */ \
 }
 
 #define FP_SYSREGS \
-- 
2.25.1

[PATCH v4 10/12] arm: Convert more load/store MVE builtins to predicate qualifiers

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

This patch covers a few builtins where we do not use the 
iterator and thus we cannot use .

For v2di instructions, we keep the HI mode for predicates.

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (STRSBS_P_QUALIFIERS): Use predicate
qualifier.
(STRSBU_P_QUALIFIERS): Likewise.
(LDRGBS_Z_QUALIFIERS): Likewise.
(LDRGBU_Z_QUALIFIERS): Likewise.
(LDRGBWBXU_Z_QUALIFIERS): Likewise.
(LDRGBWBS_Z_QUALIFIERS): Likewise.
(LDRGBWBU_Z_QUALIFIERS): Likewise.
(STRSBWBS_P_QUALIFIERS): Likewise.
(STRSBWBU_P_QUALIFIERS): Likewise.
* config/arm/mve.md: Use VxBI instead of HI.

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index a9536b2f7f8..5d582f182b9 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -689,13 +689,13 @@ arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
 #define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
 #define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
 
 static enum arm_type_qualifiers
@@ -731,13 +731,13 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -777,7 +777,7 @@ arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -793,13 +793,13 @@ arm_ldrgbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBS_Z_QUALIFIERS (arm_ldrgbwbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -815,13 +815,13 @@ arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
 #define STRSBWBS_P_QUALIFIERS (arm_strsbwbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
 #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
 
 static enum arm_type_qualifiers
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index a8087815c22..9633b7187f6 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -7282,7 +7282,7 @@ (define_insn "mve_vstrwq_scatter_base_p_v4si"
[(match_operand:V4SI 0 "s_register_operand" "w")
 (match_operand:SI 1 "immediate_operand" "i")
 (match_operand:V4SI 2 "s_register_operand" "w")
-(match_operand:HI 3 "vpr_register_operand" "Up")]
+(match_operand:V4BI 3 "vpr_register_operand" "Up")]
 VSTRWSBQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7371,7 +7371,7 @@ (define_insn "mve_vldrwq_gather_base_z_v4si"
   [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
  (match_operand:SI 2 "immediate_operand" "i")
- (match_operand:HI 3 "vpr_reg

[PATCH v4 11/12] arm: Convert more MVE/CDE builtins to predicate qualifiers

2022-02-22 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

This patch covers a few non-load/store builtins where we do not use
the  iterator and thus we cannot use .

Most of the work of this patch series was carried out while I was
working at STMicroelectronics as a Linaro assignee.

2022-02-22  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.cc (CX_UNARY_UNONE_QUALIFIERS): Use
predicate.
(CX_BINARY_UNONE_QUALIFIERS): Likewise.
(CX_TERNARY_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete.
* config/arm/arm_mve_builtins.def: Use predicated qualifiers.
* config/arm/mve.md: Use VxBI instead of HI.

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 5d582f182b9..a7acc1d71e7 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -295,7 +295,7 @@ static enum arm_type_qualifiers
 arm_cx_unary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_UNARY_UNONE_QUALIFIERS (arm_cx_unary_unone_qualifiers)
 
 /* T (immediate, T, T, unsigned immediate).  */
@@ -304,7 +304,7 @@ arm_cx_binary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
   qualifier_none, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_BINARY_UNONE_QUALIFIERS (arm_cx_binary_unone_qualifiers)
 
 /* T (immediate, T, T, T, unsigned immediate).  */
@@ -313,7 +313,7 @@ arm_cx_ternary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
   qualifier_none, qualifier_none, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_TERNARY_UNONE_QUALIFIERS (arm_cx_ternary_unone_qualifiers)
 
 /* The first argument (return type) of a store should be void type,
@@ -509,12 +509,6 @@ 
arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS \
   (arm_ternop_none_none_none_imm_qualifiers)
 
-static enum arm_type_qualifiers
-arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
@@ -567,13 +561,6 @@ 
arm_quadop_unone_unone_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS \
   (arm_quadop_unone_unone_none_none_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_none_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
-qualifier_unsigned };
-#define QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_quadop_none_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_none_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
@@ -588,13 +575,6 @@ 
arm_quadop_none_none_none_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS \
   (arm_quadop_none_none_none_imm_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_unone_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
-qualifier_unsigned, qualifier_unsigned };
-#define QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
-  (arm_quadop_unone_unone_unone_unone_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_unone_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index 7db6d47867e..1c8ee34f5cb 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -87,8 +87,8 @@ VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u, v16qi, v8hi, v4si, 
v2di)
 VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
-VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
-VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
+VAR1 (BINOP_NONE_NONE_PRED, vaddlvq_p_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_PRED, vaddlvq_p_u, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE

Re: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-02-22 Thread Tom de Vries via Gcc-patches


On 2/9/22 21:12, Roger Sayle wrote:


This patch adds middle-end support for target ABIs that pass/return
floating point values in integer registers with precision wider than
the original FP mode.  An example, is the nvptx backend where 16-bit
HFmode registers are passed/returned as (promoted to) SImode registers.
Unfortunately, this currently falls foul of the various (recent?) sanity
checks that (very sensibly) prevent creating paradoxical SUBREGs of
floating point registers.  The approach below is to explicitly perform the
conversion/promotion in two steps, via an integer mode of same precision
as the floating point value.  So on nvptx, 16-bit HFmode is initially
converted to 16-bit HImode (using SUBREG), then zero-extended to SImode,
and likewise when going the other way, parameters truncated to HImode
then converted to HFmode (using SUBREG).  These changes are localized
to expand_value_return and expanding DECL_RTL to support strange ABIs,
rather than inside convert_modes or gen_lowpart, as mismatched
precision integer/FP conversions should be explicit in the RTL,
and these semantics not generally visible/implicit in user code.



Hi Roger,

I cannot comment on the patch, but I do wonder (after your "strange ABI" 
comment): did we actively decide on (or align to) a register passing ABI 
for HFmode, or has it merely been decided by the implementation of 
promote_arg:

...
static machine_mode
promote_arg (machine_mode mode, bool prototyped)
{
  if (!prototyped && mode == SFmode)
/* K&R float promotion for unprototyped functions.  */
mode = DFmode;
  else if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode))
mode = SImode;

  return mode;
}
...

There may be a rationale why it's good to pass a HF as SI, but it's not 
documented there.


Anyway, I checked what cuda does for HF, and it passes a byte array:
...
.param .align 2 .b8 _Z5helloPj6__halfs_param_1[2],
...

So, I guess what I'm saying is I'd like to understand why we're having 
the HF -> SI promotion.


Thanks,
- Tom

Re: [PATCH] nvptx: Back-end portion of a fix for PR target/104489.

2022-02-22 Thread Tom de Vries via Gcc-patches


On 2/11/22 11:38, Roger Sayle wrote:

This one line fix/tweak is the back-end specific change for a fix for

PR target/104489, that allows the ISA for GCC's nvptx backend to be bumped

to sm_53.  The machine-independent middle-end pieces were posted here:

https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html

This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu,

together with the above middle-end patch and changes identical to those

described by Tom de Vries in the PR, with make and make -k check, where

the build now completes, and there are no regressions in the testsuite.

Ok for mainline?

2022-02-11  Roger Sayle  

gcc/ChangeLog

PR target/104489

* config/nvptx/nvptx.md (*movhf_insn): Add subregs_ok attribute.



LGTM.

Thanks,
- Tom

Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70

2022-02-22 Thread Tobias Burnus


Hi Tom,

On 22.02.22 15:43, Tom de Vries wrote:

On 2/17/22 18:24, Tobias Burnus wrote:

--- a/gcc/config/nvptx/t-omp-device
+++ b/gcc/config/nvptx/t-omp-device
@@ -1,4 +1,4 @@
 echo kind: gpu > $@
 echo arch: nvptx >> $@
-echo isa: sm_30 sm_35 >> $@
+echo isa: sm_30 sm_35 sm_53 sm_70 sm_75 sm_80 >> $@


I'm not sure I understand how this is used.  Is this user-visible?  Is
there a libgomp test-case where we can observe a difference?


That's used for OpenMP context selectors like; that way, one can generate,
e.g. one code used with nvptx and one with gcn as with:

#pragma omp declare variant (on_nvptx) 
match(construct={target},device={arch(nvptx)})
#pragma omp declare variant (on_gcn) 
match(construct={target},device={arch(gcn)})
...
  #pragma omp target map(from:v)
  v = on ();
which then either calls 'on' or 'on_nvptx' or 'on_gcn'
(from libgomp/testsuite/libgomp.c/target-42.c)


The following testcases use 'arch(nvptx)':

libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h
libgomp/testsuite/libgomp.c/target-42.c
libgomp/testsuite/libgomp.c/usleep.h
libgomp/testsuite/libgomp.fortran/declare-variant-1.f90

For ISA, there is only one run-time test:

libgomp/testsuite/libgomp.c/declare-variant-1.c

but only for x86-64: match (device={isa("avx512f")})

The sm_35 also appears, but only in the compile-time tests:
gcc/testsuite/{c-c++-common,gfortran.dg}/gomp/declare-variant-{9,10}.*

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

RE: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-02-22 Thread Roger Sayle

Hi Tom,

I'll admit that I'd not myself considered the ABI issues when I initially 
proposed
experimental HFmode support for the nvptx backend, and was surprised when
I finally tracked down the source of the problem you'd reported: that libgcc
spots HFmode support exists and immediately starts passing/returning values
in this type.

The one precedent that I can point to is that LLVM's nvptx backend passes
HFmode values in SImode regs,   see https://reviews.llvm.org/D28540
Their motivation is that not all PTX ISAs support fp16, so for compatibility
with say sm_30/sm_35, fp16 values are treated like b16, i.e. HImode.
At this point, the nvptx ABI states that HImode values are passed as SImode,
so we end up with the interesting mismatch of HFmode<->SImode.
I guess the same thing affects host code, where an i386/x86 host that
doesn't support 16-bit floating point, can pass "unsigned short" values
to and from the accelerator, and likewise this HImode locally gets passed
in a wider (often WORD_MODE) integer types on most x86 ABIs.

My guess is that passing SFmode in DImode may have been supported
in older versions of GCC, before handling of SUBREGs was tightened up,
so this might be considered a regression.

Cheers,
Roger
--

> -Original Message-
> From: Tom de Vries 
> Sent: 22 February 2022 15:43
> To: Roger Sayle ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] middle-end: Support ABIs that pass FP values as wider
> integers.
> 
> On 2/9/22 21:12, Roger Sayle wrote:
> >
> > This patch adds middle-end support for target ABIs that pass/return
> > floating point values in integer registers with precision wider than
> > the original FP mode.  An example, is the nvptx backend where 16-bit
> > HFmode registers are passed/returned as (promoted to) SImode registers.
> > Unfortunately, this currently falls foul of the various (recent?)
> > sanity checks that (very sensibly) prevent creating paradoxical
> > SUBREGs of floating point registers.  The approach below is to
> > explicitly perform the conversion/promotion in two steps, via an
> > integer mode of same precision as the floating point value.  So on
> > nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using
> > SUBREG), then zero-extended to SImode, and likewise when going the
> > other way, parameters truncated to HImode then converted to HFmode
> > (using SUBREG).  These changes are localized to expand_value_return
> > and expanding DECL_RTL to support strange ABIs, rather than inside
> > convert_modes or gen_lowpart, as mismatched precision integer/FP
> > conversions should be explicit in the RTL, and these semantics not generally
> visible/implicit in user code.
> >
> 
> Hi Roger,
> 
> I cannot comment on the patch, but I do wonder (after your "strange ABI"
> comment): did we actively decide on (or align to) a register passing ABI for
> HFmode, or has it merely been decided by the implementation of
> promote_arg:
> ...
> static machine_mode
> promote_arg (machine_mode mode, bool prototyped) {
>if (!prototyped && mode == SFmode)
>  /* K&R float promotion for unprototyped functions.  */
>  mode = DFmode;
>else if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode))
>  mode = SImode;
> 
>return mode;
> }
> ...
> 
> There may be a rationale why it's good to pass a HF as SI, but it's not
> documented there.
> 
> Anyway, I checked what cuda does for HF, and it passes a byte array:
> ...
> .param .align 2 .b8 _Z5helloPj6__halfs_param_1[2], ...
> 
> So, I guess what I'm saying is I'd like to understand why we're having the HF 
> -> SI
> promotion.
> 
> Thanks,
> - Tom

[PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.

2022-02-22 Thread Andrew MacLeod via Gcc-patches

 I'd like to get clarification on some subtle terminology. I find I am 
conflating calls that don't return with calls that may throw, and I 
think they have different considerations.


My experiments with calls that can throw indicate that they always end a 
basic block.  This makes sense to me as there is the outgoing fall-thru 
edge and an outgoing EH edge.  Are there any conditions under which this 
is not the case? (other than non-call exceptions)


If that supposition is true, that leaves us with calls in the middle of 
the block which may not return.  This prevents us from allowing later 
calculations from impacting anything which happens before the call.


I believe the following 2 small patches could then resolve this.
 1 - Export global names to SSA_NAME_RANGE_INFO during the statement 
walk instead of at the end of the pass
 2 - Use the existing lazy recomputation machinery to recompute any 
globals which are defined in the block where a dependent value becomes 
non-null.


More details in each patch.  Neither is very large.  We could add this 
to this release or wait for stage 1.


Andrew

Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90' (was: Add 'libgomp.oacc-fortran/privatized-ref-2.f90')

2022-02-22 Thread Thomas Schwinge

Hi!

On 2021-05-21T16:28:57+0200, I wrote:
> This came into existance internally, when the og10 branch was set up.
>
> On 2020-06-03T17:23:51+0200, Tobias Burnus  wrote:
>> This fixes [...] on OG10 (og10_prerelease); it will be
>> later applied to gcn/… to fix the issue. (Upstream is unaffected.)
>> [...]
>
> However, that means that your testcase does work on master branch (and
> would regress if certain commits got pushed there).  As the testcase has
> got a property useful for a thing I'm currently working on, I've pushed
> to master branch "Add 'libgomp.oacc-fortran/privatized-ref-2.f90'" in
> commit 61796dc03befa9b7426d5bc7c336cca585944143

After commit a78b1ab1df9ca44acc5638e8f9d0ae2e62bd65ed
"amdgcn: Tune default OpenMP/OpenACC GPU utilization", we'd seen this
test case regress (only) on our AMD GPU amd-instinct1/'-march=gfx908'
system:

{+WARNING: program timed out.+}
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  
execution test

Same for other optimization levels.  Nothing more in 'libgomp.log'.

I have determined this is a latent problem in the original test case,
which contains a few instances of code as follows:

!$acc parallel copyout(array)
array = [(-i, i = 1, nn)]
!$acc loop gang private(array)
do i = 1, 10
  array(i) = i
end do
if (any (array /= [(-i, i = 1, nn)])) error stop 1
!$acc end parallel

Given the '!$acc loop gang', the whole containing '!$acc parallel' region
is launched with gang parallelism.  The '!$acc loop gang' executes in
gang-partitioned mode, but the 'array' assignment before and checks after
don't execute in a (hypothetical) gang-single mode, but instead in
gang-redundant mode, meaning that each gang executes these concurrently,
giving rise to data races and other mischief.  Thus, we have to make sure
that we're not executing non-parallelized code in gang-redundant mode, by
putting these parts into their own 'parallel' constructs, which then
default to 'num_gangs(1)'.  Pushed to master branch
commit f8187b5c0d22723c8e0a3d13d0ea5dd7ecfeff75 "Fix OpenACC
gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90'",
see attached.


Grüße
 Thomas


> I confirm that "FIXME: Fails due to PR middle-end/95499" is still a
> problem.
>
> And, GCC '-O' reports:
>
> [...]/libgomp.oacc-fortran/privatized-ref-2.f90:147:21:
>
>   147 |   subroutine foobar15 (scalar)
>   | ^
> Warning: ‘foobar15’ defined but not used [-Wunused-function]
> [...]/libgomp.oacc-fortran/privatized-ref-2.f90: In function ‘MAIN__’:
> [...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: 
> ‘a.offset’ is used uninitialized [-Wuninitialized]
>31 |   A = [(3*j, j=1, 10)]
>   |  ^
> [...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared 
> here
>27 |   integer, allocatable :: A(:)
>   |  ^
> [...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: 
> ‘a.dim[0].lbound’ is used uninitialized [-Wuninitialized]
>31 |   A = [(3*j, j=1, 10)]
>   |  ^
> [...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared 
> here
>27 |   integer, allocatable :: A(:)
>   |  ^
> [...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: 
> ‘a.dim[0].ubound’ is used uninitialized [-Wuninitialized]
>31 |   A = [(3*j, j=1, 10)]
>   |  ^
> [...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared 
> here
>27 |   integer, allocatable :: A(:)
>   |  ^
>
> I haven't looked into these.
>
>
> Grüße
>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From f8187b5c0d22723c8e0a3d13d0ea5dd7ecfeff75 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 21 Jan 2022 14:58:23 +0100
Subject: [PATCH] Fix OpenACC gang-redundant execution in
 'libgomp.oacc-fortran/privatized-ref-2.f90'

This was a latent problem, and this commit here now resolves a regression that
after recent commit a78b1ab1df9ca44acc5638e8f9d0ae2e62bd65ed
"amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a
GCN offloading '-march=gfx908' system:

{+WARNING: program timed out.+}
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  execution test

Same for other optimization levels.

Make sure that we're not executing non-parallelized code in gang-redundant
mode, by putting these parts into their own 'parallel' constructs, which then

[PATCH 1/2] tree-optimization/104530 - Export global ranges during the VRP block walk.

2022-02-22 Thread Andrew MacLeod via Gcc-patches

Ranger currently waits until the end of the VRP pass, then calls 
export_global_ranges ().


This method walks the list of ssa-names looking for names which it 
thinks should have SSA_NAME_RANGE_INFO updated, and is an artifact of 
the on-demand mechanism where there isn't an obvious time to finalize a 
name.


The changes for 104288 introduced the register_side_effects method and 
do provide a final place where stmt's are processed during the DOMWALK.


This patch exports the global range calculated by the statement (before 
processing side effects), and avoids the need for calling the export 
method.  This is generally better all round I think.


Bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running to 
ensure...


OK for trunk? or defer to stage 1?

Andrew
From 60ba59b5d57236ce4bab28ecdcb790c21c733904 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 16 Feb 2022 19:59:34 -0500
Subject: [PATCH 1/2] Export global ranges during the VRP block walk.

VRP currently searches the ssa_name list for globals to exported after it
finishes running.  Recent changes have VRP calling a side-effect routine for
each stmt during the walk.  This change simply exports globals as they are
calculated the final time during the walk.

	* gimple-range-cache.cc (ranger_cache::update_to_nonnull): Set the
	global value in the def block, remove the on-entry cache hack.
	* gimple-range.cc (gimple_ranger::register_side_effects): First check
	if the DEF should be exported as a global.
	* tree-vrp.cc (rvrp_folder::pre_fold_bb): Process PHI side effects,
	which will export globals.
	(execute_ranger_vrp): Remove call to export_global_ranges.
---
 gcc/gimple-range.cc | 22 ++
 gcc/tree-vrp.cc |  4 +++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 04075a98a80..3d1843670a5 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -454,6 +454,28 @@ gimple_ranger::fold_stmt (gimple_stmt_iterator *gsi, tree (*valueize) (tree))
 void
 gimple_ranger::register_side_effects (gimple *s)
 {
+  // First, export the LHS if it is a new global range.
+  tree lhs = gimple_get_lhs (s);
+  if (lhs)
+{
+  int_range_max tmp;
+  if (range_of_stmt (tmp, s, lhs) && !tmp.varying_p ()
+	  && update_global_range (tmp, lhs) && dump_file)
+	{
+	  value_range vr = tmp;
+	  fprintf (dump_file, "Global Exported: ");
+	  print_generic_expr (dump_file, lhs, TDF_SLIM);
+	  fprintf (dump_file, " = ");
+	  vr.dump (dump_file);
+	  int_range_max same = vr;
+	  if (same != tmp)
+	{
+	  fprintf (dump_file, " ...  irange was : ");
+	  tmp.dump (dump_file);
+	}
+	  fputc ('\n', dump_file);
+	}
+}
   m_cache.block_apply_nonnull (s);
 }
 
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index e9f19d0c8b9..1ad099b9ba3 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -4295,6 +4295,9 @@ public:
   void pre_fold_bb (basic_block bb) OVERRIDE
   {
 m_pta->enter (bb);
+for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
+	 gsi_next (&gsi))
+  m_ranger->register_side_effects (gsi.phi ());
   }
 
   void post_fold_bb (basic_block bb) OVERRIDE
@@ -4338,7 +4341,6 @@ execute_ranger_vrp (struct function *fun, bool warn_array_bounds_p)
   gimple_ranger *ranger = enable_ranger (fun);
   rvrp_folder folder (ranger);
   folder.substitute_and_fold ();
-  ranger->export_global_ranges ();
   if (dump_file && (dump_flags & TDF_DETAILS))
 ranger->dump (dump_file);
 
-- 
2.17.2

[PATCH 2/2] tree-optimization/104530 - Mark defs dependent on non-null stale.

2022-02-22 Thread Andrew MacLeod via Gcc-patches

This patch simply leverages the existing computation machinery to 
re-evaluate values dependent on a newly found non-null value


Ranger associates a monotonically increasing temporal value with every 
def as it is defined.  When that value is used, we check if any of the 
values used in the definition have been updated, making the current 
cached global value stale.  This makes the evaluation lazy, if there are 
no more uses, we will never re-evaluate.


When an ssa-name is marked non-null it does not change the global value, 
and thus will not invalidate any global values.  This patch marks any 
definitions in the block which are dependent on the non-null value as 
stale.  This will cause them to be re-evaluated when they are next used.


Imports: b.0_1  d.3_7
Exports: b.0_1  _2  _3  d.3_7  _8
 _2 : b.0_1(I)
 _3 : b.0_1(I)  _2
 _8 : b.0_1(I)  _2  _3  d.3_7(I)

   b.0_1 = b;
    _2 = b.0_1 == 0B;
    _3 = (int) _2;
    c = _3;
    _5 = *b.0_1;    <<-- from this point b.0_1 is [+1, +INF]
    a = _5;
    d.3_7 = d;
    _8 = _3 % d.3_7;
    if (_8 != 0)

when _5 is defined, and n.0_1 becomes non-null,  we mark the dependent 
names that are exports and defined in this block as stale.  so _2, _3 
and _8.


When _8 is being calculated, _3 is stale, and causes it to be 
recomputed.  it is dependent on _2, alsdo stale, so it is also 
recomputed, and we end up with


  _2 == [0, 0]
  _3 == [0 ,0]
and _8 = [0, 0]
And then we can fold away the condition.

The side effect is that _2 and _3 are globally changed to be [0, 0], but 
this is OK because it is the definition block, so it dominates all other 
uses of these names, and they should be [0,0] upon exit anyway.  The 
previous patch ensure that the global values written to 
SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3.


The patch would have been even smaller if I already had a mark_stale 
method.   I thought there was one, but I guess it never made it in from 
lack of need at the time.   The only other tweak was to make the value 
stale if the dependent value was the same as the definitions.


This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running 
to ensure.


OK for trunk? or defer to stage 1?
Andrew
From a7e4e5f04899817cacc3ebe5cc3ff2d489489309 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 22 Feb 2022 09:58:00 -0500
Subject: [PATCH 2/2] Mark defs dependent on non-null stale.

When a name is marked as non-null, find all exports from the block, and mark their timestamp as stale. Any following use of the name will trigger a recomputaion using the new non-null range.

	PR tree-optimization/104530
	gcc/
	* gimple-range-cache.cc (temporal_cache::set_stale): New.
	(temporal_cache::current_p): Identical timestamp is not current.
	(ranger_cache::update_to_nonnull): Mark any export defined in this
	block stale if it is dependent on this name.

	gcc/testsuite/
	* gcc.dg/pr104530.c: New.
---
 gcc/gimple-range-cache.cc   | 26 --
 gcc/testsuite/gcc.dg/pr104530.c | 17 +
 2 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr104530.c

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 613135266a4..debc93767a9 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -696,6 +696,7 @@ public:
   bool current_p (tree name, tree dep1, tree dep2) const;
   void set_timestamp (tree name);
   void set_always_current (tree name);
+  void set_stale (tree name);
 private:
   unsigned temporal_value (unsigned ssa) const;
 
@@ -740,9 +741,9 @@ temporal_cache::current_p (tree name, tree dep1, tree dep2) const
   // Any non-registered dependencies will have a value of 0 and thus be older.
   // Return true if time is newer than either dependent.
 
-  if (dep1 && ts < temporal_value (SSA_NAME_VERSION (dep1)))
+  if (dep1 && ts <= temporal_value (SSA_NAME_VERSION (dep1)))
 return false;
-  if (dep2 && ts < temporal_value (SSA_NAME_VERSION (dep2)))
+  if (dep2 && ts <= temporal_value (SSA_NAME_VERSION (dep2)))
 return false;
 
   return true;
@@ -759,6 +760,18 @@ temporal_cache::set_timestamp (tree name)
   m_timestamp[v] = ++m_current_time;
 }
 
+// Mark a NAME as stale by marking the timestamp as oldest, unless it is
+// already "always current".
+
+inline void
+temporal_cache::set_stale (tree name)
+{
+  unsigned v = SSA_NAME_VERSION (name);
+  if (v >= m_timestamp.length () || m_timestamp[v] == 0)
+return;
+  m_timestamp[v] = 1;
+}
+
 // Set the timestamp to 0, marking it as "always up to date".
 
 inline void
@@ -1475,6 +1488,15 @@ ranger_cache::update_to_nonnull (basic_block bb, tree name)
 	{
 	  r.set_nonzero (type);
 	  m_on_entry.set_bb_range (name, bb, r);
+	  // Mark consumers of name stale so they can be recomputed.
+	  if (m_gori.is_import_p (name, bb) || m_gori.is_export_p (name, bb))
+	{
+	  tree x;
+	  FOR_EACH_GORI_EXPORT_NAME (m_gori, bb, x)
+		if (m_gori.in_chai

Further simplify 'gcc/omp-oacc-neuter-broadcast.cc:record_field_map_t' (was: [PATCH 1/4] openacc: Middle-end worker-partitioning support)

2022-02-22 Thread Thomas Schwinge

Hi!

On 2021-08-16T12:34:09+0200, I wrote:
> On 2021-08-06T09:49:58+0100, Julian Brown  wrote:
>> On Wed, 4 Aug 2021 15:13:30 +0200
>> Thomas Schwinge  wrote:
>>
>>> 'oacc_do_neutering' is the 'execute' function of the pass, so that
>>> means every time this executes, a fresh 'field_map' is set up, no
>>> state persists across runs (assuming I'm understanding that
>>> correctly).  Why don't we simply use standard (non-GC) memory
>>> management for that?  "For convenience" shall be fine as an answer
>>> ;-) -- but maybe instead of figuring out the right GC annotations,
>>> changing the memory management will be easier?  (Or, of course, maybe
>>> I completely misunderstood that?)
>>
>> I suspect you're right, and there's no need for this to be GC-allocated
>> memory. If non-standard memory allocation will work out fine, we should
>
> ("non-GC", I suppose.)
>
>> probably use that instead.
>
> Pushed "Avoid 'GTY' use for 'gcc/omp-oacc-neuter-broadcast.cc:field_map'"
> to master branch in commit 049eda8274b7394523238b17ab12c3e2889f253e

In commit 0fe9176f410accc767e0abab010aec843b2e7ea6 I've now pushed
"Further simplify 'gcc/omp-oacc-neuter-broadcast.cc:record_field_map_t'"
to master branch, see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 0fe9176f410accc767e0abab010aec843b2e7ea6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 13 Aug 2021 21:17:55 +0200
Subject: [PATCH] Further simplify
 'gcc/omp-oacc-neuter-broadcast.cc:record_field_map_t'

Now that I've resolved GCC 'hash_map' issues (a while ago already), we may
further simplify this after commit 049eda8274b7394523238b17ab12c3e2889f253e
"Avoid 'GTY' use for 'gcc/omp-oacc-neuter-broadcast.cc:field_map'": as
'hash_map' Value, directly store 'field_map_t' objects, not pointers to
manually allocated 'field_map_t' objects.

	gcc/
	* omp-oacc-neuter-broadcast.cc (record_field_map_t): Further
	simplify.  Adjust all users.
---
 gcc/omp-oacc-neuter-broadcast.cc | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/gcc/omp-oacc-neuter-broadcast.cc b/gcc/omp-oacc-neuter-broadcast.cc
index 7fb691d7155..314161e38f5 100644
--- a/gcc/omp-oacc-neuter-broadcast.cc
+++ b/gcc/omp-oacc-neuter-broadcast.cc
@@ -538,7 +538,7 @@ typedef hash_map field_map_t;
to propagate, to the field in the record type that should be used for
transmission and reception.  */
 
-typedef hash_map record_field_map_t;
+typedef hash_map record_field_map_t;
 
 static void
 install_var_field (tree var, tree record_type, field_map_t *fields)
@@ -1168,8 +1168,7 @@ worker_single_copy (basic_block from, basic_block to,
 	gcc_assert (TREE_CODE (var) == VAR_DECL);
 
   /* If we had no record type, we will have no fields map.  */
-  field_map_t **fields_p = record_field_map->get (record_type);
-  field_map_t *fields = fields_p ? *fields_p : NULL;
+  field_map_t *fields = record_field_map->get (record_type);
 
   if (worker_partitioned_uses->contains (var)
 	  && fields
@@ -1684,10 +1683,9 @@ oacc_do_neutering (unsigned HOST_WIDE_INT bounds_lo,
 
 	  field_vec.qsort (sort_by_size_then_ssa_version_or_uid);
 
-	  field_map_t *fields = new field_map_t;
-
 	  bool existed;
-	  existed = record_field_map.put (record_type, fields);
+	  field_map_t *fields
+	= &record_field_map.get_or_insert (record_type, &existed);
 	  gcc_checking_assert (!existed);
 
 	  /* Insert var fields in reverse order, so the last inserted element
@@ -1818,8 +1816,6 @@ oacc_do_neutering (unsigned HOST_WIDE_INT bounds_lo,
 			&partitioned_var_uses, &record_field_map,
 			&blk_offset_map, writes_gang_private);
 
-  for (auto it : record_field_map)
-delete it.second;
   record_field_map.empty ();
 
   /* These are supposed to have been 'delete'd by 'neuter_worker_single'.  */
-- 
2.34.1

Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.

2022-02-22 Thread Jakub Jelinek via Gcc-patches

On Tue, Feb 22, 2022 at 11:39:41AM -0500, Andrew MacLeod wrote:
>  I'd like to get clarification on some subtle terminology. I find I am
> conflating calls that don't return with calls that may throw, and I think
> they have different considerations.
> 
> My experiments with calls that can throw indicate that they always end a
> basic block.  This makes sense to me as there is the outgoing fall-thru edge
> and an outgoing EH edge.  Are there any conditions under which this is not
> the case? (other than non-call exceptions)

Generally, there are 2 kinds of calls that can throw, those that can throw
internally and those can throw externally (e.g. there are
stmt_could_throw_{in,ex}ternal predicates).

Consider e.g.

void foo ();
struct S { S (); ~S (); };
void bar () { foo (); foo (); }
void baz () { S s; foo (); foo (); }
void qux () { try { foo (); } catch (...) {} }

the calls to foo in bar throw externally, if they throw, execution doesn't
continue anywhere in bar but in some bar's caller, or could just terminate
if nothing catches it at all.  Such calls don't terminate a bb.
In baz, the s variable needs destruction if either of the foo calls throw,
so those calls do terminate bb and there are normal fallthru edges from
those bbs and eh edges to an EH pad which will destruct s and continue
propagating the exception.
In qux, there is explicit try/catch, so again, foo throws internally, ends
bb, has an EH edge to EH landing pad which will do what catch does.

That is EH, then there are calls that might not return because they leave
in some other way (e.g. longjmp), or might loop forever, might exit, might
abort, trap etc.

I must say I don't know if we have any call flags that would guarantee
the function will always return (exactly once) if called.
Perhaps ECF_CONST/EFC_PURE without ECF_LOOPING_CONST_OR_PURE do?

Jakub

Get rid of 'gcc/omp-oacc-neuter-broadcast.cc:oacc_build_component_ref' (was: Re-unify 'omp_build_component_ref' and 'oacc_build_component_ref')

2022-02-22 Thread Thomas Schwinge

Hi!

On 2021-08-09T16:16:51+0200, I wrote:
> This concerns a class of ICEs seen as of og10 branch with the
> "openacc: Middle-end worker-partitioning support" and "amdgcn:
> Enable OpenACC worker partitioning for AMD GCN" changes applied:

I've determined that as of commit 2a3f9f6532bb21d8ab6f16fbe9ee603f6b1405f2
"openacc: Shared memory layout optimisation", we're no longer running
into the vectorizer ICEs for '!ADDR_SPACE_GENERIC_P'.  I have not
researched if they've just gone latent (again), or whether that commit
really changed something to avoid those (bug fix).  Anyway: pushed to
master branch commit 54f745023276e5025e34b2cc22530c78423a93cb
"Get rid of 'gcc/omp-oacc-neuter-broadcast.cc:oacc_build_component_ref'",
see attached.


Grüße
 Thomas


> On 2020-06-06T16:07:36+0100, Kwok Cheung Yeung  wrote:
>> On 01/06/2020 8:48 pm, Kwok Cheung Yeung wrote:
>>> On 21/05/2020 10:23 pm, Kwok Cheung Yeung wrote:
 These all have the same failure mode:

 during RTL pass: expand
 [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90: In 
 function 'MAIN__._omp_fn.1':
 [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90:86: 
 internal compiler error: in convert_memory_address_addr_space_1, at 
 explow.c:302
 0xc29f20 convert_memory_address_addr_space_1(scalar_int_mode, rtx_def*, 
 unsigned char, bool, bool)
  [...]/gcc/explow.c:302
 0xc29f57 convert_memory_address_addr_space(scalar_int_mode, rtx_def*, 
 unsigned char)
  [...]/gcc/explow.c:404
 [...]
>
 This occurs if the -ftree-slp-vectorize flag is specified (default at -O3).
>
>>> The problematic bit of Gimple code is this:
>>>
>>>.oacc_worker_o.44._120 = gangs_min_472;
>>>.oacc_worker_o.44._122 = workers_min_473;
>>>.oacc_worker_o.44._124 = vectors_min_474;
>>>.oacc_worker_o.44._126 = gangs_max_475;
>>>.oacc_worker_o.44._128 = workers_max_476;
>>>.oacc_worker_o.44._130 = vectors_max_477;
>>>.oacc_worker_o.44._132 = 0;
>>>
>>> With SLP vectorization enabled, it becomes this:
>>>
>>>_40 = {gangs_min_472, workers_min_473, vectors_min_474, gangs_max_475};
>>>...
>>>MEM  [(int *)&.oacc_worker_o.44] = _40;
>>>.oacc_worker_o.44._128 = workers_max_476;
>>>.oacc_worker_o.44._130 = vectors_max_477;
>>>.oacc_worker_o.44._132 = 0;
>>>
>>> The optimization is trying to transform 4 separate assignments into a single
>>> memory operation. The trouble is that &o.acc_worker_o is an SImode pointer 
>>> in
>>> AS4 (LDS), while the memory expression appears to be in the default memory
>>> space. The 'to' expression of the assignment is:
>>>
>>>   >>  type >>  type >>  size 
>>>  unit-size 
>>>  align:32 warn_if_not_align:0 symtab:0 alias-set 1 
>>> canonical-type 0x773195e8 precision:32 min >> -2147483648> max 
>>>  pointer_to_this  
>>> reference_to_this >
>>>  TI
>>>  size 
>>>  unit-size 
>>>  align:128 warn_if_not_align:0 symtab:0 alias-set 1 
>>> structural-equality nunits:4
>>>  pointer_to_this >
>>>
>>>  arg:0 >>  type >> 0x773195e8 int>
>>>  public unsigned DI
>>>  size 
>>>  unit-size 
>>>  align:64 warn_if_not_align:0 symtab:0 alias-set 2 
>>> structural-equality>
>>>  constant
>>>  arg:0 >> 0x773eb888 .oacc_ws_data_s.21 address-space-4>
>>>  addressable used static ignored BLK 
>>> [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90:86:0
>>>
>>>  size 
>>>  unit-size 
>>>  align:128 warn_if_not_align:0
>>>  (mem/c:BLK (symbol_ref:SI (".oacc_worker_o.44.14") [flags 0x2] 
>>> ) [9 .oacc_worker_o.44+0 S28 
>>> A128 AS4])>>
>>>  arg:1  
>>> constant 0>>
>>>
>>> In convert_memory_address_addr_space_1:
>>>
>>> #ifndef POINTERS_EXTEND_UNSIGNED
>>>gcc_assert (GET_MODE (x) == to_mode || GET_MODE (x) == VOIDmode);
>>>return x;
>>> #else /* defined(POINTERS_EXTEND_UNSIGNED) */
>>>
>>> POINTERS_EXTEND_UNSIGNED is not defined, so it hits the assert. The expected
>>> to_mode is DI_mode, but x is SI_mode, so the assert fires.
>
>> I now have a fix for this.
>>
>>  >MEM  [(int *)&.oacc_worker_o.44] = _40;
>>
>> The ICE occurs because the SLP vectorization pass creates the new statement
>> using the type of the expression '&.oacc_worker_o.44', which is a pointer to 
>> a
>> component ref in the default address space. The expand pass gets confused
>> because it is handed an SImode pointer (for LDS) when it is expecting a 
>> DImode
>> pointer (for flat/global space).
>>
>> The underlying problem is that although .oacc_worker_o is in the correct 
>> address
>> space, the component ref .oacc_worker_o is not. I fixed this by propagating 
>> the
>> address space of .oacc_worker_o when the component ref is created.
>
>>  static tree
>>  oacc_build_component_ref (tree obj, tr

[PATCH][middle-end/104550]Suppress uninitialized warnings for new created uses from __builtin_clear_padding folding

2022-02-22 Thread Qing Zhao

__builtin_clear_padding(&object) will clear all the padding bits of the object.
actually, it doesn't involve any use of an user variable. Therefore, users do
not expect any uninitialized warning from it. It's reasonable to suppress
uninitialized warnings for all new created uses from __builtin_clear_padding
folding.

The patch has been bootstrapped and regress tested on both x86 and aarch64.

Okay for trunk?

Thanks.

Qing

==
>From cf6620005f55d4a1f782332809445c270d22cf86 Mon Sep 17 00:00:00 2001
From: qing zhao 
Date: Mon, 21 Feb 2022 16:38:31 +
Subject: [PATCH] Suppress uninitialized warnings for new created uses from
 __builtin_clear_padding folding [PR104550]

__builtin_clear_padding(&object) will clear all the padding bits of the object.
actually, it doesn't involve any use of an user variable. Therefore, users do
not expect any uninitialized warning from it. It's reasonable to suppress
uninitialized warnings for all new created uses from __builtin_clear_padding
folding.

PR middle-end/104550

gcc/ChangeLog:

* gimple-fold.cc (clear_padding_flush): Suppress warnings for new
created uses.
(clear_padding_emit_loop): Likewise.
(clear_padding_type): Likewise.
(gimple_fold_builtin_clear_padding): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/auto-init-pr104550-1.c: New test.
* gcc.dg/auto-init-pr104550-2.c: New test.
* gcc.dg/auto-init-pr104550-3.c: New test.
---
 gcc/gimple-fold.cc  | 31 +++--
 gcc/testsuite/gcc.dg/auto-init-pr104550-1.c | 10 +++
 gcc/testsuite/gcc.dg/auto-init-pr104550-2.c | 11 
 gcc/testsuite/gcc.dg/auto-init-pr104550-3.c | 11 
 4 files changed, 55 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-1.c
 create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-2.c
 create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-3.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 16f02c2d098..1e18ba3465a 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -4296,6 +4296,7 @@ clear_padding_flush (clear_padding_struct *buf, bool full)
 build_int_cst (buf->alias_type,
buf->off + padding_end
- padding_bytes));
+ suppress_warning (dst, OPT_Wuninitialized);
  gimple *g = gimple_build_assign (dst, src);
  gimple_set_location (g, buf->loc);
  gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
@@ -4341,6 +4342,7 @@ clear_padding_flush (clear_padding_struct *buf, bool full)
  tree dst = build2_loc (buf->loc, MEM_REF, atype,
 buf->base,
 build_int_cst (buf->alias_type, off));
+ suppress_warning (dst, OPT_Wuninitialized);
  gimple *g = gimple_build_assign (dst, src);
  gimple_set_location (g, buf->loc);
  gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
@@ -4370,6 +4372,7 @@ clear_padding_flush (clear_padding_struct *buf, bool full)
atype = build_aligned_type (type, buf->align);
  tree dst = build2_loc (buf->loc, MEM_REF, atype, buf->base,
 build_int_cst (buf->alias_type, off));
+ suppress_warning (dst, OPT_Wuninitialized);
  tree src;
  gimple *g;
  if (all_ones
@@ -4420,6 +4423,7 @@ clear_padding_flush (clear_padding_struct *buf, bool full)
 build_int_cst (buf->alias_type,
buf->off + end
- padding_bytes));
+ suppress_warning (dst, OPT_Wuninitialized);
  gimple *g = gimple_build_assign (dst, src);
  gimple_set_location (g, buf->loc);
  gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
@@ -4620,14 +4624,18 @@ clear_padding_emit_loop (clear_padding_struct *buf, 
tree type,
   gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
   clear_padding_type (buf, type, buf->sz, for_auto_init);
   clear_padding_flush (buf, true);
-  g = gimple_build_assign (buf->base, POINTER_PLUS_EXPR, buf->base,
-  size_int (buf->sz));
+  tree rhs = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (buf->base),
+ buf->base, size_int (buf->sz));
+  suppress_warning (rhs, OPT_Wuninitialized);
+  g = gimple_build_assign (buf->base, rhs);
   gimple_set_location (g, buf->loc);
   gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
   g = gimple_build_label (l2);
   gimple_set_location (g, buf->loc);
   gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
-  g = gimple_build_cond (NE_EXPR, buf->base, end, l1, l3);
+  tree cond_expr = fold_build2 (NE_EXPR, boolean_type_node, buf->base, end);
+  suppress_warn

Re: [PATCH] Check if loading const from mem is faster

2022-02-22 Thread Segher Boessenkool

Hi Jiu Fu,

On Tue, Feb 22, 2022 at 02:53:13PM +0800, Jiufu Guo wrote:
>  static bool
>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
> -  if (GET_CODE (x) == HIGH
> -  && GET_CODE (XEXP (x, 0)) == UNSPEC)
> +  if (GET_CODE (x) == HIGH)
>  return true;

This isn't explained anywhere.  "Update" is not enough ;-)

CSE is the pass that is most ancient and still causing problems left and
right.  It should be rewritten sooner rather than later.

The problem with that is that the pass does so much more than just CSE,
and we don't want to lose all those other things.  So it will be a slow
arduous affair of peeling off bits into separate passes, I think :-(

Doing actual CSE without all the restrictive restrictions our pass has
historically had isn't the hard part!

Segher

Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.

2022-02-22 Thread Andrew MacLeod via Gcc-patches


On 2/22/22 11:56, Jakub Jelinek wrote:

On Tue, Feb 22, 2022 at 11:39:41AM -0500, Andrew MacLeod wrote:

  I'd like to get clarification on some subtle terminology. I find I am
conflating calls that don't return with calls that may throw, and I think
they have different considerations.

My experiments with calls that can throw indicate that they always end a
basic block.  This makes sense to me as there is the outgoing fall-thru edge
and an outgoing EH edge.  Are there any conditions under which this is not
the case? (other than non-call exceptions)

Generally, there are 2 kinds of calls that can throw, those that can throw
internally and those can throw externally (e.g. there are
stmt_could_throw_{in,ex}ternal predicates).

Consider e.g.

void foo ();
struct S { S (); ~S (); };
void bar () { foo (); foo (); }
void baz () { S s; foo (); foo (); }
void qux () { try { foo (); } catch (...) {} }

the calls to foo in bar throw externally, if they throw, execution doesn't
continue anywhere in bar but in some bar's caller, or could just terminate
if nothing catches it at all.  Such calls don't terminate a bb.


This is not a problem.


In baz, the s variable needs destruction if either of the foo calls throw,
so those calls do terminate bb and there are normal fallthru edges from
those bbs and eh edges to an EH pad which will destruct s and continue
propagating the exception.
In qux, there is explicit try/catch, so again, foo throws internally, ends
bb, has an EH edge to EH landing pad which will do what catch does.


Those also are not a problem, everything should flow fine in these 
situations as well now that we make non-null adjustments on edges, and 
don't for EH edges.


As far as these patches go, any block which has a call at the exit point 
will not have any import or exports as there is no range stmt at the end 
of the block, so we will not be marking anything in those blocks as stale.





That is EH, then there are calls that might not return because they leave
in some other way (e.g. longjmp), or might loop forever, might exit, might
abort, trap etc.
Generally speaking, calls which do not return should not now be a 
problem... as long as they do not transfer control to somewhere else in 
the current function.

I must say I don't know if we have any call flags that would guarantee
the function will always return (exactly once) if called.
Perhaps ECF_CONST/EFC_PURE without ECF_LOOPING_CONST_OR_PURE do?


I don't think I actually need that.


Andrew

Re: [PATCH] Restore bootstrap on x86_64-pc-linux-gnu

2022-02-22 Thread Uros Bizjak via Gcc-patches

On Tue, Feb 22, 2022 at 2:40 PM Roger Sayle  wrote:
>
>
>
> This patch resolves the bootstrap failure on x86_64-pc-linux-gnu.
>
> Is this sufficiently "obvious" in stage4, or should I wait for the bootstrap
>
> and regression testing to complete?

Please just bootstrap the compiler.

>
>
> 2022-02-22  Roger Sayle  
>
>
>
> gcc/ChangeLog
>
> * config/i386/i386-expand.cc (ix86_expand_cmpxchg_loop): Restore
>
> bootstrap.

OK.

Thanks,
Uros.

>
>
>
> Cheers,
>
> Roger
>
> --
>
>
>

Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.

2022-02-22 Thread Jakub Jelinek via Gcc-patches

On Tue, Feb 22, 2022 at 12:39:28PM -0500, Andrew MacLeod wrote:
> > That is EH, then there are calls that might not return because they leave
> > in some other way (e.g. longjmp), or might loop forever, might exit, might
> > abort, trap etc.
> Generally speaking, calls which do not return should not now be a problem...
> as long as they do not transfer control to somewhere else in the current
> function.

I thought all of those cases are very relevant to PR104530.
If we have:
  _1 = ptr_2(D) == 0;
  // unrelated code in the same bb
  _3 = *ptr_2(D);
then in light of PR104288, we can optimize ptr_2(D) == 0 into true only if
there are no calls inside of "// unrelated code in the same bb"
or if all calls in "// unrelated code in the same bb" are guaranteed to
return exactly once.  Because, if there is a call in there which could
exit (that is the PR104288 testcase), or abort, or trap, or loop forever,
or throw externally, or longjmp or in any other non-UB way
cause the _1 = ptr_2(D) == 0; stmt to be invoked at runtime but
_3 = *ptr_2(D) not being invoked, then we can't optimize the earlier
comparison because ptr_2(D) could be NULL in a valid program.
While if there are no calls (and no problematic inline asms) and no trapping
insns in between, we can and PR104530 is asking that we continue to optimize
that.

Jakub

Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.

2022-02-22 Thread Jeff Law via Gcc-patches





On 2/22/2022 10:57 AM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Feb 22, 2022 at 12:39:28PM -0500, Andrew MacLeod wrote:

That is EH, then there are calls that might not return because they leave
in some other way (e.g. longjmp), or might loop forever, might exit, might
abort, trap etc.

Generally speaking, calls which do not return should not now be a problem...
as long as they do not transfer control to somewhere else in the current
function.

I thought all of those cases are very relevant to PR104530.
If we have:
   _1 = ptr_2(D) == 0;
   // unrelated code in the same bb
   _3 = *ptr_2(D);
then in light of PR104288, we can optimize ptr_2(D) == 0 into true only if
there are no calls inside of "// unrelated code in the same bb"
or if all calls in "// unrelated code in the same bb" are guaranteed to
return exactly once.  Because, if there is a call in there which could
exit (that is the PR104288 testcase), or abort, or trap, or loop forever,
or throw externally, or longjmp or in any other non-UB way
cause the _1 = ptr_2(D) == 0; stmt to be invoked at runtime but
_3 = *ptr_2(D) not being invoked, then we can't optimize the earlier
comparison because ptr_2(D) could be NULL in a valid program.
While if there are no calls (and no problematic inline asms) and no trapping
insns in between, we can and PR104530 is asking that we continue to optimize
that.
Right.  This is similar to some of the restrictions we deal with in the 
path isolation pass.  Essentially we have a path, when traversed, would 
result in a *0.  We would like to be able to find the edge upon-which 
the *0 is control dependent and optimize the test so that it always went 
to the valid path rather than the *0 path.


The problem is there may be observable side effects on the *0 path 
between the test and the actual *0 -- including calls to nonreturning 
functions, setjmp/longjmp, things that could trap, etc.  This case is 
similar.  We can't back-propagate the non-null status through any 
statements with observable side effects.


Jeff

Re: [PATCH 1/3] rs6000: Move g++.dg/ext powerpc tests to g++.target

2022-02-22 Thread Segher Boessenkool

Hi!

On Mon, Feb 21, 2022 at 03:17:45PM -0600, Paul A. Clarke wrote:
> Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
> longer required.
> 
> 2021-02-21  Paul A. Clarke  
> 
> gcc/testsuite
>   * g++.dg/ext/altivec-1.C: Move to g++.target/powerpc, adjust dg
>   directives.
>   * g++.dg/ext/altivec-2.C: Likewise.
>   * g++.dg/ext/altivec-3.C: Likewise.
>   * g++.dg/ext/altivec-4.C: Likewise.
>   * g++.dg/ext/altivec-5.C: Likewise.
>   * g++.dg/ext/altivec-6.C: Likewise.
>   * g++.dg/ext/altivec-7.C: Likewise.
>   * g++.dg/ext/altivec-8.C: Likewise.
>   * g++.dg/ext/altivec-9.C: Likewise.
>   * g++.dg/ext/altivec-10.C: Likewise.
>   * g++.dg/ext/altivec-11.C: Likewise.
>   * g++.dg/ext/altivec-12.C: Likewise.
>   * g++.dg/ext/altivec-13.C: Likewise.
>   * g++.dg/ext/altivec-14.C: Likewise.
>   * g++.dg/ext/altivec-15.C: Likewise.
>   * g++.dg/ext/altivec-16.C: Likewise.
>   * g++.dg/ext/altivec-17.C: Likewise.
>   * g++.dg/ext/altivec-18.C: Likewise.
>   * g++.dg/ext/altivec-cell-1.C: Likewise.
>   * g++.dg/ext/altivec-cell-2.C: Likewise.
>   * g++.dg/ext/altivec-cell-3.C: Likewise.
>   * g++.dg/ext/altivec-cell-4.C: Likewise.
>   * g++.dg/ext/altivec-cell-5.C: Likewise.
>   * g++.dg/ext/altivec-types-1.C: Likewise.
>   * g++.dg/ext/altivec-types-2.C: Likewise.
>   * g++.dg/ext/altivec-types-3.C: Likewise.
>   * g++.dg/ext/altivec-types-4.C: Likewise.
>   * g++.dg/ext/undef-bool-1.C: Likewise.

Okay for trunk.  Thanks!


Segher

Re: [PATCH 0/3] rs6000: Move g++.dg powerpc tests to g++.target

2022-02-22 Thread Segher Boessenkool

On Mon, Feb 21, 2022 at 03:17:44PM -0600, Paul A. Clarke wrote:
> Some tests in g++.dg are target-specific for powerpc. Move those to
> g++.target/powerpc. Update the DejaGnu directives as needed, since
> the target restriction is perhaps no longer needed when residing in the
> target-specific powerpc subdirectory.

Not "perhaps" :-)  More specifically, powerpc.exp has

# Exit immediately if this isn't a PowerPC target.
if {![istarget powerpc*-*-*] } then {
  return
}

so anything run from that driver does not have to test for powerpc
separately anymore.

Segher

Re: [PATCH] PR fortran/104619 - [10/11/12 Regression] ICE on list comprehension with default derived type constructor

2022-02-22 Thread Thomas Koenig via Gcc-patches


Hi Harald,


a recently introduced shape validation for an array constructor
against the declared shape of a DT component failed to punt if
the shape of the constructor cannot be determined at compile time.

Suggested solution: skip the shape check in those cases.

Regtested on x86_64-pc-linux-gnu.  OK for mainline / affected branches?


Looks good to me.

Thanks for the patch!

Best regards

Thomas

Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.

2022-02-22 Thread Andrew MacLeod via Gcc-patches


On 2/22/22 13:07, Jeff Law wrote:



On 2/22/2022 10:57 AM, Jakub Jelinek via Gcc-patches wrote:

On Tue, Feb 22, 2022 at 12:39:28PM -0500, Andrew MacLeod wrote:
That is EH, then there are calls that might not return because they 
leave
in some other way (e.g. longjmp), or might loop forever, might 
exit, might

abort, trap etc.
Generally speaking, calls which do not return should not now be a 
problem...
as long as they do not transfer control to somewhere else in the 
current

function.

I thought all of those cases are very relevant to PR104530.
If we have:
   _1 = ptr_2(D) == 0;
   // unrelated code in the same bb
   _3 = *ptr_2(D);
then in light of PR104288, we can optimize ptr_2(D) == 0 into true 
only if

there are no calls inside of "// unrelated code in the same bb"
or if all calls in "// unrelated code in the same bb" are guaranteed to
return exactly once.  Because, if there is a call in there which could
exit (that is the PR104288 testcase), or abort, or trap, or loop 
forever,

or throw externally, or longjmp or in any other non-UB way
cause the _1 = ptr_2(D) == 0; stmt to be invoked at runtime but
_3 = *ptr_2(D) not being invoked, then we can't optimize the earlier
comparison because ptr_2(D) could be NULL in a valid program.
While if there are no calls (and no problematic inline asms) and no 
trapping
insns in between, we can and PR104530 is asking that we continue to 
optimize

that.
Right.  This is similar to some of the restrictions we deal with in 
the path isolation pass.  Essentially we have a path, when traversed, 
would result in a *0.  We would like to be able to find the edge 
upon-which the *0 is control dependent and optimize the test so that 
it always went to the valid path rather than the *0 path.


The problem is there may be observable side effects on the *0 path 
between the test and the actual *0 -- including calls to nonreturning 
functions, setjmp/longjmp, things that could trap, etc.  This case is 
similar.  We can't back-propagate the non-null status through any 
statements with observable side effects.


Jeff

We can't back propagate, but we can alter our forward view.  Any 
ssa-name defined before the observable side effect can be recalculated 
using the updated values, and all uses of those names after the 
side-effect would then appear to be "up-to-date"


This does not actually change anything before the side-effect statement, 
but the lazy re-evalaution ranger employs makes it appear as if we do a 
new computation when _1 is used afterwards. ie:


   _1 = ptr_2(D) == 0;
   // unrelated code in the same bb
   _3 = *ptr_2(D);
   _4 = ptr_2(D) == 0;  // ptr_2 is known to be [+1, +INF] now.
And we use _4 everywhere _1 was used.   This is the effect.

so we do not actually change anything in the unrelated code, just 
observable effects afterwards.  We already do these recalculations on 
outgoing edges in other blocks, just not within the definition block 
because non-null wasn't visible within the def block.


Additionally, In the testcase, there is a store to C before the side 
effects.
these patches get rid of the branch and thus the call in the testcase as 
requested, but we still have to compute _3 in order to store it into 
global C since it occurs  pre side-effect.


    b.0_1 = b;
    _2 = b.0_1 == 0B;
    _3 = (int) _2;
    c = _3;
    _5 = *b.0_1;

No matter how you look at it, you are going to need to process a block 
twice in order to handle any code pre-side-effect.  Whether it be 
assigning stmt uids, or what have you.


VRP could pre-process the block, and if it gets to the end of the block, 
and it had at least one statement with a side effect and no calls which 
may not return you could process the block with all the side effects 
already active.   I'm not sure if that buys as much as the cost, but it 
would change the value written to C to be 1, and it would change the 
global values exported for _2 and _3.


Another option would be flag the ssa-names instead of/as well as marking 
them as stale.  If we get to the end of the block and there were no 
non-returning functions or EH edges, then re-calculate and export those 
ssa_names using the latest values..   That would export [0,0] for _2 and _3.


This would have no tangible impact during the first VRP pass, but the 
*next* VRP pass, (or any other ranger pass) would pick up the new global 
ranges, and do all the right things...  so we basically let a subsequent 
pass pick up the info and do the dirty work.


Andrew

Re: [PATCH 0/3] rs6000: Move g++.dg powerpc tests to g++.target

2022-02-22 Thread Paul A. Clarke via Gcc-patches

On Tue, Feb 22, 2022 at 12:28:56PM -0600, Segher Boessenkool wrote:
> On Mon, Feb 21, 2022 at 03:17:44PM -0600, Paul A. Clarke wrote:
> > Some tests in g++.dg are target-specific for powerpc. Move those to
> > g++.target/powerpc. Update the DejaGnu directives as needed, since
> > the target restriction is perhaps no longer needed when residing in the
> > target-specific powerpc subdirectory.
> 
> Not "perhaps" :-)  More specifically, powerpc.exp has
> 
> # Exit immediately if this isn't a PowerPC target.
> if {![istarget powerpc*-*-*] } then {
>   return
> }
> 
> so anything run from that driver does not have to test for powerpc
> separately anymore.

The context for "perhaps" is for cases like:
// { dg-do compile { target powerpc*-*-darwin* } }
and
// { dg-do compile { target { powerpc*-*-linux* } } }

where the target is still needed, albeit without the "powerpc"
restriction itself.

PC

[PATCH] c++: ->template and implicit typedef [PR104608]

2022-02-22 Thread Marek Polacek via Gcc-patches

Here we have a forward declaration of Parameter for which we create
an implicit typedef, which is a TYPE_DECL.  Then, when looking it up
at template definition time, cp_parser_template_id gets (since r12-6754)
this TYPE_DECL which it can't handle.

This patch defers lookup for implicit typedefs, a la r12-6879.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/104608

gcc/cp/ChangeLog:

* parser.cc (cp_parser_template_name): Repeat lookup of implicit
typedef.

gcc/testsuite/ChangeLog:

* g++.dg/parse/template-keyword3.C: New test.
---
 gcc/cp/parser.cc   |  3 ++-
 gcc/testsuite/g++.dg/parse/template-keyword3.C | 12 
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/parse/template-keyword3.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 03d99aba13e..5e89e3737b0 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -18681,7 +18681,8 @@ cp_parser_template_name (cp_parser* parser,
  return error_mark_node;
}
   else if ((!DECL_P (decl) && !is_overloaded_fn (decl))
-  || TREE_CODE (decl) == USING_DECL)
+  || TREE_CODE (decl) == USING_DECL
+  || DECL_IMPLICIT_TYPEDEF_P (decl))
/* Repeat the lookup at instantiation time.  */
decl = identifier;
 }
diff --git a/gcc/testsuite/g++.dg/parse/template-keyword3.C 
b/gcc/testsuite/g++.dg/parse/template-keyword3.C
new file mode 100644
index 000..59fe0fc180b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/template-keyword3.C
@@ -0,0 +1,12 @@
+// PR c++/104608
+
+class Parameter;
+template  class Function 
+: public R  
+{
+Function();
+};
+template 
+Function::Function() {
+this->template Parameter();
+}

base-commit: bc66b471d16ef2fd8cb66fd1131b41f80ecb9961
-- 
2.35.1

Re: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-02-22 Thread Tom de Vries via Gcc-patches


On 2/22/22 17:08, Roger Sayle wrote:


Hi Tom,

I'll admit that I'd not myself considered the ABI issues when I initially 
proposed
experimental HFmode support for the nvptx backend, and was surprised when
I finally tracked down the source of the problem you'd reported: that libgcc
spots HFmode support exists and immediately starts passing/returning values
in this type.

The one precedent that I can point to is that LLVM's nvptx backend passes
HFmode values in SImode regs,   see https://reviews.llvm.org/D28540


Interesting, thanks for the link.


Their motivation is that not all PTX ISAs support fp16, so for compatibility
with say sm_30/sm_35, fp16 values are treated like b16, i.e. HImode.
At this point, the nvptx ABI states that HImode values are passed as SImode,
so we end up with the interesting mismatch of HFmode<->SImode.


Indeed, that sounds plausible.

And IIUC, that also means that this leaves the door open for us to 
implement fp16 support for pre-sm_53 using b16 in a compatible way.


Then I think the current solution is OK, thanks for digging this up.

Thanks,
-Tom


I guess the same thing affects host code, where an i386/x86 host that
doesn't support 16-bit floating point, can pass "unsigned short" values
to and from the accelerator, and likewise this HImode locally gets passed
in a wider (often WORD_MODE) integer types on most x86 ABIs.

My guess is that passing SFmode in DImode may have been supported
in older versions of GCC, before handling of SUBREGs was tightened up,
so this might be considered a regression.

Cheers,
Roger
--


-Original Message-
From: Tom de Vries 
Sent: 22 February 2022 15:43
To: Roger Sayle ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] middle-end: Support ABIs that pass FP values as wider
integers.

On 2/9/22 21:12, Roger Sayle wrote:


This patch adds middle-end support for target ABIs that pass/return
floating point values in integer registers with precision wider than
the original FP mode.  An example, is the nvptx backend where 16-bit
HFmode registers are passed/returned as (promoted to) SImode registers.
Unfortunately, this currently falls foul of the various (recent?)
sanity checks that (very sensibly) prevent creating paradoxical
SUBREGs of floating point registers.  The approach below is to
explicitly perform the conversion/promotion in two steps, via an
integer mode of same precision as the floating point value.  So on
nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using
SUBREG), then zero-extended to SImode, and likewise when going the
other way, parameters truncated to HImode then converted to HFmode
(using SUBREG).  These changes are localized to expand_value_return
and expanding DECL_RTL to support strange ABIs, rather than inside
convert_modes or gen_lowpart, as mismatched precision integer/FP
conversions should be explicit in the RTL, and these semantics not generally

visible/implicit in user code.




Hi Roger,

I cannot comment on the patch, but I do wonder (after your "strange ABI"
comment): did we actively decide on (or align to) a register passing ABI for
HFmode, or has it merely been decided by the implementation of
promote_arg:
...
static machine_mode
promote_arg (machine_mode mode, bool prototyped) {
if (!prototyped && mode == SFmode)
  /* K&R float promotion for unprototyped functions.  */
  mode = DFmode;
else if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode))
  mode = SImode;

return mode;
}
...

There may be a rationale why it's good to pass a HF as SI, but it's not
documented there.

Anyway, I checked what cuda does for HF, and it passes a byte array:
...
.param .align 2 .b8 _Z5helloPj6__halfs_param_1[2], ...

So, I guess what I'm saying is I'd like to understand why we're having the HF 
-> SI
promotion.

Thanks,
- Tom

Re: [PATCH v4] Make `-Werror` optional in libatomic/libbacktrace/libgomp/libitm/libsanitizer

2022-02-22 Thread Ian Lance Taylor via Gcc-patches

On Thu, Feb 3, 2022 at 6:07 AM David Seifert via Gcc-patches
 wrote:
>
> * `-Werror` can cause issues when a more recent version of GCC compiles
>   an older version:
>   - https://bugs.gentoo.org/229059
>   - https://bugs.gentoo.org/475350
>   - https://bugs.gentoo.org/667104
>
> Bootstrapped/regtested x86_64-linux, tested without --disable-werror and
> with ./configure --disable-werror, the latter removing -Werror as expected.
>
> libgo/ChangeLog:
>
> * libgo/configure.ac: Support --disable-werror.
> * libgo/configure: Regenerate.

Just a note that the libgo directory is copied from upstream sources,
and should not be changed directly in the GCC repo.  See
libgo/README.gcc.  I'll take care of this discrepancy.  Thanks.

Ian

RE: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.

2022-02-22 Thread Roger Sayle



>> Anyway, I checked what cuda does for HF, and it passes a byte array:
>>> .param .align 2 .b8 _Z5helloPj6__halfs_param_1[2], ...
> >
> > The one precedent that I can point to is that LLVM's nvptx backend passes
> > HFmode values in SImode regs,   see https://reviews.llvm.org/D28540
> 
> Interesting, thanks for the link.

In theory, GCC could also support -mfloat-abi=nvcc and -mfloat-abi=llvm
(much like other targets have -mfloat-abi=soft vs. -mfloat-abi=hard).
At this point getting any ABI supporting HFmode would be an improvement.

Roger
--

libgo patch committed: make -Werror optional

2022-02-22 Thread Ian Lance Taylor via Gcc-patches

I committed ths libgo patch to make -Werror optional.  This patch is
already in the GCC sources, where it was erroneously applied before
the upstream patch.  This is the upstream patch.

Ian
diff --git a/libgo/configure.ac b/libgo/configure.ac
index 3cadc6d20..7e2b98ba6 100644
--- a/libgo/configure.ac
+++ b/libgo/configure.ac
@@ -62,11 +62,10 @@ AC_PROG_AWK
 WARN_FLAGS='-Wall -Wextra -Wwrite-strings -Wcast-qual'
 AC_SUBST(WARN_FLAGS)
 
-AC_ARG_ENABLE(werror, [AS_HELP_STRING([--enable-werror],
-  [turns on -Werror @<:@default=yes@:>@])])
-if test "x$enable_werror" != "xno"; then
-  WERROR="-Werror"
-fi
+AC_ARG_ENABLE([werror],
+  [AS_HELP_STRING([--disable-werror], [disable building with -Werror])])
+AS_IF([test "x$enable_werror" != "xno" && test "x$GCC" = "xyes"],
+  [WERROR="-Werror"])
 AC_SUBST(WERROR)
 
 glibgo_toolexecdir=no

libgo patch committed: Update README.gcc

2022-02-22 Thread Ian Lance Taylor via Gcc-patches

I committed this libgo patch to update the README.gcc file.

Ian
0f16f4ad82cb47bc444688822cc142d80192c284
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 7455d01c179..424bbebfeed 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-aee8eddbfc3ef1b03353a060e79e7d668fb229e2
+45fd14ab8baf5e86012a808426f8ef52c1d77943
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/README.gcc b/libgo/README.gcc
index d5aabb0f9c2..3c56ec7be17 100644
--- a/libgo/README.gcc
+++ b/libgo/README.gcc
@@ -1,7 +1,6 @@
 The files in this directory are mirrored from the gofrontend project
-hosted at http://code.google.com/p/gofrontend.  These files are the
+hosted at https://go.googlesource.com/gofrontend/ and mirrored at
+https://github.com/golang/gofrontend.  These files are the
 ones in the libgo subdirectory of that project.
 
-By default, the networking tests are not run.  In order to run all the
-libgo tests, you need to define the environment variable
-GCCGO_RUN_ALL_TESTS to a non-empty string.
+To change these files, see https://go.dev/doc/gccgo_contribute.

Re: [PATCH 2/3] rs6000: Move g++.dg powerpc PR tests to g++.target

2022-02-22 Thread Segher Boessenkool

On Mon, Feb 21, 2022 at 03:17:46PM -0600, Paul A. Clarke wrote:
> Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no
> longer required.
> 
> 2021-02-21  Paul A. Clarke  
> 
> gcc/testsuite
>   * g++.dg/pr65240.h: Move to g++.target/powerpc.
>   * g++.dg/pr93974.C: Likewise.
>   * g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg directives.
>   * g++.dg/pr65240-2.C: Likewise.
>   * g++.dg/pr65240-3.C: Likewise.
>   * g++.dg/pr65240-4.C: Likewise.
>   * g++.dg/pr65242.C: Likewise.
>   * g++.dg/pr67211.C: Likewise.
>   * g++.dg/pr69667.C: Likewise.
>   * g++.dg/pr71294.C: Likewise.
>   * g++.dg/pr84264.C: Likewise.
>   * g++.dg/pr84279.C: Likewise.
>   * g++.dg/pr85657.C: Likewise.

Okay for trunk.  Thanks!

That said...

> -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> -/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-skip-if "" { *-*-darwin* } } */

That skip-if is most likely cargo cult, and it's not clear why lp64
would be needed either (there is no comment what it is needed for, for
example).

> --- a/gcc/testsuite/g++.dg/pr85657.C
> +++ b/gcc/testsuite/g++.target/powerpc/pr85657.C
> @@ -1,4 +1,4 @@
> -// { dg-do compile { target { powerpc*-*-linux* } } }
> +// { dg-do compile { target { *-*-linux* } } }

A comment here would help as well.  All of that is pre-existing of
course.


Segher

Re: [PATCH 2/3] rs6000: Move g++.dg powerpc PR tests to g++.target

2022-02-22 Thread Paul A. Clarke via Gcc-patches

On Tue, Feb 22, 2022 at 06:41:45PM -0600, Segher Boessenkool wrote:
> On Mon, Feb 21, 2022 at 03:17:46PM -0600, Paul A. Clarke wrote:
> > Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is 
> > no
> > longer required.
> > 
> > 2021-02-21  Paul A. Clarke  
> > 
> > gcc/testsuite
> > * g++.dg/pr65240.h: Move to g++.target/powerpc.
> > * g++.dg/pr93974.C: Likewise.
> > * g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg directives.
> > * g++.dg/pr65240-2.C: Likewise.
> > * g++.dg/pr65240-3.C: Likewise.
> > * g++.dg/pr65240-4.C: Likewise.
> > * g++.dg/pr65242.C: Likewise.
> > * g++.dg/pr67211.C: Likewise.
> > * g++.dg/pr69667.C: Likewise.
> > * g++.dg/pr71294.C: Likewise.
> > * g++.dg/pr84264.C: Likewise.
> > * g++.dg/pr84279.C: Likewise.
> > * g++.dg/pr85657.C: Likewise.
> 
> Okay for trunk.  Thanks!

Thanks for the review! More below...

> That said...
> 
> > -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> > -/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > +/* { dg-do compile { target lp64 } } */
> > +/* { dg-skip-if "" { *-*-darwin* } } */
> 
> That skip-if is most likely cargo cult, and it's not clear why lp64
> would be needed either (there is no comment what it is needed for, for
> example).

I can't speak to darwin, nor have an easy way of testing on it.

As for lp64, these tests fail on -m32 with:
  cc1plus: error: '-mcmodel' not supported in this configuration
- g++.dg/pr65240-1.C
- g++.dg/pr65240-2.C
- g++.dg/pr65240-3.C

'-mcmodel' is in the dg-options line for the above tests.

The rest PASSed.  Shall I remove the 'lp64' restriction for those that PASS?

> > +++ b/gcc/testsuite/g++.target/powerpc/pr85657.C
> > @@ -1,4 +1,4 @@
> > -// { dg-do compile { target { powerpc*-*-linux* } } }
> > +// { dg-do compile { target { *-*-linux* } } }
> 
> A comment here would help as well.  All of that is pre-existing of
> course.

I'm not sure what such a comment would say. I suspect it was a testing issue
(only tested on Linux), but I have similar limitations, so I'm also reluctant
to enable the test for what would be untested (by me) platforms.

PC

[PATCH][RFC] c++/96765: warn when casting Base* to Derived* in Base ctor/dtor

2022-02-22 Thread Zhao Wei Liew via Gcc-patches

Hi!

This patch aims to add a warning when casting "this" in a base class
constructor to a derived class type. It works on the test cases
provided, but I'm still running regression tests.

However, I have a few doubts:
1. Am I missing out any cases? Right now, I'm identifying the casts by
checking that TREE_CODE (expr) == NOP_EXPR && is_this_parameter
(TREE_OPERAND (expr, 0)). It seems fine to me but perhaps there is a
function that I can use to express this more concisely?
2. -Wcast-qual doesn't seem to be the right flag for this warning.
However, I can't seem to find an appropriate flag. Maybe I should
place it under -Wextra or -Wall?

Appreciate any feedback on the aforementioned doubts or otherwise.
Thanks, and have a great day!
From 8a1f352f3db06faf264bc823387714a4a9e638b6 Mon Sep 17 00:00:00 2001
From: Zhao Wei Liew 
Date: Tue, 22 Feb 2022 16:03:17 +0800
Subject: [PATCH] c++: warn on Base* to Derived* cast in Base ctor/dtor
 [PR96765]

Casting "this" in a base class constructor to a derived class type is
undefined behaviour, but there is no warning when doing so.

Add a warning for this.

Signed-off-by: Zhao Wei Liew 

PR c++/96765

gcc/cp/ChangeLog:

* typeck.cc (build_static_cast_1): Add a warning when casting
  Base * to Derived * in Base constructor and destructor.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wcast-qual3.C: New test.
---
 gcc/cp/typeck.cc|  8 ++
 gcc/testsuite/g++.dg/warn/Wcast-qual3.C | 33 +
 2 files changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wcast-qual3.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index f796337f73c..bbc40b25547 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -8080,6 +8080,14 @@ build_static_cast_1 (location_t loc, tree type, tree 
expr, bool c_cast_p,
 {
   tree base;
 
+  if ((DECL_CONSTRUCTOR_P (current_function_decl)
+|| DECL_DESTRUCTOR_P (current_function_decl))
+   && TREE_CODE (expr) == NOP_EXPR
+   && is_this_parameter (TREE_OPERAND (expr, 0)))
+warning_at(loc, OPT_Wcast_qual,
+   "invalid % from type %qT to type %qT before 
the latter is constructed",
+   intype, type);
+
   if (processing_template_decl)
return expr;
 
diff --git a/gcc/testsuite/g++.dg/warn/Wcast-qual3.C 
b/gcc/testsuite/g++.dg/warn/Wcast-qual3.C
new file mode 100644
index 000..8c44a23bd68
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wcast-qual3.C
@@ -0,0 +1,33 @@
+// PR c++/96765
+// { dg-options "-Wcast-qual" }
+
+struct Derived;
+struct Base {
+  Derived *x;
+  Derived *y;
+  Base();
+  ~Base();
+};
+
+struct Derived : Base {};
+
+Base::Base()
+: x(static_cast(this)), // { dg-warning "invalid 'static_cast'" 
}
+  y((Derived *)this) // { dg-warning "invalid 'static_cast'" }
+{ 
+  static_cast(this); // { dg-warning "invalid 'static_cast'" }
+  (Derived *)this; // { dg-warning "invalid 'static_cast'" }
+}
+
+Base::~Base() {
+  static_cast(this); // { dg-warning "invalid 'static_cast'" }
+  (Derived *)this; // { dg-warning "invalid 'static_cast'" }
+}
+
+struct Other {
+  Other() {
+Base b;
+static_cast(&b);
+(Derived *)(&b);
+  }
+};
-- 
2.35.1

Re: [PATCH] Check if loading const from mem is faster

2022-02-22 Thread guojiufu via Gcc-patches


On 2022-02-23 01:30, Segher Boessenkool wrote:

Hi Jiu Fu,

On Tue, Feb 22, 2022 at 02:53:13PM +0800, Jiufu Guo wrote:

 static bool
 rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, 
rtx x)

 {
-  if (GET_CODE (x) == HIGH
-  && GET_CODE (XEXP (x, 0)) == UNSPEC)
+  if (GET_CODE (x) == HIGH)
 return true;



Hi Segher,


This isn't explained anywhere.  "Update" is not enough ;-)
Thanks! I will add explanations for it.   This excludes all 'HIGH' for 
'x' code,

like function "rs6000_emit_move" also check if the code is 'HIGH'.

And on P10, I also encounter this kind of case like:
 (high:DI (symbol_ref:DI ("var_1") [flags 0xc0] var_1>))

Which fail to store into .rodata.




CSE is the pass that is most ancient and still causing problems left 
and

right.  It should be rewritten sooner rather than later.

The problem with that is that the pass does so much more than just CSE,
and we don't want to lose all those other things.  So it will be a slow
arduous affair of peeling off bits into separate passes, I think :-(


Yes, it does a lot of work. One of the additional works is checking 
'folding out

constants and putting constant in memory'.

BR,
Jiufu



Doing actual CSE without all the restrictive restrictions our pass has
historically had isn't the hard part!


Segher

Re: [PATCH] wwwdocs: Document ShadowCallStack support

2022-02-22 Thread Gerald Pfeifer

On Tue, 22 Feb 2022, Richard Sandiford wrote:
> Gah, thanks.  Clearly one of those days :-(

Looks good to me, thanks.

Gerald

Re: [PATCH 1/2] tree-optimization/104530 - Export global ranges during the VRP block walk.

2022-02-22 Thread Richard Biener via Gcc-patches

On Tue, Feb 22, 2022 at 5:42 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> Ranger currently waits until the end of the VRP pass, then calls
> export_global_ranges ().
>
> This method walks the list of ssa-names looking for names which it
> thinks should have SSA_NAME_RANGE_INFO updated, and is an artifact of
> the on-demand mechanism where there isn't an obvious time to finalize a
> name.
>
> The changes for 104288 introduced the register_side_effects method and
> do provide a final place where stmt's are processed during the DOMWALK.
>
> This patch exports the global range calculated by the statement (before
> processing side effects), and avoids the need for calling the export
> method.  This is generally better all round I think.
>
> Bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running to
> ensure...
>
> OK for trunk? or defer to stage 1?

I'm getting a bit nervous so lets defer to stage 1 unless a P1 fix
requires this.

Thanks,
Richard.

>
> Andrew

Re: [PATCH 2/2] tree-optimization/104530 - Mark defs dependent on non-null stale.

2022-02-22 Thread Richard Biener via Gcc-patches

On Tue, Feb 22, 2022 at 5:42 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> This patch simply leverages the existing computation machinery to
> re-evaluate values dependent on a newly found non-null value
>
> Ranger associates a monotonically increasing temporal value with every
> def as it is defined.  When that value is used, we check if any of the
> values used in the definition have been updated, making the current
> cached global value stale.  This makes the evaluation lazy, if there are
> no more uses, we will never re-evaluate.
>
> When an ssa-name is marked non-null it does not change the global value,
> and thus will not invalidate any global values.  This patch marks any
> definitions in the block which are dependent on the non-null value as
> stale.  This will cause them to be re-evaluated when they are next used.
>
> Imports: b.0_1  d.3_7
> Exports: b.0_1  _2  _3  d.3_7  _8
>   _2 : b.0_1(I)
>   _3 : b.0_1(I)  _2
>   _8 : b.0_1(I)  _2  _3  d.3_7(I)
>
> b.0_1 = b;
>  _2 = b.0_1 == 0B;
>  _3 = (int) _2;
>  c = _3;
>  _5 = *b.0_1;<<-- from this point b.0_1 is [+1, +INF]
>  a = _5;
>  d.3_7 = d;
>  _8 = _3 % d.3_7;
>  if (_8 != 0)
>
> when _5 is defined, and n.0_1 becomes non-null,  we mark the dependent
> names that are exports and defined in this block as stale.  so _2, _3
> and _8.
>
> When _8 is being calculated, _3 is stale, and causes it to be
> recomputed.  it is dependent on _2, alsdo stale, so it is also
> recomputed, and we end up with
>
>_2 == [0, 0]
>_3 == [0 ,0]
> and _8 = [0, 0]
> And then we can fold away the condition.
>
> The side effect is that _2 and _3 are globally changed to be [0, 0], but
> this is OK because it is the definition block, so it dominates all other
> uses of these names, and they should be [0,0] upon exit anyway.  The
> previous patch ensure that the global values written to
> SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3.
>
> The patch would have been even smaller if I already had a mark_stale
> method.   I thought there was one, but I guess it never made it in from
> lack of need at the time.   The only other tweak was to make the value
> stale if the dependent value was the same as the definitions.
>
> This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running
> to ensure.

@@ -1475,6 +1488,15 @@ ranger_cache::update_to_nonnull (basic_block
bb, tree name)
{
  r.set_nonzero (type);
  m_on_entry.set_bb_range (name, bb, r);
+ // Mark consumers of name stale so they can be recomputed.
+ if (m_gori.is_import_p (name, bb) || m_gori.is_export_p (name, bb))
+   {
+ tree x;
+ FOR_EACH_GORI_EXPORT_NAME (m_gori, bb, x)
+   if (m_gori.in_chain_p (name, x)
+   && gimple_bb (SSA_NAME_DEF_STMT (x)) == bb)
+ m_temporal->set_stale (x);
+   }
}

so if we have a BB that exports N names and each of those is updated to nonnull
this is going to be quadratic?  It also looks like the gimple_bb check
is cheaper
than the bitmap test done in in_chain_p.  What comes to my mind is why we need
to mark "consumers"?  Can't consumers check their uses defs when they look
at their timestamp?  This whole set_stale thing doesn't seem to be
transitive anyway,
consider:

   _1 = ...


   _2 = _1 + ..;


  _3 = _2 + ...;

so when _1 is updated to non-null we mark _2 as stale but _3 should
also be stale, no?
When we visit _3 before eventually getting to _2 (to see whether it
updates and thus
we more precisely we know if it makes _3 stale) we won't re-evaluate it?

That said, the change looks somewhat ad-hoc to get to 1-level deep second-level
opportunities?

Richard.

>
> OK for trunk? or defer to stage 1?
> Andrew

Re: [PATCH][middle-end/104550]Suppress uninitialized warnings for new created uses from __builtin_clear_padding folding

2022-02-22 Thread Richard Biener via Gcc-patches

On Tue, 22 Feb 2022, Qing Zhao wrote:

> __builtin_clear_padding(&object) will clear all the padding bits of the 
> object.
> actually, it doesn't involve any use of an user variable. Therefore, users do
> not expect any uninitialized warning from it. It's reasonable to suppress
> uninitialized warnings for all new created uses from __builtin_clear_padding
> folding.
> 
> The patch has been bootstrapped and regress tested on both x86 and aarch64.
> 
> Okay for trunk?
> 
> Thanks.
> 
> Qing
> 
> ==
> From cf6620005f55d4a1f782332809445c270d22cf86 Mon Sep 17 00:00:00 2001
> From: qing zhao 
> Date: Mon, 21 Feb 2022 16:38:31 +
> Subject: [PATCH] Suppress uninitialized warnings for new created uses from
>  __builtin_clear_padding folding [PR104550]
> 
> __builtin_clear_padding(&object) will clear all the padding bits of the 
> object.
> actually, it doesn't involve any use of an user variable. Therefore, users do
> not expect any uninitialized warning from it. It's reasonable to suppress
> uninitialized warnings for all new created uses from __builtin_clear_padding
> folding.
> 
>   PR middle-end/104550
> 
> gcc/ChangeLog:
> 
>   * gimple-fold.cc (clear_padding_flush): Suppress warnings for new
>   created uses.
>   (clear_padding_emit_loop): Likewise.
>   (clear_padding_type): Likewise.
>   (gimple_fold_builtin_clear_padding): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/auto-init-pr104550-1.c: New test.
>   * gcc.dg/auto-init-pr104550-2.c: New test.
>   * gcc.dg/auto-init-pr104550-3.c: New test.
> ---
>  gcc/gimple-fold.cc  | 31 +++--
>  gcc/testsuite/gcc.dg/auto-init-pr104550-1.c | 10 +++
>  gcc/testsuite/gcc.dg/auto-init-pr104550-2.c | 11 
>  gcc/testsuite/gcc.dg/auto-init-pr104550-3.c | 11 
>  4 files changed, 55 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-3.c
> 
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 16f02c2d098..1e18ba3465a 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -4296,6 +4296,7 @@ clear_padding_flush (clear_padding_struct *buf, bool 
> full)
>build_int_cst (buf->alias_type,
>   buf->off + padding_end
>   - padding_bytes));
> +   suppress_warning (dst, OPT_Wuninitialized);
> gimple *g = gimple_build_assign (dst, src);
> gimple_set_location (g, buf->loc);
> gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
> @@ -4341,6 +4342,7 @@ clear_padding_flush (clear_padding_struct *buf, bool 
> full)
> tree dst = build2_loc (buf->loc, MEM_REF, atype,
>buf->base,
>build_int_cst (buf->alias_type, off));
> +   suppress_warning (dst, OPT_Wuninitialized);
> gimple *g = gimple_build_assign (dst, src);
> gimple_set_location (g, buf->loc);
> gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
> @@ -4370,6 +4372,7 @@ clear_padding_flush (clear_padding_struct *buf, bool 
> full)
>   atype = build_aligned_type (type, buf->align);
> tree dst = build2_loc (buf->loc, MEM_REF, atype, buf->base,
>build_int_cst (buf->alias_type, off));
> +   suppress_warning (dst, OPT_Wuninitialized);
> tree src;
> gimple *g;
> if (all_ones
> @@ -4420,6 +4423,7 @@ clear_padding_flush (clear_padding_struct *buf, bool 
> full)
>build_int_cst (buf->alias_type,
>   buf->off + end
>   - padding_bytes));
> +   suppress_warning (dst, OPT_Wuninitialized);
> gimple *g = gimple_build_assign (dst, src);
> gimple_set_location (g, buf->loc);
> gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
> @@ -4620,14 +4624,18 @@ clear_padding_emit_loop (clear_padding_struct *buf, 
> tree type,
>gsi_insert_before (buf->gsi, g, GSI_SAME_STMT);
>clear_padding_type (buf, type, buf->sz, for_auto_init);
>clear_padding_flush (buf, true);
> -  g = gimple_build_assign (buf->base, POINTER_PLUS_EXPR, buf->base,
> -size_int (buf->sz));
> +  tree rhs = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (buf->base),
> +   buf->base, size_int (buf->sz));
> +  suppress_warning (rhs, OPT_Wuninitialized);
> +  g = gimple_build_assign (buf->base, rhs);

why do we need to suppress warnings on a POINTER_PLUS_EXPR?  The
use of fold_build2 here is a step backwards btw, I'm not sure
whether suppress_warning is properly preserved here.  If nee

Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.

2022-02-22 Thread Richard Biener via Gcc-patches

On Tue, Feb 22, 2022 at 8:19 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> On 2/22/22 13:07, Jeff Law wrote:
> >
> >
> > On 2/22/2022 10:57 AM, Jakub Jelinek via Gcc-patches wrote:
> >> On Tue, Feb 22, 2022 at 12:39:28PM -0500, Andrew MacLeod wrote:
>  That is EH, then there are calls that might not return because they
>  leave
>  in some other way (e.g. longjmp), or might loop forever, might
>  exit, might
>  abort, trap etc.
> >>> Generally speaking, calls which do not return should not now be a
> >>> problem...
> >>> as long as they do not transfer control to somewhere else in the
> >>> current
> >>> function.
> >> I thought all of those cases are very relevant to PR104530.
> >> If we have:
> >>_1 = ptr_2(D) == 0;
> >>// unrelated code in the same bb
> >>_3 = *ptr_2(D);
> >> then in light of PR104288, we can optimize ptr_2(D) == 0 into true
> >> only if
> >> there are no calls inside of "// unrelated code in the same bb"
> >> or if all calls in "// unrelated code in the same bb" are guaranteed to
> >> return exactly once.  Because, if there is a call in there which could
> >> exit (that is the PR104288 testcase), or abort, or trap, or loop
> >> forever,
> >> or throw externally, or longjmp or in any other non-UB way
> >> cause the _1 = ptr_2(D) == 0; stmt to be invoked at runtime but
> >> _3 = *ptr_2(D) not being invoked, then we can't optimize the earlier
> >> comparison because ptr_2(D) could be NULL in a valid program.
> >> While if there are no calls (and no problematic inline asms) and no
> >> trapping
> >> insns in between, we can and PR104530 is asking that we continue to
> >> optimize
> >> that.
> > Right.  This is similar to some of the restrictions we deal with in
> > the path isolation pass.  Essentially we have a path, when traversed,
> > would result in a *0.  We would like to be able to find the edge
> > upon-which the *0 is control dependent and optimize the test so that
> > it always went to the valid path rather than the *0 path.
> >
> > The problem is there may be observable side effects on the *0 path
> > between the test and the actual *0 -- including calls to nonreturning
> > functions, setjmp/longjmp, things that could trap, etc.  This case is
> > similar.  We can't back-propagate the non-null status through any
> > statements with observable side effects.
> >
> > Jeff
> >
> We can't back propagate, but we can alter our forward view.  Any
> ssa-name defined before the observable side effect can be recalculated
> using the updated values, and all uses of those names after the
> side-effect would then appear to be "up-to-date"
>
> This does not actually change anything before the side-effect statement,
> but the lazy re-evalaution ranger employs makes it appear as if we do a
> new computation when _1 is used afterwards. ie:
>
> _1 = ptr_2(D) == 0;
> // unrelated code in the same bb
> _3 = *ptr_2(D);
> _4 = ptr_2(D) == 0;  // ptr_2 is known to be [+1, +INF] now.
> And we use _4 everywhere _1 was used.   This is the effect.
>
> so we do not actually change anything in the unrelated code, just
> observable effects afterwards.  We already do these recalculations on
> outgoing edges in other blocks, just not within the definition block
> because non-null wasn't visible within the def block.
>
> Additionally, In the testcase, there is a store to C before the side
> effects.
> these patches get rid of the branch and thus the call in the testcase as
> requested, but we still have to compute _3 in order to store it into
> global C since it occurs  pre side-effect.
>
>  b.0_1 = b;
>  _2 = b.0_1 == 0B;
>  _3 = (int) _2;
>  c = _3;
>  _5 = *b.0_1;
>
> No matter how you look at it, you are going to need to process a block
> twice in order to handle any code pre-side-effect.  Whether it be
> assigning stmt uids, or what have you.

Yes.  I thought that is what ranger already does when it discovers new
ranges from edges.  Say we have

  _1 = 10 / _2;
  if (_2 == 1)
{
   _3 = _1 + 1;

then when evaluating _1 + 1 we re-evaluate 10 / _2 using _2 == 1 and
can compute _3 to [11, 11]?

That obviously extends to any stmt-level ranges we discover for uses
(not defs because defs are never used upthread).  And doing that is
_not_ affected by any function/BB terminating calls or EH or whatnot
as long as the updated ranges are only affecting stmts dominating the
current one.

What complicates all this reasoning is that it is straight-forward when
you work with a traditional IL walking pass but it gets hard (and possibly
easy to get wrong) with on-demand processing and caching because
everything you cache will now be context dependent (valid only
starting after stmt X and for stmts dominated by it).

> VRP could pre-process the block, and if it gets to the end of the block,
> and it had at least one statement with a side effect and no calls which
> may not return you could process the block with all the side effects
> already active.

74 matches

Mail list logo