[committed][nvptx, testsuite] Remove mptx settings in gcc.target/nvptx tests
Hi, Some test-cases in gcc/testsuite/gcc.target/nvptx contain mptx settings, which are paired with misa settings, in order to have the mptx version support the misa version. Since commit decde11183bd ("[nvptx] Choose -mptx default based on -misa"), this is no longer necessary. Remove the mptx settings. Tested on nvptx. Committed to trunk. Thanks, - Tom [nvptx, testsuite] Remove mptx settings in gcc.target/nvptx tests gcc/testsuite/ChangeLog: 2022-02-20 Tom de Vries * gcc.target/nvptx/float16-1.c: Drop -mptx setting. * gcc.target/nvptx/float16-2.c: Same. * gcc.target/nvptx/float16-3.c: Same. * gcc.target/nvptx/float16-4.c: Same. * gcc.target/nvptx/float16-5.c: Same. * gcc.target/nvptx/float16-6.c: Same. * gcc.target/nvptx/tanh-1.c: Same. --- gcc/testsuite/gcc.target/nvptx/float16-1.c | 2 +- gcc/testsuite/gcc.target/nvptx/float16-2.c | 2 +- gcc/testsuite/gcc.target/nvptx/float16-3.c | 2 +- gcc/testsuite/gcc.target/nvptx/float16-4.c | 2 +- gcc/testsuite/gcc.target/nvptx/float16-5.c | 2 +- gcc/testsuite/gcc.target/nvptx/float16-6.c | 2 +- gcc/testsuite/gcc.target/nvptx/tanh-1.c| 2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/gcc/testsuite/gcc.target/nvptx/float16-1.c b/gcc/testsuite/gcc.target/nvptx/float16-1.c index 3a0324d1652..9c3f8fe8f9d 100644 --- a/gcc/testsuite/gcc.target/nvptx/float16-1.c +++ b/gcc/testsuite/gcc.target/nvptx/float16-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */ +/* { dg-options "-O2 -misa=sm_53 -ffast-math" } */ _Float16 var; diff --git a/gcc/testsuite/gcc.target/nvptx/float16-2.c b/gcc/testsuite/gcc.target/nvptx/float16-2.c index 5748a9c7a97..2d1dc1aafb5 100644 --- a/gcc/testsuite/gcc.target/nvptx/float16-2.c +++ b/gcc/testsuite/gcc.target/nvptx/float16-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ffast-math -misa=sm_80 -mptx=7.0" } */ +/* { dg-options "-O2 -ffast-math -misa=sm_80" } */ _Float16 x; _Float16 y; diff --git a/gcc/testsuite/gcc.target/nvptx/float16-3.c b/gcc/testsuite/gcc.target/nvptx/float16-3.c index 914282aa1c3..3abcec39a8a 100644 --- a/gcc/testsuite/gcc.target/nvptx/float16-3.c +++ b/gcc/testsuite/gcc.target/nvptx/float16-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -misa=sm_53 -mptx=6.3" } */ +/* { dg-options "-O2 -misa=sm_53" } */ _Float16 var; diff --git a/gcc/testsuite/gcc.target/nvptx/float16-4.c b/gcc/testsuite/gcc.target/nvptx/float16-4.c index b11f17a43ce..173f9600ac7 100644 --- a/gcc/testsuite/gcc.target/nvptx/float16-4.c +++ b/gcc/testsuite/gcc.target/nvptx/float16-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */ +/* { dg-options "-O2 -misa=sm_53 -ffast-math" } */ _Float16 var; diff --git a/gcc/testsuite/gcc.target/nvptx/float16-5.c b/gcc/testsuite/gcc.target/nvptx/float16-5.c index 5fe15ecdf7e..700b3159a97 100644 --- a/gcc/testsuite/gcc.target/nvptx/float16-5.c +++ b/gcc/testsuite/gcc.target/nvptx/float16-5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -misa=sm_53 -mptx=6.3 -ffast-math" } */ +/* { dg-options "-O2 -misa=sm_53 -ffast-math" } */ _Float16 a; _Float16 b; diff --git a/gcc/testsuite/gcc.target/nvptx/float16-6.c b/gcc/testsuite/gcc.target/nvptx/float16-6.c index 8fe4fa3051f..4889577f7f6 100644 --- a/gcc/testsuite/gcc.target/nvptx/float16-6.c +++ b/gcc/testsuite/gcc.target/nvptx/float16-6.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -misa=sm_53 -mptx=6.3" } */ +/* { dg-options "-O2 -misa=sm_53" } */ _Float16 x; _Float16 y; diff --git a/gcc/testsuite/gcc.target/nvptx/tanh-1.c b/gcc/testsuite/gcc.target/nvptx/tanh-1.c index 56a0e5a8578..946b8c1ad4b 100644 --- a/gcc/testsuite/gcc.target/nvptx/tanh-1.c +++ b/gcc/testsuite/gcc.target/nvptx/tanh-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ffast-math -misa=sm_75 -mptx=7.0" } */ +/* { dg-options "-O2 -ffast-math -misa=sm_75" } */ float foo(float x) {
[committed][nvptx] Xfail sibcall execution tests
Hi, On nvptx I see the following FAIL: ... FAIL: gcc.dg/sibcall-3.c execution test ... The test-case states that "this test is xfailed on targets without sibcall patterns". The nvptx port doesn't have a sibcall pattern, so add an xfail. Likewise in two similar test-cases. Tested on nvptx. Committed to trunk. Thanks, - Tom [nvptx] Xfail sibcall execution tests gcc/testsuite/ChangeLog: 2022-02-20 Tom de Vries * gcc.dg/sibcall-10.c: Xfail execution test for nvptx. * gcc.dg/sibcall-3.c: Same. * gcc.dg/sibcall-4.c: Same. --- gcc/testsuite/gcc.dg/sibcall-10.c | 2 +- gcc/testsuite/gcc.dg/sibcall-3.c | 2 +- gcc/testsuite/gcc.dg/sibcall-4.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.dg/sibcall-10.c b/gcc/testsuite/gcc.dg/sibcall-10.c index dcb3e6a5ba2..e78d88fc8fc 100644 --- a/gcc/testsuite/gcc.dg/sibcall-10.c +++ b/gcc/testsuite/gcc.dg/sibcall-10.c @@ -5,7 +5,7 @@ Copyright (C) 2002 Free Software Foundation Inc. Contributed by Hans-Peter Nilsson*/ -/* { dg-do run { xfail { { amdgcn*-*-* cris-*-* csky-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* } || { arm*-*-* && { ! arm32 } } } } } */ +/* { dg-do run { xfail { { amdgcn*-*-* cris-*-* csky-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* nvptx*-*-* } || { arm*-*-* && { ! arm32 } } } } } */ /* -mlongcall disables sibcall patterns. */ /* { dg-skip-if "" { powerpc*-*-* } { "-mlongcall" } { "" } } */ /* -msave-restore disables sibcall patterns. */ diff --git a/gcc/testsuite/gcc.dg/sibcall-3.c b/gcc/testsuite/gcc.dg/sibcall-3.c index 80555cf0640..82ad8c79809 100644 --- a/gcc/testsuite/gcc.dg/sibcall-3.c +++ b/gcc/testsuite/gcc.dg/sibcall-3.c @@ -5,7 +5,7 @@ Copyright (C) 2002 Free Software Foundation Inc. Contributed by Hans-Peter Nilsson*/ -/* { dg-do run { xfail { { cris-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* } || { arm*-*-* && { ! arm32 } } } } } */ +/* { dg-do run { xfail { { cris-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* nvptx*-*-* } || { arm*-*-* && { ! arm32 } } } } } */ /* -mlongcall disables sibcall patterns. */ /* { dg-skip-if "" { powerpc*-*-* } { "-mlongcall" } { "" } } */ /* { dg-options "-O2 -foptimize-sibling-calls" } */ diff --git a/gcc/testsuite/gcc.dg/sibcall-4.c b/gcc/testsuite/gcc.dg/sibcall-4.c index 97086bb5106..5dcff3f8d43 100644 --- a/gcc/testsuite/gcc.dg/sibcall-4.c +++ b/gcc/testsuite/gcc.dg/sibcall-4.c @@ -5,7 +5,7 @@ Copyright (C) 2002 Free Software Foundation Inc. Contributed by Hans-Peter Nilsson*/ -/* { dg-do run { xfail { { cris-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* } || { arm*-*-* && { ! arm32 } } } } } */ +/* { dg-do run { xfail { { cris-*-* h8300-*-* hppa*64*-*-* m32r-*-* mcore-*-* mn10300-*-* msp430*-*-* nds32*-*-* xstormy16-*-* v850*-*-* vax-*-* xtensa*-*-* nvptx*-*-* } || { arm*-*-* && { ! arm32 } } } } } */ /* -mlongcall disables sibcall patterns. */ /* { dg-skip-if "" { powerpc*-*-* } { "-mlongcall" } { "" } } */ /* { dg-options "-O2 -foptimize-sibling-calls" } */
[committed][libgomp, testsuite, nvptx] Fix pr96390.c without CUDA
Hi, When running the libgomp testsuite on x86_64 with nvptx accelerator, we run into: ... XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors) FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test ... The problem is that we're expecting the following ptxas error: ... XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors) Excess errors: ptxas /tmp/ccZYDw8N.o, line 90; error : Call to 'baz' requires call prototype ptxas /tmp/ccZYDw8N.o, line 90; error : Unknown symbol 'baz' ... But it's not triggered because ptxas is not in the path, so nvptx-none-as defaults to --no-verify. So instead, we run into the same error at execution time. Fix this by forcing verification using: ... /* { dg-additional-options "-foffload=-Wa,--verify" \ { target offload_target_nvptx } } */ ... such that we run into the xfail in this way instead: ... XFAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors) Excess errors: nvptx-as: error trying to exec 'ptxas': execvp: No such file or directory nvptx-as: ptxas returned 255 exit status ... Tested on x86_64-linux with nvptx accelerator. Committed to trunk. Thanks, - Tom [libgomp, testsuite, nvptx] Fix pr96390.c without CUDA libgomp/ChangeLog: 2022-02-21 Tom de Vries PR testsuite/104146 * testsuite/libgomp.c++/pr96390.C: Add additional-option -foffload=-Wa,--verify for nvptx. * testsuite/libgomp.c-c++-common/pr96390.c: Same. --- libgomp/testsuite/libgomp.c++/pr96390.C | 1 + libgomp/testsuite/libgomp.c-c++-common/pr96390.c | 1 + 2 files changed, 2 insertions(+) diff --git a/libgomp/testsuite/libgomp.c++/pr96390.C b/libgomp/testsuite/libgomp.c++/pr96390.C index 8c770ecb80c..1f3c3e05661 100644 --- a/libgomp/testsuite/libgomp.c++/pr96390.C +++ b/libgomp/testsuite/libgomp.c++/pr96390.C @@ -1,4 +1,5 @@ /* { dg-additional-options "-O0 -fdump-tree-omplower" } */ +/* { dg-additional-options "-foffload=-Wa,--verify" { target offload_target_nvptx } } */ /* { dg-xfail-if "PR 97106/PR 97102 - .alias not (yet) supported for nvptx" { offload_target_nvptx } } */ #include diff --git a/libgomp/testsuite/libgomp.c-c++-common/pr96390.c b/libgomp/testsuite/libgomp.c-c++-common/pr96390.c index 4fe09cebb5d..b89f934811a 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/pr96390.c +++ b/libgomp/testsuite/libgomp.c-c++-common/pr96390.c @@ -1,4 +1,5 @@ /* { dg-additional-options "-O0 -fdump-tree-omplower" } */ +/* { dg-additional-options "-foffload=-Wa,--verify" { target offload_target_nvptx } } */ /* { dg-require-alias "" } */ /* { dg-xfail-if "PR 97102/PR 97106 - .alias not (yet) supported for nvptx" { offload_target_nvptx } } */
[PATCH 1/2] wwwdocs: Group sanitiser changes together
Group the ThreadSanitizer and HardwareAssistedAddressSanitizer changes under a single top-level bullet point. This makes it easier to add a third sanitiser-related change. No (intended) change to the actual text or wording. (TBH I don't understand the ThreadSanitizer bit: is it describing three changes (KCSAN + two new options), four changes (other environments), or one big inter-related change?) OK to install? Richard --- htdocs/gcc-11/changes.html | 63 ++ 1 file changed, 36 insertions(+), 27 deletions(-) diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html index 8e6d4ec8..cc3ae989 100644 --- a/htdocs/gcc-11/changes.html +++ b/htdocs/gcc-11/changes.html @@ -69,18 +69,6 @@ You may also want to check out our General Improvements - -https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual";> -ThreadSanitizer improvements to support alternative runtimes and -environments. The https://www.kernel.org/doc/html/latest/dev-tools/kcsan.html";> -Linux Kernel Concurrency Sanitizer (KCSAN) is now supported. - - Add --param tsan-distinguish-volatile to optionally emit - instrumentation distinguishing volatile accesses. - Add --param tsan-instrument-func-entry-exit to optionally - control if function entries and exits should be instrumented. - - In previous releases of GCC, the "column numbers" emitted in diagnostics @@ -121,22 +109,43 @@ You may also want to check out our - -Introduce https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";> - Hardware-assisted AddressSanitizer support. This sanitizer currently -only works for the AArch64 target. It helps debug address problems -similarly to -https://github.com/google/sanitizers/wiki/AddressSanitizer";> - AddressSanitizer but is based on partial hardware assistance and -provides probabilistic protection to use less RAM at run time. -https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";> - Hardware-assisted AddressSanitizer is not production-ready for user -space, and is provided mainly for use compiling the Linux Kernel. - -To use this sanitizer the command line arguments are: +Sanitizer improvements: - -fsanitize=hwaddress to instrument userspace code. - -fsanitize=kernel-hwaddress to instrument kernel code. + + https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual";> + ThreadSanitizer improvements to support alternative runtimes + and environments. The + https://www.kernel.org/doc/html/latest/dev-tools/kcsan.html";> + Linux Kernel Concurrency Sanitizer (KCSAN) is now supported. + + Add --param tsan-distinguish-volatile to optionally + emit instrumentation distinguishing volatile accesses. + Add --param tsan-instrument-func-entry-exit to + optionally control if function entries and exits should be + instrumented. + + + + + Introduce https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";> + Hardware-assisted AddressSanitizer support. This sanitizer currently + only works for the AArch64 target. It helps debug address problems + similarly to + https://github.com/google/sanitizers/wiki/AddressSanitizer";> + AddressSanitizer but is based on partial hardware assistance and + provides probabilistic protection to use less RAM at run time. + https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";> + Hardware-assisted AddressSanitizer is not production-ready for user + space, and is provided mainly for use compiling the Linux Kernel. + + + To use this sanitizer the command line arguments are: + + -fsanitize=hwaddress to instrument userspace code. + -fsanitize=kernel-hwaddress to instrument kernel code. + + + -- 2.25.1
Re: [PATCH 1/2] wwwdocs: Group sanitiser changes together
Richard Sandiford writes: > Group the ThreadSanitizer and HardwareAssistedAddressSanitizer > changes under a single top-level bullet point. This makes it > easier to add a third sanitiser-related change. > > No (intended) change to the actual text or wording. (TBH I don't > understand the ThreadSanitizer bit: is it describing three changes > (KCSAN + two new options), four changes (other environments), > or one big inter-related change?) > > OK to install? Err, scratch that. Clearly I've not had tea this morning, and forgot which version we're about to release :-) Richard > > Richard > > --- > htdocs/gcc-11/changes.html | 63 ++ > 1 file changed, 36 insertions(+), 27 deletions(-) > > diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html > index 8e6d4ec8..cc3ae989 100644 > --- a/htdocs/gcc-11/changes.html > +++ b/htdocs/gcc-11/changes.html > @@ -69,18 +69,6 @@ You may also want to check out our > General Improvements > > > - > - href="https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual";> > -ThreadSanitizer improvements to support alternative runtimes and > -environments. The href="https://www.kernel.org/doc/html/latest/dev-tools/kcsan.html";> > -Linux Kernel Concurrency Sanitizer (KCSAN) is now supported. > - > - Add --param tsan-distinguish-volatile to optionally > emit > - instrumentation distinguishing volatile accesses. > - Add --param tsan-instrument-func-entry-exit to > optionally > - control if function entries and exits should be instrumented. > - > - > > >In previous releases of GCC, the "column numbers" emitted in > diagnostics > @@ -121,22 +109,43 @@ You may also want to check out our > > > > - > -Introduce href="https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";> > - Hardware-assisted AddressSanitizer support. This sanitizer > currently > -only works for the AArch64 target. It helps debug address problems > -similarly to > -https://github.com/google/sanitizers/wiki/AddressSanitizer";> > - AddressSanitizer but is based on partial hardware assistance and > -provides probabilistic protection to use less RAM at run time. > - href="https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";> > - Hardware-assisted AddressSanitizer is not production-ready for user > -space, and is provided mainly for use compiling the Linux Kernel. > - > -To use this sanitizer the command line arguments are: > +Sanitizer improvements: > > - -fsanitize=hwaddress to instrument userspace > code. > - -fsanitize=kernel-hwaddress to instrument kernel > code. > + > + href="https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual";> > + ThreadSanitizer improvements to support alternative runtimes > + and environments. The > + https://www.kernel.org/doc/html/latest/dev-tools/kcsan.html";> > + Linux Kernel Concurrency Sanitizer (KCSAN) is now supported. > + > + Add --param tsan-distinguish-volatile to optionally > + emit instrumentation distinguishing volatile accesses. > + Add --param tsan-instrument-func-entry-exit to > + optionally control if function entries and exits should be > + instrumented. > + > + > + > + > + Introduce href="https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";> > + Hardware-assisted AddressSanitizer support. This sanitizer > currently > + only works for the AArch64 target. It helps debug address problems > + similarly to > + https://github.com/google/sanitizers/wiki/AddressSanitizer";> > + AddressSanitizer but is based on partial hardware assistance and > + provides probabilistic protection to use less RAM at run time. > +href="https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html";> > + Hardware-assisted AddressSanitizer is not production-ready for > user > + space, and is provided mainly for use compiling the Linux Kernel. > + > + > + To use this sanitizer the command line arguments are: > + > + -fsanitize=hwaddress to instrument userspace > code. > + -fsanitize=kernel-hwaddress to instrument kernel > code. > + > + > + > > >
Re: [PATCH 3/3] target/99881 - x86 vector cost of CTOR from integer regs
On Tue, 22 Feb 2022, Richard Biener wrote: > On Tue, 22 Feb 2022, Hongtao Liu wrote: > > > On Mon, Feb 21, 2022 at 5:10 PM Richard Biener wrote: > > > > > > On Mon, 21 Feb 2022, Hongtao Liu wrote: > > > > > > > On Fri, Feb 18, 2022 at 10:01 PM Richard Biener via Gcc-patches > > > > wrote: > > > > > > > > > > This uses the now passed SLP node to the vectorizer costing hook > > > > > to adjust vector construction costs for the cost of moving an > > > > > integer component from a GPR to a vector register when that's > > > > > required for building a vector from components. A cruical difference > > > > > here is whether the component is loaded from memory or extracted > > > > > from a vector register as in those cases no intermediate GPR is > > > > > involved. > > > > > > > > > > The pr99881.c testcase can be Un-XFAILed with this patch, the > > > > > pr91446.c testcase now produces scalar code which looks superior > > > > > to me so I've adjusted it as well. > > > > > > > > > > I'm currently re-bootstrapping and testing on x86_64-unknown-linux-gnu > > > > > after adding the BIT_FIELD_REF vector extracting special casing. > > > > Does the patch handle PR101929? > > > > > > The patch will regress the testcase posted in PR101929 again: > > > > > > _255 1 times scalar_store costs 12 in body > > > _261 1 times scalar_store costs 12 in body > > > _258 1 times scalar_store costs 12 in body > > > _264 1 times scalar_store costs 12 in body > > > t0_247 + t2_251 1 times scalar_stmt costs 4 in body > > > t1_472 + t3_444 1 times scalar_stmt costs 4 in body > > > t0_406 - t2_451 1 times scalar_stmt costs 4 in body > > > t1_472 - t3_444 1 times scalar_stmt costs 4 in body > > > -node 0x4182f48 1 times vec_construct costs 16 in prologue > > > -node 0x41882b0 1 times vec_construct costs 16 in prologue > > > +node 0x4182f48 1 times vec_construct costs 28 in prologue > > > +node 0x41882b0 1 times vec_construct costs 28 in prologue > > > t0_406 + t2_451 1 times vector_stmt costs 4 in body > > > t1_472 - t3_444 1 times vector_stmt costs 4 in body > > > node 0x41829f8 1 times vec_perm costs 4 in body > > > _436 1 times vector_store costs 16 in body > > > t.c:37:9: note: Cost model analysis for part in loop 0: > > > - Vector cost: 60 > > > + Vector cost: 84 > > >Scalar cost: 64 > > > +t.c:37:9: missed: not vectorized: vectorization is not profitable. > > > > > > We're constructing V4SI from patterns like { _407, _480, _407, _480 } > > > where the components are results of integer adds (so the result is > > > definitely in a GPR). We are costing the construction as > > > 4 * sse_op + 2 * sse_to_integer which with skylake cost is > > > 4 * COSTS_N_INSNS (1) + 2 * 6. > > > > > > Whether the vectorization itself is profitable is likely questionable > > > but then it's true that the construction of V4SI is more costly > > > in terms of uops than a construction of V4SF. > > > > > > Now, we can - for the first time - now see the actual construction > > > pattern and ideal construction might be two GPR->xmm moves > > > + two splats + one unpack or maybe two GPR->xmm moves + one > > > unpack + splat of DI (or other means of duplicating the lowpart). > > Yes, the patch is technically right. I'm ok with the patch. > > Thanks, I've pushed it now. I've also tested the suggested adjustment > doing > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index b2bf90576d5..acf2cc977b4 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -22595,7 +22595,7 @@ ix86_builtin_vectorization_cost (enum > vect_cost_for_stmt type_of_cost, >case vec_construct: > { > /* N element inserts into SSE vectors. */ > - int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op; > + int cost = (TYPE_VECTOR_SUBPARTS (vectype) - 1) * > ix86_cost->sse_op; > /* One vinserti128 for combining two SSE vectors for AVX256. */ > if (GET_MODE_BITSIZE (mode) == 256) > cost += ix86_vec_cost (mode, ix86_cost->addss); > > successfully (with no effect on the PR101929 case as expected), I > will queue that for stage1 since it isn't known to fix any > regression (but I will keep it as option in case something pops up). > > I'll also have a more detailed look into the x264_r case to see > if there's something we can do about the regression that will now > show up (and I'll watch autotesters). I found a way to get back the vectorization doing diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 9188d727e33..7f1f12fb6c6 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -2374,7 +2375,7 @@ fail: n_vector_builds++; } } - if (all_uniform_p + if ((all_uniform_p && !two_operators) || n_vector_builds > 1 || (n_vector_builds == children.length () && is_a (stmt_info->stmt))) which in itself is reasonable since the result of the operation is a no
[PATCH] libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]
Hi! On #define A(n) int foo1##n(void) { return 1##n; } #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9) E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325) B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642) testcase with ./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto -O0 -o foo1.o foo1.c -ffunction-sections ./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o /tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized (testcase too slow to be included into testsuite). The problem is clearly reported by readelf: readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323 readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section. because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE inclusive. Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be used instead and .symtab_shndx section should contain the real section index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >= SHN_LORESERVE value is needed it should put those into Shdr[0].sh_{size,link}. But, sh_{link,info} are 32-bit fields which can contain any section index. Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before 2011) used to mishandle the > 63.75K sections case and assumed there is a hole in between the sections, but what simple_object_elf_copy_lto_debug_sections does wouldn't help in that case for the debug temp object creation, we'd need to detect the case also in that routine and take it into account in the remapping etc. I think it is not worth it given that it is over 10 years, if somebody needs 63.75K or more sections, better use more recent binutils. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-02-22 Jakub Jelinek PR lto/104617 * simple-object-elf.c (simple_object_elf_match): Fix up URL in comment. (simple_object_elf_copy_lto_debug_sections): Remap sh_info and sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE range (inclusive). --- libiberty/simple-object-elf.c.jj2022-01-11 23:11:23.967267993 +0100 +++ libiberty/simple-object-elf.c 2022-02-21 20:37:12.815202845 +0100 @@ -528,7 +528,7 @@ simple_object_elf_match (unsigned char h not handle objects with more than SHN_LORESERVE sections correctly. All large section indexes were offset by 0x100. There is more information at -http://sourceware.org/bugzilla/show_bug.cgi?id-5900 . +https://sourceware.org/PR5900 . Fortunately these object files are easy to detect, as the GNU binutils always put the section header string table near the end of the list of sections. Thus if the @@ -1559,17 +1559,13 @@ simple_object_elf_copy_lto_debug_section { sh_info = ELF_FETCH_FIELD (type_functions, ei_class, Shdr, shdr, sh_info, Elf_Word); - if (sh_info < SHN_LORESERVE - || sh_info > SHN_HIRESERVE) - sh_info = sh_map[sh_info]; + sh_info = sh_map[sh_info]; ELF_SET_FIELD (type_functions, ei_class, Shdr, shdr, sh_info, Elf_Word, sh_info); } sh_link = ELF_FETCH_FIELD (type_functions, ei_class, Shdr, shdr, sh_link, Elf_Word); - if (sh_link < SHN_LORESERVE - || sh_link > SHN_HIRESERVE) - sh_link = sh_map[sh_link]; + sh_link = sh_map[sh_link]; ELF_SET_FIELD (type_functions, ei_class, Shdr, shdr, sh_link, Elf_Word, sh_link); } Jakub
[PATCH] wwwdocs: Document ShadowCallStack support
Document ShadowCallStack support. The option link doesn't work yet of course, but I checked that it works with gcc-12.1.0/ removed. OK to install? Thanks, Richard --- htdocs/gcc-12/changes.html | 11 +++ 1 file changed, 11 insertions(+) diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html index b6341fda..216ee0b6 100644 --- a/htdocs/gcc-12/changes.html +++ b/htdocs/gcc-12/changes.html @@ -96,6 +96,17 @@ a work-in-progress. Note that default vectorizer cost model has been changed which used to behave as -fvect-cost-model=cheap were specified. + +GCC now supports the +https://clang.llvm.org/docs/ShadowCallStack.html";> +ShadowCallStack sanitizer, which can be enabled using the +command-line option +https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Instrumentation-Options.html#index-fsanitize_003dshadow-call-stack";> +-fshadow-call-stack. This sanitizer currently +only works on AArch64 targets and it requires an environment in +which all code has been compiled with -ffixed-r18. +Its primary initial user is the Linux kernel. + -- 2.25.1
Re: [PATCH] libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]
On Tue, 22 Feb 2022, Jakub Jelinek wrote: > Hi! > > On > #define A(n) int foo1##n(void) { return 1##n; } > #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) > A(n##8) A(n##9) > #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) > B(n##8) B(n##9) > #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) > C(n##8) C(n##9) > #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) > D(n##8) D(n##9) > E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325) > B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642) > testcase with > ./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto -O0 -o foo1.o foo1.c > -ffunction-sections > ./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o > /tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized > (testcase too slow to be included into testsuite). > The problem is clearly reported by readelf: > readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link > value of 65321 > readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link > value of 65321 > readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link > value of 65323 > readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index > a symtab section. > readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index > a symtab section. > readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index > a string section. > because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and > sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE > inclusive. Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE > range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym > where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be > used instead and .symtab_shndx section should contain the real section > index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >= > SHN_LORESERVE value is needed it should put those into > Shdr[0].sh_{size,link}. But, sh_{link,info} are 32-bit fields which can > contain any section index. > > Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before > 2011) used to mishandle the > 63.75K sections case and assumed there is a > hole in between the sections, but what > simple_object_elf_copy_lto_debug_sections does wouldn't help in that case > for the debug temp object creation, we'd need to detect the case also in > that routine and take it into account in the remapping etc. I think > it is not worth it given that it is over 10 years, if somebody needs > 63.75K or more sections, better use more recent binutils. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. I suppose this also qualifies for backports? Thanks, Richard. > 2022-02-22 Jakub Jelinek > > PR lto/104617 > * simple-object-elf.c (simple_object_elf_match): Fix up URL > in comment. > (simple_object_elf_copy_lto_debug_sections): Remap sh_info and > sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE > range (inclusive). > > --- libiberty/simple-object-elf.c.jj 2022-01-11 23:11:23.967267993 +0100 > +++ libiberty/simple-object-elf.c 2022-02-21 20:37:12.815202845 +0100 > @@ -528,7 +528,7 @@ simple_object_elf_match (unsigned char h >not handle objects with more than SHN_LORESERVE sections >correctly. All large section indexes were offset by >0x100. There is more information at > - http://sourceware.org/bugzilla/show_bug.cgi?id-5900 . > + https://sourceware.org/PR5900 . >Fortunately these object files are easy to detect, as the >GNU binutils always put the section header string table >near the end of the list of sections. Thus if the > @@ -1559,17 +1559,13 @@ simple_object_elf_copy_lto_debug_section > { > sh_info = ELF_FETCH_FIELD (type_functions, ei_class, Shdr, > shdr, sh_info, Elf_Word); > - if (sh_info < SHN_LORESERVE > - || sh_info > SHN_HIRESERVE) > - sh_info = sh_map[sh_info]; > + sh_info = sh_map[sh_info]; > ELF_SET_FIELD (type_functions, ei_class, Shdr, > shdr, sh_info, Elf_Word, sh_info); > } > sh_link = ELF_FETCH_FIELD (type_functions, ei_class, Shdr, > shdr, sh_link, Elf_Word); > - if (sh_link < SHN_LORESERVE > - || sh_link > SHN_HIRESERVE) > - sh_link = sh_map[sh_link]; > + sh_link = sh_map[sh_link]; > ELF_SET_FIELD (type_functions, ei_class, Shdr, > shdr, sh_link, Elf_Word, sh_link); >} > > Jakub > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
Re: [PATCH] wwwdocs: Document ShadowCallStack support
On Tue, Feb 22, 2022 at 10:11:06AM +, Richard Sandiford via Gcc-patches wrote: > Document ShadowCallStack support. The option link doesn't work yet > of course, but I checked that it works with gcc-12.1.0/ removed. > > OK to install? > > Thanks, > Richard > > > --- > htdocs/gcc-12/changes.html | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html > index b6341fda..216ee0b6 100644 > --- a/htdocs/gcc-12/changes.html > +++ b/htdocs/gcc-12/changes.html > @@ -96,6 +96,17 @@ a work-in-progress. >Note that default vectorizer cost model has been changed which used to > behave >as -fvect-cost-model=cheap were specified. > > + > +GCC now supports the > +https://clang.llvm.org/docs/ShadowCallStack.html";> > +ShadowCallStack sanitizer, which can be enabled using the > +command-line option > + href="https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Instrumentation-Options.html#index-fsanitize_003dshadow-call-stack";> > +-fshadow-call-stack. This sanitizer currently The option is -fsanitize=shadow-call-stack , no? > +only works on AArch64 targets and it requires an environment in > +which all code has been compiled with -ffixed-r18. > +Its primary initial user is the Linux kernel. > + > > Jakub
Re: [PATCH 1/2] wwwdocs: Group sanitiser changes together
On Tue, 22 Feb 2022, Richard Sandiford wrote: > Err, scratch that. Clearly I've not had tea this morning, and forgot > which version we're about to release :-) No worries! (And it's not even 13 yet. ;-) For the record, I for one am happy for you to make such changes as you see fit (where applicable), i.e., happy to provide a second pair of eyes, but that's an offer, not a requirement. Gerald
Re: [PATCH] wwwdocs: Document ShadowCallStack support
Jakub Jelinek writes: > On Tue, Feb 22, 2022 at 10:11:06AM +, Richard Sandiford via Gcc-patches > wrote: >> Document ShadowCallStack support. The option link doesn't work yet >> of course, but I checked that it works with gcc-12.1.0/ removed. >> >> OK to install? >> >> Thanks, >> Richard >> >> >> --- >> htdocs/gcc-12/changes.html | 11 +++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html >> index b6341fda..216ee0b6 100644 >> --- a/htdocs/gcc-12/changes.html >> +++ b/htdocs/gcc-12/changes.html >> @@ -96,6 +96,17 @@ a work-in-progress. >>Note that default vectorizer cost model has been changed which used >> to behave >>as -fvect-cost-model=cheap were specified. >> >> + >> +GCC now supports the >> +https://clang.llvm.org/docs/ShadowCallStack.html";> >> +ShadowCallStack sanitizer, which can be enabled using the >> +command-line option >> +> href="https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Instrumentation-Options.html#index-fsanitize_003dshadow-call-stack";> >> +-fshadow-call-stack. This sanitizer currently > > The option is -fsanitize=shadow-call-stack , no? Gah, thanks. Clearly one of those days :-( Richard diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html index b6341fda..9c2d9ea8 100644 --- a/htdocs/gcc-12/changes.html +++ b/htdocs/gcc-12/changes.html @@ -96,6 +96,17 @@ a work-in-progress. Note that default vectorizer cost model has been changed which used to behave as -fvect-cost-model=cheap were specified. + +GCC now supports the +https://clang.llvm.org/docs/ShadowCallStack.html";> +ShadowCallStack sanitizer, which can be enabled using the +command-line option +https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Instrumentation-Options.html#index-fsanitize_003dshadow-call-stack";> +-fsanitize=shadow-call-stack. This sanitizer currently +only works on AArch64 targets and it requires an environment in +which all code has been compiled with -ffixed-r18. +Its primary initial user is the Linux kernel. + -- 2.25.1
Re: [PATCH] x86: Update Intel architectures ISA support in documentation.
On Tue, Feb 22, 2022 at 7:39 AM Cui,Lili wrote: > > Hi Uros, > > This patch is to update Intel architectures ISA support in documentation. > Since the ISA supported by Intel architectures in the documentation > are inconsistent with the actual, modify them all. > > OK for master? OK. Thanks, Uros. > > > gcc/Changelog: > > * gcc/doc/invoke.texi: Update documents for Intel architectures. > --- > gcc/doc/invoke.texi | 185 +++- > 1 file changed, 98 insertions(+), 87 deletions(-) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 635c5f79278..60472a21255 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -31086,66 +31086,69 @@ instruction set is used, so the code runs on all > i686 family chips. > When used with @option{-mtune}, it has the same meaning as @samp{generic}. > > @item pentium2 > -Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set > -support. > +Intel Pentium II CPU, based on Pentium Pro core with MMX and FXSR instruction > +set support. > > @item pentium3 > @itemx pentium3m > -Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction > -set support. > +Intel Pentium III CPU, based on Pentium Pro core with MMX, FXSR and SSE > +instruction set support. > > @item pentium-m > Intel Pentium M; low-power version of Intel Pentium III CPU > -with MMX, SSE and SSE2 instruction set support. Used by Centrino notebooks. > +with MMX, SSE, SSE2 and FXSR instruction set support. Used by Centrino > +notebooks. > > @item pentium4 > @itemx pentium4m > -Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support. > +Intel Pentium 4 CPU with MMX, SSE, SSE2 and FXSR instruction set support. > > @item prescott > -Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 > instruction > -set support. > +Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2, SSE3 and FXSR > +instruction set support. > > @item nocona > Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE, > -SSE2 and SSE3 instruction set support. > +SSE2, SSE3 and FXSR instruction set support. > > @item core2 > -Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 > -instruction set support. > +Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, CX16, > +SAHF and FXSR instruction set support. > > @item nehalem > Intel Nehalem CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, > -SSE4.1, SSE4.2 and POPCNT instruction set support. > +SSE4.1, SSE4.2, POPCNT, CX16, SAHF and FXSR instruction set support. > > @item westmere > Intel Westmere CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, > -SSE4.1, SSE4.2, POPCNT, AES and PCLMUL instruction set support. > +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR and PCLMUL instruction set support. > > @item sandybridge > Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, > -SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support. > +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE and PCLMUL instruction > set > +support. > > @item ivybridge > Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, > -SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C > -instruction set support. > +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND > +and F16C instruction set support. > > @item haswell > Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, > -SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, > -BMI, BMI2 and F16C instruction set support. > +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, > RDRND, > +F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE and HLE instruction set support. > > @item broadwell > Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, > SSSE3, > -SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, > BMI2, > -F16C, RDSEED ADCX and PREFETCHW instruction set support. > +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, > RDRND, > +F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX and PREFETCHW > +instruction set support. > > @item skylake > Intel Skylake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, > -SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, > -BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC and XSAVES > -instruction set support. > +SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, > RDRND, > +F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW, AES, > +CLFLUSHOPT, XSAVEC, XSAVES and SGX instruction set support. > > @item bonnell > Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and > SSSE3 > @@ -31153,113 +31156,121 @@ instruction set support. > > @item silvermont > Intel Silvermont CPU with 64-bit ext
[C++ PATCH] PR c++/96442: Another improved error recovery in enumerations.
This patch resolves PR c++/96442, another ICE-after-error regression. In this case, invalid code attempts to use a non-integral type as the underlying type for an enumeration (a record_type in the example given in the bugzilla PR), for which the parser emits an error message but allows the inappropriate type to leak to downstream code. The minimal safe fix is to double check that the enumeration's underlying type EUTYPE satisfies INTEGRAL_TYPE_P before calling int_fits_type_p in build_enumerator. This is a one line fix, but correcting indentation and storing a common subexpression in a variable makes the change look a little bigger. This patch has been tested on x86_64-pc-linunx-gnu with make bootstrap and make -k check with no new (unexpected) failures. Ok for mainline? 2022-02-22 Roger Sayle gcc/cp/ChangeLog PR c++/96442 * decl.cc (build_enumeration): Check ENUM_UNDERLYING_TYPE is INTEGRAL_TYPE_P before calling int_fits_type_p. gcc/testsuite/ChangeLog PR c++/96442 * g++.dg/pr96442.C: New test cae. Thanks in advance, Roger -- diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index 7b48b56..c430f78 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -16542,19 +16542,21 @@ incremented enumerator value is too large for %")); STRIP_TYPE_NOPS (value); /* If the underlying type of the enum is fixed, check whether - the enumerator values fits in the underlying type. If it - does not fit, the program is ill-formed [C++0x dcl.enum]. */ - if (ENUM_UNDERLYING_TYPE (enumtype) - && value - && TREE_CODE (value) == INTEGER_CST) -{ - if (!int_fits_type_p (value, ENUM_UNDERLYING_TYPE (enumtype))) +the enumerator values fits in the underlying type. If it +does not fit, the program is ill-formed [C++0x dcl.enum]. */ + tree eutype = ENUM_UNDERLYING_TYPE (enumtype); + if (eutype + && value + && INTEGRAL_TYPE_P (eutype) + && TREE_CODE (value) == INTEGER_CST) + { + if (!int_fits_type_p (value, eutype)) error ("enumerator value %qE is outside the range of underlying " - "type %qT", value, ENUM_UNDERLYING_TYPE (enumtype)); + "type %qT", value, eutype); - /* Convert the value to the appropriate type. */ - value = fold_convert (ENUM_UNDERLYING_TYPE (enumtype), value); -} + /* Convert the value to the appropriate type. */ + value = fold_convert (eutype, value); + } } /* C++ associates enums with global, function, or class declarations. */ diff --git a/gcc/testsuite/g++.dg/pr96442.C b/gcc/testsuite/g++.dg/pr96442.C new file mode 100644 index 000..235bb11 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr96442.C @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +enum struct a : struct {}; +template enum class a : class c{}; +enum struct a {b}; +// { dg-excess-errors "" }
Re: libgo patch committed: Update to Go1.18rc1 release
Hi Ian, > On Sun, Feb 20, 2022 at 2:13 PM Rainer Orth > wrote: >> >> > This patch updates libgo to the Go1.18rc1 release. Bootstrapped and >> > ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. >> >> this broke Solaris bootstrap: >> >> ld: fatal: file runtime/internal/.libs/syscall.o: open failed: No such >> file or directory >> collect2: error: ld returned 1 exit status >> >> Creating a dummy syscall_solaris.go worked around that for now. > > Sorry about that. I committed this patch which should fix the problem. great, thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH] Dump def that we use for a splat
This makes the SLP vectorizer dump the def we use for a splat to aid debugging. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. 2022-02-22 Richard Biener * tree-vect-slp.cc (vect_build_slp_tree_2): Dump the def used for a splat. --- gcc/tree-vect-slp.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 9188d727e33..341bd5220a5 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -2202,7 +2202,8 @@ out: { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, -"Using a splat of the uniform operand\n"); +"Using a splat of the uniform operand %G", +first_def->stmt); oprnd_info->first_dt = vect_external_def; } } -- 2.34.1
[PATCH] Restore bootstrap on x86_64-pc-linux-gnu
This patch resolves the bootstrap failure on x86_64-pc-linux-gnu. Is this sufficiently "obvious" in stage4, or should I wait for the bootstrap and regression testing to complete? 2022-02-22 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_cmpxchg_loop): Restore bootstrap. Cheers, Roger -- diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 7f7055b..faa0191 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -23287,11 +23287,11 @@ void ix86_expand_cmpxchg_loop (rtx *ptarget_bool, rtx target_val, switch (mode) { -case TImode: +case E_TImode: gendw = gen_atomic_compare_and_swapti_doubleword; hmode = DImode; break; -case DImode: +case E_DImode: if (doubleword) { gendw = gen_atomic_compare_and_swapdi_doubleword; @@ -23300,12 +23300,15 @@ void ix86_expand_cmpxchg_loop (rtx *ptarget_bool, rtx target_val, else gen = gen_atomic_compare_and_swapdi_1; break; -case SImode: - gen = gen_atomic_compare_and_swapsi_1; break; -case HImode: - gen = gen_atomic_compare_and_swaphi_1; break; -case QImode: - gen = gen_atomic_compare_and_swapqi_1; break; +case E_SImode: + gen = gen_atomic_compare_and_swapsi_1; + break; +case E_HImode: + gen = gen_atomic_compare_and_swaphi_1; + break; +case E_QImode: + gen = gen_atomic_compare_and_swapqi_1; + break; default: gcc_unreachable (); }
[committed][nvptx] Add -mptx-comment
Hi, Add functionality that indicates which insns are added by -minit-regs, such that for instance we have for pr53465.s: ... // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // Start: Added by -minit-regs=3: // #NO_APP mov.u32 %r26, 0; // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // End: Added by -minit-regs=3: // #NO_APP ... Can be switched off using -mno-ptx-comment. Tested on nvptx. Committed to trunk. Thanks, - Tom [nvptx] Add -mptx-comment gcc/ChangeLog: 2022-02-21 Tom de Vries * config/nvptx/nvptx.cc (gen_comment): New function. (workaround_uninit_method_1, workaround_uninit_method_2) (workaround_uninit_method_3): : Use gen_comment. * config/nvptx/nvptx.opt (mptx-comment): New option. --- gcc/config/nvptx/nvptx.cc | 42 ++ gcc/config/nvptx/nvptx.opt | 3 +++ 2 files changed, 45 insertions(+) diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index a37a6c78b41..981b91f7095 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -5372,6 +5372,17 @@ workaround_barsyncs (void) } #endif +static rtx +gen_comment (const char *s) +{ + const char *sep = " "; + size_t len = strlen (ASM_COMMENT_START) + strlen (sep) + strlen (s) + 1; + char *comment = (char *) alloca (len); + snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s); + return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment), + cfun->function_start_locus); +} + /* Initialize all declared regs at function entry. Advantage : Fool-proof. Disadvantage: Potentially creates a lot of long live ranges and adds a lot @@ -5394,6 +5405,8 @@ workaround_uninit_method_1 (void) gcc_assert (CONST0_RTX (GET_MODE (reg))); start_sequence (); + if (nvptx_comment && first != NULL) + emit_insn (gen_comment ("Start: Added by -minit-regs=1")); emit_move_insn (reg, CONST0_RTX (GET_MODE (reg))); rtx_insn *inits = get_insns (); end_sequence (); @@ -5411,6 +5424,9 @@ workaround_uninit_method_1 (void) else insert_here = emit_insn_after (inits, insert_here); } + + if (nvptx_comment && insert_here != NULL) +emit_insn_after (gen_comment ("End: Added by -minit-regs=1"), insert_here); } /* Find uses of regs that are not defined on all incoming paths, and insert a @@ -5446,6 +5462,8 @@ workaround_uninit_method_2 (void) gcc_assert (CONST0_RTX (GET_MODE (reg))); start_sequence (); + if (nvptx_comment && first != NULL) + emit_insn (gen_comment ("Start: Added by -minit-regs=2:")); emit_move_insn (reg, CONST0_RTX (GET_MODE (reg))); rtx_insn *inits = get_insns (); end_sequence (); @@ -5463,6 +5481,9 @@ workaround_uninit_method_2 (void) else insert_here = emit_insn_after (inits, insert_here); } + + if (nvptx_comment && insert_here != NULL) +emit_insn_after (gen_comment ("End: Added by -minit-regs=2"), insert_here); } /* Find uses of regs that are not defined on all incoming paths, and insert a @@ -5531,6 +5552,27 @@ workaround_uninit_method_3 (void) } } + if (nvptx_comment) +FOR_EACH_BB_FN (bb, cfun) + { + if (single_pred_p (bb)) + continue; + + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, bb->preds) + { + if (e->insns.r == NULL_RTX) + continue; + start_sequence (); + emit_insn (gen_comment ("Start: Added by -minit-regs=3:")); + emit_insn (e->insns.r); + emit_insn (gen_comment ("End: Added by -minit-regs=3:")); + e->insns.r = get_insns (); + end_sequence (); + } + } + commit_edge_insertions (); } diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index 08580071731..e56ec9288da 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -95,3 +95,6 @@ Specify the version of the ptx version to use. minit-regs= Target Var(nvptx_init_regs) IntegerRange(0, 3) Joined UInteger Init(3) Initialize ptx registers. + +mptx-comment +Target Var(nvptx_comment) Init(1) Undocumented
[PATCH][final] Handle compiler-generated asm insn
Hi, For the nvptx port, with -mptx-comment we have in pr53465.s: ... // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // Start: Added by -minit-regs=3: // #NO_APP mov.u32 %r26, 0; // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // End: Added by -minit-regs=3: // #NO_APP ... The comments where generated using the compiler-generated equivalent of: ... asm ("// Comment"); ... but both the printed location and the NO_APP/APP are unnecessary for a compiler-generated asm insn. Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in final_scan_insn_1, such what we simply get: ... // Start: Added by -minit-regs=3: mov.u32 %r26, 0; // End: Added by -minit-regs=3: ... Tested on nvptx. OK for trunk? Thanks, - Tom [final] Handle compiler-generated asm insn gcc/ChangeLog: 2022-02-21 Tom de Vries PR rtl-optimization/104596 * config/nvptx/nvptx.cc (gen_comment): Use gen_rtx_ASM_INPUT instead of gen_rtx_ASM_INPUT_loc. * final.cc (final_scan_insn_1): Handle ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION. --- gcc/config/nvptx/nvptx.cc | 3 +-- gcc/final.cc | 17 +++-- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index 858789e6df7..4124c597f24 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -5381,8 +5381,7 @@ gen_comment (const char *s) size_t len = strlen (ASM_COMMENT_START) + strlen (sep) + strlen (s) + 1; char *comment = (char *) alloca (len); snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s); - return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment), - cfun->function_start_locus); + return gen_rtx_ASM_INPUT (VOIDmode, ggc_strdup (comment)); } /* Initialize all declared regs at function entry. diff --git a/gcc/final.cc b/gcc/final.cc index a9868861bd2..e6443ef7a4f 100644 --- a/gcc/final.cc +++ b/gcc/final.cc @@ -2642,15 +2642,20 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, int optimize_p ATTRIBUTE_UNUSED, if (string[0]) { expanded_location loc; + bool unknown_loc_p + = ASM_INPUT_SOURCE_LOCATION (body) == UNKNOWN_LOCATION; - app_enable (); - loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body)); - if (*loc.file && loc.line) - fprintf (asm_out_file, "%s %i \"%s\" 1\n", - ASM_COMMENT_START, loc.line, loc.file); + if (!unknown_loc_p) + { + app_enable (); + loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body)); + if (*loc.file && loc.line) + fprintf (asm_out_file, "%s %i \"%s\" 1\n", + ASM_COMMENT_START, loc.line, loc.file); + } fprintf (asm_out_file, "\t%s\n", string); #if HAVE_AS_LINE_ZERO - if (*loc.file && loc.line) + if (!unknown_loc_p && loc.file && *loc.file && loc.line) fprintf (asm_out_file, "%s 0 \"\" 2\n", ASM_COMMENT_START); #endif }
Re: [PATCH v2] x86: Add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO
On Mon, Feb 21, 2022 at 6:43 PM Hongtao Liu wrote: > > On Tue, Feb 22, 2022 at 2:35 AM H.J. Lu wrote: > > > > On Sun, Feb 20, 2022 at 6:01 PM Hongtao Liu wrote: > > > > > > On Thu, Feb 17, 2022 at 9:56 PM H.J. Lu wrote: > > > > > > > > On Thu, Feb 17, 2022 at 08:51:31AM +0100, Uros Bizjak wrote: > > > > > On Thu, Feb 17, 2022 at 6:25 AM Hongtao Liu via Gcc-patches > > > > > wrote: > > > > > > > > > > > > On Thu, Feb 17, 2022 at 12:26 PM H.J. Lu via Gcc-patches > > > > > > wrote: > > > > > > > > > > > > > > Reading YMM registers with all zero bits needs VZEROUPPER on > > > > > > > Sandy Bride, > > > > > > > Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid SSE <-> AVX > > > > > > > transition penalty. Add TARGET_READ_ZERO_YMM_ZMM_NEED_VZEROUPPER > > > > > > > to > > > > > > > generate vzeroupper instruction after loading all-zero YMM/YMM > > > > > > > registers > > > > > > > and enable it by default. > > > > > > Shouldn't TARGET_READ_ZERO_YMM_ZMM_NONEED_VZEROUPPER sounds a bit > > > > > > smoother? > > > > > > Because originally we needed to add vzeroupper to all avx<->sse > > > > > > cases, > > > > > > now it's a tune to indicate that we don't need to add it in some > > > > > > > > > > Perhaps we should go from the other side and use > > > > > X86_TUNE_OPTIMIZE_AVX_READ for new processors? > > > > > > > > > > > > > Here is the v2 patch to add TARGET_OMIT_VZEROUPPER_AFTER_AVX_READ_ZERO. > > > > > > > The patch LGTM in general, but please rebase against > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590541.html > > > and resend the patch, also wait a couple days in case Uros(and others) > > > have any comments. > > > > I am dropping my patch since it causes the compile-time regression. > I think only vextractif128 part is reverted, but we still have > vmovdqu(below) which should also cause penalty? commit fe79d652c96b53384ddfa43e312cb0010251391b Author: Richard Biener Date: Thu Feb 17 14:40:16 2022 +0100 target/104581 - compile-time regression in mode-switching has diff --git a/gcc/testsuite/gcc.target/i386/pr101456-1.c b/gcc/testsuite/gcc.target/i386/pr101456-1.c index 803fc6e0207..7fb3a3f055c 100644 --- a/gcc/testsuite/gcc.target/i386/pr101456-1.c +++ b/gcc/testsuite/gcc.target/i386/pr101456-1.c @@ -30,4 +30,5 @@ foo3 (void) bar (); } -/* { dg-final { scan-assembler-not "vzeroupper" } } */ +/* See PR104581 for the XFAIL reason. */ +/* { dg-final { scan-assembler-not "vzeroupper" { xfail *-*-* } } } */ and I checked in: commit 1931cbad498e625b1e24452dcfffe02539b12224 Author: H.J. Lu Date: Fri Feb 18 10:36:53 2022 -0800 pieces-memset-21.c: Expect vzeroupper for ia32 Update gcc.target/i386/pieces-memset-21.c to expect vzeroupper for ia32 caused by commit fe79d652c96b53384ddfa43e312cb0010251391b Author: Richard Biener Date: Thu Feb 17 14:40:16 2022 +0100 target/104581 - compile-time regression in mode-switching PR target/104581 * gcc.target/i386/pieces-memset-21.c: Expect vzeroupper for ia32. I believe that vmovdqu is also covered. -- H.J.
Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70
On 2/17/22 18:24, Tobias Burnus wrote: PTX version (-mptx=) [patch adds -mptx=6.0 as option] * Currently supported internally are 3.1 (CUDA 5.0, used by GCC <= 11), 6.0 (CUDA 9.0, current GCC 12 default), 6.3 (CUDA 10.0), 7.0 (CUDA 11.0) * -mptx= supports 3.1, 6.3, 7.0 – but not the internal default 6.0 I tend not to think in terms of CUDA versions, but supported driver versions. In the end, drivers are used to translate ptx to SASS for execution, CUDA is just used for build time verification (or not, if it's not in the path). And a driver may or may not be supported. F.i. 390.x still may receive updates from nvidia, but there are JIT bugs that we've reported that they've decided not to fix, so from that point of view 390.x is unsupported. I think it makes sense to expose the 6.0 value to the user and not only use it internally behind the scenes. As it is already used internally, the change is tiny but user visible. Sure, I've committed this (with a somewhat shorter commit log). Thus, it has to stay when we will bump the default in later GCC versions; on the other hand, if we bump the default, it might be also a good reason to have it to permit the user to have a backward compatible PTX output for linking libraries. FWIW, I think that it's possible to link different versions of ptx isa together (though perhaps there are specific scenarios where that's not possible, I'm not sure). But mixing versions restricts the range of drivers you can use, so it may make sense to just use one version. Thanks, - Tomnvptx: Add -mptx=6.0 Currently supported internally are 3.1, 6.0, 6.3 and 7.0. However, -mptx= supports 3.1, 6.3, 7.0 – but not the internal default 6.0. Add -mptx=6.0 for consistency. Tested on nvptx. gcc/ChangeLog: * config/nvptx/nvptx.opt (mptx): Add 6.0 alias PTX_VERSION_6_0. * doc/invoke.texi (-mptx): Update for new values and defaults. Co-Authored-By: Tom de Vries --- gcc/config/nvptx/nvptx.opt | 3 +++ gcc/doc/invoke.texi| 7 --- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index e56ec9288da..97e127cc4fb 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -82,6 +82,9 @@ Known PTX versions (for use with the -mptx= option): EnumValue Enum(ptx_version) String(3.1) Value(PTX_VERSION_3_1) +EnumValue +Enum(ptx_version) String(6.0) Value(PTX_VERSION_6_0) + EnumValue Enum(ptx_version) String(6.3) Value(PTX_VERSION_6_3) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 635c5f79278..56f3a01de44 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -27286,9 +27286,10 @@ strings must be lower-case. Valid ISA strings include @samp{sm_30} and @item -mptx=@var{version-string} @opindex mptx -Generate code for given the specified PTX version (e.g.@: @samp{6.3}). -Valid version strings include @samp{3.1} and @samp{6.3}. The default PTX -version is 3.1. +Generate code for given the specified PTX version (e.g.@: @samp{7.0}). +Valid version strings include @samp{3.1}, @samp{6.0}, @samp{6.3}, and +@samp{7.0}. The default PTX version is 6.0, unless a higher minimal +version is required for specified PTX ISA via option @option{-misa=}. @item -mmainkernel @opindex mmainkernel
Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70
On 2/17/22 18:24, Tobias Burnus wrote: SM version (-misa=) [Patch adds -misa=sm_70] * The compiler supports internally: SM_30, SM_35, SM_53, SM_70, SM_75, SM_80. I'd formulate it like: it uses SM_70 internally to accurately formulate when certain insns can be used. I think it makes sense to have sm_70 in addition: * The current code actually does generate different code for >= sm_70 already. Agreed. I've committed this (with a somewhat shorter commit log), and a test-case update. Thanks, - Tomnvptx: Add -misa=sm_70 Add -misa=sm_70, and use it to specify the misa value in test-case gcc.target/nvptx/atomic-store-2.c. Tested on nvptx. gcc/ChangeLog: * config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Handle SM70. * config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Likewise. * config/nvptx/nvptx.opt (misa): Add sm_70 alias PTX_ISA_SM70. gcc/testsuite/ChangeLog: 2022-02-22 Tom de Vries * gcc.target/nvptx/atomic-store-2.c: Use -misa=sm_70. * gcc.target/nvptx/uniform-simt-3.c: Same. Co-Authored-By: Tom de Vries --- gcc/config/nvptx/nvptx-c.cc | 2 ++ gcc/config/nvptx/nvptx.cc | 2 ++ gcc/config/nvptx/nvptx.opt | 3 +++ gcc/testsuite/gcc.target/nvptx/atomic-store-2.c | 2 +- gcc/testsuite/gcc.target/nvptx/uniform-simt-3.c | 2 +- 5 files changed, 9 insertions(+), 2 deletions(-) diff --git a/gcc/config/nvptx/nvptx-c.cc b/gcc/config/nvptx/nvptx-c.cc index d68b9910d7e..b2375fb5b16 100644 --- a/gcc/config/nvptx/nvptx-c.cc +++ b/gcc/config/nvptx/nvptx-c.cc @@ -43,6 +43,8 @@ nvptx_cpu_cpp_builtins (void) cpp_define (parse_in, "__PTX_SM__=800"); else if (TARGET_SM75) cpp_define (parse_in, "__PTX_SM__=750"); + else if (TARGET_SM70) +cpp_define (parse_in, "__PTX_SM__=700"); else if (TARGET_SM53) cpp_define (parse_in, "__PTX_SM__=530"); else if (TARGET_SM35) diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index 981b91f7095..858789e6df7 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -217,6 +217,8 @@ first_ptx_version_supporting_sm (enum ptx_isa sm) return PTX_VERSION_3_1; case PTX_ISA_SM53: return PTX_VERSION_4_2; +case PTX_ISA_SM70: + return PTX_VERSION_6_0; case PTX_ISA_SM75: return PTX_VERSION_6_3; case PTX_ISA_SM80: diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index 97e127cc4fb..9776c3b9a1f 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -64,6 +64,9 @@ Enum(ptx_isa) String(sm_35) Value(PTX_ISA_SM35) EnumValue Enum(ptx_isa) String(sm_53) Value(PTX_ISA_SM53) +EnumValue +Enum(ptx_isa) String(sm_70) Value(PTX_ISA_SM70) + EnumValue Enum(ptx_isa) String(sm_75) Value(PTX_ISA_SM75) diff --git a/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c b/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c index cd5e4c38267..b58f33f2abd 100644 --- a/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c +++ b/gcc/testsuite/gcc.target/nvptx/atomic-store-2.c @@ -2,7 +2,7 @@ shared state space. */ /* { dg-do compile } */ -/* { dg-options "-misa=sm_75" } */ +/* { dg-options "-misa=sm_70" } */ enum memmodel { diff --git a/gcc/testsuite/gcc.target/nvptx/uniform-simt-3.c b/gcc/testsuite/gcc.target/nvptx/uniform-simt-3.c index 532fa825161..b61b8ba9d5b 100644 --- a/gcc/testsuite/gcc.target/nvptx/uniform-simt-3.c +++ b/gcc/testsuite/gcc.target/nvptx/uniform-simt-3.c @@ -1,4 +1,4 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -muniform-simt -misa=sm_75" } */ +/* { dg-options "-O2 -muniform-simt -misa=sm_70" } */ #include "atomic-store-2.c"
Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70
On 2/17/22 18:24, Tobias Burnus wrote: diff --git a/gcc/config/nvptx/t-omp-device b/gcc/config/nvptx/t-omp-device index 8765d9f1881..4228218a424 100644 --- a/gcc/config/nvptx/t-omp-device +++ b/gcc/config/nvptx/t-omp-device @@ -1,4 +1,4 @@ omp-device-properties-nvptx: $(srcdir)/config/nvptx/nvptx.cc echo kind: gpu > $@ echo arch: nvptx >> $@ - echo isa: sm_30 sm_35 >> $@ + echo isa: sm_30 sm_35 sm_53 sm_70 sm_75 sm_80 >> $@ I'm not sure I understand how this is used. Is this user-visible? Is there a libgomp test-case where we can observe a difference? Thanks, - Tom
Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].
On 2022-02-20 12:34, Iain Sandoe wrote: ^^^ this is mostly for my education - the stuff below is a potential solution to leaving lra-constraints unchanged and fixing the Darwin bug…. I'd be really glad if you do manage to fix this w/o changing LRA. Richard has a legitimate point that my proposed change in LRA prohibiting `...;reg=low_sum; ...mem[reg]` might force LRA to generate less optimized code or even might make LRA to generate unrecognized insns `reg = orginal addr` for some ports requiring further fixes in machine-dependent code of the ports.
Re: [PING][PATCH][libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
On 5/19/21 16:52, Tom de Vries wrote: On 4/23/21 6:48 PM, Tom de Vries wrote: On 4/23/21 5:45 PM, Alexander Monakov wrote: On Thu, 22 Apr 2021, Tom de Vries wrote: Ah, I see, agreed, that makes sense. I was afraid there was some fundamental problem that I overlooked. Here's an updated version. I've tried to make it clear that the futex_wait/wake are locally used versions, not generic functionality. Could you please regenerate the patch passing appropriate flags to 'git format-patch' so it presents a rewrite properly (see documentation for --patience and --break-rewrites options). The attached patch was mostly unreadable, I'm afraid. Sure. I did notice that the patch was not readable, but I didn't known there were options to improve that, so thanks for pointing that out. Ping. Any comments? I've hardcoded do_spin to 1, and tested on: - turing, pascal, maxwell (510.x driver) - kepler (470.x driver) Committed. Thanks, - Tom
[PATCH v4 03/12] arm: Add support for VPR_REG in arm_class_likely_spilled_p
From: Christophe Lyon VPR_REG is the only register in its class, so it should be handled by TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling default_class_likely_spilled_p. No test fails without this patch, but it seems it should be implemented. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon gcc/ * config/arm/arm.cc (arm_class_likely_spilled_p): Handle VPR_REG. diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 9c19589186f..8d7f095b59b 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -29369,7 +29369,7 @@ arm_class_likely_spilled_p (reg_class_t rclass) || rclass == CC_REG) return true; - return false; + return default_class_likely_spilled_p (rclass); } /* Implements target hook small_register_classes_for_mode_p. */ -- 2.25.1
[PATCH v4 02/12] arm: Add GENERAL_AND_VPR_REGS regclass
From: Christophe Lyon At some point during the development of this patch series, it appeared that in some cases the register allocator wants “VPR or general” rather than “VPR or general or FP” (which is the same thing as ALL_REGS). The series does not seem to require this anymore, but it seems to be a good thing to do anyway, to give the register allocator more freedom. CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a regression in gcc.dg/stack-usage-1.c when compiled with -mthumb -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon gcc/ * config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. (CLASS_MAX_NREGS): Handle VPR. * config/arm/arm.cc (arm_hard_regno_nregs): Handle VPR. diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 663f4595050..9c19589186f 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -25339,6 +25339,9 @@ thumb2_asm_output_opcode (FILE * stream) static unsigned int arm_hard_regno_nregs (unsigned int regno, machine_mode mode) { + if (IS_VPR_REGNUM (regno)) +return CEIL (GET_MODE_SIZE (mode), 2); + if (TARGET_32BIT && regno > PC_REGNUM && regno != FRAME_POINTER_REGNUM diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index f52724d01ad..61c02218b78 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1287,6 +1287,7 @@ enum reg_class SFP_REG, AFP_REG, VPR_REG, + GENERAL_AND_VPR_REGS, ALL_REGS, LIM_REG_CLASSES }; @@ -1316,6 +1317,7 @@ enum reg_class "SFP_REG", \ "AFP_REG", \ "VPR_REG", \ + "GENERAL_AND_VPR_REGS", \ "ALL_REGS" \ } @@ -1344,6 +1346,7 @@ enum reg_class { 0x, 0x, 0x, 0x0040 }, /* SFP_REG */\ { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */\ { 0x, 0x, 0x, 0x0400 }, /* VPR_REG. */ \ + { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. */ \ { 0x7FFF, 0x, 0x, 0x000F } /* ALL_REGS. */ \ } @@ -1453,7 +1456,9 @@ extern const char *fp_sysreg_names[NB_FP_SYSREGS]; ARM regs are UNITS_PER_WORD bits. FIXME: Is this true for iWMMX? */ #define CLASS_MAX_NREGS(CLASS, MODE) \ - (ARM_NUM_REGS (MODE)) + (CLASS == VPR_REG) \ + ? CEIL (GET_MODE_SIZE (MODE), 2)\ + : (ARM_NUM_REGS (MODE)) /* If defined, gives a class of registers that cannot be used as the operand of a SUBREG that changes the mode of the object illegally. */ -- 2.25.1
[PATCH v4 04/12] arm: Fix mve_vmvnq_n_ argument mode
From: Christophe Lyon The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use iterator instead of HI in mve_vmvnq_n_. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon gcc/ * config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode for operand 1. diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 171dd384133..5c3b34dce3a 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_" (define_insn "mve_vmvnq_n_" [ (set (match_operand:MVE_5 0 "s_register_operand" "=w") - (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")] + (unspec:MVE_5 [(match_operand: 1 "immediate_operand" "i")] VMVNQ_N)) ] "TARGET_HAVE_MVE" -- 2.25.1
[PATCH v4 01/12] arm: Add new tests for comparison vectorization with Neon and MVE
From: Christophe Lyon This patch mainly adds Neon tests similar to existing MVE ones, to make sure we do not break Neon when fixing MVE. mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional with 2.0f and 3.0f constants to help scan-assembler-times. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon gcc/testsuite/ * gcc.target/arm/simd/mve-vcmp-f32-2.c: New. * gcc.target/arm/simd/neon-compare-1.c: New. * gcc.target/arm/simd/neon-compare-2.c: New. * gcc.target/arm/simd/neon-compare-3.c: New. * gcc.target/arm/simd/neon-compare-scalar-1.c: New. * gcc.target/arm/simd/neon-vcmp-f16.c: New. * gcc.target/arm/simd/neon-vcmp-f32-2.c: New. * gcc.target/arm/simd/neon-vcmp-f32-3.c: New. * gcc.target/arm/simd/neon-vcmp-f32.c: New. * gcc.target/arm/simd/neon-vcmp.c: New. diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c new file mode 100644 index 000..917a95bf141 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c @@ -0,0 +1,32 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ + +#include + +#define NB 4 + +#define FUNC(OP, NAME) \ + void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \ +int i; \ +for (i=0; i, vcmpgt) +FUNC(>=, vcmpge) + +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 24 } } */ /* Constant 2.0f. */ +/* { dg-final { scan-assembler-times {\t.word\t1077936128\n} 24 } } */ /* Constant 3.0f. */ diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c new file mode 100644 index 000..2e0222a71f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c @@ -0,0 +1,78 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-add-options arm_neon } */ +/* { dg-additional-options "-O3" } */ + +#include "mve-compare-1.c" + +/* 64-bit vectors. */ +/* vmvn is used by 'ne' comparisons: 3 sizes * 2 (signed/unsigned) * 2 + (register/zero) = 12. */ +/* { dg-final { scan-assembler-times {\tvmvn\td[0-9]+, d[0-9]+\n} 12 } } */ + +/* { 8 bits } x { eq, ne, lt, le, gt, ge }. */ +/* ne uses eq, lt/le only apply to comparison with zero, they use gt/ge + otherwise. */ +/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, d[0-9]+\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, #0\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvclt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcle.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } */ + +/* { 16 bits } x { eq, ne, lt, le, gt, ge }. */ +/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, d[0-9]+\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, #0\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvclt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcle.s16\td[0-9]+, d[0-9]+, #0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, #0\n} 1 } } */ + +/* { 32 bits } x { eq, ne, lt, le, gt, ge }. */ +/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, d[0-9]+\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, #0\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvclt.s32\td[0-9]+, d[0-9]+, #0\n} 1 } } */ +/* { dg-final { scan-assembler-times {
[PATCH v4 00/12] ARM/MVE use vectors of boolean for predicates
From: Christophe Lyon This is v4 of this patch series, fixing issues I discovered before committing v2 (which had been approved). I am posting it for the record of what I am going commit after I implemented all the requested changes to v3. Thanks a lot to Richard Sandiford for his help. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. The changes v3 -> v4 are: Patch 5: Use build_truth_vector_type_for_mode to construct the boolean types. Also fix the definition of B2Imode etc in init_emit_once. Patches 6 and 7: Squash code change and testcases as requested during the review. Original text (patch numbers no longer match because of the squashes): This patch series addresses PR 100757 and 101325 by representing vectors of predicates (MVE VPR.P0 register) as vectors of booleans rather than using HImode. As this implies a lot of mostly mechanical changes, I have tried to split the patches in a way that should help reviewers, but the split is a bit artificial. Patches 1-3 add new tests. Patches 4-6 are small independent improvements. Patch 7 implements the predicate qualifier, but does not change any builtin yet. Patch 8 is the first of the two main patches, and uses the new qualifier to describe the vcmp and vpsel builtins that are useful for auto-vectorization of comparisons. Patch 9 is the second main patch, which fixes the vcond_mask expander. Patches 10-13 convert almost all the remaining builtins with HI operands to use the predicate qualifier. After these, there are still a few builtins with HI operands left, about which I am not sure: vctp, vpnot, load-gather and store-scatter with v2di operands. In fact, patches 11/12 update some STR/LDR qualifiers in a way that breaks these v2di builtins although existing tests still pass. Christophe Lyon (12): arm: Add new tests for comparison vectorization with Neon and MVE arm: Add GENERAL_AND_VPR_REGS regclass arm: Add support for VPR_REG in arm_class_likely_spilled_p arm: Fix mve_vmvnq_n_ argument mode arm: Implement MVE predicates as vectors of booleans arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates arm: Fix vcond_mask expander for MVE (PR target/100757) arm: Convert remaining MVE vcmp builtins to predicate qualifiers arm: Convert more MVE builtins to predicate qualifiers arm: Convert more load/store MVE builtins to predicate qualifiers arm: Convert more MVE/CDE builtins to predicate qualifiers arm: Add VPR_REG to ALL_REGS gcc/config/aarch64/aarch64-modes.def | 8 +- gcc/config/arm/arm-builtins.cc| 239 -- gcc/config/arm/arm-builtins.h | 4 +- gcc/config/arm/arm-modes.def | 8 + gcc/config/arm/arm-protos.h | 4 +- gcc/config/arm/arm-simd-builtin-types.def | 4 + gcc/config/arm/arm.cc | 166 ++-- gcc/config/arm/arm.h | 9 +- gcc/config/arm/arm_mve_builtins.def | 746 gcc/config/arm/constraints.md | 6 + gcc/config/arm/iterators.md | 6 + gcc/config/arm/mve.md | 795 ++ gcc/config/arm/neon.md| 39 + gcc/config/arm/vec-common.md | 52 -- gcc/config/arm/vfp.md | 34 +- gcc/doc/sourcebuild.texi | 4 + gcc/emit-rtl.cc | 28 +- gcc/genmodes.cc | 71 +- gcc/machmode.def | 11 +- gcc/rtx-vector-builder.cc | 4 +- gcc/simplify-rtx.cc | 34 +- gcc/testsuite/gcc.dg/rtl/arm/mve-vxbi.c | 89 ++ gcc/testsuite/gcc.dg/signbit-2.c | 1 + .../gcc.target/arm/simd/mve-vcmp-f32-2.c | 32 + .../gcc.target/arm/simd/neon-compare-1.c | 78 ++ .../gcc.target/arm/simd/neon-compare-2.c | 13 + .../gcc.target/arm/simd/neon-compare-3.c | 14 + .../arm/simd/neon-compare-scalar-1.c | 57 ++ .../gcc.target/arm/simd/neon-vcmp-f16.c | 12 + .../gcc.target/arm/simd/neon-vcmp-f32-2.c | 15 + .../gcc.target/arm/simd/neon-vcmp-f32-3.c | 12 + .../gcc.target/arm/simd/neon-vcmp-f32.c | 12 + gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c | 22 + .../gcc.target/arm/simd/pr100757-2.c | 20 + .../gcc.target/arm/simd/pr100757-3.c | 20 + .../gcc.target/arm/simd/pr100757-4.c | 19 + gcc/testsuite/gcc.target/arm/simd/pr100757.c | 19 + .../gcc.target/arm/simd/pr101325-2.c | 19 + gcc/testsuite/gcc.target/arm/simd/pr101325.c | 14 + gcc/testsuite/lib/target-supports.exp | 15 +- gcc/varasm.cc | 7 +- 41 files changed, 1738 insertions(+), 1024 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/rtl/arm/mve-vxbi.c create mo
[PATCH v4 05/12] arm: Implement MVE predicates as vectors of booleans
From: Christophe Lyon This patch implements support for vectors of booleans to support MVE predicates, instead of HImode. Since the ABI mandates pred16_t (aka uint16_t) to represent predicates in intrinsics prototypes, we introduce a new "predicate" type qualifier so that we can map relevant builtins HImode arguments and return value to the appropriate vector of booleans (VxBI). We have to update test_vector_ops_duplicate, because it iterates using an offset in bytes, where we would need to iterate in bits: we stop iterating when we reach the end of the vector of booleans. In addition, we have to fix the underlying definition of vectors of booleans because ARM/MVE needs a different representation than AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the element size, so that a true element of V4BI is represented by '0b'. This patch updates the aarch64 definition of VNx*BI as needed. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon Richard Sandiford gcc/ PR target/100757 PR target/101325 * config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI, VNx2BI): Update definition. * config/arm/arm-builtins.cc (arm_init_simd_builtin_types): Add new simd types. (arm_init_builtin): Map predicate vectors arguments to HImode. (arm_expand_builtin_args): Move HImode predicate arguments to VxBI rtx. Move return value to HImode rtx. * config/arm/arm-builtins.h (arm_type_qualifiers): Add qualifier_predicate. * config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New modes. * config/arm/arm-simd-builtin-types.def (Pred1x16_t, Pred2x8_t,Pred4x4_t): New. * emit-rtl.cc (init_emit_once): Handle all boolean modes. * genmodes.cc (mode_data): Add boolean field. (blank_mode): Initialize it. (make_complex_modes): Fix handling of boolean modes. (make_vector_modes): Likewise. (VECTOR_BOOL_MODE): Use new COMPONENT parameter. (make_vector_bool_mode): Likewise. (BOOL_MODE): New. (make_bool_mode): New. (emit_insn_modes_h): Fix generation of boolean modes. (emit_class_narrowest_mode): Likewise. * machmode.def: (VECTOR_BOOL_MODE): Document new COMPONENT parameter. Use new BOOL_MODE instead of FRACTIONAL_INT_MODE to define BImode. * rtx-vector-builder.cc (rtx_vector_builder::find_cached_value): Fix handling of constm1_rtx for VECTOR_BOOL. * simplify-rtx.cc (native_encode_rtx): Fix support for VECTOR_BOOL. (native_decode_vector_rtx): Likewise. (test_vector_ops_duplicate): Skip vec_merge test with vectors of booleans. * varasm.cc (output_constant_pool_2): Likewise. diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def index 976bf9b42be..8f399225a80 100644 --- a/gcc/config/aarch64/aarch64-modes.def +++ b/gcc/config/aarch64/aarch64-modes.def @@ -47,10 +47,10 @@ ADJUST_FLOAT_FORMAT (HF, &ieee_half_format); /* Vector modes. */ -VECTOR_BOOL_MODE (VNx16BI, 16, 2); -VECTOR_BOOL_MODE (VNx8BI, 8, 2); -VECTOR_BOOL_MODE (VNx4BI, 4, 2); -VECTOR_BOOL_MODE (VNx2BI, 2, 2); +VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2); +VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2); +VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2); +VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2); ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8); ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4); diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc index e6bbda23e3e..993a2f7b082 100644 --- a/gcc/config/arm/arm-builtins.cc +++ b/gcc/config/arm/arm-builtins.cc @@ -1553,11 +1553,28 @@ arm_init_simd_builtin_types (void) tree eltype = arm_simd_types[i].eltype; machine_mode mode = arm_simd_types[i].mode; - if (eltype == NULL) + if (eltype == NULL + /* VECTOR_BOOL is not supported unless MVE is activated, +this would make build_truth_vector_type_for_mode +crash. */ + && ((GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL) + || !TARGET_HAVE_MVE)) continue; if (arm_simd_types[i].itype == NULL) { - tree type = build_vector_type (eltype, GET_MODE_NUNITS (mode)); + tree type; + if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL) + { + /* Handle MVE predicates: they are internally stored as +16 bits, but are used as vectors of 1, 2 or 4-bit +elements. */ + type = build_truth_vector_type_for_mode (GET_MODE_NUNITS (mode), + mode); + eltype = TREE_TYPE (type); + } + else + type = build_vector_type (eltype, GET_MODE_NUNITS (mode)); + type = build_distinct_type_copy (type); SET_TYPE_STRUCTU
[PATCH v4 06/12] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates
From: Christophe Lyon We make use of qualifier_predicate to describe MVE builtins prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins, as they are exercised by the tests added earlier in the series. Special handling is needed for mve_vpselq because it has a v2di variant, which has no natural VPR.P0 representation: we keep HImode for it. The vector_compare expansion code is updated to use the right VxBI mode instead of HI for the result. We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16 patterns to use the new MVE_7_HI iterator which covers HI and the new VxBI modes, in conjunction with the new DB constraint for a constant vector of booleans. This patch also adds tests derived from the one provided in PR target/101325: there is a compile-only test because I did not have access to anything that could execute MVE code until recently. I have been able to add an executable test since QEMU supports MVE. Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does. This ensures arm_mve_hw passes even if the toolchain does not generate MVE code by default. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon Richard Sandiford gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (BINOP_PRED_UNONE_UNONE_QUALIFIERS) (BINOP_PRED_NONE_NONE_QUALIFIERS) (TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS) (TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New. * config/arm/arm-protos.h (mve_const_bool_vec_to_hi): New. * config/arm/arm.cc (arm_hard_regno_mode_ok): Handle new VxBI modes. (arm_mode_to_pred_mode): New. (arm_expand_vector_compare): Use the right VxBI mode instead of HI. (arm_expand_vcond): Likewise. (simd_valid_immediate): Handle MODE_VECTOR_BOOL. (mve_const_bool_vec_to_hi): New. (neon_make_constant): Call mve_const_bool_vec_to_hi when needed. * config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_) (vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f) (vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u) (vpselq_s, vpselq_f): Use new predicated qualifiers. * config/arm/constraints.md (DB): New. * config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators. (MVE_VPRED, MVE_vpred): New attribute iterators. * config/arm/mve.md (@mve_vcmpq_) (@mve_vcmpq_f, @mve_vpselq_) (@mve_vpselq_f): Use MVE_VPRED instead of HI. (@mve_vpselq_v2di): Define separately. (mov): New expander for VxBI modes. * config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Use MVE_7_HI iterator and add support for DB constraint. gcc/testsuite/ PR target/100757 PR target/101325 * gcc.dg/rtl/arm/mve-vxbi.c: New test. * gcc.target/arm/simd/pr101325.c: New. * gcc.target/arm/simd/pr101325-2.c: New. * lib/target-supports.exp (check_effective_target_arm_mve_hw): Use add_options_for_arm_v8_1m_mve_fp. diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc index 993a2f7b082..1c6b9c986ee 100644 --- a/gcc/config/arm/arm-builtins.cc +++ b/gcc/config/arm/arm-builtins.cc @@ -420,6 +420,12 @@ arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \ (arm_binop_unone_unone_unone_qualifiers) +static enum arm_type_qualifiers +arm_binop_pred_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned }; +#define BINOP_PRED_UNONE_UNONE_QUALIFIERS \ + (arm_binop_pred_unone_unone_qualifiers) + static enum arm_type_qualifiers arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_none, qualifier_immediate }; @@ -438,6 +444,12 @@ arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define BINOP_UNONE_NONE_NONE_QUALIFIERS \ (arm_binop_unone_none_none_qualifiers) +static enum arm_type_qualifiers +arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_predicate, qualifier_none, qualifier_none }; +#define BINOP_PRED_NONE_NONE_QUALIFIERS \ + (arm_binop_pred_none_none_qualifiers) + static enum arm_type_qualifiers arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_none }; @@ -509,6 +521,12 @@ arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \ (arm_ternop_none_none_none_unone_qualifiers) +static enum arm_type_qualifiers +arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate }; +#define TERNOP_NONE
[PATCH v4 07/12] arm: Fix vcond_mask expander for MVE (PR target/100757)
From: Christophe Lyon The problem in this PR is that we call VPSEL with a mask of vector type instead of HImode. This happens because operand 3 in vcond_mask is the pre-computed vector comparison and has vector type. This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE, returning the appropriate VxBI mode when targeting MVE. In turn, this implies implementing vec_cmp, vec_cmpu and vcond_mask_, and we can move vec_cmp, vec_cmpu and vcond_mask_ back to neon.md since they are not used by MVE anymore. The new * patterns listed above are implemented in mve.md since they are only valid for MVE. However this may make maintenance/comparison more painful than having all of them in vec-common.md. In the process, we can get rid of the recently added vcond_mve parameter of arm_expand_vector_compare. Compared to neon.md's vcond_mask_ before my "arm: Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH iterator added in r12-835 (to have V4HF/V8HF support), as well as the (! || flag_unsafe_math_optimizations) condition which was not present before r12-834 although SF modes were enabled by VDQW (I think this was a bug). Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no longer need to generate vpsel with vectors of 0 and 1: the masks are now merged via scalar 'ands' instructions operating on 16-bit masks after converting the boolean vectors. In addition, this patch fixes a problem in arm_expand_vcond() where the result would be a vector of 0 or 1 instead of operand 1 or 2. Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new arm_mve effective target. Reducing the number of iterations in pr100757-3.c from 32 to 8, we generate the code below: float a[32]; float fn1(int d) { float c = 4.0f; for (int b = 0; b < 8; b++) if (a[b] != 2.0f) c = 5.0f; return c; } fn1: ldr r3, .L3+48 vldr.64 d4, .L3 // q2=(2.0,2.0,2.0,2.0) vldr.64 d5, .L3+8 vldrw.32q0, [r3] // q0=a(0..3) addsr3, r3, #16 vcmp.f32eq, q0, q2 // cmp a(0..3) == (2.0,2.0,2.0,2.0) vldrw.32q1, [r3] // q1=a(4..7) vmrs r3, P0 vcmp.f32eq, q1, q2 // cmp a(4..7) == (2.0,2.0,2.0,2.0) vmrsr2, P0 @ movhi andsr3, r3, r2 // r3=select(a(0..3]) & select(a(4..7)) vldr.64 d4, .L3+16 // q2=(5.0,5.0,5.0,5.0) vldr.64 d5, .L3+24 vmsr P0, r3 vldr.64 d6, .L3+32 // q3=(4.0,4.0,4.0,4.0) vldr.64 d7, .L3+40 vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0) vmov.32 r2, q3[1]// keep the scalar max vmov.32 r0, q3[3] vmov.32 r3, q3[2] vmov.f32s11, s12 vmovs15, r2 vmovs14, r3 vmaxnm.f32 s15, s11, s15 vmaxnm.f32 s15, s15, s14 vmovs14, r0 vmaxnm.f32 s15, s15, s14 vmovr0, s15 bx lr .L4: .align 3 .L3: .word 1073741824 // 2.0f .word 1073741824 .word 1073741824 .word 1073741824 .word 1084227584 // 5.0f .word 1084227584 .word 1084227584 .word 1084227584 .word 1082130432 // 4.0f .word 1082130432 .word 1082130432 .word 1082130432 This patch adds tests that trigger an ICE without this fix. The pr100757*.c testcases are derived from gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using various types and return values different from 0 and 1 to avoid commonalization with boolean masks. In addition, since we should not need these masks, the tests make sure they are not present. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon PR target/100757 gcc/ * config/arm/arm-protos.h (arm_get_mask_mode): New prototype. (arm_expand_vector_compare): Update prototype. * config/arm/arm.cc (TARGET_VECTORIZE_GET_MASK_MODE): New. (arm_vector_mode_supported_p): Add support for VxBI modes. (arm_expand_vector_compare): Remove useless generation of vpsel. (arm_expand_vcond): Fix select operands. (arm_get_mask_mode): New. * config/arm/mve.md (vec_cmp): New. (vec_cmpu): New. (vcond_mask_): New. * config/arm/vec-common.md (vec_cmp) (vec_cmpu): Move to ... * config/arm/neon.md (vec_cmp) (vec_cmpu): ... here and disable for MVE. * doc/sourcebuild.texi (arm_mve): Document new effective-target. gcc/testsuite/ PR target/100757 * gcc.target/arm/simd/pr100757-2.c: New. * gcc.target/arm/simd/pr100757-3.c: New. * gcc.target/arm/simd/pr100757-4.c: New. * gcc.target/arm/simd/pr100757.c: New.
[PATCH v4 08/12] arm: Convert remaining MVE vcmp builtins to predicate qualifiers
From: Christophe Lyon This is mostly a mechanical change, only tested by the intrinsics expansion tests. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (BINOP_UNONE_NONE_NONE_QUALIFIERS): Delete. (TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ... (TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this. (TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New. * config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new predicated qualifiers. * config/arm/mve.md (mve_vcmpq_n_) (mve_vcmp*q_m_f): Use MVE_VPRED instead of HI. diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc index 1c6b9c986ee..02411c61098 100644 --- a/gcc/config/arm/arm-builtins.cc +++ b/gcc/config/arm/arm-builtins.cc @@ -438,12 +438,6 @@ arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define BINOP_NONE_NONE_UNONE_QUALIFIERS \ (arm_binop_none_none_unone_qualifiers) -static enum arm_type_qualifiers -arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] - = { qualifier_unsigned, qualifier_none, qualifier_none }; -#define BINOP_UNONE_NONE_NONE_QUALIFIERS \ - (arm_binop_unone_none_none_qualifiers) - static enum arm_type_qualifiers arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_predicate, qualifier_none, qualifier_none }; @@ -504,10 +498,10 @@ arm_ternop_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] (arm_ternop_unone_unone_imm_unone_qualifiers) static enum arm_type_qualifiers -arm_ternop_unone_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] - = { qualifier_unsigned, qualifier_none, qualifier_none, qualifier_unsigned }; -#define TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS \ - (arm_ternop_unone_none_none_unone_qualifiers) +arm_ternop_pred_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_predicate, qualifier_none, qualifier_none, qualifier_predicate }; +#define TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS \ + (arm_ternop_pred_none_none_pred_qualifiers) static enum arm_type_qualifiers arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -553,6 +547,13 @@ arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \ (arm_ternop_unone_unone_unone_pred_qualifiers) +static enum arm_type_qualifiers +arm_ternop_pred_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned, +qualifier_predicate }; +#define TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS \ + (arm_ternop_pred_unone_unone_pred_qualifiers) + static enum arm_type_qualifiers arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_none, qualifier_none }; diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def index 44b41eab4c5..b7ebbcab87f 100644 --- a/gcc/config/arm/arm_mve_builtins.def +++ b/gcc/config/arm/arm_mve_builtins.def @@ -118,9 +118,9 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si) VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si) VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si) VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si) -VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si) +VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si) VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si) -VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si) +VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si) VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si) VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si) VAR3 (BINOP_UNONE_UNONE_UNONE, vaddvq_p_u, v16qi, v8hi, v4si) @@ -142,17 +142,17 @@ VAR3 (BINOP_UNONE_UNONE_NONE, vbrsrq_n_u, v16qi, v8hi, v4si) VAR3 (BINOP_UNONE_UNONE_IMM, vshlq_n_u, v16qi, v8hi, v4si) VAR3 (BINOP_UNONE_UNONE_IMM, vrshrq_n_u, v16qi, v8hi, v4si) VAR3 (BINOP_UNONE_UNONE_IMM, vqshlq_n_u, v16qi, v8hi, v4si) -VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si) +VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si) VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si) -VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si) +VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si) VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si) -VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si) +VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si) VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si) -VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si) +VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si) VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si) -VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si) +VAR3
[PATCH v4 12/12] arm: Add VPR_REG to ALL_REGS
From: Christophe Lyon VPR_REG should be part of ALL_REGS, this patch fixes this omission. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon gcc/ * config/arm/arm.h (REG_CLASS_CONTENTS): Add VPR_REG to ALL_REGS. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 61c02218b78..ef7b66f34ae 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1347,7 +1347,7 @@ enum reg_class { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */\ { 0x, 0x, 0x, 0x0400 }, /* VPR_REG. */ \ { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. */ \ - { 0x7FFF, 0x, 0x, 0x000F } /* ALL_REGS. */ \ + { 0x7FFF, 0x, 0x, 0x040F } /* ALL_REGS. */ \ } #define FP_SYSREGS \ -- 2.25.1
[PATCH v4 10/12] arm: Convert more load/store MVE builtins to predicate qualifiers
From: Christophe Lyon This patch covers a few builtins where we do not use the iterator and thus we cannot use . For v2di instructions, we keep the HI mode for predicates. Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (STRSBS_P_QUALIFIERS): Use predicate qualifier. (STRSBU_P_QUALIFIERS): Likewise. (LDRGBS_Z_QUALIFIERS): Likewise. (LDRGBU_Z_QUALIFIERS): Likewise. (LDRGBWBXU_Z_QUALIFIERS): Likewise. (LDRGBWBS_Z_QUALIFIERS): Likewise. (LDRGBWBU_Z_QUALIFIERS): Likewise. (STRSBWBS_P_QUALIFIERS): Likewise. (STRSBWBU_P_QUALIFIERS): Likewise. * config/arm/mve.md: Use VxBI instead of HI. diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc index a9536b2f7f8..5d582f182b9 100644 --- a/gcc/config/arm/arm-builtins.cc +++ b/gcc/config/arm/arm-builtins.cc @@ -689,13 +689,13 @@ arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum arm_type_qualifiers arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_void, qualifier_unsigned, qualifier_immediate, - qualifier_none, qualifier_unsigned}; + qualifier_none, qualifier_predicate}; #define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers) static enum arm_type_qualifiers arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_void, qualifier_unsigned, qualifier_immediate, - qualifier_unsigned, qualifier_unsigned}; + qualifier_unsigned, qualifier_predicate}; #define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers) static enum arm_type_qualifiers @@ -731,13 +731,13 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum arm_type_qualifiers arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_unsigned, qualifier_immediate, - qualifier_unsigned}; + qualifier_predicate}; #define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers) static enum arm_type_qualifiers arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate, - qualifier_unsigned}; + qualifier_predicate}; #define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers) static enum arm_type_qualifiers @@ -777,7 +777,7 @@ arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum arm_type_qualifiers arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate, - qualifier_unsigned}; + qualifier_predicate}; #define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers) static enum arm_type_qualifiers @@ -793,13 +793,13 @@ arm_ldrgbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum arm_type_qualifiers arm_ldrgbwbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_unsigned, qualifier_immediate, - qualifier_unsigned}; + qualifier_predicate}; #define LDRGBWBS_Z_QUALIFIERS (arm_ldrgbwbs_z_qualifiers) static enum arm_type_qualifiers arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate, - qualifier_unsigned}; + qualifier_predicate}; #define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers) static enum arm_type_qualifiers @@ -815,13 +815,13 @@ arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS] static enum arm_type_qualifiers arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_const, - qualifier_none, qualifier_unsigned}; + qualifier_none, qualifier_predicate}; #define STRSBWBS_P_QUALIFIERS (arm_strsbwbs_p_qualifiers) static enum arm_type_qualifiers arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_const, - qualifier_unsigned, qualifier_unsigned}; + qualifier_unsigned, qualifier_predicate}; #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers) static enum arm_type_qualifiers diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index a8087815c22..9633b7187f6 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -7282,7 +7282,7 @@ (define_insn "mve_vstrwq_scatter_base_p_v4si" [(match_operand:V4SI 0 "s_register_operand" "w") (match_operand:SI 1 "immediate_operand" "i") (match_operand:V4SI 2 "s_register_operand" "w") -(match_operand:HI 3 "vpr_register_operand" "Up")] +(match_operand:V4BI 3 "vpr_register_operand" "Up")] VSTRWSBQ)) ] "TARGET_HAVE_MVE" @@ -7371,7 +7371,7 @@ (define_insn "mve_vldrwq_gather_base_z_v4si" [(set (match_operand:V4SI 0 "s_register_operand" "=&w") (unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w") (match_operand:SI 2 "immediate_operand" "i") - (match_operand:HI 3 "vpr_reg
[PATCH v4 11/12] arm: Convert more MVE/CDE builtins to predicate qualifiers
From: Christophe Lyon This patch covers a few non-load/store builtins where we do not use the iterator and thus we cannot use . Most of the work of this patch series was carried out while I was working at STMicroelectronics as a Linaro assignee. 2022-02-22 Christophe Lyon gcc/ PR target/100757 PR target/101325 * config/arm/arm-builtins.cc (CX_UNARY_UNONE_QUALIFIERS): Use predicate. (CX_BINARY_UNONE_QUALIFIERS): Likewise. (CX_TERNARY_UNONE_QUALIFIERS): Likewise. (TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete. (QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete. (QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete. * config/arm/arm_mve_builtins.def: Use predicated qualifiers. * config/arm/mve.md: Use VxBI instead of HI. diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc index 5d582f182b9..a7acc1d71e7 100644 --- a/gcc/config/arm/arm-builtins.cc +++ b/gcc/config/arm/arm-builtins.cc @@ -295,7 +295,7 @@ static enum arm_type_qualifiers arm_cx_unary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_immediate, qualifier_none, qualifier_unsigned_immediate, - qualifier_unsigned }; + qualifier_predicate }; #define CX_UNARY_UNONE_QUALIFIERS (arm_cx_unary_unone_qualifiers) /* T (immediate, T, T, unsigned immediate). */ @@ -304,7 +304,7 @@ arm_cx_binary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_immediate, qualifier_none, qualifier_none, qualifier_unsigned_immediate, - qualifier_unsigned }; + qualifier_predicate }; #define CX_BINARY_UNONE_QUALIFIERS (arm_cx_binary_unone_qualifiers) /* T (immediate, T, T, T, unsigned immediate). */ @@ -313,7 +313,7 @@ arm_cx_ternary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_immediate, qualifier_none, qualifier_none, qualifier_none, qualifier_unsigned_immediate, - qualifier_unsigned }; + qualifier_predicate }; #define CX_TERNARY_UNONE_QUALIFIERS (arm_cx_ternary_unone_qualifiers) /* The first argument (return type) of a store should be void type, @@ -509,12 +509,6 @@ arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS \ (arm_ternop_none_none_none_imm_qualifiers) -static enum arm_type_qualifiers -arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] - = { qualifier_none, qualifier_none, qualifier_none, qualifier_unsigned }; -#define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \ - (arm_ternop_none_none_none_unone_qualifiers) - static enum arm_type_qualifiers arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate }; @@ -567,13 +561,6 @@ arm_quadop_unone_unone_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS \ (arm_quadop_unone_unone_none_none_pred_qualifiers) -static enum arm_type_qualifiers -arm_quadop_none_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] - = { qualifier_none, qualifier_none, qualifier_none, qualifier_none, -qualifier_unsigned }; -#define QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS \ - (arm_quadop_none_none_none_none_unone_qualifiers) - static enum arm_type_qualifiers arm_quadop_none_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_none, qualifier_none, @@ -588,13 +575,6 @@ arm_quadop_none_none_none_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS \ (arm_quadop_none_none_none_imm_pred_qualifiers) -static enum arm_type_qualifiers -arm_quadop_unone_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS] - = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned, -qualifier_unsigned, qualifier_unsigned }; -#define QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \ - (arm_quadop_unone_unone_unone_unone_unone_qualifiers) - static enum arm_type_qualifiers arm_quadop_unone_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned, diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def index 7db6d47867e..1c8ee34f5cb 100644 --- a/gcc/config/arm/arm_mve_builtins.def +++ b/gcc/config/arm/arm_mve_builtins.def @@ -87,8 +87,8 @@ VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u, v16qi, v8hi, v4si, v2di) VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di) VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si) VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si) -VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si) -VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si) +VAR1 (BINOP_NONE_NONE_PRED, vaddlvq_p_s, v4si) +VAR1 (BINOP_UNONE_UNONE_PRED, vaddlvq_p_u, v4si) VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si) VAR3 (BINOP_NONE
Re: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.
On 2/9/22 21:12, Roger Sayle wrote: This patch adds middle-end support for target ABIs that pass/return floating point values in integer registers with precision wider than the original FP mode. An example, is the nvptx backend where 16-bit HFmode registers are passed/returned as (promoted to) SImode registers. Unfortunately, this currently falls foul of the various (recent?) sanity checks that (very sensibly) prevent creating paradoxical SUBREGs of floating point registers. The approach below is to explicitly perform the conversion/promotion in two steps, via an integer mode of same precision as the floating point value. So on nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using SUBREG), then zero-extended to SImode, and likewise when going the other way, parameters truncated to HImode then converted to HFmode (using SUBREG). These changes are localized to expand_value_return and expanding DECL_RTL to support strange ABIs, rather than inside convert_modes or gen_lowpart, as mismatched precision integer/FP conversions should be explicit in the RTL, and these semantics not generally visible/implicit in user code. Hi Roger, I cannot comment on the patch, but I do wonder (after your "strange ABI" comment): did we actively decide on (or align to) a register passing ABI for HFmode, or has it merely been decided by the implementation of promote_arg: ... static machine_mode promote_arg (machine_mode mode, bool prototyped) { if (!prototyped && mode == SFmode) /* K&R float promotion for unprototyped functions. */ mode = DFmode; else if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)) mode = SImode; return mode; } ... There may be a rationale why it's good to pass a HF as SI, but it's not documented there. Anyway, I checked what cuda does for HF, and it passes a byte array: ... .param .align 2 .b8 _Z5helloPj6__halfs_param_1[2], ... So, I guess what I'm saying is I'd like to understand why we're having the HF -> SI promotion. Thanks, - Tom
Re: [PATCH] nvptx: Back-end portion of a fix for PR target/104489.
On 2/11/22 11:38, Roger Sayle wrote: This one line fix/tweak is the back-end specific change for a fix for PR target/104489, that allows the ISA for GCC's nvptx backend to be bumped to sm_53. The machine-independent middle-end pieces were posted here: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu, together with the above middle-end patch and changes identical to those described by Tom de Vries in the PR, with make and make -k check, where the build now completes, and there are no regressions in the testsuite. Ok for mainline? 2022-02-11 Roger Sayle gcc/ChangeLog PR target/104489 * config/nvptx/nvptx.md (*movhf_insn): Add subregs_ok attribute. LGTM. Thanks, - Tom
Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70
Hi Tom, On 22.02.22 15:43, Tom de Vries wrote: On 2/17/22 18:24, Tobias Burnus wrote: --- a/gcc/config/nvptx/t-omp-device +++ b/gcc/config/nvptx/t-omp-device @@ -1,4 +1,4 @@ echo kind: gpu > $@ echo arch: nvptx >> $@ -echo isa: sm_30 sm_35 >> $@ +echo isa: sm_30 sm_35 sm_53 sm_70 sm_75 sm_80 >> $@ I'm not sure I understand how this is used. Is this user-visible? Is there a libgomp test-case where we can observe a difference? That's used for OpenMP context selectors like; that way, one can generate, e.g. one code used with nvptx and one with gcn as with: #pragma omp declare variant (on_nvptx) match(construct={target},device={arch(nvptx)}) #pragma omp declare variant (on_gcn) match(construct={target},device={arch(gcn)}) ... #pragma omp target map(from:v) v = on (); which then either calls 'on' or 'on_nvptx' or 'on_gcn' (from libgomp/testsuite/libgomp.c/target-42.c) The following testcases use 'arch(nvptx)': libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h libgomp/testsuite/libgomp.c/target-42.c libgomp/testsuite/libgomp.c/usleep.h libgomp/testsuite/libgomp.fortran/declare-variant-1.f90 For ISA, there is only one run-time test: libgomp/testsuite/libgomp.c/declare-variant-1.c but only for x86-64: match (device={isa("avx512f")}) The sm_35 also appears, but only in the compile-time tests: gcc/testsuite/{c-c++-common,gfortran.dg}/gomp/declare-variant-{9,10}.* Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
RE: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.
Hi Tom, I'll admit that I'd not myself considered the ABI issues when I initially proposed experimental HFmode support for the nvptx backend, and was surprised when I finally tracked down the source of the problem you'd reported: that libgcc spots HFmode support exists and immediately starts passing/returning values in this type. The one precedent that I can point to is that LLVM's nvptx backend passes HFmode values in SImode regs, see https://reviews.llvm.org/D28540 Their motivation is that not all PTX ISAs support fp16, so for compatibility with say sm_30/sm_35, fp16 values are treated like b16, i.e. HImode. At this point, the nvptx ABI states that HImode values are passed as SImode, so we end up with the interesting mismatch of HFmode<->SImode. I guess the same thing affects host code, where an i386/x86 host that doesn't support 16-bit floating point, can pass "unsigned short" values to and from the accelerator, and likewise this HImode locally gets passed in a wider (often WORD_MODE) integer types on most x86 ABIs. My guess is that passing SFmode in DImode may have been supported in older versions of GCC, before handling of SUBREGs was tightened up, so this might be considered a regression. Cheers, Roger -- > -Original Message- > From: Tom de Vries > Sent: 22 February 2022 15:43 > To: Roger Sayle ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] middle-end: Support ABIs that pass FP values as wider > integers. > > On 2/9/22 21:12, Roger Sayle wrote: > > > > This patch adds middle-end support for target ABIs that pass/return > > floating point values in integer registers with precision wider than > > the original FP mode. An example, is the nvptx backend where 16-bit > > HFmode registers are passed/returned as (promoted to) SImode registers. > > Unfortunately, this currently falls foul of the various (recent?) > > sanity checks that (very sensibly) prevent creating paradoxical > > SUBREGs of floating point registers. The approach below is to > > explicitly perform the conversion/promotion in two steps, via an > > integer mode of same precision as the floating point value. So on > > nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using > > SUBREG), then zero-extended to SImode, and likewise when going the > > other way, parameters truncated to HImode then converted to HFmode > > (using SUBREG). These changes are localized to expand_value_return > > and expanding DECL_RTL to support strange ABIs, rather than inside > > convert_modes or gen_lowpart, as mismatched precision integer/FP > > conversions should be explicit in the RTL, and these semantics not generally > visible/implicit in user code. > > > > Hi Roger, > > I cannot comment on the patch, but I do wonder (after your "strange ABI" > comment): did we actively decide on (or align to) a register passing ABI for > HFmode, or has it merely been decided by the implementation of > promote_arg: > ... > static machine_mode > promote_arg (machine_mode mode, bool prototyped) { >if (!prototyped && mode == SFmode) > /* K&R float promotion for unprototyped functions. */ > mode = DFmode; >else if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)) > mode = SImode; > >return mode; > } > ... > > There may be a rationale why it's good to pass a HF as SI, but it's not > documented there. > > Anyway, I checked what cuda does for HF, and it passes a byte array: > ... > .param .align 2 .b8 _Z5helloPj6__halfs_param_1[2], ... > > So, I guess what I'm saying is I'd like to understand why we're having the HF > -> SI > promotion. > > Thanks, > - Tom
[PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.
I'd like to get clarification on some subtle terminology. I find I am conflating calls that don't return with calls that may throw, and I think they have different considerations. My experiments with calls that can throw indicate that they always end a basic block. This makes sense to me as there is the outgoing fall-thru edge and an outgoing EH edge. Are there any conditions under which this is not the case? (other than non-call exceptions) If that supposition is true, that leaves us with calls in the middle of the block which may not return. This prevents us from allowing later calculations from impacting anything which happens before the call. I believe the following 2 small patches could then resolve this. 1 - Export global names to SSA_NAME_RANGE_INFO during the statement walk instead of at the end of the pass 2 - Use the existing lazy recomputation machinery to recompute any globals which are defined in the block where a dependent value becomes non-null. More details in each patch. Neither is very large. We could add this to this release or wait for stage 1. Andrew
Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90' (was: Add 'libgomp.oacc-fortran/privatized-ref-2.f90')
Hi! On 2021-05-21T16:28:57+0200, I wrote: > This came into existance internally, when the og10 branch was set up. > > On 2020-06-03T17:23:51+0200, Tobias Burnus wrote: >> This fixes [...] on OG10 (og10_prerelease); it will be >> later applied to gcn/… to fix the issue. (Upstream is unaffected.) >> [...] > > However, that means that your testcase does work on master branch (and > would regress if certain commits got pushed there). As the testcase has > got a property useful for a thing I'm currently working on, I've pushed > to master branch "Add 'libgomp.oacc-fortran/privatized-ref-2.f90'" in > commit 61796dc03befa9b7426d5bc7c336cca585944143 After commit a78b1ab1df9ca44acc5638e8f9d0ae2e62bd65ed "amdgcn: Tune default OpenMP/OpenACC GPU utilization", we'd seen this test case regress (only) on our AMD GPU amd-instinct1/'-march=gfx908' system: {+WARNING: program timed out.+} [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test Same for other optimization levels. Nothing more in 'libgomp.log'. I have determined this is a latent problem in the original test case, which contains a few instances of code as follows: !$acc parallel copyout(array) array = [(-i, i = 1, nn)] !$acc loop gang private(array) do i = 1, 10 array(i) = i end do if (any (array /= [(-i, i = 1, nn)])) error stop 1 !$acc end parallel Given the '!$acc loop gang', the whole containing '!$acc parallel' region is launched with gang parallelism. The '!$acc loop gang' executes in gang-partitioned mode, but the 'array' assignment before and checks after don't execute in a (hypothetical) gang-single mode, but instead in gang-redundant mode, meaning that each gang executes these concurrently, giving rise to data races and other mischief. Thus, we have to make sure that we're not executing non-parallelized code in gang-redundant mode, by putting these parts into their own 'parallel' constructs, which then default to 'num_gangs(1)'. Pushed to master branch commit f8187b5c0d22723c8e0a3d13d0ea5dd7ecfeff75 "Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90'", see attached. Grüße Thomas > I confirm that "FIXME: Fails due to PR middle-end/95499" is still a > problem. > > And, GCC '-O' reports: > > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:147:21: > > 147 | subroutine foobar15 (scalar) > | ^ > Warning: ‘foobar15’ defined but not used [-Wunused-function] > [...]/libgomp.oacc-fortran/privatized-ref-2.f90: In function ‘MAIN__’: > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: > ‘a.offset’ is used uninitialized [-Wuninitialized] >31 | A = [(3*j, j=1, 10)] > | ^ > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared > here >27 | integer, allocatable :: A(:) > | ^ > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: > ‘a.dim[0].lbound’ is used uninitialized [-Wuninitialized] >31 | A = [(3*j, j=1, 10)] > | ^ > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared > here >27 | integer, allocatable :: A(:) > | ^ > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: > ‘a.dim[0].ubound’ is used uninitialized [-Wuninitialized] >31 | A = [(3*j, j=1, 10)] > | ^ > [...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared > here >27 | integer, allocatable :: A(:) > | ^ > > I haven't looked into these. > > > Grüße > Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 >From f8187b5c0d22723c8e0a3d13d0ea5dd7ecfeff75 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Fri, 21 Jan 2022 14:58:23 +0100 Subject: [PATCH] Fix OpenACC gang-redundant execution in 'libgomp.oacc-fortran/privatized-ref-2.f90' This was a latent problem, and this commit here now resolves a regression that after recent commit a78b1ab1df9ca44acc5638e8f9d0ae2e62bd65ed "amdgcn: Tune default OpenMP/OpenACC GPU utilization" we had (only) seen on a GCN offloading '-march=gfx908' system: {+WARNING: program timed out.+} [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 execution test Same for other optimization levels. Make sure that we're not executing non-parallelized code in gang-redundant mode, by putting these parts into their own 'parallel' constructs, which then
[PATCH 1/2] tree-optimization/104530 - Export global ranges during the VRP block walk.
Ranger currently waits until the end of the VRP pass, then calls export_global_ranges (). This method walks the list of ssa-names looking for names which it thinks should have SSA_NAME_RANGE_INFO updated, and is an artifact of the on-demand mechanism where there isn't an obvious time to finalize a name. The changes for 104288 introduced the register_side_effects method and do provide a final place where stmt's are processed during the DOMWALK. This patch exports the global range calculated by the statement (before processing side effects), and avoids the need for calling the export method. This is generally better all round I think. Bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running to ensure... OK for trunk? or defer to stage 1? Andrew From 60ba59b5d57236ce4bab28ecdcb790c21c733904 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Wed, 16 Feb 2022 19:59:34 -0500 Subject: [PATCH 1/2] Export global ranges during the VRP block walk. VRP currently searches the ssa_name list for globals to exported after it finishes running. Recent changes have VRP calling a side-effect routine for each stmt during the walk. This change simply exports globals as they are calculated the final time during the walk. * gimple-range-cache.cc (ranger_cache::update_to_nonnull): Set the global value in the def block, remove the on-entry cache hack. * gimple-range.cc (gimple_ranger::register_side_effects): First check if the DEF should be exported as a global. * tree-vrp.cc (rvrp_folder::pre_fold_bb): Process PHI side effects, which will export globals. (execute_ranger_vrp): Remove call to export_global_ranges. --- gcc/gimple-range.cc | 22 ++ gcc/tree-vrp.cc | 4 +++- 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc index 04075a98a80..3d1843670a5 100644 --- a/gcc/gimple-range.cc +++ b/gcc/gimple-range.cc @@ -454,6 +454,28 @@ gimple_ranger::fold_stmt (gimple_stmt_iterator *gsi, tree (*valueize) (tree)) void gimple_ranger::register_side_effects (gimple *s) { + // First, export the LHS if it is a new global range. + tree lhs = gimple_get_lhs (s); + if (lhs) +{ + int_range_max tmp; + if (range_of_stmt (tmp, s, lhs) && !tmp.varying_p () + && update_global_range (tmp, lhs) && dump_file) + { + value_range vr = tmp; + fprintf (dump_file, "Global Exported: "); + print_generic_expr (dump_file, lhs, TDF_SLIM); + fprintf (dump_file, " = "); + vr.dump (dump_file); + int_range_max same = vr; + if (same != tmp) + { + fprintf (dump_file, " ... irange was : "); + tmp.dump (dump_file); + } + fputc ('\n', dump_file); + } +} m_cache.block_apply_nonnull (s); } diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc index e9f19d0c8b9..1ad099b9ba3 100644 --- a/gcc/tree-vrp.cc +++ b/gcc/tree-vrp.cc @@ -4295,6 +4295,9 @@ public: void pre_fold_bb (basic_block bb) OVERRIDE { m_pta->enter (bb); +for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + m_ranger->register_side_effects (gsi.phi ()); } void post_fold_bb (basic_block bb) OVERRIDE @@ -4338,7 +4341,6 @@ execute_ranger_vrp (struct function *fun, bool warn_array_bounds_p) gimple_ranger *ranger = enable_ranger (fun); rvrp_folder folder (ranger); folder.substitute_and_fold (); - ranger->export_global_ranges (); if (dump_file && (dump_flags & TDF_DETAILS)) ranger->dump (dump_file); -- 2.17.2
[PATCH 2/2] tree-optimization/104530 - Mark defs dependent on non-null stale.
This patch simply leverages the existing computation machinery to re-evaluate values dependent on a newly found non-null value Ranger associates a monotonically increasing temporal value with every def as it is defined. When that value is used, we check if any of the values used in the definition have been updated, making the current cached global value stale. This makes the evaluation lazy, if there are no more uses, we will never re-evaluate. When an ssa-name is marked non-null it does not change the global value, and thus will not invalidate any global values. This patch marks any definitions in the block which are dependent on the non-null value as stale. This will cause them to be re-evaluated when they are next used. Imports: b.0_1 d.3_7 Exports: b.0_1 _2 _3 d.3_7 _8 _2 : b.0_1(I) _3 : b.0_1(I) _2 _8 : b.0_1(I) _2 _3 d.3_7(I) b.0_1 = b; _2 = b.0_1 == 0B; _3 = (int) _2; c = _3; _5 = *b.0_1; <<-- from this point b.0_1 is [+1, +INF] a = _5; d.3_7 = d; _8 = _3 % d.3_7; if (_8 != 0) when _5 is defined, and n.0_1 becomes non-null, we mark the dependent names that are exports and defined in this block as stale. so _2, _3 and _8. When _8 is being calculated, _3 is stale, and causes it to be recomputed. it is dependent on _2, alsdo stale, so it is also recomputed, and we end up with _2 == [0, 0] _3 == [0 ,0] and _8 = [0, 0] And then we can fold away the condition. The side effect is that _2 and _3 are globally changed to be [0, 0], but this is OK because it is the definition block, so it dominates all other uses of these names, and they should be [0,0] upon exit anyway. The previous patch ensure that the global values written to SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3. The patch would have been even smaller if I already had a mark_stale method. I thought there was one, but I guess it never made it in from lack of need at the time. The only other tweak was to make the value stale if the dependent value was the same as the definitions. This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running to ensure. OK for trunk? or defer to stage 1? Andrew From a7e4e5f04899817cacc3ebe5cc3ff2d489489309 Mon Sep 17 00:00:00 2001 From: Andrew MacLeod Date: Tue, 22 Feb 2022 09:58:00 -0500 Subject: [PATCH 2/2] Mark defs dependent on non-null stale. When a name is marked as non-null, find all exports from the block, and mark their timestamp as stale. Any following use of the name will trigger a recomputaion using the new non-null range. PR tree-optimization/104530 gcc/ * gimple-range-cache.cc (temporal_cache::set_stale): New. (temporal_cache::current_p): Identical timestamp is not current. (ranger_cache::update_to_nonnull): Mark any export defined in this block stale if it is dependent on this name. gcc/testsuite/ * gcc.dg/pr104530.c: New. --- gcc/gimple-range-cache.cc | 26 -- gcc/testsuite/gcc.dg/pr104530.c | 17 + 2 files changed, 41 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr104530.c diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc index 613135266a4..debc93767a9 100644 --- a/gcc/gimple-range-cache.cc +++ b/gcc/gimple-range-cache.cc @@ -696,6 +696,7 @@ public: bool current_p (tree name, tree dep1, tree dep2) const; void set_timestamp (tree name); void set_always_current (tree name); + void set_stale (tree name); private: unsigned temporal_value (unsigned ssa) const; @@ -740,9 +741,9 @@ temporal_cache::current_p (tree name, tree dep1, tree dep2) const // Any non-registered dependencies will have a value of 0 and thus be older. // Return true if time is newer than either dependent. - if (dep1 && ts < temporal_value (SSA_NAME_VERSION (dep1))) + if (dep1 && ts <= temporal_value (SSA_NAME_VERSION (dep1))) return false; - if (dep2 && ts < temporal_value (SSA_NAME_VERSION (dep2))) + if (dep2 && ts <= temporal_value (SSA_NAME_VERSION (dep2))) return false; return true; @@ -759,6 +760,18 @@ temporal_cache::set_timestamp (tree name) m_timestamp[v] = ++m_current_time; } +// Mark a NAME as stale by marking the timestamp as oldest, unless it is +// already "always current". + +inline void +temporal_cache::set_stale (tree name) +{ + unsigned v = SSA_NAME_VERSION (name); + if (v >= m_timestamp.length () || m_timestamp[v] == 0) +return; + m_timestamp[v] = 1; +} + // Set the timestamp to 0, marking it as "always up to date". inline void @@ -1475,6 +1488,15 @@ ranger_cache::update_to_nonnull (basic_block bb, tree name) { r.set_nonzero (type); m_on_entry.set_bb_range (name, bb, r); + // Mark consumers of name stale so they can be recomputed. + if (m_gori.is_import_p (name, bb) || m_gori.is_export_p (name, bb)) + { + tree x; + FOR_EACH_GORI_EXPORT_NAME (m_gori, bb, x) + if (m_gori.in_chai
Further simplify 'gcc/omp-oacc-neuter-broadcast.cc:record_field_map_t' (was: [PATCH 1/4] openacc: Middle-end worker-partitioning support)
Hi! On 2021-08-16T12:34:09+0200, I wrote: > On 2021-08-06T09:49:58+0100, Julian Brown wrote: >> On Wed, 4 Aug 2021 15:13:30 +0200 >> Thomas Schwinge wrote: >> >>> 'oacc_do_neutering' is the 'execute' function of the pass, so that >>> means every time this executes, a fresh 'field_map' is set up, no >>> state persists across runs (assuming I'm understanding that >>> correctly). Why don't we simply use standard (non-GC) memory >>> management for that? "For convenience" shall be fine as an answer >>> ;-) -- but maybe instead of figuring out the right GC annotations, >>> changing the memory management will be easier? (Or, of course, maybe >>> I completely misunderstood that?) >> >> I suspect you're right, and there's no need for this to be GC-allocated >> memory. If non-standard memory allocation will work out fine, we should > > ("non-GC", I suppose.) > >> probably use that instead. > > Pushed "Avoid 'GTY' use for 'gcc/omp-oacc-neuter-broadcast.cc:field_map'" > to master branch in commit 049eda8274b7394523238b17ab12c3e2889f253e In commit 0fe9176f410accc767e0abab010aec843b2e7ea6 I've now pushed "Further simplify 'gcc/omp-oacc-neuter-broadcast.cc:record_field_map_t'" to master branch, see attached. Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 >From 0fe9176f410accc767e0abab010aec843b2e7ea6 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Fri, 13 Aug 2021 21:17:55 +0200 Subject: [PATCH] Further simplify 'gcc/omp-oacc-neuter-broadcast.cc:record_field_map_t' Now that I've resolved GCC 'hash_map' issues (a while ago already), we may further simplify this after commit 049eda8274b7394523238b17ab12c3e2889f253e "Avoid 'GTY' use for 'gcc/omp-oacc-neuter-broadcast.cc:field_map'": as 'hash_map' Value, directly store 'field_map_t' objects, not pointers to manually allocated 'field_map_t' objects. gcc/ * omp-oacc-neuter-broadcast.cc (record_field_map_t): Further simplify. Adjust all users. --- gcc/omp-oacc-neuter-broadcast.cc | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/gcc/omp-oacc-neuter-broadcast.cc b/gcc/omp-oacc-neuter-broadcast.cc index 7fb691d7155..314161e38f5 100644 --- a/gcc/omp-oacc-neuter-broadcast.cc +++ b/gcc/omp-oacc-neuter-broadcast.cc @@ -538,7 +538,7 @@ typedef hash_map field_map_t; to propagate, to the field in the record type that should be used for transmission and reception. */ -typedef hash_map record_field_map_t; +typedef hash_map record_field_map_t; static void install_var_field (tree var, tree record_type, field_map_t *fields) @@ -1168,8 +1168,7 @@ worker_single_copy (basic_block from, basic_block to, gcc_assert (TREE_CODE (var) == VAR_DECL); /* If we had no record type, we will have no fields map. */ - field_map_t **fields_p = record_field_map->get (record_type); - field_map_t *fields = fields_p ? *fields_p : NULL; + field_map_t *fields = record_field_map->get (record_type); if (worker_partitioned_uses->contains (var) && fields @@ -1684,10 +1683,9 @@ oacc_do_neutering (unsigned HOST_WIDE_INT bounds_lo, field_vec.qsort (sort_by_size_then_ssa_version_or_uid); - field_map_t *fields = new field_map_t; - bool existed; - existed = record_field_map.put (record_type, fields); + field_map_t *fields + = &record_field_map.get_or_insert (record_type, &existed); gcc_checking_assert (!existed); /* Insert var fields in reverse order, so the last inserted element @@ -1818,8 +1816,6 @@ oacc_do_neutering (unsigned HOST_WIDE_INT bounds_lo, &partitioned_var_uses, &record_field_map, &blk_offset_map, writes_gang_private); - for (auto it : record_field_map) -delete it.second; record_field_map.empty (); /* These are supposed to have been 'delete'd by 'neuter_worker_single'. */ -- 2.34.1
Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.
On Tue, Feb 22, 2022 at 11:39:41AM -0500, Andrew MacLeod wrote: > I'd like to get clarification on some subtle terminology. I find I am > conflating calls that don't return with calls that may throw, and I think > they have different considerations. > > My experiments with calls that can throw indicate that they always end a > basic block. This makes sense to me as there is the outgoing fall-thru edge > and an outgoing EH edge. Are there any conditions under which this is not > the case? (other than non-call exceptions) Generally, there are 2 kinds of calls that can throw, those that can throw internally and those can throw externally (e.g. there are stmt_could_throw_{in,ex}ternal predicates). Consider e.g. void foo (); struct S { S (); ~S (); }; void bar () { foo (); foo (); } void baz () { S s; foo (); foo (); } void qux () { try { foo (); } catch (...) {} } the calls to foo in bar throw externally, if they throw, execution doesn't continue anywhere in bar but in some bar's caller, or could just terminate if nothing catches it at all. Such calls don't terminate a bb. In baz, the s variable needs destruction if either of the foo calls throw, so those calls do terminate bb and there are normal fallthru edges from those bbs and eh edges to an EH pad which will destruct s and continue propagating the exception. In qux, there is explicit try/catch, so again, foo throws internally, ends bb, has an EH edge to EH landing pad which will do what catch does. That is EH, then there are calls that might not return because they leave in some other way (e.g. longjmp), or might loop forever, might exit, might abort, trap etc. I must say I don't know if we have any call flags that would guarantee the function will always return (exactly once) if called. Perhaps ECF_CONST/EFC_PURE without ECF_LOOPING_CONST_OR_PURE do? Jakub
Get rid of 'gcc/omp-oacc-neuter-broadcast.cc:oacc_build_component_ref' (was: Re-unify 'omp_build_component_ref' and 'oacc_build_component_ref')
Hi! On 2021-08-09T16:16:51+0200, I wrote: > This concerns a class of ICEs seen as of og10 branch with the > "openacc: Middle-end worker-partitioning support" and "amdgcn: > Enable OpenACC worker partitioning for AMD GCN" changes applied: I've determined that as of commit 2a3f9f6532bb21d8ab6f16fbe9ee603f6b1405f2 "openacc: Shared memory layout optimisation", we're no longer running into the vectorizer ICEs for '!ADDR_SPACE_GENERIC_P'. I have not researched if they've just gone latent (again), or whether that commit really changed something to avoid those (bug fix). Anyway: pushed to master branch commit 54f745023276e5025e34b2cc22530c78423a93cb "Get rid of 'gcc/omp-oacc-neuter-broadcast.cc:oacc_build_component_ref'", see attached. Grüße Thomas > On 2020-06-06T16:07:36+0100, Kwok Cheung Yeung wrote: >> On 01/06/2020 8:48 pm, Kwok Cheung Yeung wrote: >>> On 21/05/2020 10:23 pm, Kwok Cheung Yeung wrote: These all have the same failure mode: during RTL pass: expand [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90: In function 'MAIN__._omp_fn.1': [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90:86: internal compiler error: in convert_memory_address_addr_space_1, at explow.c:302 0xc29f20 convert_memory_address_addr_space_1(scalar_int_mode, rtx_def*, unsigned char, bool, bool) [...]/gcc/explow.c:302 0xc29f57 convert_memory_address_addr_space(scalar_int_mode, rtx_def*, unsigned char) [...]/gcc/explow.c:404 [...] > This occurs if the -ftree-slp-vectorize flag is specified (default at -O3). > >>> The problematic bit of Gimple code is this: >>> >>>.oacc_worker_o.44._120 = gangs_min_472; >>>.oacc_worker_o.44._122 = workers_min_473; >>>.oacc_worker_o.44._124 = vectors_min_474; >>>.oacc_worker_o.44._126 = gangs_max_475; >>>.oacc_worker_o.44._128 = workers_max_476; >>>.oacc_worker_o.44._130 = vectors_max_477; >>>.oacc_worker_o.44._132 = 0; >>> >>> With SLP vectorization enabled, it becomes this: >>> >>>_40 = {gangs_min_472, workers_min_473, vectors_min_474, gangs_max_475}; >>>... >>>MEM [(int *)&.oacc_worker_o.44] = _40; >>>.oacc_worker_o.44._128 = workers_max_476; >>>.oacc_worker_o.44._130 = vectors_max_477; >>>.oacc_worker_o.44._132 = 0; >>> >>> The optimization is trying to transform 4 separate assignments into a single >>> memory operation. The trouble is that &o.acc_worker_o is an SImode pointer >>> in >>> AS4 (LDS), while the memory expression appears to be in the default memory >>> space. The 'to' expression of the assignment is: >>> >>> >> type >> type >> size >>> unit-size >>> align:32 warn_if_not_align:0 symtab:0 alias-set 1 >>> canonical-type 0x773195e8 precision:32 min >> -2147483648> max >>> pointer_to_this >>> reference_to_this > >>> TI >>> size >>> unit-size >>> align:128 warn_if_not_align:0 symtab:0 alias-set 1 >>> structural-equality nunits:4 >>> pointer_to_this > >>> >>> arg:0 >> type >> 0x773195e8 int> >>> public unsigned DI >>> size >>> unit-size >>> align:64 warn_if_not_align:0 symtab:0 alias-set 2 >>> structural-equality> >>> constant >>> arg:0 >> 0x773eb888 .oacc_ws_data_s.21 address-space-4> >>> addressable used static ignored BLK >>> [...]/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90:86:0 >>> >>> size >>> unit-size >>> align:128 warn_if_not_align:0 >>> (mem/c:BLK (symbol_ref:SI (".oacc_worker_o.44.14") [flags 0x2] >>> ) [9 .oacc_worker_o.44+0 S28 >>> A128 AS4])>> >>> arg:1 >>> constant 0>> >>> >>> In convert_memory_address_addr_space_1: >>> >>> #ifndef POINTERS_EXTEND_UNSIGNED >>>gcc_assert (GET_MODE (x) == to_mode || GET_MODE (x) == VOIDmode); >>>return x; >>> #else /* defined(POINTERS_EXTEND_UNSIGNED) */ >>> >>> POINTERS_EXTEND_UNSIGNED is not defined, so it hits the assert. The expected >>> to_mode is DI_mode, but x is SI_mode, so the assert fires. > >> I now have a fix for this. >> >> >MEM [(int *)&.oacc_worker_o.44] = _40; >> >> The ICE occurs because the SLP vectorization pass creates the new statement >> using the type of the expression '&.oacc_worker_o.44', which is a pointer to >> a >> component ref in the default address space. The expand pass gets confused >> because it is handed an SImode pointer (for LDS) when it is expecting a >> DImode >> pointer (for flat/global space). >> >> The underlying problem is that although .oacc_worker_o is in the correct >> address >> space, the component ref .oacc_worker_o is not. I fixed this by propagating >> the >> address space of .oacc_worker_o when the component ref is created. > >> static tree >> oacc_build_component_ref (tree obj, tr
[PATCH][middle-end/104550]Suppress uninitialized warnings for new created uses from __builtin_clear_padding folding
__builtin_clear_padding(&object) will clear all the padding bits of the object. actually, it doesn't involve any use of an user variable. Therefore, users do not expect any uninitialized warning from it. It's reasonable to suppress uninitialized warnings for all new created uses from __builtin_clear_padding folding. The patch has been bootstrapped and regress tested on both x86 and aarch64. Okay for trunk? Thanks. Qing == >From cf6620005f55d4a1f782332809445c270d22cf86 Mon Sep 17 00:00:00 2001 From: qing zhao Date: Mon, 21 Feb 2022 16:38:31 + Subject: [PATCH] Suppress uninitialized warnings for new created uses from __builtin_clear_padding folding [PR104550] __builtin_clear_padding(&object) will clear all the padding bits of the object. actually, it doesn't involve any use of an user variable. Therefore, users do not expect any uninitialized warning from it. It's reasonable to suppress uninitialized warnings for all new created uses from __builtin_clear_padding folding. PR middle-end/104550 gcc/ChangeLog: * gimple-fold.cc (clear_padding_flush): Suppress warnings for new created uses. (clear_padding_emit_loop): Likewise. (clear_padding_type): Likewise. (gimple_fold_builtin_clear_padding): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/auto-init-pr104550-1.c: New test. * gcc.dg/auto-init-pr104550-2.c: New test. * gcc.dg/auto-init-pr104550-3.c: New test. --- gcc/gimple-fold.cc | 31 +++-- gcc/testsuite/gcc.dg/auto-init-pr104550-1.c | 10 +++ gcc/testsuite/gcc.dg/auto-init-pr104550-2.c | 11 gcc/testsuite/gcc.dg/auto-init-pr104550-3.c | 11 4 files changed, 55 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-1.c create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-2.c create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-3.c diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index 16f02c2d098..1e18ba3465a 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -4296,6 +4296,7 @@ clear_padding_flush (clear_padding_struct *buf, bool full) build_int_cst (buf->alias_type, buf->off + padding_end - padding_bytes)); + suppress_warning (dst, OPT_Wuninitialized); gimple *g = gimple_build_assign (dst, src); gimple_set_location (g, buf->loc); gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); @@ -4341,6 +4342,7 @@ clear_padding_flush (clear_padding_struct *buf, bool full) tree dst = build2_loc (buf->loc, MEM_REF, atype, buf->base, build_int_cst (buf->alias_type, off)); + suppress_warning (dst, OPT_Wuninitialized); gimple *g = gimple_build_assign (dst, src); gimple_set_location (g, buf->loc); gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); @@ -4370,6 +4372,7 @@ clear_padding_flush (clear_padding_struct *buf, bool full) atype = build_aligned_type (type, buf->align); tree dst = build2_loc (buf->loc, MEM_REF, atype, buf->base, build_int_cst (buf->alias_type, off)); + suppress_warning (dst, OPT_Wuninitialized); tree src; gimple *g; if (all_ones @@ -4420,6 +4423,7 @@ clear_padding_flush (clear_padding_struct *buf, bool full) build_int_cst (buf->alias_type, buf->off + end - padding_bytes)); + suppress_warning (dst, OPT_Wuninitialized); gimple *g = gimple_build_assign (dst, src); gimple_set_location (g, buf->loc); gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); @@ -4620,14 +4624,18 @@ clear_padding_emit_loop (clear_padding_struct *buf, tree type, gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); clear_padding_type (buf, type, buf->sz, for_auto_init); clear_padding_flush (buf, true); - g = gimple_build_assign (buf->base, POINTER_PLUS_EXPR, buf->base, - size_int (buf->sz)); + tree rhs = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (buf->base), + buf->base, size_int (buf->sz)); + suppress_warning (rhs, OPT_Wuninitialized); + g = gimple_build_assign (buf->base, rhs); gimple_set_location (g, buf->loc); gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); g = gimple_build_label (l2); gimple_set_location (g, buf->loc); gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); - g = gimple_build_cond (NE_EXPR, buf->base, end, l1, l3); + tree cond_expr = fold_build2 (NE_EXPR, boolean_type_node, buf->base, end); + suppress_warn
Re: [PATCH] Check if loading const from mem is faster
Hi Jiu Fu, On Tue, Feb 22, 2022 at 02:53:13PM +0800, Jiufu Guo wrote: > static bool > rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) > { > - if (GET_CODE (x) == HIGH > - && GET_CODE (XEXP (x, 0)) == UNSPEC) > + if (GET_CODE (x) == HIGH) > return true; This isn't explained anywhere. "Update" is not enough ;-) CSE is the pass that is most ancient and still causing problems left and right. It should be rewritten sooner rather than later. The problem with that is that the pass does so much more than just CSE, and we don't want to lose all those other things. So it will be a slow arduous affair of peeling off bits into separate passes, I think :-( Doing actual CSE without all the restrictive restrictions our pass has historically had isn't the hard part! Segher
Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.
On 2/22/22 11:56, Jakub Jelinek wrote: On Tue, Feb 22, 2022 at 11:39:41AM -0500, Andrew MacLeod wrote: I'd like to get clarification on some subtle terminology. I find I am conflating calls that don't return with calls that may throw, and I think they have different considerations. My experiments with calls that can throw indicate that they always end a basic block. This makes sense to me as there is the outgoing fall-thru edge and an outgoing EH edge. Are there any conditions under which this is not the case? (other than non-call exceptions) Generally, there are 2 kinds of calls that can throw, those that can throw internally and those can throw externally (e.g. there are stmt_could_throw_{in,ex}ternal predicates). Consider e.g. void foo (); struct S { S (); ~S (); }; void bar () { foo (); foo (); } void baz () { S s; foo (); foo (); } void qux () { try { foo (); } catch (...) {} } the calls to foo in bar throw externally, if they throw, execution doesn't continue anywhere in bar but in some bar's caller, or could just terminate if nothing catches it at all. Such calls don't terminate a bb. This is not a problem. In baz, the s variable needs destruction if either of the foo calls throw, so those calls do terminate bb and there are normal fallthru edges from those bbs and eh edges to an EH pad which will destruct s and continue propagating the exception. In qux, there is explicit try/catch, so again, foo throws internally, ends bb, has an EH edge to EH landing pad which will do what catch does. Those also are not a problem, everything should flow fine in these situations as well now that we make non-null adjustments on edges, and don't for EH edges. As far as these patches go, any block which has a call at the exit point will not have any import or exports as there is no range stmt at the end of the block, so we will not be marking anything in those blocks as stale. That is EH, then there are calls that might not return because they leave in some other way (e.g. longjmp), or might loop forever, might exit, might abort, trap etc. Generally speaking, calls which do not return should not now be a problem... as long as they do not transfer control to somewhere else in the current function. I must say I don't know if we have any call flags that would guarantee the function will always return (exactly once) if called. Perhaps ECF_CONST/EFC_PURE without ECF_LOOPING_CONST_OR_PURE do? I don't think I actually need that. Andrew
Re: [PATCH] Restore bootstrap on x86_64-pc-linux-gnu
On Tue, Feb 22, 2022 at 2:40 PM Roger Sayle wrote: > > > > This patch resolves the bootstrap failure on x86_64-pc-linux-gnu. > > Is this sufficiently "obvious" in stage4, or should I wait for the bootstrap > > and regression testing to complete? Please just bootstrap the compiler. > > > 2022-02-22 Roger Sayle > > > > gcc/ChangeLog > > * config/i386/i386-expand.cc (ix86_expand_cmpxchg_loop): Restore > > bootstrap. OK. Thanks, Uros. > > > > Cheers, > > Roger > > -- > > >
Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.
On Tue, Feb 22, 2022 at 12:39:28PM -0500, Andrew MacLeod wrote: > > That is EH, then there are calls that might not return because they leave > > in some other way (e.g. longjmp), or might loop forever, might exit, might > > abort, trap etc. > Generally speaking, calls which do not return should not now be a problem... > as long as they do not transfer control to somewhere else in the current > function. I thought all of those cases are very relevant to PR104530. If we have: _1 = ptr_2(D) == 0; // unrelated code in the same bb _3 = *ptr_2(D); then in light of PR104288, we can optimize ptr_2(D) == 0 into true only if there are no calls inside of "// unrelated code in the same bb" or if all calls in "// unrelated code in the same bb" are guaranteed to return exactly once. Because, if there is a call in there which could exit (that is the PR104288 testcase), or abort, or trap, or loop forever, or throw externally, or longjmp or in any other non-UB way cause the _1 = ptr_2(D) == 0; stmt to be invoked at runtime but _3 = *ptr_2(D) not being invoked, then we can't optimize the earlier comparison because ptr_2(D) could be NULL in a valid program. While if there are no calls (and no problematic inline asms) and no trapping insns in between, we can and PR104530 is asking that we continue to optimize that. Jakub
Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.
On 2/22/2022 10:57 AM, Jakub Jelinek via Gcc-patches wrote: On Tue, Feb 22, 2022 at 12:39:28PM -0500, Andrew MacLeod wrote: That is EH, then there are calls that might not return because they leave in some other way (e.g. longjmp), or might loop forever, might exit, might abort, trap etc. Generally speaking, calls which do not return should not now be a problem... as long as they do not transfer control to somewhere else in the current function. I thought all of those cases are very relevant to PR104530. If we have: _1 = ptr_2(D) == 0; // unrelated code in the same bb _3 = *ptr_2(D); then in light of PR104288, we can optimize ptr_2(D) == 0 into true only if there are no calls inside of "// unrelated code in the same bb" or if all calls in "// unrelated code in the same bb" are guaranteed to return exactly once. Because, if there is a call in there which could exit (that is the PR104288 testcase), or abort, or trap, or loop forever, or throw externally, or longjmp or in any other non-UB way cause the _1 = ptr_2(D) == 0; stmt to be invoked at runtime but _3 = *ptr_2(D) not being invoked, then we can't optimize the earlier comparison because ptr_2(D) could be NULL in a valid program. While if there are no calls (and no problematic inline asms) and no trapping insns in between, we can and PR104530 is asking that we continue to optimize that. Right. This is similar to some of the restrictions we deal with in the path isolation pass. Essentially we have a path, when traversed, would result in a *0. We would like to be able to find the edge upon-which the *0 is control dependent and optimize the test so that it always went to the valid path rather than the *0 path. The problem is there may be observable side effects on the *0 path between the test and the actual *0 -- including calls to nonreturning functions, setjmp/longjmp, things that could trap, etc. This case is similar. We can't back-propagate the non-null status through any statements with observable side effects. Jeff
Re: [PATCH 1/3] rs6000: Move g++.dg/ext powerpc tests to g++.target
Hi! On Mon, Feb 21, 2022 at 03:17:45PM -0600, Paul A. Clarke wrote: > Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no > longer required. > > 2021-02-21 Paul A. Clarke > > gcc/testsuite > * g++.dg/ext/altivec-1.C: Move to g++.target/powerpc, adjust dg > directives. > * g++.dg/ext/altivec-2.C: Likewise. > * g++.dg/ext/altivec-3.C: Likewise. > * g++.dg/ext/altivec-4.C: Likewise. > * g++.dg/ext/altivec-5.C: Likewise. > * g++.dg/ext/altivec-6.C: Likewise. > * g++.dg/ext/altivec-7.C: Likewise. > * g++.dg/ext/altivec-8.C: Likewise. > * g++.dg/ext/altivec-9.C: Likewise. > * g++.dg/ext/altivec-10.C: Likewise. > * g++.dg/ext/altivec-11.C: Likewise. > * g++.dg/ext/altivec-12.C: Likewise. > * g++.dg/ext/altivec-13.C: Likewise. > * g++.dg/ext/altivec-14.C: Likewise. > * g++.dg/ext/altivec-15.C: Likewise. > * g++.dg/ext/altivec-16.C: Likewise. > * g++.dg/ext/altivec-17.C: Likewise. > * g++.dg/ext/altivec-18.C: Likewise. > * g++.dg/ext/altivec-cell-1.C: Likewise. > * g++.dg/ext/altivec-cell-2.C: Likewise. > * g++.dg/ext/altivec-cell-3.C: Likewise. > * g++.dg/ext/altivec-cell-4.C: Likewise. > * g++.dg/ext/altivec-cell-5.C: Likewise. > * g++.dg/ext/altivec-types-1.C: Likewise. > * g++.dg/ext/altivec-types-2.C: Likewise. > * g++.dg/ext/altivec-types-3.C: Likewise. > * g++.dg/ext/altivec-types-4.C: Likewise. > * g++.dg/ext/undef-bool-1.C: Likewise. Okay for trunk. Thanks! Segher
Re: [PATCH 0/3] rs6000: Move g++.dg powerpc tests to g++.target
On Mon, Feb 21, 2022 at 03:17:44PM -0600, Paul A. Clarke wrote: > Some tests in g++.dg are target-specific for powerpc. Move those to > g++.target/powerpc. Update the DejaGnu directives as needed, since > the target restriction is perhaps no longer needed when residing in the > target-specific powerpc subdirectory. Not "perhaps" :-) More specifically, powerpc.exp has # Exit immediately if this isn't a PowerPC target. if {![istarget powerpc*-*-*] } then { return } so anything run from that driver does not have to test for powerpc separately anymore. Segher
Re: [PATCH] PR fortran/104619 - [10/11/12 Regression] ICE on list comprehension with default derived type constructor
Hi Harald, a recently introduced shape validation for an array constructor against the declared shape of a DT component failed to punt if the shape of the constructor cannot be determined at compile time. Suggested solution: skip the shape check in those cases. Regtested on x86_64-pc-linux-gnu. OK for mainline / affected branches? Looks good to me. Thanks for the patch! Best regards Thomas
Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.
On 2/22/22 13:07, Jeff Law wrote: On 2/22/2022 10:57 AM, Jakub Jelinek via Gcc-patches wrote: On Tue, Feb 22, 2022 at 12:39:28PM -0500, Andrew MacLeod wrote: That is EH, then there are calls that might not return because they leave in some other way (e.g. longjmp), or might loop forever, might exit, might abort, trap etc. Generally speaking, calls which do not return should not now be a problem... as long as they do not transfer control to somewhere else in the current function. I thought all of those cases are very relevant to PR104530. If we have: _1 = ptr_2(D) == 0; // unrelated code in the same bb _3 = *ptr_2(D); then in light of PR104288, we can optimize ptr_2(D) == 0 into true only if there are no calls inside of "// unrelated code in the same bb" or if all calls in "// unrelated code in the same bb" are guaranteed to return exactly once. Because, if there is a call in there which could exit (that is the PR104288 testcase), or abort, or trap, or loop forever, or throw externally, or longjmp or in any other non-UB way cause the _1 = ptr_2(D) == 0; stmt to be invoked at runtime but _3 = *ptr_2(D) not being invoked, then we can't optimize the earlier comparison because ptr_2(D) could be NULL in a valid program. While if there are no calls (and no problematic inline asms) and no trapping insns in between, we can and PR104530 is asking that we continue to optimize that. Right. This is similar to some of the restrictions we deal with in the path isolation pass. Essentially we have a path, when traversed, would result in a *0. We would like to be able to find the edge upon-which the *0 is control dependent and optimize the test so that it always went to the valid path rather than the *0 path. The problem is there may be observable side effects on the *0 path between the test and the actual *0 -- including calls to nonreturning functions, setjmp/longjmp, things that could trap, etc. This case is similar. We can't back-propagate the non-null status through any statements with observable side effects. Jeff We can't back propagate, but we can alter our forward view. Any ssa-name defined before the observable side effect can be recalculated using the updated values, and all uses of those names after the side-effect would then appear to be "up-to-date" This does not actually change anything before the side-effect statement, but the lazy re-evalaution ranger employs makes it appear as if we do a new computation when _1 is used afterwards. ie: _1 = ptr_2(D) == 0; // unrelated code in the same bb _3 = *ptr_2(D); _4 = ptr_2(D) == 0; // ptr_2 is known to be [+1, +INF] now. And we use _4 everywhere _1 was used. This is the effect. so we do not actually change anything in the unrelated code, just observable effects afterwards. We already do these recalculations on outgoing edges in other blocks, just not within the definition block because non-null wasn't visible within the def block. Additionally, In the testcase, there is a store to C before the side effects. these patches get rid of the branch and thus the call in the testcase as requested, but we still have to compute _3 in order to store it into global C since it occurs pre side-effect. b.0_1 = b; _2 = b.0_1 == 0B; _3 = (int) _2; c = _3; _5 = *b.0_1; No matter how you look at it, you are going to need to process a block twice in order to handle any code pre-side-effect. Whether it be assigning stmt uids, or what have you. VRP could pre-process the block, and if it gets to the end of the block, and it had at least one statement with a side effect and no calls which may not return you could process the block with all the side effects already active. I'm not sure if that buys as much as the cost, but it would change the value written to C to be 1, and it would change the global values exported for _2 and _3. Another option would be flag the ssa-names instead of/as well as marking them as stale. If we get to the end of the block and there were no non-returning functions or EH edges, then re-calculate and export those ssa_names using the latest values.. That would export [0,0] for _2 and _3. This would have no tangible impact during the first VRP pass, but the *next* VRP pass, (or any other ranger pass) would pick up the new global ranges, and do all the right things... so we basically let a subsequent pass pick up the info and do the dirty work. Andrew
Re: [PATCH 0/3] rs6000: Move g++.dg powerpc tests to g++.target
On Tue, Feb 22, 2022 at 12:28:56PM -0600, Segher Boessenkool wrote: > On Mon, Feb 21, 2022 at 03:17:44PM -0600, Paul A. Clarke wrote: > > Some tests in g++.dg are target-specific for powerpc. Move those to > > g++.target/powerpc. Update the DejaGnu directives as needed, since > > the target restriction is perhaps no longer needed when residing in the > > target-specific powerpc subdirectory. > > Not "perhaps" :-) More specifically, powerpc.exp has > > # Exit immediately if this isn't a PowerPC target. > if {![istarget powerpc*-*-*] } then { > return > } > > so anything run from that driver does not have to test for powerpc > separately anymore. The context for "perhaps" is for cases like: // { dg-do compile { target powerpc*-*-darwin* } } and // { dg-do compile { target { powerpc*-*-linux* } } } where the target is still needed, albeit without the "powerpc" restriction itself. PC
[PATCH] c++: ->template and implicit typedef [PR104608]
Here we have a forward declaration of Parameter for which we create an implicit typedef, which is a TYPE_DECL. Then, when looking it up at template definition time, cp_parser_template_id gets (since r12-6754) this TYPE_DECL which it can't handle. This patch defers lookup for implicit typedefs, a la r12-6879. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? PR c++/104608 gcc/cp/ChangeLog: * parser.cc (cp_parser_template_name): Repeat lookup of implicit typedef. gcc/testsuite/ChangeLog: * g++.dg/parse/template-keyword3.C: New test. --- gcc/cp/parser.cc | 3 ++- gcc/testsuite/g++.dg/parse/template-keyword3.C | 12 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/parse/template-keyword3.C diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index 03d99aba13e..5e89e3737b0 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -18681,7 +18681,8 @@ cp_parser_template_name (cp_parser* parser, return error_mark_node; } else if ((!DECL_P (decl) && !is_overloaded_fn (decl)) - || TREE_CODE (decl) == USING_DECL) + || TREE_CODE (decl) == USING_DECL + || DECL_IMPLICIT_TYPEDEF_P (decl)) /* Repeat the lookup at instantiation time. */ decl = identifier; } diff --git a/gcc/testsuite/g++.dg/parse/template-keyword3.C b/gcc/testsuite/g++.dg/parse/template-keyword3.C new file mode 100644 index 000..59fe0fc180b --- /dev/null +++ b/gcc/testsuite/g++.dg/parse/template-keyword3.C @@ -0,0 +1,12 @@ +// PR c++/104608 + +class Parameter; +template class Function +: public R +{ +Function(); +}; +template +Function::Function() { +this->template Parameter(); +} base-commit: bc66b471d16ef2fd8cb66fd1131b41f80ecb9961 -- 2.35.1
Re: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.
On 2/22/22 17:08, Roger Sayle wrote: Hi Tom, I'll admit that I'd not myself considered the ABI issues when I initially proposed experimental HFmode support for the nvptx backend, and was surprised when I finally tracked down the source of the problem you'd reported: that libgcc spots HFmode support exists and immediately starts passing/returning values in this type. The one precedent that I can point to is that LLVM's nvptx backend passes HFmode values in SImode regs, see https://reviews.llvm.org/D28540 Interesting, thanks for the link. Their motivation is that not all PTX ISAs support fp16, so for compatibility with say sm_30/sm_35, fp16 values are treated like b16, i.e. HImode. At this point, the nvptx ABI states that HImode values are passed as SImode, so we end up with the interesting mismatch of HFmode<->SImode. Indeed, that sounds plausible. And IIUC, that also means that this leaves the door open for us to implement fp16 support for pre-sm_53 using b16 in a compatible way. Then I think the current solution is OK, thanks for digging this up. Thanks, -Tom I guess the same thing affects host code, where an i386/x86 host that doesn't support 16-bit floating point, can pass "unsigned short" values to and from the accelerator, and likewise this HImode locally gets passed in a wider (often WORD_MODE) integer types on most x86 ABIs. My guess is that passing SFmode in DImode may have been supported in older versions of GCC, before handling of SUBREGs was tightened up, so this might be considered a regression. Cheers, Roger -- -Original Message- From: Tom de Vries Sent: 22 February 2022 15:43 To: Roger Sayle ; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] middle-end: Support ABIs that pass FP values as wider integers. On 2/9/22 21:12, Roger Sayle wrote: This patch adds middle-end support for target ABIs that pass/return floating point values in integer registers with precision wider than the original FP mode. An example, is the nvptx backend where 16-bit HFmode registers are passed/returned as (promoted to) SImode registers. Unfortunately, this currently falls foul of the various (recent?) sanity checks that (very sensibly) prevent creating paradoxical SUBREGs of floating point registers. The approach below is to explicitly perform the conversion/promotion in two steps, via an integer mode of same precision as the floating point value. So on nvptx, 16-bit HFmode is initially converted to 16-bit HImode (using SUBREG), then zero-extended to SImode, and likewise when going the other way, parameters truncated to HImode then converted to HFmode (using SUBREG). These changes are localized to expand_value_return and expanding DECL_RTL to support strange ABIs, rather than inside convert_modes or gen_lowpart, as mismatched precision integer/FP conversions should be explicit in the RTL, and these semantics not generally visible/implicit in user code. Hi Roger, I cannot comment on the patch, but I do wonder (after your "strange ABI" comment): did we actively decide on (or align to) a register passing ABI for HFmode, or has it merely been decided by the implementation of promote_arg: ... static machine_mode promote_arg (machine_mode mode, bool prototyped) { if (!prototyped && mode == SFmode) /* K&R float promotion for unprototyped functions. */ mode = DFmode; else if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)) mode = SImode; return mode; } ... There may be a rationale why it's good to pass a HF as SI, but it's not documented there. Anyway, I checked what cuda does for HF, and it passes a byte array: ... .param .align 2 .b8 _Z5helloPj6__halfs_param_1[2], ... So, I guess what I'm saying is I'd like to understand why we're having the HF -> SI promotion. Thanks, - Tom
Re: [PATCH v4] Make `-Werror` optional in libatomic/libbacktrace/libgomp/libitm/libsanitizer
On Thu, Feb 3, 2022 at 6:07 AM David Seifert via Gcc-patches wrote: > > * `-Werror` can cause issues when a more recent version of GCC compiles > an older version: > - https://bugs.gentoo.org/229059 > - https://bugs.gentoo.org/475350 > - https://bugs.gentoo.org/667104 > > Bootstrapped/regtested x86_64-linux, tested without --disable-werror and > with ./configure --disable-werror, the latter removing -Werror as expected. > > libgo/ChangeLog: > > * libgo/configure.ac: Support --disable-werror. > * libgo/configure: Regenerate. Just a note that the libgo directory is copied from upstream sources, and should not be changed directly in the GCC repo. See libgo/README.gcc. I'll take care of this discrepancy. Thanks. Ian
RE: [PATCH] middle-end: Support ABIs that pass FP values as wider integers.
>> Anyway, I checked what cuda does for HF, and it passes a byte array: >>> .param .align 2 .b8 _Z5helloPj6__halfs_param_1[2], ... > > > > The one precedent that I can point to is that LLVM's nvptx backend passes > > HFmode values in SImode regs, see https://reviews.llvm.org/D28540 > > Interesting, thanks for the link. In theory, GCC could also support -mfloat-abi=nvcc and -mfloat-abi=llvm (much like other targets have -mfloat-abi=soft vs. -mfloat-abi=hard). At this point getting any ABI supporting HFmode would be an improvement. Roger --
libgo patch committed: make -Werror optional
I committed ths libgo patch to make -Werror optional. This patch is already in the GCC sources, where it was erroneously applied before the upstream patch. This is the upstream patch. Ian diff --git a/libgo/configure.ac b/libgo/configure.ac index 3cadc6d20..7e2b98ba6 100644 --- a/libgo/configure.ac +++ b/libgo/configure.ac @@ -62,11 +62,10 @@ AC_PROG_AWK WARN_FLAGS='-Wall -Wextra -Wwrite-strings -Wcast-qual' AC_SUBST(WARN_FLAGS) -AC_ARG_ENABLE(werror, [AS_HELP_STRING([--enable-werror], - [turns on -Werror @<:@default=yes@:>@])]) -if test "x$enable_werror" != "xno"; then - WERROR="-Werror" -fi +AC_ARG_ENABLE([werror], + [AS_HELP_STRING([--disable-werror], [disable building with -Werror])]) +AS_IF([test "x$enable_werror" != "xno" && test "x$GCC" = "xyes"], + [WERROR="-Werror"]) AC_SUBST(WERROR) glibgo_toolexecdir=no
libgo patch committed: Update README.gcc
I committed this libgo patch to update the README.gcc file. Ian 0f16f4ad82cb47bc444688822cc142d80192c284 diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE index 7455d01c179..424bbebfeed 100644 --- a/gcc/go/gofrontend/MERGE +++ b/gcc/go/gofrontend/MERGE @@ -1,4 +1,4 @@ -aee8eddbfc3ef1b03353a060e79e7d668fb229e2 +45fd14ab8baf5e86012a808426f8ef52c1d77943 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. diff --git a/libgo/README.gcc b/libgo/README.gcc index d5aabb0f9c2..3c56ec7be17 100644 --- a/libgo/README.gcc +++ b/libgo/README.gcc @@ -1,7 +1,6 @@ The files in this directory are mirrored from the gofrontend project -hosted at http://code.google.com/p/gofrontend. These files are the +hosted at https://go.googlesource.com/gofrontend/ and mirrored at +https://github.com/golang/gofrontend. These files are the ones in the libgo subdirectory of that project. -By default, the networking tests are not run. In order to run all the -libgo tests, you need to define the environment variable -GCCGO_RUN_ALL_TESTS to a non-empty string. +To change these files, see https://go.dev/doc/gccgo_contribute.
Re: [PATCH 2/3] rs6000: Move g++.dg powerpc PR tests to g++.target
On Mon, Feb 21, 2022 at 03:17:46PM -0600, Paul A. Clarke wrote: > Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is no > longer required. > > 2021-02-21 Paul A. Clarke > > gcc/testsuite > * g++.dg/pr65240.h: Move to g++.target/powerpc. > * g++.dg/pr93974.C: Likewise. > * g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg directives. > * g++.dg/pr65240-2.C: Likewise. > * g++.dg/pr65240-3.C: Likewise. > * g++.dg/pr65240-4.C: Likewise. > * g++.dg/pr65242.C: Likewise. > * g++.dg/pr67211.C: Likewise. > * g++.dg/pr69667.C: Likewise. > * g++.dg/pr71294.C: Likewise. > * g++.dg/pr84264.C: Likewise. > * g++.dg/pr84279.C: Likewise. > * g++.dg/pr85657.C: Likewise. Okay for trunk. Thanks! That said... > -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ > -/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > +/* { dg-do compile { target lp64 } } */ > +/* { dg-skip-if "" { *-*-darwin* } } */ That skip-if is most likely cargo cult, and it's not clear why lp64 would be needed either (there is no comment what it is needed for, for example). > --- a/gcc/testsuite/g++.dg/pr85657.C > +++ b/gcc/testsuite/g++.target/powerpc/pr85657.C > @@ -1,4 +1,4 @@ > -// { dg-do compile { target { powerpc*-*-linux* } } } > +// { dg-do compile { target { *-*-linux* } } } A comment here would help as well. All of that is pre-existing of course. Segher
Re: [PATCH 2/3] rs6000: Move g++.dg powerpc PR tests to g++.target
On Tue, Feb 22, 2022 at 06:41:45PM -0600, Segher Boessenkool wrote: > On Mon, Feb 21, 2022 at 03:17:46PM -0600, Paul A. Clarke wrote: > > Also adjust DejaGnu directives, as specifically requiring "powerpc*-*-*" is > > no > > longer required. > > > > 2021-02-21 Paul A. Clarke > > > > gcc/testsuite > > * g++.dg/pr65240.h: Move to g++.target/powerpc. > > * g++.dg/pr93974.C: Likewise. > > * g++.dg/pr65240-1.C: Move to g++.target/powerpc, adjust dg directives. > > * g++.dg/pr65240-2.C: Likewise. > > * g++.dg/pr65240-3.C: Likewise. > > * g++.dg/pr65240-4.C: Likewise. > > * g++.dg/pr65242.C: Likewise. > > * g++.dg/pr67211.C: Likewise. > > * g++.dg/pr69667.C: Likewise. > > * g++.dg/pr71294.C: Likewise. > > * g++.dg/pr84264.C: Likewise. > > * g++.dg/pr84279.C: Likewise. > > * g++.dg/pr85657.C: Likewise. > > Okay for trunk. Thanks! Thanks for the review! More below... > That said... > > > -/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */ > > -/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > > +/* { dg-do compile { target lp64 } } */ > > +/* { dg-skip-if "" { *-*-darwin* } } */ > > That skip-if is most likely cargo cult, and it's not clear why lp64 > would be needed either (there is no comment what it is needed for, for > example). I can't speak to darwin, nor have an easy way of testing on it. As for lp64, these tests fail on -m32 with: cc1plus: error: '-mcmodel' not supported in this configuration - g++.dg/pr65240-1.C - g++.dg/pr65240-2.C - g++.dg/pr65240-3.C '-mcmodel' is in the dg-options line for the above tests. The rest PASSed. Shall I remove the 'lp64' restriction for those that PASS? > > +++ b/gcc/testsuite/g++.target/powerpc/pr85657.C > > @@ -1,4 +1,4 @@ > > -// { dg-do compile { target { powerpc*-*-linux* } } } > > +// { dg-do compile { target { *-*-linux* } } } > > A comment here would help as well. All of that is pre-existing of > course. I'm not sure what such a comment would say. I suspect it was a testing issue (only tested on Linux), but I have similar limitations, so I'm also reluctant to enable the test for what would be untested (by me) platforms. PC
[PATCH][RFC] c++/96765: warn when casting Base* to Derived* in Base ctor/dtor
Hi! This patch aims to add a warning when casting "this" in a base class constructor to a derived class type. It works on the test cases provided, but I'm still running regression tests. However, I have a few doubts: 1. Am I missing out any cases? Right now, I'm identifying the casts by checking that TREE_CODE (expr) == NOP_EXPR && is_this_parameter (TREE_OPERAND (expr, 0)). It seems fine to me but perhaps there is a function that I can use to express this more concisely? 2. -Wcast-qual doesn't seem to be the right flag for this warning. However, I can't seem to find an appropriate flag. Maybe I should place it under -Wextra or -Wall? Appreciate any feedback on the aforementioned doubts or otherwise. Thanks, and have a great day! From 8a1f352f3db06faf264bc823387714a4a9e638b6 Mon Sep 17 00:00:00 2001 From: Zhao Wei Liew Date: Tue, 22 Feb 2022 16:03:17 +0800 Subject: [PATCH] c++: warn on Base* to Derived* cast in Base ctor/dtor [PR96765] Casting "this" in a base class constructor to a derived class type is undefined behaviour, but there is no warning when doing so. Add a warning for this. Signed-off-by: Zhao Wei Liew PR c++/96765 gcc/cp/ChangeLog: * typeck.cc (build_static_cast_1): Add a warning when casting Base * to Derived * in Base constructor and destructor. gcc/testsuite/ChangeLog: * g++.dg/warn/Wcast-qual3.C: New test. --- gcc/cp/typeck.cc| 8 ++ gcc/testsuite/g++.dg/warn/Wcast-qual3.C | 33 + 2 files changed, 41 insertions(+) create mode 100644 gcc/testsuite/g++.dg/warn/Wcast-qual3.C diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc index f796337f73c..bbc40b25547 100644 --- a/gcc/cp/typeck.cc +++ b/gcc/cp/typeck.cc @@ -8080,6 +8080,14 @@ build_static_cast_1 (location_t loc, tree type, tree expr, bool c_cast_p, { tree base; + if ((DECL_CONSTRUCTOR_P (current_function_decl) +|| DECL_DESTRUCTOR_P (current_function_decl)) + && TREE_CODE (expr) == NOP_EXPR + && is_this_parameter (TREE_OPERAND (expr, 0))) +warning_at(loc, OPT_Wcast_qual, + "invalid % from type %qT to type %qT before the latter is constructed", + intype, type); + if (processing_template_decl) return expr; diff --git a/gcc/testsuite/g++.dg/warn/Wcast-qual3.C b/gcc/testsuite/g++.dg/warn/Wcast-qual3.C new file mode 100644 index 000..8c44a23bd68 --- /dev/null +++ b/gcc/testsuite/g++.dg/warn/Wcast-qual3.C @@ -0,0 +1,33 @@ +// PR c++/96765 +// { dg-options "-Wcast-qual" } + +struct Derived; +struct Base { + Derived *x; + Derived *y; + Base(); + ~Base(); +}; + +struct Derived : Base {}; + +Base::Base() +: x(static_cast(this)), // { dg-warning "invalid 'static_cast'" } + y((Derived *)this) // { dg-warning "invalid 'static_cast'" } +{ + static_cast(this); // { dg-warning "invalid 'static_cast'" } + (Derived *)this; // { dg-warning "invalid 'static_cast'" } +} + +Base::~Base() { + static_cast(this); // { dg-warning "invalid 'static_cast'" } + (Derived *)this; // { dg-warning "invalid 'static_cast'" } +} + +struct Other { + Other() { +Base b; +static_cast(&b); +(Derived *)(&b); + } +}; -- 2.35.1
Re: [PATCH] Check if loading const from mem is faster
On 2022-02-23 01:30, Segher Boessenkool wrote: Hi Jiu Fu, On Tue, Feb 22, 2022 at 02:53:13PM +0800, Jiufu Guo wrote: static bool rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x) { - if (GET_CODE (x) == HIGH - && GET_CODE (XEXP (x, 0)) == UNSPEC) + if (GET_CODE (x) == HIGH) return true; Hi Segher, This isn't explained anywhere. "Update" is not enough ;-) Thanks! I will add explanations for it. This excludes all 'HIGH' for 'x' code, like function "rs6000_emit_move" also check if the code is 'HIGH'. And on P10, I also encounter this kind of case like: (high:DI (symbol_ref:DI ("var_1") [flags 0xc0] var_1>)) Which fail to store into .rodata. CSE is the pass that is most ancient and still causing problems left and right. It should be rewritten sooner rather than later. The problem with that is that the pass does so much more than just CSE, and we don't want to lose all those other things. So it will be a slow arduous affair of peeling off bits into separate passes, I think :-( Yes, it does a lot of work. One of the additional works is checking 'folding out constants and putting constant in memory'. BR, Jiufu Doing actual CSE without all the restrictive restrictions our pass has historically had isn't the hard part! Segher
Re: [PATCH] wwwdocs: Document ShadowCallStack support
On Tue, 22 Feb 2022, Richard Sandiford wrote: > Gah, thanks. Clearly one of those days :-( Looks good to me, thanks. Gerald
Re: [PATCH 1/2] tree-optimization/104530 - Export global ranges during the VRP block walk.
On Tue, Feb 22, 2022 at 5:42 PM Andrew MacLeod via Gcc-patches wrote: > > Ranger currently waits until the end of the VRP pass, then calls > export_global_ranges (). > > This method walks the list of ssa-names looking for names which it > thinks should have SSA_NAME_RANGE_INFO updated, and is an artifact of > the on-demand mechanism where there isn't an obvious time to finalize a > name. > > The changes for 104288 introduced the register_side_effects method and > do provide a final place where stmt's are processed during the DOMWALK. > > This patch exports the global range calculated by the statement (before > processing side effects), and avoids the need for calling the export > method. This is generally better all round I think. > > Bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running to > ensure... > > OK for trunk? or defer to stage 1? I'm getting a bit nervous so lets defer to stage 1 unless a P1 fix requires this. Thanks, Richard. > > Andrew
Re: [PATCH 2/2] tree-optimization/104530 - Mark defs dependent on non-null stale.
On Tue, Feb 22, 2022 at 5:42 PM Andrew MacLeod via Gcc-patches wrote: > > This patch simply leverages the existing computation machinery to > re-evaluate values dependent on a newly found non-null value > > Ranger associates a monotonically increasing temporal value with every > def as it is defined. When that value is used, we check if any of the > values used in the definition have been updated, making the current > cached global value stale. This makes the evaluation lazy, if there are > no more uses, we will never re-evaluate. > > When an ssa-name is marked non-null it does not change the global value, > and thus will not invalidate any global values. This patch marks any > definitions in the block which are dependent on the non-null value as > stale. This will cause them to be re-evaluated when they are next used. > > Imports: b.0_1 d.3_7 > Exports: b.0_1 _2 _3 d.3_7 _8 > _2 : b.0_1(I) > _3 : b.0_1(I) _2 > _8 : b.0_1(I) _2 _3 d.3_7(I) > > b.0_1 = b; > _2 = b.0_1 == 0B; > _3 = (int) _2; > c = _3; > _5 = *b.0_1;<<-- from this point b.0_1 is [+1, +INF] > a = _5; > d.3_7 = d; > _8 = _3 % d.3_7; > if (_8 != 0) > > when _5 is defined, and n.0_1 becomes non-null, we mark the dependent > names that are exports and defined in this block as stale. so _2, _3 > and _8. > > When _8 is being calculated, _3 is stale, and causes it to be > recomputed. it is dependent on _2, alsdo stale, so it is also > recomputed, and we end up with > >_2 == [0, 0] >_3 == [0 ,0] > and _8 = [0, 0] > And then we can fold away the condition. > > The side effect is that _2 and _3 are globally changed to be [0, 0], but > this is OK because it is the definition block, so it dominates all other > uses of these names, and they should be [0,0] upon exit anyway. The > previous patch ensure that the global values written to > SSA_NAME_RANGE_INFO is the correct [0,1] for both _2 and _3. > > The patch would have been even smaller if I already had a mark_stale > method. I thought there was one, but I guess it never made it in from > lack of need at the time. The only other tweak was to make the value > stale if the dependent value was the same as the definitions. > > This bootstraps on x86_64-pc-linux-gnu with no regressions. Re-running > to ensure. @@ -1475,6 +1488,15 @@ ranger_cache::update_to_nonnull (basic_block bb, tree name) { r.set_nonzero (type); m_on_entry.set_bb_range (name, bb, r); + // Mark consumers of name stale so they can be recomputed. + if (m_gori.is_import_p (name, bb) || m_gori.is_export_p (name, bb)) + { + tree x; + FOR_EACH_GORI_EXPORT_NAME (m_gori, bb, x) + if (m_gori.in_chain_p (name, x) + && gimple_bb (SSA_NAME_DEF_STMT (x)) == bb) + m_temporal->set_stale (x); + } } so if we have a BB that exports N names and each of those is updated to nonnull this is going to be quadratic? It also looks like the gimple_bb check is cheaper than the bitmap test done in in_chain_p. What comes to my mind is why we need to mark "consumers"? Can't consumers check their uses defs when they look at their timestamp? This whole set_stale thing doesn't seem to be transitive anyway, consider: _1 = ... _2 = _1 + ..; _3 = _2 + ...; so when _1 is updated to non-null we mark _2 as stale but _3 should also be stale, no? When we visit _3 before eventually getting to _2 (to see whether it updates and thus we more precisely we know if it makes _3 stale) we won't re-evaluate it? That said, the change looks somewhat ad-hoc to get to 1-level deep second-level opportunities? Richard. > > OK for trunk? or defer to stage 1? > Andrew
Re: [PATCH][middle-end/104550]Suppress uninitialized warnings for new created uses from __builtin_clear_padding folding
On Tue, 22 Feb 2022, Qing Zhao wrote: > __builtin_clear_padding(&object) will clear all the padding bits of the > object. > actually, it doesn't involve any use of an user variable. Therefore, users do > not expect any uninitialized warning from it. It's reasonable to suppress > uninitialized warnings for all new created uses from __builtin_clear_padding > folding. > > The patch has been bootstrapped and regress tested on both x86 and aarch64. > > Okay for trunk? > > Thanks. > > Qing > > == > From cf6620005f55d4a1f782332809445c270d22cf86 Mon Sep 17 00:00:00 2001 > From: qing zhao > Date: Mon, 21 Feb 2022 16:38:31 + > Subject: [PATCH] Suppress uninitialized warnings for new created uses from > __builtin_clear_padding folding [PR104550] > > __builtin_clear_padding(&object) will clear all the padding bits of the > object. > actually, it doesn't involve any use of an user variable. Therefore, users do > not expect any uninitialized warning from it. It's reasonable to suppress > uninitialized warnings for all new created uses from __builtin_clear_padding > folding. > > PR middle-end/104550 > > gcc/ChangeLog: > > * gimple-fold.cc (clear_padding_flush): Suppress warnings for new > created uses. > (clear_padding_emit_loop): Likewise. > (clear_padding_type): Likewise. > (gimple_fold_builtin_clear_padding): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.dg/auto-init-pr104550-1.c: New test. > * gcc.dg/auto-init-pr104550-2.c: New test. > * gcc.dg/auto-init-pr104550-3.c: New test. > --- > gcc/gimple-fold.cc | 31 +++-- > gcc/testsuite/gcc.dg/auto-init-pr104550-1.c | 10 +++ > gcc/testsuite/gcc.dg/auto-init-pr104550-2.c | 11 > gcc/testsuite/gcc.dg/auto-init-pr104550-3.c | 11 > 4 files changed, 55 insertions(+), 8 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-1.c > create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-2.c > create mode 100644 gcc/testsuite/gcc.dg/auto-init-pr104550-3.c > > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc > index 16f02c2d098..1e18ba3465a 100644 > --- a/gcc/gimple-fold.cc > +++ b/gcc/gimple-fold.cc > @@ -4296,6 +4296,7 @@ clear_padding_flush (clear_padding_struct *buf, bool > full) >build_int_cst (buf->alias_type, > buf->off + padding_end > - padding_bytes)); > + suppress_warning (dst, OPT_Wuninitialized); > gimple *g = gimple_build_assign (dst, src); > gimple_set_location (g, buf->loc); > gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); > @@ -4341,6 +4342,7 @@ clear_padding_flush (clear_padding_struct *buf, bool > full) > tree dst = build2_loc (buf->loc, MEM_REF, atype, >buf->base, >build_int_cst (buf->alias_type, off)); > + suppress_warning (dst, OPT_Wuninitialized); > gimple *g = gimple_build_assign (dst, src); > gimple_set_location (g, buf->loc); > gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); > @@ -4370,6 +4372,7 @@ clear_padding_flush (clear_padding_struct *buf, bool > full) > atype = build_aligned_type (type, buf->align); > tree dst = build2_loc (buf->loc, MEM_REF, atype, buf->base, >build_int_cst (buf->alias_type, off)); > + suppress_warning (dst, OPT_Wuninitialized); > tree src; > gimple *g; > if (all_ones > @@ -4420,6 +4423,7 @@ clear_padding_flush (clear_padding_struct *buf, bool > full) >build_int_cst (buf->alias_type, > buf->off + end > - padding_bytes)); > + suppress_warning (dst, OPT_Wuninitialized); > gimple *g = gimple_build_assign (dst, src); > gimple_set_location (g, buf->loc); > gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); > @@ -4620,14 +4624,18 @@ clear_padding_emit_loop (clear_padding_struct *buf, > tree type, >gsi_insert_before (buf->gsi, g, GSI_SAME_STMT); >clear_padding_type (buf, type, buf->sz, for_auto_init); >clear_padding_flush (buf, true); > - g = gimple_build_assign (buf->base, POINTER_PLUS_EXPR, buf->base, > -size_int (buf->sz)); > + tree rhs = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (buf->base), > + buf->base, size_int (buf->sz)); > + suppress_warning (rhs, OPT_Wuninitialized); > + g = gimple_build_assign (buf->base, rhs); why do we need to suppress warnings on a POINTER_PLUS_EXPR? The use of fold_build2 here is a step backwards btw, I'm not sure whether suppress_warning is properly preserved here. If nee
Re: [PATCH 0/2] tree-optimization/104530 - proposed re-evaluation.
On Tue, Feb 22, 2022 at 8:19 PM Andrew MacLeod via Gcc-patches wrote: > > On 2/22/22 13:07, Jeff Law wrote: > > > > > > On 2/22/2022 10:57 AM, Jakub Jelinek via Gcc-patches wrote: > >> On Tue, Feb 22, 2022 at 12:39:28PM -0500, Andrew MacLeod wrote: > That is EH, then there are calls that might not return because they > leave > in some other way (e.g. longjmp), or might loop forever, might > exit, might > abort, trap etc. > >>> Generally speaking, calls which do not return should not now be a > >>> problem... > >>> as long as they do not transfer control to somewhere else in the > >>> current > >>> function. > >> I thought all of those cases are very relevant to PR104530. > >> If we have: > >>_1 = ptr_2(D) == 0; > >>// unrelated code in the same bb > >>_3 = *ptr_2(D); > >> then in light of PR104288, we can optimize ptr_2(D) == 0 into true > >> only if > >> there are no calls inside of "// unrelated code in the same bb" > >> or if all calls in "// unrelated code in the same bb" are guaranteed to > >> return exactly once. Because, if there is a call in there which could > >> exit (that is the PR104288 testcase), or abort, or trap, or loop > >> forever, > >> or throw externally, or longjmp or in any other non-UB way > >> cause the _1 = ptr_2(D) == 0; stmt to be invoked at runtime but > >> _3 = *ptr_2(D) not being invoked, then we can't optimize the earlier > >> comparison because ptr_2(D) could be NULL in a valid program. > >> While if there are no calls (and no problematic inline asms) and no > >> trapping > >> insns in between, we can and PR104530 is asking that we continue to > >> optimize > >> that. > > Right. This is similar to some of the restrictions we deal with in > > the path isolation pass. Essentially we have a path, when traversed, > > would result in a *0. We would like to be able to find the edge > > upon-which the *0 is control dependent and optimize the test so that > > it always went to the valid path rather than the *0 path. > > > > The problem is there may be observable side effects on the *0 path > > between the test and the actual *0 -- including calls to nonreturning > > functions, setjmp/longjmp, things that could trap, etc. This case is > > similar. We can't back-propagate the non-null status through any > > statements with observable side effects. > > > > Jeff > > > We can't back propagate, but we can alter our forward view. Any > ssa-name defined before the observable side effect can be recalculated > using the updated values, and all uses of those names after the > side-effect would then appear to be "up-to-date" > > This does not actually change anything before the side-effect statement, > but the lazy re-evalaution ranger employs makes it appear as if we do a > new computation when _1 is used afterwards. ie: > > _1 = ptr_2(D) == 0; > // unrelated code in the same bb > _3 = *ptr_2(D); > _4 = ptr_2(D) == 0; // ptr_2 is known to be [+1, +INF] now. > And we use _4 everywhere _1 was used. This is the effect. > > so we do not actually change anything in the unrelated code, just > observable effects afterwards. We already do these recalculations on > outgoing edges in other blocks, just not within the definition block > because non-null wasn't visible within the def block. > > Additionally, In the testcase, there is a store to C before the side > effects. > these patches get rid of the branch and thus the call in the testcase as > requested, but we still have to compute _3 in order to store it into > global C since it occurs pre side-effect. > > b.0_1 = b; > _2 = b.0_1 == 0B; > _3 = (int) _2; > c = _3; > _5 = *b.0_1; > > No matter how you look at it, you are going to need to process a block > twice in order to handle any code pre-side-effect. Whether it be > assigning stmt uids, or what have you. Yes. I thought that is what ranger already does when it discovers new ranges from edges. Say we have _1 = 10 / _2; if (_2 == 1) { _3 = _1 + 1; then when evaluating _1 + 1 we re-evaluate 10 / _2 using _2 == 1 and can compute _3 to [11, 11]? That obviously extends to any stmt-level ranges we discover for uses (not defs because defs are never used upthread). And doing that is _not_ affected by any function/BB terminating calls or EH or whatnot as long as the updated ranges are only affecting stmts dominating the current one. What complicates all this reasoning is that it is straight-forward when you work with a traditional IL walking pass but it gets hard (and possibly easy to get wrong) with on-demand processing and caching because everything you cache will now be context dependent (valid only starting after stmt X and for stmts dominated by it). > VRP could pre-process the block, and if it gets to the end of the block, > and it had at least one statement with a side effect and no calls which > may not return you could process the block with all the side effects > already active.