[gcc r15-1004] testsuite: analyzer: Skip tests with non-numeric macros on Solaris [PR107750]

2024-06-04 Thread Rainer Orth via Gcc-cvs
https://gcc.gnu.org/g:09ae36461ed34f343f2d8299bad7e394cccf996e

commit r15-1004-g09ae36461ed34f343f2d8299bad7e394cccf996e
Author: Rainer Orth 
Date:   Tue Jun 4 09:04:25 2024 +0200

testsuite: analyzer: Skip tests with non-numeric macros on Solaris 
[PR107750]

A couple of gcc.dg/analyzer/fd-*.c tests still FAIL on Solaris.  The
reason is always the same: they use macros that don't expand to simple
numbers, something which c/c-parser.cc
(ana::c_translation_unit::consider_macro) cannot handle:

* :

* :

To avoid the resulting noise, this patch skips the affected tests.

Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.

2024-06-03  Rainer Orth  

gcc/testsuite:
PR analyzer/107750
* gcc.dg/analyzer/fd-accept.c: Skip on *-*-solaris2*.
* gcc.dg/analyzer/fd-access-mode-target-headers.c: Likewise.
* gcc.dg/analyzer/fd-connect.c: Likewise.
* gcc.dg/analyzer/fd-datagram-socket.c: Likewise.
* gcc.dg/analyzer/fd-listen.c: Likewise.
* gcc.dg/analyzer/fd-socket-misuse.c: Likewise.
* gcc.dg/analyzer/fd-stream-socket-active-open.c: Likewise.
* gcc.dg/analyzer/fd-stream-socket-passive-open.c: Likewise.
* gcc.dg/analyzer/fd-stream-socket.c: Likewise.

Diff:
---
 gcc/testsuite/gcc.dg/analyzer/fd-accept.c | 1 +
 gcc/testsuite/gcc.dg/analyzer/fd-access-mode-target-headers.c | 1 +
 gcc/testsuite/gcc.dg/analyzer/fd-connect.c| 1 +
 gcc/testsuite/gcc.dg/analyzer/fd-datagram-socket.c| 1 +
 gcc/testsuite/gcc.dg/analyzer/fd-listen.c | 1 +
 gcc/testsuite/gcc.dg/analyzer/fd-socket-misuse.c  | 1 +
 gcc/testsuite/gcc.dg/analyzer/fd-stream-socket-active-open.c  | 1 +
 gcc/testsuite/gcc.dg/analyzer/fd-stream-socket-passive-open.c | 1 +
 gcc/testsuite/gcc.dg/analyzer/fd-stream-socket.c  | 1 +
 9 files changed, 9 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-accept.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-accept.c
index d07ab154d0f..5724a389e2e 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-accept.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-accept.c
@@ -1,5 +1,6 @@
 /* { dg-require-effective-target sockets } */
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-skip-if "PR analyzer/107750" { *-*-solaris2* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-access-mode-target-headers.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-access-mode-target-headers.c
index 9fc32638a3d..1386ac2de1e 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-access-mode-target-headers.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-access-mode-target-headers.c
@@ -1,4 +1,5 @@
 /* { dg-skip-if "" { { powerpc*-*-aix* avr-*-* *-*-vxworks* } || newlib } } */
+/* { dg-skip-if "PR analyzer/107750" { *-*-solaris2* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-connect.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-connect.c
index 43e435eaf12..3fe99d9530c 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-connect.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-connect.c
@@ -1,5 +1,6 @@
 /* { dg-require-effective-target sockets } */
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-skip-if "PR analyzer/107750" { *-*-solaris2* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-datagram-socket.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-datagram-socket.c
index 59e80c831e3..8d32e858111 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-datagram-socket.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-datagram-socket.c
@@ -1,5 +1,6 @@
 /* { dg-require-effective-target sockets } */
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-skip-if "PR analyzer/107750" { *-*-solaris2* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-listen.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-listen.c
index 3ac7a990042..1444af72e3a 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-listen.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-listen.c
@@ -1,5 +1,6 @@
 /* { dg-require-effective-target sockets } */
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-skip-if "PR analyzer/107750" { *-*-solaris2* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-socket-misuse.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-socket-misuse.c
index 914948644bb..8771c0cbe03 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-socket-misuse.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-socket-misuse.c
@@ -2,6 +2,7 @@
 
 /* { dg-require-effective-target sockets } */
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-skip-if "PR analyzer/107750" { *-*-solaris2* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-stream-socket-active-open.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-stream-socket-active-open.c
index b39dbf85c3d..e8b01dd2985 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-stream-socket-active-open.c
++

[gcc r14-10274] libstdc++: Build libbacktrace and 19_diagnostics/stacktrace with -funwind-tables [PR111641]

2024-06-04 Thread Rainer Orth via Gcc-cvs
https://gcc.gnu.org/g:d92b508dd19daffedfc0fb02e5bfa710f2c397b0

commit r14-10274-gd92b508dd19daffedfc0fb02e5bfa710f2c397b0
Author: Rainer Orth 
Date:   Wed May 29 10:08:07 2024 +0200

libstdc++: Build libbacktrace and 19_diagnostics/stacktrace with 
-funwind-tables [PR111641]

Several of the 19_diagnostics/stacktrace tests FAIL on Solaris/SPARC (32
and 64-bit), Solaris/x86 (32-bit only), and several other targets:

FAIL: 19_diagnostics/stacktrace/current.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/current.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/entry.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/entry.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/stacktrace.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/stacktrace.cc  -std=gnu++26 execution test

As it turns out, both the copy of libbacktrace in libstdc++ and the
testcases proper need to compiled with -funwind-tables, as is done for
libbacktrace itself.

This isn't an issue on Linux/x86_64 and Solaris/amd64 since 64-bit x86
always defaults to -funwind-tables.  32-bit x86 does, too, when
-fomit-frame-pointer is enabled as on Linux/i686, but unlike
Solaris/i386.

So this patch always enables the option both for the libbacktrace copy
and the testcases.

Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.

2024-05-23  Rainer Orth  

libstdc++-v3:
PR libstdc++/111641
* src/libbacktrace/Makefile.am (AM_CFLAGS): Add -funwind-tables.
* src/libbacktrace/Makefile.in: Regenerate.

* testsuite/19_diagnostics/stacktrace/current.cc (dg-options): Add
-funwind-tables.
* testsuite/19_diagnostics/stacktrace/entry.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/hash.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/output.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/stacktrace.cc: Likewise.

(cherry picked from commit a99ebb88f8f25e76ebed5afc22e64fa77a2f0d3f)

Diff:
---
 libstdc++-v3/src/libbacktrace/Makefile.am  | 2 +-
 libstdc++-v3/src/libbacktrace/Makefile.in  | 2 +-
 libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc| 2 +-
 libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc  | 2 +-
 libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc   | 2 +-
 libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc | 2 +-
 libstdc++-v3/testsuite/19_diagnostics/stacktrace/stacktrace.cc | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/src/libbacktrace/Makefile.am 
b/libstdc++-v3/src/libbacktrace/Makefile.am
index a2e78671259..82205db46de 100644
--- a/libstdc++-v3/src/libbacktrace/Makefile.am
+++ b/libstdc++-v3/src/libbacktrace/Makefile.am
@@ -51,7 +51,7 @@ C_WARN_FLAGS = $(WARN_FLAGS) -Wstrict-prototypes 
-Wmissing-prototypes -Wold-styl
 CXX_WARN_FLAGS = $(WARN_FLAGS) -Wno-unused-parameter
 AM_CFLAGS = \
$(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
-   $(C_WARN_FLAGS)
+   $(C_WARN_FLAGS) -funwind-tables
 AM_CFLAGS += $(EXTRA_CFLAGS)
 AM_CXXFLAGS = \
$(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
diff --git a/libstdc++-v3/src/libbacktrace/Makefile.in 
b/libstdc++-v3/src/libbacktrace/Makefile.in
index b5713b0c616..51c8092335a 100644
--- a/libstdc++-v3/src/libbacktrace/Makefile.in
+++ b/libstdc++-v3/src/libbacktrace/Makefile.in
@@ -473,7 +473,7 @@ libstdc___libbacktrace_la_CPPFLAGS = \
 C_WARN_FLAGS = $(WARN_FLAGS) -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition -Wno-unused-but-set-variable
 CXX_WARN_FLAGS = $(WARN_FLAGS) -Wno-unused-parameter
 AM_CFLAGS = $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
-   $(C_WARN_FLAGS) $(EXTRA_CFLAGS)
+   $(C_WARN_FLAGS) -funwind-tables $(EXTRA_CFLAGS)
 AM_CXXFLAGS = $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
$(CXX_WARN_FLAGS) -fno-rtti -fno-exceptions $(EXTRA_CXXFLAGS)
 obj_prefix = std_stacktrace
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
index b1af5f74fb2..cdebd5f1daa 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
@@ -1,4 +1,4 @@
-// { dg-options "-lstdc++exp" }
+// { dg-options "-funwind-tables -lstdc++exp" }
 // { dg-do run { target c++23 } }
 // { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc
i

[gcc r14-10275] testsuite: gm2: Remove timeout overrides [PR114886]

2024-06-04 Thread Rainer Orth via Gcc-cvs
https://gcc.gnu.org/g:e80523288c9967a5fa6d6e27609cc4b1f1aef8d4

commit r14-10275-ge80523288c9967a5fa6d6e27609cc4b1f1aef8d4
Author: Rainer Orth 
Date:   Tue Apr 30 13:49:28 2024 +0200

testsuite: gm2: Remove timeout overrides [PR114886]

A large number of gm2 tests are timing out even on current Solaris/SPARC
systems.  As detailed in the PR, the problem is that the gm2 testsuite
artificially lowers many timeouts way below the DejaGnu default of 300
seconds, often as short as 10 seconds.  The problem lies both in the
values (they may be appropriate for some targets, but too low for
others, especially under high load) and the fact that it uses absolute
values, overriding e.g. settings from a build-wide site.exp.

Therefore this patch removes all those overrides, restoring the
defaults.

Tested on sparc-sun-solaris2.11 (where all the previous timeouts are
gone) and i386-pc-solaris2.11.

2024-04-29  Rainer Orth  

gcc/testsuite:
PR modula2/114886
* lib/gm2.exp: Don't load timeout-dg.exp.
Don't set gm2_previous_timeout.
Don't call dg-timeout.
(gm2_push_timeout, gm2_pop_timeout): Remove.
(gm2_init): Don't call dg-timeout.
* lib/gm2-torture.exp: Don't load timeout-dg.exp.
Don't set gm2_previous_timeout.
Don't call dg-timeout.
(gm2_push_timeout, gm2_pop_timeout): Remove.

* gm2/coroutines/pim/run/pass/coroutines-pim-run-pass.exp: Don't
load timeout-dg.exp.
Don't call gm2_push_timeout, gm2_pop_timeout.
* gm2/examples/map/pass/examples-map-pass.exp: Don't call
gm2_push_timeout, gm2_pop_timeout.
* gm2/iso/run/pass/iso-run-pass.exp: Don't load timeout-dg.exp.
Don't call gm2_push_timeout, gm2_pop_timeout.
* gm2/pimlib/base/run/pass/pimlib-base-run-pass.exp: Don't load
timeout-dg.exp.
Don't call gm2_push_timeout, gm2_pop_timeout.
* gm2/projects/iso/run/pass/halma/projects-iso-run-pass-halma.exp:
Don't call gm2_push_timeout, gm2_pop_timeout.
* 
gm2/switches/whole-program/pass/run/switches-whole-program-pass-run.exp:
Don't load timeout-dg.exp.
Don't call gm2_push_timeout, gm2_pop_timeout.

(cherry picked from commit aff63ac11099d100b6891f3bcc3dc6cbc4fad654)

Diff:
---
 .../pim/run/pass/coroutines-pim-run-pass.exp   |  7 -
 .../gm2/examples/map/pass/examples-map-pass.exp|  5 
 gcc/testsuite/gm2/iso/run/pass/iso-run-pass.exp|  6 
 .../pimlib/base/run/pass/pimlib-base-run-pass.exp  |  6 
 .../run/pass/halma/projects-iso-run-pass-halma.exp |  7 -
 .../pass/run/switches-whole-program-pass-run.exp   |  4 ---
 gcc/testsuite/lib/gm2-torture.exp  | 28 --
 gcc/testsuite/lib/gm2.exp  | 34 --
 8 files changed, 97 deletions(-)

diff --git 
a/gcc/testsuite/gm2/coroutines/pim/run/pass/coroutines-pim-run-pass.exp 
b/gcc/testsuite/gm2/coroutines/pim/run/pass/coroutines-pim-run-pass.exp
index 6b3a8ebefe2..db2ba6314c1 100644
--- a/gcc/testsuite/gm2/coroutines/pim/run/pass/coroutines-pim-run-pass.exp
+++ b/gcc/testsuite/gm2/coroutines/pim/run/pass/coroutines-pim-run-pass.exp
@@ -24,16 +24,11 @@ if $tracelevel then {
 
 # load support procs
 load_lib gm2-torture.exp
-load_lib timeout-dg.exp
 
 set gm2src ${srcdir}/../gm2
 
 gm2_init_cor ""
 
-# We should be able to compile, link or run in 20 seconds.
-gm2_push_timeout 20
-
-
 foreach testcase [lsort [glob -nocomplain $srcdir/$subdir/*.mod]] {
 # If we're only testing specific files and this isn't one of them, skip it.
 if ![runtest_file_p $runtests $testcase] then {
@@ -42,5 +37,3 @@ foreach testcase [lsort [glob -nocomplain 
$srcdir/$subdir/*.mod]] {
 
 gm2-torture-execute $testcase "" "pass"
 }
-
-gm2_pop_timeout
diff --git a/gcc/testsuite/gm2/examples/map/pass/examples-map-pass.exp 
b/gcc/testsuite/gm2/examples/map/pass/examples-map-pass.exp
index 432518d7133..fabcf96f6d1 100644
--- a/gcc/testsuite/gm2/examples/map/pass/examples-map-pass.exp
+++ b/gcc/testsuite/gm2/examples/map/pass/examples-map-pass.exp
@@ -27,9 +27,6 @@ load_lib gm2-torture.exp
 
 gm2_init_pim "${srcdir}/${subdir}"
 
-# We should be able to compile, link or run in 30 seconds.
-gm2_push_timeout 30
-
 foreach testcase [lsort [glob -nocomplain $srcdir/$subdir/*.mod]] {
 # If we're only testing specific files and this isn't one of them, skip it.
 if ![runtest_file_p $runtests $testcase] then {
@@ -38,5 +35,3 @@ foreach testcase [lsort [glob -nocomplain 
$srcdir/$subdir/*.mod]] {
 
 gm2-torture $testcase
 }
-
-gm2_pop_timeout
diff --git a/gcc/testsuite/gm2/iso/run/pass/iso-run-pass.exp 
b/gcc/testsuite/gm2/iso/run/pass/iso-run-pass.exp
index af387e54b24..2c79b69ab6a 100644
--- a/gcc/testsuite/gm2/iso/run/pass

[gcc r12-10488] vect: Fix access size alignment assumption [PR115192]

2024-06-04 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:f510e59db482456160b8a63dc083c78b0c1f6c09

commit r12-10488-gf510e59db482456160b8a63dc083c78b0c1f6c09
Author: Richard Sandiford 
Date:   Tue Jun 4 08:47:47 2024 +0100

vect: Fix access size alignment assumption [PR115192]

create_intersect_range_checks checks whether two access ranges
a and b are alias-free using something equivalent to:

  end_a <= start_b || end_b <= start_a

It has two ways of doing this: a "vanilla" way that calculates
the exact exclusive end pointers, and another way that uses the
last inclusive aligned pointers (and changes the comparisons
accordingly).  The comment for the latter is:

  /* Calculate the minimum alignment shared by all four pointers,
 then arrange for this alignment to be subtracted from the
 exclusive maximum values to get inclusive maximum values.
 This "- min_align" is cumulative with a "+ access_size"
 in the calculation of the maximum values.  In the best
 (and common) case, the two cancel each other out, leaving
 us with an inclusive bound based only on seg_len.  In the
 worst case we're simply adding a smaller number than before.

The problem is that the associated code implicitly assumed that the
access size was a multiple of the pointer alignment, and so the
alignment could be carried over to the exclusive end pointer.

The testcase started failing after g:9fa5b473b5b8e289b6542
because that commit improved the alignment information for
the accesses.

gcc/
PR tree-optimization/115192
* tree-data-ref.cc (create_intersect_range_checks): Take the
alignment of the access sizes into account.

gcc/testsuite/
PR tree-optimization/115192
* gcc.dg/vect/pr115192.c: New test.

(cherry picked from commit a0fe4fb1c8d7804515845dd5d2a814b3c7a1ccba)

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115192.c | 28 
 gcc/tree-data-ref.cc |  5 -
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115192.c 
b/gcc/testsuite/gcc.dg/vect/pr115192.c
new file mode 100644
index 000..923d377c1bb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115192.c
@@ -0,0 +1,28 @@
+#include "tree-vect.h"
+
+int data[4 * 16 * 16] __attribute__((aligned(16)));
+
+__attribute__((noipa)) void
+foo (__SIZE_TYPE__ n)
+{
+  for (__SIZE_TYPE__ i = 1; i < n; ++i)
+{
+  data[i * n * 4] = data[(i - 1) * n * 4] + 1;
+  data[i * n * 4 + 1] = data[(i - 1) * n * 4 + 1] + 2;
+}
+}
+
+int
+main ()
+{
+  check_vect ();
+
+  data[0] = 10;
+  data[1] = 20;
+
+  foo (3);
+
+  if (data[24] != 12 || data[25] != 24)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index 0df4a3525f4..706a49f226e 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 
 */
 
+#define INCLUDE_ALGORITHM
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -2627,7 +2628,9 @@ create_intersect_range_checks (class loop *loop, tree 
*cond_expr,
 Because the maximum values are inclusive, there is an alias
 if the maximum value of one segment is equal to the minimum
 value of the other.  */
-  min_align = MIN (dr_a.align, dr_b.align);
+  min_align = std::min (dr_a.align, dr_b.align);
+  min_align = std::min (min_align, known_alignment (dr_a.access_size));
+  min_align = std::min (min_align, known_alignment (dr_b.access_size));
   cmp_code = LT_EXPR;
 }


[gcc r12-10489] vect: Tighten vect_determine_precisions_from_range [PR113281]

2024-06-04 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:dfaa13455d67646805bc611aa4373728a460a37d

commit r12-10489-gdfaa13455d67646805bc611aa4373728a460a37d
Author: Richard Sandiford 
Date:   Tue Jun 4 08:47:48 2024 +0100

vect: Tighten vect_determine_precisions_from_range [PR113281]

This was another PR caused by the way that
vect_determine_precisions_from_range handles shifts.  We tried to
narrow 32768 >> x to a 16-bit shift based on range information for
the inputs and outputs, with vect_recog_over_widening_pattern
(after PR110828) adjusting the shift amount.  But this doesn't
work for the case where x is in [16, 31], since then 32-bit
32768 >> x is a well-defined zero, whereas no well-defined
16-bit 32768 >> y will produce 0.

We could perhaps generate x < 16 ? 32768 >> x : 0 instead,
but since vect_determine_precisions_from_range was never really
supposed to rely on fix-ups, it seems better to fix that instead.

The patch also makes the code more selective about which codes
can be narrowed based on input and output ranges.  This showed
that vect_truncatable_operation_p was missing cases for
BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR
(equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1).

pr113281-1.c is the original testcase.  pr113281-[23].c failed
before the patch due to overly optimistic narrowing.  pr113281-[45].c
previously passed and are meant to protect against accidental
optimisation regressions.

gcc/
PR target/113281
* tree-vect-patterns.cc (vect_recog_over_widening_pattern): Remove
workaround for right shifts.
(vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR.
(vect_determine_precisions_from_range): Be more selective about
which codes can be narrowed based on their input and output ranges.
For shifts, require at least one more bit of precision than the
maximum shift amount.

gcc/testsuite/
PR target/113281
* gcc.dg/vect/pr113281-1.c: New test.
* gcc.dg/vect/pr113281-2.c: Likewise.
* gcc.dg/vect/pr113281-3.c: Likewise.
* gcc.dg/vect/pr113281-4.c: Likewise.
* gcc.dg/vect/pr113281-5.c: Likewise.

(cherry picked from commit 1a8261e047f7a2c2b0afb95716f7615cba718cd1)

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr113281-1.c |  17 ++
 gcc/testsuite/gcc.dg/vect/pr113281-2.c |  50 +++
 gcc/testsuite/gcc.dg/vect/pr113281-3.c |  39 
 gcc/testsuite/gcc.dg/vect/pr113281-4.c |  55 +
 gcc/testsuite/gcc.dg/vect/pr113281-5.c |  66 
 gcc/tree-vect-patterns.cc  | 107 -
 6 files changed, 305 insertions(+), 29 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-1.c 
b/gcc/testsuite/gcc.dg/vect/pr113281-1.c
new file mode 100644
index 000..6df4231cb5f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr113281-1.c
@@ -0,0 +1,17 @@
+#include "tree-vect.h"
+
+unsigned char a;
+
+int main() {
+  check_vect ();
+
+  short b = a = 0;
+  for (; a != 19; a++)
+if (a)
+  b = 32872 >> a;
+
+  if (b == 0)
+return 0;
+  else
+return 1;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-2.c 
b/gcc/testsuite/gcc.dg/vect/pr113281-2.c
new file mode 100644
index 000..3a1170c28b6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr113281-2.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+
+#define N 128
+
+short x[N];
+short y[N];
+
+void
+f1 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= y[i];
+}
+
+void
+f2 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 32 ? y[i] : 32);
+}
+
+void
+f3 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 31 ? y[i] : 31);
+}
+
+void
+f4 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] & 31);
+}
+
+void
+f5 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= 0x8000 >> y[i];
+}
+
+void
+f6 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= 0x8000 >> (y[i] & 31);
+}
+
+/* { dg-final { scan-tree-dump-not {can narrow[^\n]+>>} "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-3.c 
b/gcc/testsuite/gcc.dg/vect/pr113281-3.c
new file mode 100644
index 000..5982dd2d16f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr113281-3.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+
+#define N 128
+
+short x[N];
+short y[N];
+
+void
+f1 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 30 ? y[i] : 30);
+}
+
+void
+f2 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= ((y[i] & 15) + 2);
+}
+
+void
+f3 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 16 ? y[i] : 16);
+}
+
+void
+f4 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] = 32768 >> ((y[i] & 15) + 3);
+}
+
+/* { dg-final { scan-tree-dump {can narrow to signed:31 without loss [^\n]+>>} 
"vect" } } */
+/* { dg-final { scan-tree-dump {can n

[gcc r12-10490] tree-optimization/113910 - huge compile time during PTA

2024-06-04 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:db0f236aa1c30f703ff564960bd9f3dbd747ea7b

commit r12-10490-gdb0f236aa1c30f703ff564960bd9f3dbd747ea7b
Author: Richard Biener 
Date:   Wed Feb 14 12:33:13 2024 +0100

tree-optimization/113910 - huge compile time during PTA

For the testcase in PR113910 we spend a lot of time in PTA comparing
bitmaps for looking up equivalence class members.  This points to
the very weak bitmap_hash function which effectively hashes set
and a subset of not set bits.

The major problem with it is that it simply truncates the
BITMAP_WORD sized intermediate hash to hashval_t which is
unsigned int, effectively not hashing half of the bits.

This reduces the compile-time for the testcase from tens of minutes
to 42 seconds and PTA time from 99% to 46%.

PR tree-optimization/113910
* bitmap.cc (bitmap_hash): Mix the full element "hash" to
the hashval_t hash.

(cherry picked from commit ad7a365aaccecd23ea287c7faaab9c7bd50b944a)

Diff:
---
 gcc/bitmap.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
index 88c329f9325..601c04e2e13 100644
--- a/gcc/bitmap.cc
+++ b/gcc/bitmap.cc
@@ -2673,7 +2673,7 @@ bitmap_hash (const_bitmap head)
   for (ix = 0; ix != BITMAP_ELEMENT_WORDS; ix++)
hash ^= ptr->bits[ix];
 }
-  return (hashval_t)hash;
+  return iterative_hash (&hash, sizeof (hash), 0);
 }


[gcc r12-10491] tree-optimization/110381 - preserve SLP permutation with in-order reductions

2024-06-04 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:8f6d889a8e609710ecfd555778fbff602b2c7d74

commit r12-10491-g8f6d889a8e609710ecfd555778fbff602b2c7d74
Author: Richard Biener 
Date:   Mon Jun 26 12:51:37 2023 +0200

tree-optimization/110381 - preserve SLP permutation with in-order reductions

The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements.  But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.

Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.

PR tree-optimization/110381
* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
Materialize permutes before fold-left reductions.

* gcc.dg/vect/pr110381.c: New testcase.

(cherry picked from commit 53d6f57c1b20c6da52aefce737fb7d5263686ba3)

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr110381.c | 44 
 gcc/tree-vect-slp.cc | 19 +---
 2 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr110381.c 
b/gcc/testsuite/gcc.dg/vect/pr110381.c
new file mode 100644
index 000..278f4426c29
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr110381.c
@@ -0,0 +1,44 @@
+/* { dg-require-effective-target vect_float_strict } */
+
+#include "tree-vect.h"
+
+struct FOO {
+   double a;
+   double b;
+   double c;
+};
+
+double __attribute__((noipa))
+sum_8_foos(const struct FOO* foos)
+{
+  double sum = 0;
+
+  for (int i = 0; i < 8; ++i)
+{
+  struct FOO foo = foos[i];
+
+  /* Need to use an in-order reduction here, preserving
+ the load permutation.  */
+  sum += foo.a;
+  sum += foo.c;
+  sum += foo.b;
+}
+
+  return sum;
+}
+
+int main()
+{
+  struct FOO foos[8];
+
+  check_vect ();
+
+  __builtin_memset (foos, 0, sizeof (foos));
+  foos[0].a = __DBL_MAX__;
+  foos[0].b = 5;
+  foos[0].c = -__DBL_MAX__;
+
+  if (sum_8_foos (foos) != 5)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 54e6a9e4224..19cab93761c 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3733,9 +3733,8 @@ vect_optimize_slp (vec_info *vinfo)
   vertices[idx].perm_out = perms.length () - 1;
 }
 
-  /* In addition to the above we have to mark outgoing permutes facing
- non-reduction graph entries that are not represented as to be
- materialized.  */
+  /* We have to mark outgoing permutations facing non-associating-reduction
+ graph entries that are not represented as to be materialized.  */
   for (slp_instance instance : vinfo->slp_instances)
 if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor)
   {
@@ -3744,6 +3743,20 @@ vect_optimize_slp (vec_info *vinfo)
vertices[SLP_INSTANCE_TREE (instance)->vertex].perm_in = 0;
vertices[SLP_INSTANCE_TREE (instance)->vertex].perm_out = 0;
   }
+else if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_reduc_chain)
+  {
+   stmt_vec_info stmt_info
+ = SLP_TREE_REPRESENTATIVE (SLP_INSTANCE_TREE (instance));
+   stmt_vec_info reduc_info = info_for_reduction (vinfo, stmt_info);
+   if (needs_fold_left_reduction_p (TREE_TYPE
+  (gimple_get_lhs (stmt_info->stmt)),
+STMT_VINFO_REDUC_CODE (reduc_info)))
+ {
+   unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
+   vertices[node_i].perm_in = 0;
+   vertices[node_i].perm_out = 0;
+ }
+  }
 
   /* Propagate permutes along the graph and compute materialization points.  */
   bool changed;


[gcc r12-10492] middle-end/112732 - stray TYPE_ALIAS_SET in type variant

2024-06-04 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:b46486ef0316240eb3c173bda062b52333507e03

commit r12-10492-gb46486ef0316240eb3c173bda062b52333507e03
Author: Richard Biener 
Date:   Tue Nov 28 12:36:21 2023 +0100

middle-end/112732 - stray TYPE_ALIAS_SET in type variant

The following fixes a stray TYPE_ALIAS_SET in a type variant built
by build_opaque_vector_type which is diagnosed by type checking
enabled with -flto.

PR middle-end/112732
* tree.cc (build_opaque_vector_type): Reset TYPE_ALIAS_SET
of the newly built type.

(cherry picked from commit f26d68d5d128c86faaceeb81b1e8f22254ad53df)

Diff:
---
 gcc/tree.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree.cc b/gcc/tree.cc
index ead4c1421cd..6b28eb9f10d 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -10124,6 +10124,8 @@ build_opaque_vector_type (tree innertype, poly_int64 
nunits)
   TYPE_NEXT_VARIANT (cand) = TYPE_NEXT_VARIANT (t);
   TYPE_NEXT_VARIANT (t) = cand;
   TYPE_MAIN_VARIANT (cand) = TYPE_MAIN_VARIANT (t);
+  /* Type variants have no alias set defined.  */
+  TYPE_ALIAS_SET (cand) = -1;
   return cand;
 }


[gcc r12-10493] c++: Add testcase for this PR [PR97990]

2024-06-04 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:c7627054b9ee2ded8a22340a6a09bf9786afcafa

commit r12-10493-gc7627054b9ee2ded8a22340a6a09bf9786afcafa
Author: Andrew Pinski 
Date:   Fri Feb 16 10:55:43 2024 -0800

c++: Add testcase for this PR [PR97990]

This testcase was fixed by r14-5934-gf26d68d5d128c8 but we should add
one to make sure it does not regress again.

Committed as obvious after a quick test on the testcase.

PR c++/97990

gcc/testsuite/ChangeLog:

* g++.dg/torture/vector-struct-1.C: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 5f1438db419c9eb8901d1d1d7f98fb69082aec8e)

Diff:
---
 gcc/testsuite/g++.dg/torture/vector-struct-1.C | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/testsuite/g++.dg/torture/vector-struct-1.C 
b/gcc/testsuite/g++.dg/torture/vector-struct-1.C
new file mode 100644
index 000..e2747417e2d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/vector-struct-1.C
@@ -0,0 +1,18 @@
+/* PR c++/97990 */
+/* This used to crash with lto and strict aliasing enabled as the
+   vector type variant still had TYPE_ALIAS_SET set on it. */
+
+typedef __attribute__((__vector_size__(sizeof(short short TSimd;
+TSimd hh(int);
+struct y6
+{
+  TSimd VALUE;
+  ~y6();
+};
+template 
+auto f2(T1 p1, T2){
+  return hh(p1) <= 0;
+}
+void f1(){
+  f2(0, y6{});
+}


[gcc r15-1005] Avoid inserting after a GIMPLE_COND with SLP and early break

2024-06-04 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:0592000aeed84d47040946a125154b3c46d7c84f

commit r15-1005-g0592000aeed84d47040946a125154b3c46d7c84f
Author: Richard Biener 
Date:   Mon May 27 14:40:27 2024 +0200

Avoid inserting after a GIMPLE_COND with SLP and early break

When vectorizing an early break loop with LENs (do we miss some
check here to disallow this?) we can end up deciding to insert
stmts after a GIMPLE_COND when doing SLP scheduling and trying
to be conservative with placing of stmts only dependent on
the implicit loop mask/len.  The following avoids this, I guess
it's not perfect but it does the job fixing some observed
RISC-V regression.

* tree-vect-slp.cc (vect_schedule_slp_node): For mask/len
loops make sure to not advance the insertion iterator
beyond a GIMPLE_COND.

Diff:
---
 gcc/tree-vect-slp.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index bf1f467f53f..11ec82086fc 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9650,7 +9650,12 @@ vect_schedule_slp_node (vec_info *vinfo,
   else
{
  si = gsi_for_stmt (last_stmt);
- gsi_next (&si);
+ /* When we're getting gsi_after_labels from the starting
+condition of a fully masked/len loop avoid insertion
+after a GIMPLE_COND that can appear as the only header
+stmt with early break vectorization.  */
+ if (gimple_code (last_stmt) != GIMPLE_COND)
+   gsi_next (&si);
}
 }


[gcc r15-1006] Do single-lane SLP discovery for reductions

2024-06-04 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:d93353e6423ecaaae9fa47d0935caafd9abfe4de

commit r15-1006-gd93353e6423ecaaae9fa47d0935caafd9abfe4de
Author: Richard Biener 
Date:   Fri Feb 23 11:45:50 2024 +0100

Do single-lane SLP discovery for reductions

The following performs single-lane SLP discovery for reductions.
It requires a fixup for outer loop vectorization where a check
for multiple types needs adjustments as otherwise bogus pointer
IV increments happen when there are multiple copies of vector stmts
in the inner loop.

For the reduction epilog handling this extends the optimized path
to cover the trivial single-lane SLP reduction case.

The fix for PR65518 implemented in vect_grouped_load_supported for
non-SLP needs a SLP counterpart that I put in get_group_load_store_type.

I've decided to adjust three testcases for appearing single-lane
SLP instances instead of not dumping "vectorizing stmts using SLP"
for single-lane instances as that also requires testsuite adjustments.

* tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
discoveries are reduction chains and need special backedge
treatment.
(vect_analyze_slp): Fall back to single-lane SLP discovery
for reductions.  Make sure to try single-lane SLP reduction
for all reductions as fallback.
(vectorizable_load): Avoid outer loop SLP vectorization with
multi-copy vector stmts in the inner loop.
(vectorizable_store): Likewise.
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Allow
direct opcode and shift reduction also for SLP reductions
with a single lane.
* tree-vect-stmts.cc (get_group_load_store_type): For SLP also
check for the PR65518 single-element interleaving case as done in
vect_grouped_load_supported.

* gcc.dg/vect/slp-24.c: Expect another SLP instance for the
reduction.
* gcc.dg/vect/slp-24-big-array.c: Likewise.
* gcc.dg/vect/slp-reduc-6.c: Remove scan for zero SLP instances.

Diff:
---
 gcc/testsuite/gcc.dg/vect/slp-24-big-array.c |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-24.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-reduc-6.c  |  1 -
 gcc/tree-vect-loop.cc|  4 +-
 gcc/tree-vect-slp.cc | 71 +---
 gcc/tree-vect-stmts.cc   | 24 +-
 6 files changed, 80 insertions(+), 24 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c 
b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
index 5eaea9600ac..63f744338a1 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
@@ -92,4 +92,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { 
vect_no_align && ilp32 } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
xfail { vect_no_align && ilp32 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
xfail { vect_no_align && ilp32 } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-24.c 
b/gcc/testsuite/gcc.dg/vect/slp-24.c
index 59178f2c0f2..7814d7c324e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-24.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-24.c
@@ -78,4 +78,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { 
vect_no_align && ilp32 } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { 
xfail { vect_no_align && ilp32 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
xfail { vect_no_align && ilp32 } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c 
b/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
index 1fd15aa3c87..5566705a704 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
@@ -45,6 +45,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail { 
vect_no_int_add || { ! { vect_unpack || vect_strided2 } } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } 
} */
 /* { dg-final { scan-tree-dump-times "different interleaving chains in one 
node" 1 "vect" { target { ! vect_no_int_add } } } } */
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a08357acc11..06292ed8bbe 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6504,7 +6504,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   /* 2.3 Create the reduction code, using one of the three schemes described
  above. In SLP we simply need to extract all the elements from the 
  vector (without reducing them), so we use scalar shifts.  */
-  else if (reduc_fn != IFN_LAST && !slp_reduc)
+  else if (redu

[gcc r15-1007] libstdc++: Fix simd conversion for -fno-signed-char for Clang

2024-06-04 Thread Matthias Kretz via Libstdc++-cvs
https://gcc.gnu.org/g:8e36cf4c5c9140915d001db132a900b48037

commit r15-1007-g8e36cf4c5c9140915d001db132a900b48037
Author: Matthias Kretz 
Date:   Mon Jun 3 12:02:07 2024 +0200

libstdc++: Fix simd conversion for -fno-signed-char for Clang

The special case for Clang in the trait producing a signed integer type
lead to the trait returning 'char' where it should have been 'signed
char'. This workaround was introduced because on Clang the return type
of vector compares was not convertible to '_SimdWrapper<
__int_for_sizeof_t<...' unless '__int_for_sizeof_t' was an alias
for 'char'. In order to not rewrite the complete mask type code (there
is code scattered around the implementation assuming signed integers),
this needs to be 'signed char'; so the special case for Clang needs to
be removed.
The conversion issue is now solved in _SimdWrapper, which now
additionally allows conversion from vector types with compatible
integral type.

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/115308
* include/experimental/bits/simd.h (__int_for_sizeof): Remove
special cases for __clang__.
(_SimdWrapper): Change constructor overload set to allow
conversion from vector types with integral conversions via bit
reinterpretation.

Diff:
---
 libstdc++-v3/include/experimental/bits/simd.h | 45 ---
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/libstdc++-v3/include/experimental/bits/simd.h 
b/libstdc++-v3/include/experimental/bits/simd.h
index 7c524625719..cb1f13d8ba6 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -606,19 +606,12 @@ template 
 static_assert(_Bytes > 0);
 if constexpr (_Bytes == sizeof(int))
   return int();
-  #ifdef __clang__
-else if constexpr (_Bytes == sizeof(char))
-  return char();
-  #else
 else if constexpr (_Bytes == sizeof(_SChar))
   return _SChar();
-  #endif
 else if constexpr (_Bytes == sizeof(short))
   return short();
-  #ifndef __clang__
 else if constexpr (_Bytes == sizeof(long))
   return long();
-  #endif
 else if constexpr (_Bytes == sizeof(_LLong))
   return _LLong();
   #ifdef __SIZEOF_INT128__
@@ -2747,6 +2740,8 @@ template 
 
 // }}}
 // _SimdWrapper{{{
+struct _DisabledSimdWrapper;
+
 template 
   struct _SimdWrapper<
 _Tp, _Width,
@@ -2756,16 +2751,17 @@ template 
  == sizeof(__vector_type_t<_Tp, _Width>),
   __vector_type_t<_Tp, _Width>>
   {
-using _Base
-  = _SimdWrapperBase<__has_iec559_behavior<__signaling_NaN, _Tp>::value
-  && sizeof(_Tp) * _Width
-   == sizeof(__vector_type_t<_Tp, _Width>),
-__vector_type_t<_Tp, _Width>>;
+static constexpr bool _S_need_default_init
+  = __has_iec559_behavior<__signaling_NaN, _Tp>::value
+ and sizeof(_Tp) * _Width == sizeof(__vector_type_t<_Tp, _Width>);
+
+using _BuiltinType = __vector_type_t<_Tp, _Width>;
+
+using _Base = _SimdWrapperBase<_S_need_default_init, _BuiltinType>;
 
 static_assert(__is_vectorizable_v<_Tp>);
 static_assert(_Width >= 2); // 1 doesn't make sense, use _Tp directly then
 
-using _BuiltinType = __vector_type_t<_Tp, _Width>;
 using value_type = _Tp;
 
 static inline constexpr size_t _S_full_size
@@ -2801,13 +2797,26 @@ template 
 _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper&
 operator=(_SimdWrapper&&) = default;
 
-template >,
-is_same<_V, __intrinsic_type_t<_Tp, _Width>
+// Convert from exactly matching __vector_type_t
+using _SimdWrapperBase<_S_need_default_init, 
_BuiltinType>::_SimdWrapperBase;
+
+// Convert from __intrinsic_type_t if __intrinsic_type_t and 
__vector_type_t differ, otherwise
+// this ctor should not exist. Making the argument type unusable is our 
next best solution.
+_GLIBCXX_SIMD_INTRINSIC constexpr
+_SimdWrapper(conditional_t>,
+  _DisabledSimdWrapper, __intrinsic_type_t<_Tp, 
_Width>> __x)
+: _Base(__vector_bitcast<_Tp, _Width>(__x)) {}
+
+// Convert from different __vector_type_t, but only if bit 
reinterpretation is a correct
+// conversion of the value_type
+template ,
+ typename = enable_if_t
+  and is_integral_v>>
   _GLIBCXX_SIMD_INTRINSIC constexpr
   _SimdWrapper(_V __x)
-  // __vector_bitcast can convert e.g. __m128 to __vector(2) float
-  : _Base(__vector_bitcast<_Tp, _Width>(__x)) {}
+  : _Base(reinterpret_cast<_BuiltinType>(__x)) {}
 
 template  && ...)


[gcc r15-1008] invoke.texi: Clarify -march=lujiazui

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:09b4ab53155ea16e1fb12c2afcd9b6fe29a31c74

commit r15-1008-g09b4ab53155ea16e1fb12c2afcd9b6fe29a31c74
Author: Jakub Jelinek 
Date:   Tue Jun 4 12:20:13 2024 +0200

invoke.texi: Clarify -march=lujiazui

I was recently searching which exact CPUs are affected by the PR114576
wrong-code issue and went from the PTA_* bitmasks in GCC, so arrived
at the goldmont, goldmont-plus, tremont and lujiazui CPUs (as -march=
cases which do enable -maes and don't enable -mavx).
But when double-checking that against the invoke.texi documentation,
that was true for the first 3, but lujiazui said it supported AVX.
I was really confused by that, until I found the
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604407.html
explanation.  So, seems the CPUs do have AVX and F16C but -march=lujiazui
doesn't enable those and even activelly attempts to filter those out from
the announced CPUID features, in glibc as well as e.g. in libgcc.

Thus, I think we should document what actually happens, otherwise
users could assume that
gcc -march=lujiazui predefines __AVX__ and __F16C__, which it doesn't.

2024-06-04  Jakub Jelinek  

* doc/invoke.texi (lujiazui): Clarify that while the CPUs do support
AVX and F16C, -march=lujiazui actually doesn't enable those.

Diff:
---
 gcc/doc/invoke.texi | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 45115b5fbed..4e8967fd8ab 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -34808,8 +34808,10 @@ instruction set support.
 
 @item lujiazui
 ZHAOXIN lujiazui CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,
-SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
-ABM, BMI, BMI2, F16C, FXSR, RDSEED instruction set support.
+SSE4.2, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
+ABM, BMI, BMI2, FXSR, RDSEED instruction set support.  While the CPUs
+do support AVX and F16C, these aren't enabled by @code{-march=lujiazui}
+for performance reasons.
 
 @item yongfeng
 ZHAOXIN yongfeng CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,


[gcc r15-1009] builtins: Force SAVE_EXPR for __builtin_{add, sub, mul}_overflow and __builtin{add, sub}c [PR108789]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:b8e28381cb5c0cddfe5201faf799d8b27f5d7d6c

commit r15-1009-gb8e28381cb5c0cddfe5201faf799d8b27f5d7d6c
Author: Jakub Jelinek 
Date:   Tue Jun 4 12:28:01 2024 +0200

builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overflow and 
__builtin{add,sub}c [PR108789]

The following testcase is miscompiled, because we use save_expr
on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first
two operands are not INTEGER_CSTs (in that case we just fold it right away)
but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually
create a SAVE_EXPR at all and so we lower it to
*arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \
IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1))
which evaluates the ifn twice and just hope it will be CSEd back.
As *arg2 aliases *arg0, that is not the case.
The builtins are really never const/pure as they store into what
the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST
case, I think we should just always use SAVE_EXPR.  Just building SAVE_EXPR
by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because
c_fully_fold optimizes it away again, so the following patch marks the
ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the
__builtin_{add,sub,mul}_overflow_p case which were designed for use
especially in constant expressions and don't really evaluate the
realpart side, so we don't really need a SAVE_EXPR in that case).

2024-06-04  Jakub Jelinek  

PR middle-end/108789
* builtins.cc (fold_builtin_arith_overflow): For ovf_only,
don't call save_expr and don't build REALPART_EXPR, otherwise
set TREE_SIDE_EFFECTS on call before calling save_expr.
(fold_builtin_addc_subc): Set TREE_SIDE_EFFECTS on call before
calling save_expr.

* gcc.c-torture/execute/pr108789.c: New test.

Diff:
---
 gcc/builtins.cc| 22 ++-
 gcc/testsuite/gcc.c-torture/execute/pr108789.c | 39 ++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 00ee9eb2925..5b5307c67b8 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -10042,7 +10042,21 @@ fold_builtin_arith_overflow (location_t loc, enum 
built_in_function fcode,
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
arg0, arg1);
-  tree tgt = save_expr (call);
+  tree tgt;
+  if (ovf_only)
+   {
+ tgt = call;
+ intres = NULL_TREE;
+   }
+  else
+   {
+ /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+as while the call itself is const, the REALPART_EXPR store is
+certainly not.  And in any case, we want just one call,
+not multiple and trying to CSE them later.  */
+ TREE_SIDE_EFFECTS (call) = 1;
+ tgt = save_expr (call);
+   }
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   ovfres = fold_convert_loc (loc, boolean_type_node, ovfres);
@@ -10354,11 +10368,17 @@ fold_builtin_addc_subc (location_t loc, enum 
built_in_function fcode,
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
args[0], args[1]);
+  /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+ as while the call itself is const, the REALPART_EXPR store is
+ certainly not.  And in any case, we want just one call,
+ not multiple and trying to CSE them later.  */
+  TREE_SIDE_EFFECTS (call) = 1;
   tree tgt = save_expr (call);
   tree intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   tree ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
   intres, args[2]);
+  TREE_SIDE_EFFECTS (call) = 1;
   tgt = save_expr (call);
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   tree ovfres2 = build1_loc (loc, IMAGPART_EXPR, type, tgt);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr108789.c 
b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
new file mode 100644
index 000..32ee19be1c4
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
@@ -0,0 +1,39 @@
+/* PR middle-end/108789 */
+
+int
+add (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_add_overflow (*a, *b, r);
+}
+
+int
+mul (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_mul_overflow (*a, *b, r);
+}
+
+int
+main ()
+{
+  unsigned x;
+
+  /* 1073741824U + 1073741824U should not overflow.  */
+  x = (__INT_MAX__ + 1U) / 2;
+  if (add (&x, &x, &x))
+__builtin_abort ();
+
+  /*

[gcc r15-1010] testsuite: i386: Require ifunc support in gcc.target/i386/avx10_1-25.c etc.

2024-06-04 Thread Rainer Orth via Gcc-cvs
https://gcc.gnu.org/g:00fb385a25a7fbaa9c7060ddd5f41a8c3b1548d1

commit r15-1010-g00fb385a25a7fbaa9c7060ddd5f41a8c3b1548d1
Author: Rainer Orth 
Date:   Tue Jun 4 13:33:46 2024 +0200

testsuite: i386: Require ifunc support in gcc.target/i386/avx10_1-25.c etc.

Two new AVX10.1 tests FAIL on Solaris/x86:

FAIL: gcc.target/i386/avx10_1-25.c (test for excess errors)
FAIL: gcc.target/i386/avx10_1-26.c (test for excess errors)

Excess errors:

/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/avx10_1-25.c:6:9: 
error: the call requires 'ifunc', which is not supported by this target

Fixed by requiring ifunc support.

Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

2024-06-04  Rainer Orth  

gcc/testsuite:
* gcc.target/i386/avx10_1-25.c: Require ifunc support.
* gcc.target/i386/avx10_1-26.c: Likewise.

Diff:
---
 gcc/testsuite/gcc.target/i386/avx10_1-25.c | 1 +
 gcc/testsuite/gcc.target/i386/avx10_1-26.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-25.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
index 73f1b724560..5bd2b88fb08 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_1-25.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx" } */
+/* { dg-require-ifunc "" } */
 
 #include 
 __attribute__((target_clones ("default","avx10.1-256")))
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-26.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
index 514ab57a406..cf8c976e21f 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_1-26.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-ifunc "" } */
 
 #include 
 __attribute__((target_clones ("default","avx10.1-512")))


[gcc r11-11465] vect: Fix access size alignment assumption [PR115192]

2024-06-04 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:741ea10418987ac02eb8e680f2946a6e5928eb23

commit r11-11465-g741ea10418987ac02eb8e680f2946a6e5928eb23
Author: Richard Sandiford 
Date:   Tue Jun 4 13:47:34 2024 +0100

vect: Fix access size alignment assumption [PR115192]

create_intersect_range_checks checks whether two access ranges
a and b are alias-free using something equivalent to:

  end_a <= start_b || end_b <= start_a

It has two ways of doing this: a "vanilla" way that calculates
the exact exclusive end pointers, and another way that uses the
last inclusive aligned pointers (and changes the comparisons
accordingly).  The comment for the latter is:

  /* Calculate the minimum alignment shared by all four pointers,
 then arrange for this alignment to be subtracted from the
 exclusive maximum values to get inclusive maximum values.
 This "- min_align" is cumulative with a "+ access_size"
 in the calculation of the maximum values.  In the best
 (and common) case, the two cancel each other out, leaving
 us with an inclusive bound based only on seg_len.  In the
 worst case we're simply adding a smaller number than before.

The problem is that the associated code implicitly assumed that the
access size was a multiple of the pointer alignment, and so the
alignment could be carried over to the exclusive end pointer.

The testcase started failing after g:9fa5b473b5b8e289b6542
because that commit improved the alignment information for
the accesses.

gcc/
PR tree-optimization/115192
* tree-data-ref.c (create_intersect_range_checks): Take the
alignment of the access sizes into account.

gcc/testsuite/
PR tree-optimization/115192
* gcc.dg/vect/pr115192.c: New test.

(cherry picked from commit a0fe4fb1c8d7804515845dd5d2a814b3c7a1ccba)

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr115192.c | 28 
 gcc/tree-data-ref.c  |  5 -
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115192.c 
b/gcc/testsuite/gcc.dg/vect/pr115192.c
new file mode 100644
index 000..923d377c1bb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115192.c
@@ -0,0 +1,28 @@
+#include "tree-vect.h"
+
+int data[4 * 16 * 16] __attribute__((aligned(16)));
+
+__attribute__((noipa)) void
+foo (__SIZE_TYPE__ n)
+{
+  for (__SIZE_TYPE__ i = 1; i < n; ++i)
+{
+  data[i * n * 4] = data[(i - 1) * n * 4] + 1;
+  data[i * n * 4 + 1] = data[(i - 1) * n * 4 + 1] + 2;
+}
+}
+
+int
+main ()
+{
+  check_vect ();
+
+  data[0] = 10;
+  data[1] = 20;
+
+  foo (3);
+
+  if (data[24] != 12 || data[25] != 24)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index b3dd2f0ca41..d127aba8792 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 
 */
 
+#define INCLUDE_ALGORITHM
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -2629,7 +2630,9 @@ create_intersect_range_checks (class loop *loop, tree 
*cond_expr,
 Because the maximum values are inclusive, there is an alias
 if the maximum value of one segment is equal to the minimum
 value of the other.  */
-  min_align = MIN (dr_a.align, dr_b.align);
+  min_align = std::min (dr_a.align, dr_b.align);
+  min_align = std::min (min_align, known_alignment (dr_a.access_size));
+  min_align = std::min (min_align, known_alignment (dr_b.access_size));
   cmp_code = LT_EXPR;
 }


[gcc r11-11466] vect: Tighten vect_determine_precisions_from_range [PR113281]

2024-06-04 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:95e4252f53bc0e5b66a200c611fd2c9f6f7f2a62

commit r11-11466-g95e4252f53bc0e5b66a200c611fd2c9f6f7f2a62
Author: Richard Sandiford 
Date:   Tue Jun 4 13:47:35 2024 +0100

vect: Tighten vect_determine_precisions_from_range [PR113281]

This was another PR caused by the way that
vect_determine_precisions_from_range handles shifts.  We tried to
narrow 32768 >> x to a 16-bit shift based on range information for
the inputs and outputs, with vect_recog_over_widening_pattern
(after PR110828) adjusting the shift amount.  But this doesn't
work for the case where x is in [16, 31], since then 32-bit
32768 >> x is a well-defined zero, whereas no well-defined
16-bit 32768 >> y will produce 0.

We could perhaps generate x < 16 ? 32768 >> x : 0 instead,
but since vect_determine_precisions_from_range was never really
supposed to rely on fix-ups, it seems better to fix that instead.

The patch also makes the code more selective about which codes
can be narrowed based on input and output ranges.  This showed
that vect_truncatable_operation_p was missing cases for
BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR
(equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1).

pr113281-1.c is the original testcase.  pr113281-[23].c failed
before the patch due to overly optimistic narrowing.  pr113281-[45].c
previously passed and are meant to protect against accidental
optimisation regressions.

gcc/
PR target/113281
* tree-vect-patterns.c (vect_recog_over_widening_pattern): Remove
workaround for right shifts.
(vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR.
(vect_determine_precisions_from_range): Be more selective about
which codes can be narrowed based on their input and output ranges.
For shifts, require at least one more bit of precision than the
maximum shift amount.

gcc/testsuite/
PR target/113281
* gcc.dg/vect/pr113281-1.c: New test.
* gcc.dg/vect/pr113281-2.c: Likewise.
* gcc.dg/vect/pr113281-3.c: Likewise.
* gcc.dg/vect/pr113281-4.c: Likewise.
* gcc.dg/vect/pr113281-5.c: Likewise.

(cherry picked from commit 1a8261e047f7a2c2b0afb95716f7615cba718cd1)

Diff:
---
 gcc/testsuite/gcc.dg/vect/pr113281-1.c |  17 ++
 gcc/testsuite/gcc.dg/vect/pr113281-2.c |  50 +++
 gcc/testsuite/gcc.dg/vect/pr113281-3.c |  39 
 gcc/testsuite/gcc.dg/vect/pr113281-4.c |  55 +
 gcc/testsuite/gcc.dg/vect/pr113281-5.c |  66 
 gcc/tree-vect-patterns.c   | 107 -
 6 files changed, 305 insertions(+), 29 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-1.c 
b/gcc/testsuite/gcc.dg/vect/pr113281-1.c
new file mode 100644
index 000..6df4231cb5f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr113281-1.c
@@ -0,0 +1,17 @@
+#include "tree-vect.h"
+
+unsigned char a;
+
+int main() {
+  check_vect ();
+
+  short b = a = 0;
+  for (; a != 19; a++)
+if (a)
+  b = 32872 >> a;
+
+  if (b == 0)
+return 0;
+  else
+return 1;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-2.c 
b/gcc/testsuite/gcc.dg/vect/pr113281-2.c
new file mode 100644
index 000..3a1170c28b6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr113281-2.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+
+#define N 128
+
+short x[N];
+short y[N];
+
+void
+f1 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= y[i];
+}
+
+void
+f2 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 32 ? y[i] : 32);
+}
+
+void
+f3 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 31 ? y[i] : 31);
+}
+
+void
+f4 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] & 31);
+}
+
+void
+f5 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= 0x8000 >> y[i];
+}
+
+void
+f6 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= 0x8000 >> (y[i] & 31);
+}
+
+/* { dg-final { scan-tree-dump-not {can narrow[^\n]+>>} "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr113281-3.c 
b/gcc/testsuite/gcc.dg/vect/pr113281-3.c
new file mode 100644
index 000..5982dd2d16f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr113281-3.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+
+#define N 128
+
+short x[N];
+short y[N];
+
+void
+f1 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 30 ? y[i] : 30);
+}
+
+void
+f2 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= ((y[i] & 15) + 2);
+}
+
+void
+f3 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] >>= (y[i] < 16 ? y[i] : 16);
+}
+
+void
+f4 (void)
+{
+  for (int i = 0; i < N; ++i)
+x[i] = 32768 >> ((y[i] & 15) + 3);
+}
+
+/* { dg-final { scan-tree-dump {can narrow to signed:31 without loss [^\n]+>>} 
"vect" } } */
+/* { dg-final { scan-tree-dump {can na

[gcc r11-11467] rtl-ssa: Extend m_num_defs to a full unsigned int [PR108086]

2024-06-04 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:66d01cc3f4a248ccc471a978f0bfe3615c3f3a30

commit r11-11467-g66d01cc3f4a248ccc471a978f0bfe3615c3f3a30
Author: Richard Sandiford 
Date:   Tue Jun 4 13:47:35 2024 +0100

rtl-ssa: Extend m_num_defs to a full unsigned int [PR108086]

insn_info tried to save space by storing the number of
definitions in a 16-bit bitfield.  The justification was:

  // ...  FIRST_PSEUDO_REGISTER + 1
  // is the maximum number of accesses to hard registers and memory, and
  // MAX_RECOG_OPERANDS is the maximum number of pseudos that can be
  // defined by an instruction, so the number of definitions should fit
  // easily in 16 bits.

But while that reasoning holds (I think) for real instructions,
it doesn't hold for artificial instructions.  I don't think there's
any sensible higher limit we can use, so this patch goes for a full
unsigned int.

gcc/
PR rtl-optimization/108086
* rtl-ssa/insns.h (insn_info): Make m_num_defs a full unsigned int.
Adjust size-related commentary accordingly.

(cherry picked from commit cd41085a37b8288dbdfe0f81027ce04b978578f1)

Diff:
---
 gcc/rtl-ssa/insns.h | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/rtl-ssa/insns.h b/gcc/rtl-ssa/insns.h
index e4aa6d1d5ce..ab715adc151 100644
--- a/gcc/rtl-ssa/insns.h
+++ b/gcc/rtl-ssa/insns.h
@@ -141,7 +141,7 @@ using insn_call_clobbers_tree = 
default_splay_tree;
 // of "notes", a bit like REG_NOTES for the underlying RTL insns.
 class insn_info
 {
-  // Size: 8 LP64 words.
+  // Size: 9 LP64 words.
   friend class ebb_info;
   friend class function_info;
 
@@ -401,10 +401,11 @@ private:
   // The number of definitions and the number uses.  FIRST_PSEUDO_REGISTER + 1
   // is the maximum number of accesses to hard registers and memory, and
   // MAX_RECOG_OPERANDS is the maximum number of pseudos that can be
-  // defined by an instruction, so the number of definitions should fit
-  // easily in 16 bits.
+  // defined by an instruction, so the number of definitions in a real
+  // instruction should fit easily in 16 bits.  However, there are no
+  // limits on the number of definitions in artifical instructions.
   unsigned int m_num_uses;
-  unsigned int m_num_defs : 16;
+  unsigned int m_num_defs;
 
   // Flags returned by the accessors above.
   unsigned int m_is_debug_insn : 1;
@@ -414,7 +415,7 @@ private:
   unsigned int m_has_volatile_refs : 1;
 
   // For future expansion.
-  unsigned int m_spare : 11;
+  unsigned int m_spare : 27;
 
   // The program point at which the instruction occurs.
   //
@@ -431,6 +432,9 @@ private:
   // instruction.
   mutable int m_cost_or_uid;
 
+  // On LP64 systems, there's a gap here that could be used for future
+  // expansion.
+
   // The list of notes that have been attached to the instruction.
   insn_note *m_first_note;
 };


[gcc r11-11468] rtl-ssa: Fix -fcompare-debug failure [PR100303]

2024-06-04 Thread Richard Sandiford via Gcc-cvs
https://gcc.gnu.org/g:a1fb76e041740e7dd8cdf71dff3ae7aa31b3ea9b

commit r11-11468-ga1fb76e041740e7dd8cdf71dff3ae7aa31b3ea9b
Author: Richard Sandiford 
Date:   Tue Jun 4 13:47:36 2024 +0100

rtl-ssa: Fix -fcompare-debug failure [PR100303]

This patch fixes an oversight in the handling of debug instructions
in rtl-ssa.  At the moment (and whether this is a good idea or not
remains to be seen), we maintain a linear RPO sequence of definitions
and non-debug uses.  If a register is defined more than once, we use
a degenerate phi to reestablish a previous definition where necessary.

However, debug instructions shouldn't of course affect codegen,
so we can't create a new definition just for them.  In those situations
we instead hang the debug use off the real definition (meaning that
debug uses do not follow a linear order wrt definitions).  Again,
it remains to be seen whether that's a good idea.

The problem in the PR was that we weren't taking this into account
when increasing (or potentially increasing) the live range of an
existing definition.  We'd create the phi even if it would only
be used by debug instructions.

The patch goes for the simple but inelegant approach of passing
a bool to say whether the use is a debug use or not.  I imagine
this area will need some tweaking based on experience in future.

gcc/
PR rtl-optimization/100303
* rtl-ssa/accesses.cc (function_info::make_use_available): Take a
boolean that indicates whether the use will only be used in
debug instructions.  Treat it in the same way that existing
cross-EBB debug references would be handled if so.
(function_info::make_uses_available): Likewise.
* rtl-ssa/functions.h (function_info::make_uses_available): Update
prototype accordingly.
(function_info::make_uses_available): Likewise.
* fwprop.c (try_fwprop_subst): Update call accordingly.

(cherry picked from commit c97351c0cf4872cc0e99e73ed17fb16659fd38b3)

Diff:
---
 gcc/fwprop.c|   3 +-
 gcc/rtl-ssa/accesses.cc |  15 +++--
 gcc/rtl-ssa/functions.h |   7 +-
 gcc/testsuite/g++.dg/torture/pr100303.C | 112 
 4 files changed, 129 insertions(+), 8 deletions(-)

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index d7203672886..73284a7ae3e 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -606,7 +606,8 @@ try_fwprop_subst (use_info *use, set_info *def,
   if (def_insn->bb () != use_insn->bb ())
 {
   src_uses = crtl->ssa->make_uses_available (attempt, src_uses,
-use_insn->bb ());
+use_insn->bb (),
+use_insn->is_debug_insn ());
   if (!src_uses.is_valid ())
return false;
 }
diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index af7b568fa98..0621ea22880 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1290,7 +1290,10 @@ function_info::insert_temp_clobber (obstack_watermark 
&watermark,
 }
 
 // A subroutine of make_uses_available.  Try to make USE's definition
-// available at the head of BB.  On success:
+// available at the head of BB.  WILL_BE_DEBUG_USE is true if the
+// definition will be used only in debug instructions.
+//
+// On success:
 //
 // - If the use would have the same def () as USE, return USE.
 //
@@ -1302,7 +1305,8 @@ function_info::insert_temp_clobber (obstack_watermark 
&watermark,
 //
 // Return null on failure.
 use_info *
-function_info::make_use_available (use_info *use, bb_info *bb)
+function_info::make_use_available (use_info *use, bb_info *bb,
+  bool will_be_debug_use)
 {
   set_info *def = use->def ();
   if (!def)
@@ -1318,7 +1322,7 @@ function_info::make_use_available (use_info *use, bb_info 
*bb)
   && single_pred (cfg_bb) == use_bb->cfg_bb ()
   && remains_available_on_exit (def, use_bb))
 {
-  if (def->ebb () == bb->ebb ())
+  if (def->ebb () == bb->ebb () || will_be_debug_use)
return use;
 
   resource_info resource = use->resource ();
@@ -1362,7 +1366,8 @@ function_info::make_use_available (use_info *use, bb_info 
*bb)
 // See the comment above the declaration.
 use_array
 function_info::make_uses_available (obstack_watermark &watermark,
-   use_array uses, bb_info *bb)
+   use_array uses, bb_info *bb,
+   bool will_be_debug_uses)
 {
   unsigned int num_uses = uses.size ();
   if (num_uses == 0)
@@ -1371,7 +1376,7 @@ function_info::make_uses_available (obstack_watermark 
&watermark,
   auto **new_uses = XOBNEWVEC (watermark, access_info *, num_uses);
   for (unsigned int i = 0; i < num_uses; ++i)
  

[gcc r15-1011] fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:b82a816000791e7a286c7836b3a473ec0e2a577b

commit r15-1011-gb82a816000791e7a286c7836b3a473ec0e2a577b
Author: Jakub Jelinek 
Date:   Tue Jun 4 15:49:41 2024 +0200

fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

The function currently incorrectly assumes all the __builtin_clz* and .CLZ
calls have non-negative result.  That is the case of the former which is UB
on zero and has [0, prec-1] return value otherwise, and is the case of the
single argument .CLZ as well (again, UB on zero), but for two argument
.CLZ is the case only if the second argument is also nonnegative (or if we
know the argument can't be zero, but let's do that just in the ranger IMHO).

The following patch does that.

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p) :
If arg1 is non-NULL, RECURSE on it, otherwise return true.

* gcc.dg/bitint-106.c: New test.

Diff:
---
 gcc/fold-const.cc |  6 +-
 gcc/testsuite/gcc.dg/bitint-106.c | 29 +
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 117a816fec6..65ce03d572f 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -15241,7 +15241,6 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
 CASE_CFN_FFS:
 CASE_CFN_PARITY:
 CASE_CFN_POPCOUNT:
-CASE_CFN_CLZ:
 CASE_CFN_CLRSB:
 case CFN_BUILT_IN_BSWAP16:
 case CFN_BUILT_IN_BSWAP32:
@@ -15250,6 +15249,11 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
   /* Always true.  */
   return true;
 
+CASE_CFN_CLZ:
+  if (arg1)
+   return RECURSE (arg1);
+  return true;
+
 CASE_CFN_SQRT:
 CASE_CFN_SQRT_FN:
   /* sqrt(-0.0) is -0.0.  */
diff --git a/gcc/testsuite/gcc.dg/bitint-106.c 
b/gcc/testsuite/gcc.dg/bitint-106.c
new file mode 100644
index 000..a36e8836690
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/bitint-106.c
@@ -0,0 +1,29 @@
+/* PR tree-optimization/115337 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-O2" } */
+
+#if __BITINT_MAXWIDTH__ >= 129
+#define N 128
+#else
+#define N 63
+#endif
+
+_BitInt (N) g;
+int c;
+
+void
+foo (unsigned _BitInt (N + 1) z, _BitInt (N) *ret)
+{
+  c = __builtin_stdc_first_leading_one (z << N);
+  _BitInt (N) y = *(_BitInt (N) *) __builtin_memset (&g, c, 5);
+  *ret = y;
+}
+
+int
+main ()
+{
+  _BitInt (N) x;
+  foo (0, &x);
+  if (c || g || x)
+__builtin_abort ();
+}


[gcc r15-1012] fold-const, gimple-fold: Some formatting cleanups

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:7be37a9bd40862e6a4686105cacf22d393258848

commit r15-1012-g7be37a9bd40862e6a4686105cacf22d393258848
Author: Jakub Jelinek 
Date:   Tue Jun 4 15:51:31 2024 +0200

fold-const, gimple-fold: Some formatting cleanups

While looking into PR115337, I've spotted some badly formatted code,
which the following patch fixes.

2024-06-04  Jakub Jelinek  

* fold-const.cc (tree_call_nonnegative_warnv_p): Formatting fixes.
(tree_invalid_nonnegative_warnv_p): Likewise.
* gimple-fold.cc (gimple_call_nonnegative_warnv_p): Likewise.

Diff:
---
 gcc/fold-const.cc  | 8 
 gcc/gimple-fold.cc | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 65ce03d572f..048c654c848 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -15331,8 +15331,8 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
 non-negative if both operands are non-negative.  In the presence
 of qNaNs, we're non-negative if either operand is non-negative
 and can't be a qNaN, or if both operands are non-negative.  */
-  if (tree_expr_maybe_signaling_nan_p (arg0) ||
- tree_expr_maybe_signaling_nan_p (arg1))
+  if (tree_expr_maybe_signaling_nan_p (arg0)
+ || tree_expr_maybe_signaling_nan_p (arg1))
 return RECURSE (arg0) && RECURSE (arg1);
   return RECURSE (arg0) ? (!tree_expr_maybe_nan_p (arg0)
   || RECURSE (arg1))
@@ -15431,8 +15431,8 @@ tree_invalid_nonnegative_warnv_p (tree t, bool 
*strict_overflow_p, int depth)
 
 case CALL_EXPR:
   {
-   tree arg0 = call_expr_nargs (t) > 0 ?  CALL_EXPR_ARG (t, 0) : NULL_TREE;
-   tree arg1 = call_expr_nargs (t) > 1 ?  CALL_EXPR_ARG (t, 1) : NULL_TREE;
+   tree arg0 = call_expr_nargs (t) > 0 ? CALL_EXPR_ARG (t, 0) : NULL_TREE;
+   tree arg1 = call_expr_nargs (t) > 1 ? CALL_EXPR_ARG (t, 1) : NULL_TREE;
 
return tree_call_nonnegative_warnv_p (TREE_TYPE (t),
  get_call_combined_fn (t),
diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index c33583cf3ee..7c534d56bf1 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -9334,10 +9334,10 @@ static bool
 gimple_call_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p,
 int depth)
 {
-  tree arg0 = gimple_call_num_args (stmt) > 0 ?
-gimple_call_arg (stmt, 0) : NULL_TREE;
-  tree arg1 = gimple_call_num_args (stmt) > 1 ?
-gimple_call_arg (stmt, 1) : NULL_TREE;
+  tree arg0
+= gimple_call_num_args (stmt) > 0 ? gimple_call_arg (stmt, 0) : NULL_TREE;
+  tree arg1
+= gimple_call_num_args (stmt) > 1 ? gimple_call_arg (stmt, 1) : NULL_TREE;
   tree lhs = gimple_call_lhs (stmt);
   return (lhs
  && tree_call_nonnegative_warnv_p (TREE_TYPE (lhs),


[gcc r15-1013] fold-const: Handle CTZ like CLZ in tree_call_nonnegative_warnv_p [PR115337]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:181861b072ff1ef650c1a9d0290a4a672b9e747c

commit r15-1013-g181861b072ff1ef650c1a9d0290a4a672b9e747c
Author: Jakub Jelinek 
Date:   Tue Jun 4 15:52:09 2024 +0200

fold-const: Handle CTZ like CLZ in tree_call_nonnegative_warnv_p [PR115337]

I think we can handle CTZ exactly like CLZ in tree_call_nonnegative_warnv_p.
Like CLZ, if it is UB at zero, the result range is [0, prec-1] and if it is
well defined at zero, the second argument provides the value at zero.

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p): Handle
CASE_CFN_CTZ like CASE_CFN_CLZ.

Diff:
---
 gcc/fold-const.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 048c654c848..92b048c307e 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -15250,6 +15250,7 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
   return true;
 
 CASE_CFN_CLZ:
+CASE_CFN_CTZ:
   if (arg1)
return RECURSE (arg1);
   return true;


[gcc r15-1014] ranger: Improve CLZ fold_range [PR115337]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:591d30c5c97e757f63ce0d99ae9a3dbe8c75a50a

commit r15-1014-g591d30c5c97e757f63ce0d99ae9a3dbe8c75a50a
Author: Jakub Jelinek 
Date:   Tue Jun 4 16:16:49 2024 +0200

ranger: Improve CLZ fold_range [PR115337]

cfn_ctz::fold_range includes special cases for the case where .CTZ has
two arguments and so is well defined at zero, and the second argument is
equal to prec or -1, but cfn_clz::fold_range does that only for the prec
case.  -1 is fairly common as well though, because the  builtins
do use it now, so I think it is worth special casing that.
If we don't know anything about the argument, the difference for
.CLZ (arg, -1) is that previously the result was varying, now it will be
[-1, prec-1].  If we knew arg can't be zero, it used to be optimized before
as well into e.g. [0, prec-1] or similar.

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* gimple-range-op.cc (cfn_clz::fold_range): For
m_gimple_call_internal_p handle as a special case also second 
argument
of -1 next to prec.

Diff:
---
 gcc/gimple-range-op.cc | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index aec3f39ec0e..1b9a84708b9 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -941,8 +941,10 @@ cfn_clz::fold_range (irange &r, tree type, const irange 
&lh,
   int maxi = prec - 1;
   if (m_gimple_call_internal_p)
 {
-  // Only handle the single common value.
-  if (rh.lower_bound () == prec)
+  // Handle only the two common values.
+  if (rh.lower_bound () == -1)
+   mini = -1;
+  else if (rh.lower_bound () == prec)
maxi = prec;
   else
// Magic value to give up, unless we can prove arg is non-zero.
@@ -953,7 +955,7 @@ cfn_clz::fold_range (irange &r, tree type, const irange &lh,
   if (wi::gt_p (lh.lower_bound (), 0, TYPE_SIGN (lh.type (
 {
   maxi = prec - 1 - wi::floor_log2 (lh.lower_bound ());
-  if (mini == -2)
+  if (mini < 0)
mini = 0;
 }
   else if (!range_includes_zero_p (lh))
@@ -969,11 +971,11 @@ cfn_clz::fold_range (irange &r, tree type, const irange 
&lh,
   if (max == 0)
 {
   // If CLZ_DEFINED_VALUE_AT_ZERO is 2 with VALUE of prec,
-  // return [prec, prec], otherwise ignore the range.
-  if (maxi == prec)
-   mini = prec;
+  // return [prec, prec] or [-1, -1], otherwise ignore the range.
+  if (maxi == prec || mini == -1)
+   mini = maxi;
 }
-  else
+  else if (mini >= 0)
 mini = newmini;
 
   if (mini == -2)


[gcc r14-10276] combine: Fix up simplify_compare_const [PR115092]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:14a7296d04474055bfe1d7f130dceac6dabf390d

commit r14-10276-g14a7296d04474055bfe1d7f130dceac6dabf390d
Author: Jakub Jelinek 
Date:   Wed May 15 18:37:17 2024 +0200

combine: Fix up simplify_compare_const [PR115092]

The following testcases are miscompiled (with tons of GIMPLE
optimization disabled) because combine sees GE comparison of
1-bit sign_extract (i.e. something with [-1, 0] value range)
with (const_int -1) (which is always true) and optimizes it into
NE comparison of 1-bit zero_extract ([0, 1] value range) against
(const_int 0).
The reason is that simplify_compare_const first (correctly)
simplifies the comparison to
GE (ashift:SI something (const_int 31)) (const_int -2147483648)
and then an optimization for when the second operand is power of 2
triggers.  That optimization is fine for power of 2s which aren't
the signed minimum of the mode, or if it is NE, EQ, GEU or LTU
against the signed minimum of the mode, but for GE or LT optimizing
it into NE (or EQ) against const0_rtx is wrong, those cases
are always true or always false (but the function doesn't have
a standardized way to tell callers the comparison is now unconditional).

The following patch just disables the optimization in that case.

2024-05-15  Jakub Jelinek  

PR rtl-optimization/114902
PR rtl-optimization/115092
* combine.cc (simplify_compare_const): Don't optimize
GE op0 SIGNED_MIN or LT op0 SIGNED_MIN into NE op0 const0_rtx or
EQ op0 const0_rtx.

* gcc.dg/pr114902.c: New test.
* gcc.dg/pr115092.c: New test.

(cherry picked from commit 0b93a0ae153ef70a82ff63e67926a01fdab9956b)

Diff:
---
 gcc/combine.cc  |  6 --
 gcc/testsuite/gcc.dg/pr114902.c | 23 +++
 gcc/testsuite/gcc.dg/pr115092.c | 16 
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 92b8d98e6c1..60afe043578 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -11841,8 +11841,10 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
  `and'ed with that bit), we can replace this with a comparison
  with zero.  */
   if (const_op
-  && (code == EQ || code == NE || code == GE || code == GEU
- || code == LT || code == LTU)
+  && (code == EQ || code == NE || code == GEU || code == LTU
+ /* This optimization is incorrect for signed >= INT_MIN or
+< INT_MIN, those are always true or always false.  */
+ || ((code == GE || code == LT) && const_op > 0))
   && is_a  (mode, &int_mode)
   && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
   && pow2p_hwi (const_op & GET_MODE_MASK (int_mode))
diff --git a/gcc/testsuite/gcc.dg/pr114902.c b/gcc/testsuite/gcc.dg/pr114902.c
new file mode 100644
index 000..60684faa25d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114902.c
@@ -0,0 +1,23 @@
+/* PR rtl-optimization/114902 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fno-tree-fre -fno-tree-forwprop -fno-tree-ccp 
-fno-tree-dominator-opts" } */
+
+__attribute__((noipa))
+int foo (int x)
+{
+  int a = ~x;
+  int t = a & 1;
+  int e = -t;
+  int b = e >= -1;
+  if (b)
+return 0;
+  __builtin_trap ();
+}
+
+int
+main ()
+{
+  foo (-1);
+  foo (0);
+  foo (1);
+}
diff --git a/gcc/testsuite/gcc.dg/pr115092.c b/gcc/testsuite/gcc.dg/pr115092.c
new file mode 100644
index 000..c9047f4d321
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115092.c
@@ -0,0 +1,16 @@
+/* PR rtl-optimization/115092 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fgcse -ftree-pre -fno-tree-dominator-opts -fno-tree-fre 
-fno-guess-branch-probability" } */
+
+int a, b, c = 1, d, e;
+
+int
+main ()
+{
+  int f, g = a;
+  b = -2;
+  f = -(1 >> ((c && b) & ~a));
+  if (f <= b)
+d = g / e;
+  return 0;
+}


[gcc r14-10277] rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:a7dd44c02ec1047166b4bacc3faa6255c816da2a

commit r14-10277-ga7dd44c02ec1047166b4bacc3faa6255c816da2a
Author: Jakub Jelinek 
Date:   Mon Jun 3 23:11:06 2024 +0200

rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

PCH doesn't work properly in --enable-host-pie configurations on
powerpc*-linux*.
The problem is that the rs6000_builtin_info and rs6000_instance_info
arrays mix pointers to .rodata/.data (bifname and attr_string point
to string literals in .rodata section, and the next member is either NULL
or &rs6000_instance_info[XXX]) and GC member (tree fntype).
Now, for normal GC this works just fine, we emit
  {
&rs6000_instance_info[0].fntype,
1 * (RS6000_INST_MAX),
sizeof (rs6000_instance_info[0]),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
  },
  {
&rs6000_builtin_info[0].fntype,
1 * (RS6000_BIF_MAX),
sizeof (rs6000_builtin_info[0]),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
  },
GC roots which are strided and thus cover only the fntype members of all
the elements of the two arrays.
For PCH though it actually results in saving those huge arrays (one is
130832 bytes, another 81568 bytes) into the .gch files and loading them back
in full.  While the bifname and attr_string and next pointers are marked as
GTY((skip)), they are actually saved to point to the .rodata and .data
sections of the process which writes the PCH, but because cc1/cc1plus etc.
are position independent executables with --enable-host-pie, when it is
loaded from the PCH file, it can point in a completely different addresses
where nothing is mapped at all or some random different thing appears at.
While gengtype supports the callback option, that one is meant for
relocatable function pointers and doesn't work in the case of GTY arrays
inside of .data section anyway.

So, either we'd need to add some further GTY extensions, or the following
patch instead reworks it such that the fntype members which were the only
reason for PCH in those arrays are moved to separate arrays.

Size-wise in .data sections it is (in bytes):

 vanillapatched
rs6000_builtin_info  130832 110704
rs6000_instance_info  81568  40784
rs6000_overload_info   7392   7392
rs6000_builtin_info_fntype0  10064
rs6000_instance_info_fntype   0  20392
sum  219792 189336

where previously we saved/restored for PCH those 130832+81568 bytes, now we
save/restore just 10064+20392 bytes, so this change is beneficial for the
data section size.

Unfortunately, it grows the size of the rs6000_init_generated_builtins
function, vanilla had 218328 bytes, patched has 228668.

When I applied
 void
 rs6000_init_generated_builtins ()
 {
+  bifdata *rs6000_builtin_info_p;
+  tree *rs6000_builtin_info_fntype_p;
+  ovlddata *rs6000_instance_info_p;
+  tree *rs6000_instance_info_fntype_p;
+  ovldrecord *rs6000_overload_info_p;
+  __asm ("" : "=r" (rs6000_builtin_info_p) : "0" (rs6000_builtin_info));
+  __asm ("" : "=r" (rs6000_builtin_info_fntype_p) : "0" 
(rs6000_builtin_info_fntype));
+  __asm ("" : "=r" (rs6000_instance_info_p) : "0" (rs6000_instance_info));
+  __asm ("" : "=r" (rs6000_instance_info_fntype_p) : "0" 
(rs6000_instance_info_fntype));
+  __asm ("" : "=r" (rs6000_overload_info_p) : "0" (rs6000_overload_info));
+  #define rs6000_builtin_info rs6000_builtin_info_p
+  #define rs6000_builtin_info_fntype rs6000_builtin_info_fntype_p
+  #define rs6000_instance_info rs6000_instance_info_p
+  #define rs6000_instance_info_fntype rs6000_instance_info_fntype_p
+  #define rs6000_overload_info rs6000_overload_info_p
+
hack by hand, the size of the function is 209700 though, so if really
wanted, we could add __attribute__((__noipa__)) to the function when
building with recent enough GCC and pass pointers to the first elements
of the 5 arrays to the function as arguments.  If you want such a change,
could that be done incrementally?

2024-06-03  Jakub Jelinek  

PR target/115324
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Remove
GTY markup from struct bifdata and struct ovlddata and remove their
fntype members.  Change next member in struct ovlddata and
first_instance member of struct ovldrecord to have int type rather
than struct ovlddata *.  Remove GTY markup from rs6000_builtin_info
and rs6000_instance_info arrays, declare new
rs6000_builtin_info_fntype and rs6000_instance_info_fntype arrays,
which have GTY markup.
(write_bif_static_init): Adjust for the a

[gcc r14-10279] builtins: Force SAVE_EXPR for __builtin_{add, sub, mul}_overflow and __builtin{add, sub}c [PR108789]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:f9af4a05e027a8b797628f1a2c39ef0b28dc36d9

commit r14-10279-gf9af4a05e027a8b797628f1a2c39ef0b28dc36d9
Author: Jakub Jelinek 
Date:   Tue Jun 4 12:28:01 2024 +0200

builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overflow and 
__builtin{add,sub}c [PR108789]

The following testcase is miscompiled, because we use save_expr
on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first
two operands are not INTEGER_CSTs (in that case we just fold it right away)
but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually
create a SAVE_EXPR at all and so we lower it to
*arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \
IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1))
which evaluates the ifn twice and just hope it will be CSEd back.
As *arg2 aliases *arg0, that is not the case.
The builtins are really never const/pure as they store into what
the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST
case, I think we should just always use SAVE_EXPR.  Just building SAVE_EXPR
by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because
c_fully_fold optimizes it away again, so the following patch marks the
ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the
__builtin_{add,sub,mul}_overflow_p case which were designed for use
especially in constant expressions and don't really evaluate the
realpart side, so we don't really need a SAVE_EXPR in that case).

2024-06-04  Jakub Jelinek  

PR middle-end/108789
* builtins.cc (fold_builtin_arith_overflow): For ovf_only,
don't call save_expr and don't build REALPART_EXPR, otherwise
set TREE_SIDE_EFFECTS on call before calling save_expr.
(fold_builtin_addc_subc): Set TREE_SIDE_EFFECTS on call before
calling save_expr.

* gcc.c-torture/execute/pr108789.c: New test.

(cherry picked from commit b8e28381cb5c0cddfe5201faf799d8b27f5d7d6c)

Diff:
---
 gcc/builtins.cc| 22 ++-
 gcc/testsuite/gcc.c-torture/execute/pr108789.c | 39 ++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..7c1497561f7 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -10042,7 +10042,21 @@ fold_builtin_arith_overflow (location_t loc, enum 
built_in_function fcode,
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
arg0, arg1);
-  tree tgt = save_expr (call);
+  tree tgt;
+  if (ovf_only)
+   {
+ tgt = call;
+ intres = NULL_TREE;
+   }
+  else
+   {
+ /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+as while the call itself is const, the REALPART_EXPR store is
+certainly not.  And in any case, we want just one call,
+not multiple and trying to CSE them later.  */
+ TREE_SIDE_EFFECTS (call) = 1;
+ tgt = save_expr (call);
+   }
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   ovfres = fold_convert_loc (loc, boolean_type_node, ovfres);
@@ -10354,11 +10368,17 @@ fold_builtin_addc_subc (location_t loc, enum 
built_in_function fcode,
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
args[0], args[1]);
+  /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+ as while the call itself is const, the REALPART_EXPR store is
+ certainly not.  And in any case, we want just one call,
+ not multiple and trying to CSE them later.  */
+  TREE_SIDE_EFFECTS (call) = 1;
   tree tgt = save_expr (call);
   tree intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   tree ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
   intres, args[2]);
+  TREE_SIDE_EFFECTS (call) = 1;
   tgt = save_expr (call);
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   tree ovfres2 = build1_loc (loc, IMAGPART_EXPR, type, tgt);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr108789.c 
b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
new file mode 100644
index 000..32ee19be1c4
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
@@ -0,0 +1,39 @@
+/* PR middle-end/108789 */
+
+int
+add (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_add_overflow (*a, *b, r);
+}
+
+int
+mul (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_mul_overflow (*a, *b, r);
+}
+
+int
+main ()
+{
+  unsigned x;
+
+  /* 1073741824U + 1073741824U should not overflow.  */
+  x = (_

[gcc r14-10278] invoke.texi: Clarify -march=lujiazui

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:1c1bc2553f6cb6d104f1f1b749aac0f39c4a3959

commit r14-10278-g1c1bc2553f6cb6d104f1f1b749aac0f39c4a3959
Author: Jakub Jelinek 
Date:   Tue Jun 4 12:20:13 2024 +0200

invoke.texi: Clarify -march=lujiazui

I was recently searching which exact CPUs are affected by the PR114576
wrong-code issue and went from the PTA_* bitmasks in GCC, so arrived
at the goldmont, goldmont-plus, tremont and lujiazui CPUs (as -march=
cases which do enable -maes and don't enable -mavx).
But when double-checking that against the invoke.texi documentation,
that was true for the first 3, but lujiazui said it supported AVX.
I was really confused by that, until I found the
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604407.html
explanation.  So, seems the CPUs do have AVX and F16C but -march=lujiazui
doesn't enable those and even activelly attempts to filter those out from
the announced CPUID features, in glibc as well as e.g. in libgcc.

Thus, I think we should document what actually happens, otherwise
users could assume that
gcc -march=lujiazui predefines __AVX__ and __F16C__, which it doesn't.

2024-06-04  Jakub Jelinek  

* doc/invoke.texi (lujiazui): Clarify that while the CPUs do support
AVX and F16C, -march=lujiazui actually doesn't enable those.

(cherry picked from commit 09b4ab53155ea16e1fb12c2afcd9b6fe29a31c74)

Diff:
---
 gcc/doc/invoke.texi | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9456ced468a..a916d618960 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -34732,8 +34732,10 @@ instruction set support.
 
 @item lujiazui
 ZHAOXIN lujiazui CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,
-SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
-ABM, BMI, BMI2, F16C, FXSR, RDSEED instruction set support.
+SSE4.2, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
+ABM, BMI, BMI2, FXSR, RDSEED instruction set support.  While the CPUs
+do support AVX and F16C, these aren't enabled by @code{-march=lujiazui}
+for performance reasons.
 
 @item yongfeng
 ZHAOXIN yongfeng CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,


[gcc r14-10280] fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:a88e13bd7e0f50011e7f7f6e05c6f5e2a031143c

commit r14-10280-ga88e13bd7e0f50011e7f7f6e05c6f5e2a031143c
Author: Jakub Jelinek 
Date:   Tue Jun 4 15:49:41 2024 +0200

fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

The function currently incorrectly assumes all the __builtin_clz* and .CLZ
calls have non-negative result.  That is the case of the former which is UB
on zero and has [0, prec-1] return value otherwise, and is the case of the
single argument .CLZ as well (again, UB on zero), but for two argument
.CLZ is the case only if the second argument is also nonnegative (or if we
know the argument can't be zero, but let's do that just in the ranger IMHO).

The following patch does that.

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p) :
If arg1 is non-NULL, RECURSE on it, otherwise return true.

* gcc.dg/bitint-106.c: New test.

(cherry picked from commit b82a816000791e7a286c7836b3a473ec0e2a577b)

Diff:
---
 gcc/fold-const.cc |  6 +-
 gcc/testsuite/gcc.dg/bitint-106.c | 29 +
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 7b268964acc..f496b3436df 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -15241,7 +15241,6 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
 CASE_CFN_FFS:
 CASE_CFN_PARITY:
 CASE_CFN_POPCOUNT:
-CASE_CFN_CLZ:
 CASE_CFN_CLRSB:
 case CFN_BUILT_IN_BSWAP16:
 case CFN_BUILT_IN_BSWAP32:
@@ -15250,6 +15249,11 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
   /* Always true.  */
   return true;
 
+CASE_CFN_CLZ:
+  if (arg1)
+   return RECURSE (arg1);
+  return true;
+
 CASE_CFN_SQRT:
 CASE_CFN_SQRT_FN:
   /* sqrt(-0.0) is -0.0.  */
diff --git a/gcc/testsuite/gcc.dg/bitint-106.c 
b/gcc/testsuite/gcc.dg/bitint-106.c
new file mode 100644
index 000..a36e8836690
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/bitint-106.c
@@ -0,0 +1,29 @@
+/* PR tree-optimization/115337 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-O2" } */
+
+#if __BITINT_MAXWIDTH__ >= 129
+#define N 128
+#else
+#define N 63
+#endif
+
+_BitInt (N) g;
+int c;
+
+void
+foo (unsigned _BitInt (N + 1) z, _BitInt (N) *ret)
+{
+  c = __builtin_stdc_first_leading_one (z << N);
+  _BitInt (N) y = *(_BitInt (N) *) __builtin_memset (&g, c, 5);
+  *ret = y;
+}
+
+int
+main ()
+{
+  _BitInt (N) x;
+  foo (0, &x);
+  if (c || g || x)
+__builtin_abort ();
+}


[gcc r15-1015] libstdc++: Only define std::span::at for C++26 [PR115335]

2024-06-04 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:2197814011eec75022aa8550f10621409b69d4a1

commit r15-1015-g2197814011eec75022aa8550f10621409b69d4a1
Author: Jonathan Wakely 
Date:   Tue Jun 4 15:06:44 2024 +0100

libstdc++: Only define std::span::at for C++26 [PR115335]

In r14-5689-g1fa85dcf656e2f I added std::span::at and made the correct
changes to the __cpp_lib_span macro (with tests for the correct value in
C++20/23/26). But I didn't make the declaration of std::span::at
actually depend on the macro, so it was defined for C++20 and C++23, not
only for C++26. This fixes that oversight.

libstdc++-v3/ChangeLog:

PR libstdc++/115335
* include/std/span (span::at): Guard with feature test macro.

Diff:
---
 libstdc++-v3/include/std/span | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index 43e9cf82a54..00fc5279152 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -287,6 +287,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return *(this->_M_ptr + __idx);
   }
 
+#if __cpp_lib_span >= 202311L // >= C++26
   [[nodiscard]]
   constexpr reference
   at(size_type __idx) const
@@ -296,6 +297,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   "of size %zu"), __idx, this->size());
return *(this->_M_ptr + __idx);
   }
+#endif
 
   [[nodiscard]]
   constexpr pointer


[gcc r14-10281] libstdc++: Only define std::span::at for C++26 [PR115335]

2024-06-04 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:c6e6258ea43299399074f8d5f48697b5bc26064e

commit r14-10281-gc6e6258ea43299399074f8d5f48697b5bc26064e
Author: Jonathan Wakely 
Date:   Tue Jun 4 15:06:44 2024 +0100

libstdc++: Only define std::span::at for C++26 [PR115335]

In r14-5689-g1fa85dcf656e2f I added std::span::at and made the correct
changes to the __cpp_lib_span macro (with tests for the correct value in
C++20/23/26). But I didn't make the declaration of std::span::at
actually depend on the macro, so it was defined for C++20 and C++23, not
only for C++26. This fixes that oversight.

libstdc++-v3/ChangeLog:

PR libstdc++/115335
* include/std/span (span::at): Guard with feature test macro.

(cherry picked from commit 2197814011eec75022aa8550f10621409b69d4a1)

Diff:
---
 libstdc++-v3/include/std/span | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index 43e9cf82a54..00fc5279152 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -287,6 +287,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return *(this->_M_ptr + __idx);
   }
 
+#if __cpp_lib_span >= 202311L // >= C++26
   [[nodiscard]]
   constexpr reference
   at(size_type __idx) const
@@ -296,6 +297,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   "of size %zu"), __idx, this->size());
return *(this->_M_ptr + __idx);
   }
+#endif
 
   [[nodiscard]]
   constexpr pointer


[gcc r15-1016] Fix PR c++/111106: missing ; causes internal compiler error

2024-06-04 Thread Simon Martin via Gcc-cvs
https://gcc.gnu.org/g:cfbd8735359d84a2d716549415eac70e885167bf

commit r15-1016-gcfbd8735359d84a2d716549415eac70e885167bf
Author: Simon Martin 
Date:   Fri May 24 17:00:17 2024 +0200

Fix PR c++/06: missing ; causes internal compiler error

We currently fail upon the following because an assert in dependent_type_p
fails for f's parameter

=== cut here ===
consteval int id (int i) { return i; }
constexpr int
f (auto i) requires requires { id (i) } { return i; }
void g () { f (42); }
=== cut here ===

This patch fixes this by relaxing the assert to pass during error recovery.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/06

gcc/cp/ChangeLog:

* pt.cc (dependent_type_p): Don't fail assert during error recovery.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval37.C: New test.

Diff:
---
 gcc/cp/pt.cc |  3 ++-
 gcc/testsuite/g++.dg/cpp2a/consteval37.C | 16 
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index dfce1b3c359..edb94a000ea 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -28019,7 +28019,8 @@ dependent_type_p (tree type)
   /* If we are not processing a template, then nobody should be
 providing us with a dependent type.  */
   gcc_assert (type);
-  gcc_assert (TREE_CODE (type) != TEMPLATE_TYPE_PARM || is_auto (type));
+  gcc_assert (TREE_CODE (type) != TEMPLATE_TYPE_PARM || is_auto (type)
+ || seen_error());
   return false;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/consteval37.C 
b/gcc/testsuite/g++.dg/cpp2a/consteval37.C
new file mode 100644
index 000..519d83d9bf8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/consteval37.C
@@ -0,0 +1,16 @@
+// PR c++/06
+// { dg-do compile { target c++20 } }
+
+consteval int id (int i) { return i; }
+
+constexpr int f (auto i)
+  requires requires { id (i) } // { dg-error "expected" }
+{
+  return i;
+}
+
+void g () {
+  f (42);
+}
+
+// { dg-excess-errors "" }


[gcc r15-1017] Add missing space after seen_error in gcc/cp/pt.cc

2024-06-04 Thread Simon Martin via Gcc-cvs
https://gcc.gnu.org/g:54e5cbcd82e36f5aa8205b56880821eea25701ae

commit r15-1017-g54e5cbcd82e36f5aa8205b56880821eea25701ae
Author: Simon Martin 
Date:   Tue Jun 4 17:27:25 2024 +0200

Add missing space after seen_error in gcc/cp/pt.cc

I realized that I committed a change with a missing space after seen_error.
This fixes it, as well as another occurrence in the same file.

Apologies for the mistake - I'll commit this as obvious.

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr): Add missing space after seen_error.
(dependent_type_p): Likewise.

Diff:
---
 gcc/cp/pt.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index edb94a000ea..8cbcf7cdf7a 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -20918,7 +20918,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
   be using lambdas anyway, so it's ok to be
   stricter.  Be strict with C++20 template-id ADL too.
   And be strict if we're already failing anyway.  */
-   bool strict = in_lambda || template_id_p || seen_error();
+   bool strict = in_lambda || template_id_p || seen_error ();
bool diag = true;
if (strict)
  error_at (cp_expr_loc_or_input_loc (t),
@@ -28020,7 +28020,7 @@ dependent_type_p (tree type)
 providing us with a dependent type.  */
   gcc_assert (type);
   gcc_assert (TREE_CODE (type) != TEMPLATE_TYPE_PARM || is_auto (type)
- || seen_error());
+ || seen_error ());
   return false;
 }


[gcc r15-1018] Fortran: fix ALLOCATE with SOURCE=, zero-length character [PR83865]

2024-06-04 Thread Harald Anlauf via Gcc-cvs
https://gcc.gnu.org/g:7f21aee0d4ef95eee7d9f7f42e9a056715836648

commit r15-1018-g7f21aee0d4ef95eee7d9f7f42e9a056715836648
Author: Harald Anlauf 
Date:   Mon Jun 3 22:02:06 2024 +0200

Fortran: fix ALLOCATE with SOURCE=, zero-length character [PR83865]

gcc/fortran/ChangeLog:

PR fortran/83865
* trans-stmt.cc (gfc_trans_allocate): Restrict special case for
source-expression with zero-length character to rank 0, so that
the array shape is not discarded.

gcc/testsuite/ChangeLog:

PR fortran/83865
* gfortran.dg/allocate_with_source_32.f90: New test.

Diff:
---
 gcc/fortran/trans-stmt.cc  |  3 +-
 .../gfortran.dg/allocate_with_source_32.f90| 33 ++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-stmt.cc b/gcc/fortran/trans-stmt.cc
index 9b497d6bdc6..93b633e212e 100644
--- a/gcc/fortran/trans-stmt.cc
+++ b/gcc/fortran/trans-stmt.cc
@@ -6449,8 +6449,9 @@ gfc_trans_allocate (gfc_code * code, gfc_omp_namelist 
*omp_allocate)
   else
gfc_add_block_to_block (&post, &se.post);
 
-  /* Special case when string in expr3 is zero.  */
+  /* Special case when string in expr3 is scalar and has length zero.  */
   if (code->expr3->ts.type == BT_CHARACTER
+ && code->expr3->rank == 0
  && integer_zerop (se.string_length))
{
  gfc_init_se (&se, NULL);
diff --git a/gcc/testsuite/gfortran.dg/allocate_with_source_32.f90 
b/gcc/testsuite/gfortran.dg/allocate_with_source_32.f90
new file mode 100644
index 000..4a9bd46da4d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/allocate_with_source_32.f90
@@ -0,0 +1,33 @@
+! { dg-do run }
+!
+! PR fortran/83865
+!
+! Test ALLOCATE with SOURCE= of deferred length character, where
+! the source-expression is an array of character with length 0.
+
+program p
+  implicit none
+  character(:), allocatable :: z(:)
+  character(1) :: cc(4) = ""
+  allocate (z, source=[''])
+  if (len (z) /= 0 .or. size (z) /= 1) stop 1
+  deallocate (z)
+  allocate (z, source=['',''])
+  if (len (z) /= 0 .or. size (z) /= 2) stop 2
+  deallocate (z)
+  allocate (z, source=[ character(0) :: 'a','b','c'])
+  if (len (z) /= 0 .or. size (z) /= 3) stop 3
+  deallocate (z)
+  allocate (z, source=[ character(0) :: cc ])
+  if (len (z) /= 0 .or. size (z) /= 4) stop 4
+  deallocate (z)
+  associate (x => f())
+if (len (x) /= 0 .or. size (x) /= 1) stop 5
+if (x(1) /= '') stop 6
+  end associate
+contains
+  function f() result(z)
+character(:), allocatable :: z(:)
+allocate (z, source=[''])
+  end function f
+end


[gcc r15-1019] c++: Add testcase for PR103338

2024-06-04 Thread Simon Martin via Gcc-cvs
https://gcc.gnu.org/g:126ccf8ffc46865accec22a2789f09abd98c1d85

commit r15-1019-g126ccf8ffc46865accec22a2789f09abd98c1d85
Author: Simon Martin 
Date:   Tue Jun 4 11:59:31 2024 +0200

c++: Add testcase for PR103338

The case in that PR used to ICE until commit f04dc89. This patch simply adds
the case to the testsuite.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/103388

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash73.C: New test.

Diff:
---
 gcc/testsuite/g++.dg/parse/crash73.C | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/gcc/testsuite/g++.dg/parse/crash73.C 
b/gcc/testsuite/g++.dg/parse/crash73.C
new file mode 100644
index 000..97b8b5e8325
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/crash73.C
@@ -0,0 +1,19 @@
+// PR c++/103338
+// { dg-do compile { target c++11 } }
+
+template
+struct zip_view {
+  struct Iterator;
+};
+
+template
+struct zip_transform_view;
+
+template
+struct zip_view::Iterator { // { dg-error "no class template" }
+  template
+  template
+  friend class zip_transform_view::Iterator;
+};
+
+zip_view<>::Iterator iter;


[gcc(refs/users/meissner/heads/work168-tar)] Restrict SPR to appropriate integer modes.

2024-06-04 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:8f96a3df132b456f535b8c94bb436dca44eecc39

commit 8f96a3df132b456f535b8c94bb436dca44eecc39
Author: Michael Meissner 
Date:   Tue Jun 4 13:44:01 2024 -0400

Restrict SPR to appropriate integer modes.

In preparation for the patches to add support for the TAR register, I 
restricted
the modes that special purpose registers (SPRs) could hold to be appropriate
sized scalar integers.  I have discovered occasionally when GCC has run out 
of
registers, it will use the SPRs to hold values instead of spilling them to 
the
stack.  The LR/CTR registers can hold 8/16/32-bit values and on 64-bit 
systems,
they can also hold 64-bit values.  The VRSAVE and VSCR registers can only 
hold
32-bit values.

2024-06-04  Michael Meissner  

gcc/

* config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): 
Restrict
SPR registers to only hold scalar integer modes of an appropriate 
size.
* config/rs6000/rs6000.md (movcc_): Remove alternatives that 
move
values to/from the SPRs.
(movsf_hardfloat): Likewise.
(movsd_hardfloat): Likewise.
(mov_softfloat): Likewise.
(mov_softfloat32): Likewise.
(mov_hardfloat64): Likewise.
(*mov_softfloat64): Likewise.

Diff:
---
 gcc/config/rs6000/rs6000.cc |  29 ++-
 gcc/config/rs6000/rs6000.md | 117 +++-
 2 files changed, 77 insertions(+), 69 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index c5c4191127e..c2f8096beec 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1851,9 +1851,13 @@ static int
 rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
 {
   int last_regno = regno + rs6000_hard_regno_nregs[mode][regno] - 1;
+  bool orig_complex_p = false;
 
   if (COMPLEX_MODE_P (mode))
-mode = GET_MODE_INNER (mode);
+{
+  mode = GET_MODE_INNER (mode);
+  orig_complex_p = true;
+}
 
   /* Vector pair modes need even/odd VSX register pairs.  Only allow vector
  registers.  */
@@ -1935,6 +1939,29 @@ rs6000_hard_regno_mode_ok_uncached (int regno, 
machine_mode mode)
   if (CA_REGNO_P (regno))
 return mode == Pmode || mode == SImode;
 
+  /* Restrict SPR registers to only hold an appropriate sized integer mode.  In
+ partciular, do not allow condition codes, complex values, or floating
+ point.  VRSAVE and VSCR can only hold 32-bit values.  */
+  switch (regno)
+{
+case VRSAVE_REGNO:
+case VSCR_REGNO:
+case LR_REGNO:
+case CTR_REGNO:
+  {
+   unsigned reg_size = ((regno == VRSAVE_REGNO || regno == VSCR_REGNO)
+? 4
+: UNITS_PER_WORD);
+
+   return (!orig_complex_p
+   && GET_MODE_SIZE (mode) <= reg_size
+   && SCALAR_INT_MODE_P (mode));
+  }
+
+default:
+  break;
+}
+
   /* AltiVec only in AldyVec registers.  */
   if (ALTIVEC_REGNO_P (regno))
 return (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44d38df56f1..e5d3cb286cb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -8119,9 +8119,9 @@
 
 (define_insn "*movcc_"
   [(set (match_operand:CC_any 0 "nonimmediate_operand"
-   "=y,x,?y,y,r,r,r,r, r,*c*l,r,m")
+   "=y,x,?y,y,r,r,r,r,r,m")
(match_operand:CC_any 1 "general_operand"
-   " y,r, r,O,x,y,r,I,*h,   r,m,r"))]
+   " y,r, r,O,x,y,r,I,m,r"))]
   "register_operand (operands[0], mode)
|| register_operand (operands[1], mode)"
   "@
@@ -8133,8 +8133,6 @@
mfcr %0%Q1\;rlwinm %0,%0,%f1,0xf000
mr %0,%1
li %0,%1
-   mf%1 %0
-   mt%0 %1
lwz%U1%X1 %0,%1
stw%U0%X0 %1,%0"
   [(set_attr_alternative "type"
@@ -8148,11 +8146,9 @@
(const_string "mfcrf") (const_string "mfcr"))
   (const_string "integer")
   (const_string "integer")
-  (const_string "mfjmpr")
-  (const_string "mtjmpr")
   (const_string "load")
   (const_string "store")])
-   (set_attr "length" "*,*,12,*,*,8,*,*,*,*,*,*")])
+   (set_attr "length" "*,*,12,*,*,8,*,*,*,*")])
 
 ;; For floating-point, we normally deal with the floating-point registers
 ;; unless -msoft-float is used.  The sole exception is that parameter passing
@@ -8203,17 +8199,17 @@
 ;;
 ;; LWZ  LFSLXSSP   LXSSPX STFS   STXSSP
 ;; STXSSPX  STWXXLXOR  LI FMRXSCPSGNDP
-;; MR   MT  MF   NOPXXSPLTIDP
+;; MR   XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 "=!r,   f, v,  wa,m, wY,
  Z, m, wa, !r, 

[gcc(refs/users/meissner/heads/work168-tar)] Add support for the TAR register.

2024-06-04 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:f5177d70ff8945e98334be05bc706e133ec83cd3

commit f5177d70ff8945e98334be05bc706e133ec83cd3
Author: Michael Meissner 
Date:   Tue Jun 4 14:25:29 2024 -0400

Add support for the TAR register.

2024-06-04  Michael Meissner  

gcc/

* config/rs6000/constraints.md (h constraint): Add TAR register to 
the
documentation.
(wt constraint): New constraint.
* config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Document 
that we
do not explicitly add -mtar for power9.
(OTHER_POWER10_MASKS): Add -mtar.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.cc (rs6000_reg_names): Add TAR register 
support.
(alt_reg_names): Likewise.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_debug_reg_global): Print the register class that wt maps 
too.
(rs6000_init_hard_regno_mode_ok): Add TAR register support.
(rs6000_option_override_internal): Restrict -mtar to power9 and 
above.
(rs6000_conditional_register_usage): Add TAR register support.
(print_operand): Likewise.
(rs6000_debugger_regno): Likewise.
(rs6000_opt_masks): Add support for -mtar.
* config/rs6000/rs6000.h (FIRST_PSEUDO_REGISTER): Add TAR register
support.
(FIXED_REGISTERS): Likewise.
(CALL_REALLY_USED_REGISTERS): Likewise.
(REG_ALLOC_ORDER): Likewise.
(enum reg_class): Likewise.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(enum r6000_reg_class_enum): Add support for the wt constraint.
* config/rs6000/rs6000.md (TAR_REGNO): New constant.
(mov_internal): Add TAR register support.
(call_indirect_nonlocal_sysv): Likewise.
(call_value_indirect_nonlocal_sysv): Likewise.
(call_indirect_aix): Likewise.
(call_value_indirect_aix): Likewise.
(call_indirect_elfv2): Likewise.
(call_indirect_pcrel): Likewise.
(call_value_indirect_elfv2): Likewise.
(call_value_indirect_pcrel): Likewise.
(*sibcall_indirect_nonlocal_sysv): Likewise.
(sibcall_value_indirect_nonlocal_sysv): Likewise.
(indirect_jump): Likewise.
(@indirect_jump_nospec): Likewise.
(@tablejump_insn_normal): Likewise.
(@tablejump_insn_nospec): Likewise.
* config/rs6000/rs6000.opt (-mtar): New option.

gcc/testsuite/

* gcc.target/powerpc/ppc-switch-1.c: Update test for the TAR 
register.
* gcc.target/powerpc/pr51513.c: Likewise.
* gcc.target/powerpc/safe-indirect-jump-2.c: Likewise.
* gcc.target/powerpc/safe-indirect-jump-3.c: Likewise.
* gcc.target/powerpc/tar-register.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md   |  5 ++-
 gcc/config/rs6000/rs6000-cpus.def  |  7 ++--
 gcc/config/rs6000/rs6000.cc| 42 ++
 gcc/config/rs6000/rs6000.h | 31 +---
 gcc/config/rs6000/rs6000.md| 35 +-
 gcc/config/rs6000/rs6000.opt   |  4 +++
 gcc/testsuite/gcc.target/powerpc/ppc-switch-1.c|  4 +--
 gcc/testsuite/gcc.target/powerpc/pr51513.c |  4 +--
 .../gcc.target/powerpc/safe-indirect-jump-2.c  |  2 +-
 .../gcc.target/powerpc/safe-indirect-jump-3.c  |  2 +-
 gcc/testsuite/gcc.target/powerpc/tar-register.c| 34 ++
 11 files changed, 126 insertions(+), 44 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 369a7b75042..14f0465d7ae 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -57,7 +57,7 @@
   "@internal A compatibility alias for @code{wa}.")
 
 (define_register_constraint "h" "SPECIAL_REGS"
-  "@internal A special register (@code{vrsave}, @code{ctr}, or @code{lr}).")
+  "@internal A special register (@code{vrsave}, @code{ctr}, @code{lr} or 
@code{tar}).")
 
 (define_register_constraint "c" "CTR_REGS"
   "The count register, @code{ctr}.")
@@ -91,6 +91,9 @@
   "@internal Like @code{r}, if @option{-mpowerpc64} is used; otherwise,
@code{NO_REGS}.")
 
+(define_register_constraint "wt" "rs6000_constraints[RS6000_CONSTRAINT_wt]"
+  "The tar register, @code{tar}.")
+
 (define_register_constraint "wx" "rs6000_constraints[RS6000_CONSTRAINT_wx]"
   "@internal Like @code{d}, if @option{-mpowerpc-gfxopt} is used; otherwise,
@code{NO_REGS}.")
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index d625dbeb91f..37366d5e056 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -59,7 +59,8 @@
 | OPTION

[gcc(refs/users/meissner/heads/work168-tar)] Update ChangeLog.*

2024-06-04 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:42f6f1cdec43877fd0532acd297deba0aec5c3c2

commit 42f6f1cdec43877fd0532acd297deba0aec5c3c2
Author: Michael Meissner 
Date:   Tue Jun 4 14:29:19 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.tar | 248 +-
 1 file changed, 247 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.tar b/gcc/ChangeLog.tar
index c512209738a..a69b0f59eac 100644
--- a/gcc/ChangeLog.tar
+++ b/gcc/ChangeLog.tar
@@ -1,6 +1,252 @@
+ Branch work168-tar, patch #201 
+
+Add support for the TAR register.
+
+2024-06-04  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/constraints.md (h constraint): Add TAR register to the
+   documentation.
+   (wt constraint): New constraint.
+   * config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Document that we
+   do not explicitly add -mtar for power9.
+   (OTHER_POWER10_MASKS): Add -mtar.
+   (POWERPC_MASKS): Likewise.
+   * config/rs6000/rs6000.cc (rs6000_reg_names): Add TAR register support.
+   (alt_reg_names): Likewise.
+   (rs6000_hard_regno_mode_ok_uncached): Likewise.
+   (rs6000_debug_reg_global): Print the register class that wt maps too.
+   (rs6000_init_hard_regno_mode_ok): Add TAR register support.
+   (rs6000_option_override_internal): Restrict -mtar to power9 and above.
+   (rs6000_conditional_register_usage): Add TAR register support.
+   (print_operand): Likewise.
+   (rs6000_debugger_regno): Likewise.
+   (rs6000_opt_masks): Add support for -mtar.
+   * config/rs6000/rs6000.h (FIRST_PSEUDO_REGISTER): Add TAR register
+   support.
+   (FIXED_REGISTERS): Likewise.
+   (CALL_REALLY_USED_REGISTERS): Likewise.
+   (REG_ALLOC_ORDER): Likewise.
+   (enum reg_class): Likewise.
+   (REG_CLASS_NAMES): Likewise.
+   (REG_CLASS_CONTENTS): Likewise.
+   (enum r6000_reg_class_enum): Add support for the wt constraint.
+   * config/rs6000/rs6000.md (TAR_REGNO): New constant.
+   (mov_internal): Add TAR register support.
+   (call_indirect_nonlocal_sysv): Likewise.
+   (call_value_indirect_nonlocal_sysv): Likewise.
+   (call_indirect_aix): Likewise.
+   (call_value_indirect_aix): Likewise.
+   (call_indirect_elfv2): Likewise.
+   (call_indirect_pcrel): Likewise.
+   (call_value_indirect_elfv2): Likewise.
+   (call_value_indirect_pcrel): Likewise.
+   (*sibcall_indirect_nonlocal_sysv): Likewise.
+   (sibcall_value_indirect_nonlocal_sysv): Likewise.
+   (indirect_jump): Likewise.
+   (@indirect_jump_nospec): Likewise.
+   (@tablejump_insn_normal): Likewise.
+   (@tablejump_insn_nospec): Likewise.
+   * config/rs6000/rs6000.opt (-mtar): New option.
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/ppc-switch-1.c: Update test for the TAR register.
+   * gcc.target/powerpc/pr51513.c: Likewise.
+   * gcc.target/powerpc/safe-indirect-jump-2.c: Likewise.
+   * gcc.target/powerpc/safe-indirect-jump-3.c: Likewise.
+   * gcc.target/powerpc/tar-register.c: New test.
+
+ Branch work168-tar, patch #200 
+
+Restrict SPR to appropriate integer modes.
+
+In preparation for the patches to add support for the TAR register, I 
restricted
+the modes that special purpose registers (SPRs) could hold to be appropriate
+sized scalar integers.  I have discovered occasionally when GCC has run out of
+registers, it will use the SPRs to hold values instead of spilling them to the
+stack.  The LR/CTR registers can hold 8/16/32-bit values and on 64-bit systems,
+they can also hold 64-bit values.  The VRSAVE and VSCR registers can only hold
+32-bit values.
+
+2024-06-04  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): Restrict
+   SPR registers to only hold scalar integer modes of an appropriate size.
+   * config/rs6000/rs6000.md (movcc_): Remove alternatives that move
+   values to/from the SPRs.
+   (movsf_hardfloat): Likewise.
+   (movsd_hardfloat): Likewise.
+   (mov_softfloat): Likewise.
+   (mov_softfloat32): Likewise.
+   (mov_hardfloat64): Likewise.
+   (*mov_softfloat64): Likewise.
+
+ Branch work168-tar, patch #11 from work168 branch 

+
+Add -mcpu=future tuning support.
+
+This patch makes -mtune=future use the same tuning decision as -mtune=power11.
+
+2024-06-03  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/power10.md (all reservations): Add future as an
+   alterntive to power10 and power11.
+
+ Branch work168-tar, patch #10 from work168 branch 

+
+Add -mcpu=future support.
+
+This patch adds the future option to the -mcpu= and -mtune= switches.
+
+This patch treats the future like a power11 in terms of costs and reassociation
+width.
+
+This patch issues a ".machine future" to the 

[gcc r15-1021] RISC-V: Add Zfbfmin extension

2024-06-04 Thread xiao via Gcc-cvs
https://gcc.gnu.org/g:4638e508aa814d4aa2e204c3ab041c6a56aad2bd

commit r15-1021-g4638e508aa814d4aa2e204c3ab041c6a56aad2bd
Author: Xiao Zeng 
Date:   Wed May 15 13:56:42 2024 +0800

RISC-V: Add Zfbfmin extension

1 In the previous patch, the libcall for BF16 was implemented:



2 Riscv provides Zfbfmin extension, which completes the "Scalar BF16 
Converts":



3 Implemented replacing libcall with Zfbfmin extension instruction.

4 Reused previous testcases in:


gcc/ChangeLog:

* config/riscv/iterators.md: Add mode_iterator between
floating-point modes and BFmode.
* config/riscv/riscv.cc (riscv_output_move): Handle BFmode move
for zfbfmin.
* config/riscv/riscv.md (truncbf2): New pattern for BFmode.
(extendbfsf2): Dotto.
(*movhf_hardfloat): Add BFmode.
(*mov_hardfloat): Dotto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfbfmin-bf16_arithmetic.c: New test.
* gcc.target/riscv/zfbfmin-bf16_comparison.c: New test.
* gcc.target/riscv/zfbfmin-bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/zfbfmin-bf16_integer_libcall_convert.c: New test.

Diff:
---
 gcc/config/riscv/iterators.md  |  6 +-
 gcc/config/riscv/riscv.cc  |  4 +-
 gcc/config/riscv/riscv.md  | 49 +---
 .../gcc.target/riscv/zfbfmin-bf16_arithmetic.c | 35 
 .../gcc.target/riscv/zfbfmin-bf16_comparison.c | 33 +++
 .../riscv/zfbfmin-bf16_float_libcall_convert.c | 45 +++
 .../riscv/zfbfmin-bf16_integer_libcall_convert.c   | 66 ++
 7 files changed, 228 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 3c139bc2e30..1e37e843023 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -78,9 +78,13 @@
 ;; Iterator for floating-point modes that can be loaded into X registers.
 (define_mode_iterator SOFTF [SF (DF "TARGET_64BIT") (HF "TARGET_ZFHMIN")])
 
-;; Iterator for floating-point modes of BF16
+;; Iterator for floating-point modes of BF16.
 (define_mode_iterator HFBF [HF BF])
 
+;; Conversion between floating-point modes and BF16.
+;; SF to BF16 have hardware instructions.
+(define_mode_iterator FBF [HF DF TF])
+
 ;; ---
 ;; Mode attributes
 ;; ---
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 10af38a5a81..c5c4c777349 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4310,7 +4310,7 @@ riscv_output_move (rtx dest, rtx src)
switch (width)
  {
  case 2:
-   if (TARGET_ZFHMIN)
+   if (TARGET_ZFHMIN || TARGET_ZFBFMIN)
  return "fmv.x.h\t%0,%1";
/* Using fmv.x.s + sign-extend to emulate fmv.x.h.  */
return "fmv.x.s\t%0,%1;slli\t%0,%0,16;srai\t%0,%0,16";
@@ -4366,7 +4366,7 @@ riscv_output_move (rtx dest, rtx src)
switch (width)
  {
  case 2:
-   if (TARGET_ZFHMIN)
+   if (TARGET_ZFHMIN || TARGET_ZFBFMIN)
  return "fmv.h.x\t%0,%z1";
/* High 16 bits should be all-1, otherwise HW will treated
   as a n-bit canonical NaN, but isn't matter for softfloat.  */
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 25d341ec987..e57bfcf616a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1763,6 +1763,31 @@
   [(set_attr "type" "fcvt")
(set_attr "mode" "HF")])
 
+(define_insn "truncsfbf2"
+  [(set (match_operand:BF0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:SF 1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  "fcvt.bf16.s\t%0,%1"
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
+;; The conversion of HF/DF/TF to BF needs to be done with SF if there is a
+;; chance to generate at least one instruction, otherwise just using
+;; libfunc __trunc[h|d|t]fbf2.
+(define_expand "truncbf2"
+  [(set (match_operand:BF  0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:FBF   1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, mode, operands[1], 0), 0);
+DONE;
+  }
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
 ;;
 ;;  
 ;;
@@ -1907,6 +1932,15 @@
   [(set_attr "type" "fcvt")
(set_att

[gcc r14-10283] testsuite: i386: Require ifunc support in gcc.target/i386/avx10_1-25.c etc.

2024-06-04 Thread Haochen Jiang via Gcc-cvs
https://gcc.gnu.org/g:e11a42b8c7ac32f8a1e307f99719a0f9c63813e8

commit r14-10283-ge11a42b8c7ac32f8a1e307f99719a0f9c63813e8
Author: Rainer Orth 
Date:   Tue Jun 4 13:33:46 2024 +0200

testsuite: i386: Require ifunc support in gcc.target/i386/avx10_1-25.c etc.

Two new AVX10.1 tests FAIL on Solaris/x86:

FAIL: gcc.target/i386/avx10_1-25.c (test for excess errors)
FAIL: gcc.target/i386/avx10_1-26.c (test for excess errors)

Excess errors:

/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.target/i386/avx10_1-25.c:6:9: 
error: the call requires 'ifunc', which is not supported by this target

Fixed by requiring ifunc support.

Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

2024-06-04  Rainer Orth  

gcc/testsuite:
* gcc.target/i386/avx10_1-25.c: Require ifunc support.
* gcc.target/i386/avx10_1-26.c: Likewise.

Diff:
---
 gcc/testsuite/gcc.target/i386/avx10_1-25.c | 1 +
 gcc/testsuite/gcc.target/i386/avx10_1-26.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-25.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
index 73f1b724560..5bd2b88fb08 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_1-25.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx" } */
+/* { dg-require-ifunc "" } */
 
 #include 
 __attribute__((target_clones ("default","avx10.1-256")))
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-26.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
index 514ab57a406..cf8c976e21f 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_1-26.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f" } */
+/* { dg-require-ifunc "" } */
 
 #include 
 __attribute__((target_clones ("default","avx10.1-512")))


[gcc r15-1022] Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.

2024-06-04 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:b05288d1f1e4b632eddf8830b4369d4659f6c2ff

commit r15-1022-gb05288d1f1e4b632eddf8830b4369d4659f6c2ff
Author: liuhongt 
Date:   Tue May 21 16:57:17 2024 +0800

Don't simplify NAN/INF or out-of-range constant for FIX/UNSIGNED_FIX.

According to IEEE standard, for conversions from floating point to
integer. When a NaN or infinite operand cannot be represented in the
destination format and this cannot otherwise be indicated, the invalid
operation exception shall be signaled. When a numeric operand would
convert to an integer outside the range of the destination format, the
invalid operation exception shall be signaled if this situation cannot
otherwise be indicated.

The patch prevent simplication of the conversion from floating point
to integer for NAN/INF/out-of-range constant when flag_trapping_math.

gcc/ChangeLog:

PR rtl-optimization/100927
PR rtl-optimization/115161
PR rtl-optimization/115115
* simplify-rtx.cc (simplify_const_unary_operation): Prevent
simplication of FIX/UNSIGNED_FIX for NAN/INF/out-of-range
constant when flag_trapping_math.
* fold-const.cc (fold_convert_const_int_from_real): Don't fold
for overflow value when_trapping_math.

gcc/testsuite/ChangeLog:

* gcc.dg/pr100927.c: New test.
* c-c++-common/Wconversion-1.c: Add -fno-trapping-math.
* c-c++-common/dfp/convert-int-saturate.c: Ditto.
* g++.dg/ubsan/pr63956.C: Ditto.
* g++.dg/warn/Wconversion-real-integer.C: Ditto.
* gcc.c-torture/execute/20031003-1.c: Ditto.
* gcc.dg/Wconversion-complex-c99.c: Ditto.
* gcc.dg/Wconversion-real-integer.c: Ditto.
* gcc.dg/c90-const-expr-11.c: Ditto.
* gcc.dg/overflow-warn-8.c: Ditto.

Diff:
---
 gcc/fold-const.cc  | 13 -
 gcc/simplify-rtx.cc| 23 +---
 gcc/testsuite/c-c++-common/Wconversion-1.c |  2 +-
 .../c-c++-common/dfp/convert-int-saturate.c|  1 +
 gcc/testsuite/g++.dg/ubsan/pr63956.C   |  7 -
 .../g++.dg/warn/Wconversion-real-integer.C |  2 +-
 gcc/testsuite/gcc.c-torture/execute/20031003-1.c   |  2 ++
 gcc/testsuite/gcc.dg/Wconversion-complex-c99.c |  2 +-
 gcc/testsuite/gcc.dg/Wconversion-real-integer.c|  2 +-
 gcc/testsuite/gcc.dg/c90-const-expr-11.c   |  2 +-
 gcc/testsuite/gcc.dg/overflow-warn-8.c |  1 +
 gcc/testsuite/gcc.dg/pr100927.c| 31 ++
 12 files changed, 77 insertions(+), 11 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 92b048c307e..710d697c021 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -2246,7 +2246,18 @@ fold_convert_const_int_from_real (enum tree_code code, 
tree type, const_tree arg
   if (! overflow)
 val = real_to_integer (&r, &overflow, TYPE_PRECISION (type));
 
-  t = force_fit_type (type, val, -1, overflow | TREE_OVERFLOW (arg1));
+  /* According to IEEE standard, for conversions from floating point to
+ integer. When a NaN or infinite operand cannot be represented in the
+ destination format and this cannot otherwise be indicated, the invalid
+ operation exception shall be signaled. When a numeric operand would
+ convert to an integer outside the range of the destination format, the
+ invalid operation exception shall be signaled if this situation cannot
+ otherwise be indicated.  */
+  if (!flag_trapping_math || !overflow)
+t = force_fit_type (type, val, -1, overflow | TREE_OVERFLOW (arg1));
+  else
+t = NULL_TREE;
+
   return t;
 }
 
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 5caf1dfd957..f6b4d73b593 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2256,14 +2256,25 @@ simplify_const_unary_operation (enum rtx_code code, 
machine_mode mode,
   switch (code)
{
case FIX:
+ /* According to IEEE standard, for conversions from floating point to
+integer. When a NaN or infinite operand cannot be represented in
+the destination format and this cannot otherwise be indicated, the
+invalid operation exception shall be signaled. When a numeric
+operand would convert to an integer outside the range of the
+destination format, the invalid operation exception shall be
+signaled if this situation cannot otherwise be indicated.  */
  if (REAL_VALUE_ISNAN (*x))
-   return const0_rtx;
+   return flag_trapping_math ? NULL_RTX : const0_rtx;
+
+ if (REAL_VALUE_ISINF (*x) && flag_trapping_math)
+   return NULL_RTX;
 
  /* Test against the signed upper bound.  */
  wmax = wi::max_value (width, SIGNED);
  real_from_integer (&t, VOIDmode, wmax

[gcc r15-1023] libstdc++: Update gcc.gnu.org links in FAQ to https

2024-06-04 Thread Gerald Pfeifer via Gcc-cvs
https://gcc.gnu.org/g:35e453d9e17c299d58d5d2c9f44b4b4eec9867b6

commit r15-1023-g35e453d9e17c299d58d5d2c9f44b4b4eec9867b6
Author: Gerald Pfeifer 
Date:   Wed Jun 5 07:59:47 2024 +0200

libstdc++: Update gcc.gnu.org links in FAQ to https

libstdc++-v3:
* doc/xml/faq.xml: Move gcc.gnu.org to https.
* doc/html/faq.html: Regenerate.

Diff:
---
 libstdc++-v3/doc/html/faq.html | 10 +-
 libstdc++-v3/doc/xml/faq.xml   | 10 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/doc/html/faq.html b/libstdc++-v3/doc/html/faq.html
index e84e455c4e9..dcb94ba67dc 100644
--- a/libstdc++-v3/doc/html/faq.html
+++ b/libstdc++-v3/doc/html/faq.html
@@ -268,7 +268,7 @@
 Libstdc++ comes with its own validation testsuite, which includes
 conformance testing, regression testing, ABI testing, and
 performance testing. Please consult the
-http://gcc.gnu.org/install/test.html"; 
target="_top">testing
+https://gcc.gnu.org/install/test.html"; 
target="_top">testing
 documentation for GCC and
 Testing in the 
libstdc++
 manual for more details.
@@ -458,14 +458,14 @@
  g++ -E -dM -x c++ 
/dev/null to display
  a list of predefined macros for any particular installation.
   This has been discussed on the mailing lists
- http://gcc.gnu.org/cgi-bin/htsearch?method=and&format=builtin-long&sort=score&words=_XOPEN_SOURCE+Solaris";
 target="_top">quite a bit.
+ https://gcc.gnu.org/cgi-bin/htsearch?method=and&format=builtin-long&sort=score&words=_XOPEN_SOURCE+Solaris";
 target="_top">quite a bit.
   This method is something of a wart.  We'd like to find a cleaner
  solution, but nobody yet has contributed the time.
   4.4.
   Mac OS X ctype.h is broken! How can I fix 
it?
 NoteThis answer is old and probably no longer be 
relevant.
  This was a long-standing bug in the OS X support.  Fortunately, the
- http://gcc.gnu.org/ml/gcc/2002-03/msg00817.html"; target="_top">patch
+ https://gcc.gnu.org/ml/gcc/2002-03/msg00817.html"; target="_top">patch
 was quite simple, and well-known.
   4.5.
   Threading is broken on i386?
@@ -636,7 +636,7 @@
  header),
 then you will suddenly be faced with huge numbers of ambiguity
 errors.  This was discussed on the mailing list; Nathan Myers
-http://gcc.gnu.org/ml/libstdc++/2001-01/msg00247.html"; target="_top">sums
+https://gcc.gnu.org/ml/libstdc++/2001-01/msg00247.html"; target="_top">sums
   things up here.  The collisions with vector/string iterator
 types have been fixed for 3.1.
 6.4.
@@ -729,7 +729,7 @@
 
 If you have found a bug in the library and you think you have
 a working fix, then send it in!  The main GCC site has a page
-on http://gcc.gnu.org/contribute.html"; 
target="_top">submitting
+on https://gcc.gnu.org/contribute.html"; 
target="_top">submitting
 patches that covers the procedure, but for libstdc++ you
 should also send the patch to our mailing list in addition to
 the GCC patches mailing list.  The libstdc++
diff --git a/libstdc++-v3/doc/xml/faq.xml b/libstdc++-v3/doc/xml/faq.xml
index 79edb02bec4..4888fa93ae9 100644
--- a/libstdc++-v3/doc/xml/faq.xml
+++ b/libstdc++-v3/doc/xml/faq.xml
@@ -313,7 +313,7 @@
 Libstdc++ comes with its own validation testsuite, which includes
 conformance testing, regression testing, ABI testing, and
 performance testing. Please consult the
-http://www.w3.org/1999/xlink"; 
xlink:href="http://gcc.gnu.org/install/test.html";>testing
+http://www.w3.org/1999/xlink"; 
xlink:href="https://gcc.gnu.org/install/test.html";>testing
 documentation for GCC and
 Testing in the libstdc++
 manual for more details.
@@ -583,7 +583,7 @@
  a list of predefined macros for any particular installation.
   
   This has been discussed on the mailing lists
- http://www.w3.org/1999/xlink"; 
xlink:href="http://gcc.gnu.org/cgi-bin/htsearch?method=and&format=builtin-long&sort=score&words=_XOPEN_SOURCE+Solaris";>quite
 a bit.
+ http://www.w3.org/1999/xlink"; 
xlink:href="https://gcc.gnu.org/cgi-bin/htsearch?method=and&format=builtin-long&sort=score&words=_XOPEN_SOURCE+Solaris";>quite
 a bit.
   
   This method is something of a wart.  We'd like to find a cleaner
  solution, but nobody yet has contributed the time.
@@ -604,7 +604,7 @@
   
   
  This was a long-standing bug in the OS X support.  Fortunately, the
- http://www.w3.org/1999/xlink"; 
xlink:href="http://gcc.gnu.org/ml/gcc/2002-03/msg00817.html";>patch
+ http://www.w3.org/1999/xlink"; 
xlink:href="https://gcc.gnu.org/ml/gcc/2002-03/msg00817.html";>patch
 was quite simple, and well-known.
   
 
@@ -885,7 +885,7 @@
  header),
 then you will suddenly be faced with huge numbers of ambiguity
 errors.  This was discus