Re: [PATCH] libquadmath: add quad support for trig-pi functions

2025-07-03 Thread Jakub Jelinek
On Thu, Jul 03, 2025 at 02:43:43PM +0200, Michael Matz wrote:
> Hello,
> 
> On Thu, 3 Jul 2025, Yuao Ma wrote:
> 
> > This patch adds the required function for Fortran trigonometric functions to
> > work with glibc versions prior to 2.26. It's based on glibc source commit
> > 632d895f3e5d98162f77b9c3c1da4ec19968b671.
> > 
> > I've built it successfully on my end. Documentation is also included.
> > 
> > Please take a look when you have a moment.
> 
> +__float128
> +cospiq (__float128 x)
> +{
> ...
> +  if (__builtin_islessequal (x, 0.25Q))
> +return cosq (M_PIq * x);
> 
> Isn't the whole raison d'etre for the trig-pi functions that the internal 
> argument reduction against multiples of pi becomes trivial and hence (a) 
> performant, and (b) doesn't introduce rounding artifacts?  Expressing the 
> trig-pi functions in terms of their counterparts completely defeats this 
> purpose.  The other way around would be more sensible for the cases where 
> it works, but the above doesn't seem very attractive.

glibc has
FLOAT
M_DECL_FUNC (__cospi) (FLOAT x)
{
  if (isless (M_FABS (x), M_EPSILON))
return M_LIT (1.0);
  if (__glibc_unlikely (isinf (x)))
__set_errno (EDOM);
  x = M_FABS (x - M_LIT (2.0) * M_SUF (round) (M_LIT (0.5) * x));
  if (islessequal (x, M_LIT (0.25)))
return M_SUF (__cos) (M_MLIT (M_PI) * x);
  else if (x == M_LIT (0.5))
return M_LIT (0.0);
  else if (islessequal (x, M_LIT (0.75)))
return M_SUF (__sin) (M_MLIT (M_PI) * (M_LIT (0.5) - x));
  else
return -M_SUF (__cos) (M_MLIT (M_PI) * (M_LIT (1.0) - x));
}
for this case, so if it is incorrect, has too bad precision
or there are better ways to do it, then perhaps it should be changed
on the glibc side first and then ported to libquadmath.

Jakub



[PATCH v2 0/1] contrib: add bpf-vmtest-tool to test BPF programs

2025-07-03 Thread Piyush Raj
This patch adds initial version of bpf-vmtest-tool script to test BPF programs 
on live kernel

For now, the tool is standalone, but it is intended to be integrated with the
DejaGnu testsuite to run BPF testcases in future patches.

Current Limitations:
- Only x86_64 is supported. Support for additional architectures will be added 
soon.
- When testing BPF programs with --bpf-src or --bpf-obj, only the host's root
  directory can be used as the VM root filesystem. This will also be improved
  in future updates.

Changes since v1:
- Added support for Python 3.9
- Introduced BPF_CFLAGS and other environment flags (see README)
- Added "bpf" directory prefix
- Removed dependency on the uv binary and uv-specific files
- Fixed typo in README

Thank you,
Piyush Raj

Piyush Raj (1):
  contrib: add bpf-vmtest-tool to test BPF programs

 contrib/bpf-vmtest-tool/.gitignore|  23 +++
 contrib/bpf-vmtest-tool/README|  78 
 contrib/bpf-vmtest-tool/bpf.py| 189 +++
 contrib/bpf-vmtest-tool/config.py |  18 ++
 contrib/bpf-vmtest-tool/kernel.py | 209 ++
 contrib/bpf-vmtest-tool/main.py   | 101 +++
 contrib/bpf-vmtest-tool/pyproject.toml|  36 
 contrib/bpf-vmtest-tool/tests/test_cli.py | 167 +
 contrib/bpf-vmtest-tool/utils.py  |  27 +++
 contrib/bpf-vmtest-tool/vm.py | 154 
 10 files changed, 1002 insertions(+)
 create mode 100644 contrib/bpf-vmtest-tool/.gitignore
 create mode 100644 contrib/bpf-vmtest-tool/README
 create mode 100644 contrib/bpf-vmtest-tool/bpf.py
 create mode 100644 contrib/bpf-vmtest-tool/config.py
 create mode 100644 contrib/bpf-vmtest-tool/kernel.py
 create mode 100644 contrib/bpf-vmtest-tool/main.py
 create mode 100644 contrib/bpf-vmtest-tool/pyproject.toml
 create mode 100644 contrib/bpf-vmtest-tool/tests/test_cli.py
 create mode 100644 contrib/bpf-vmtest-tool/utils.py
 create mode 100644 contrib/bpf-vmtest-tool/vm.py

-- 
2.50.0



[PATCH v2 1/1] contrib: add bpf-vmtest-tool to test BPF programs

2025-07-03 Thread Piyush Raj
This patch adds the bpf-vmtest-tool subdirectory under contrib which tests
BPF programs under a live kernel using a QEMU VM.  It automatically
builds the specified kernel version with eBPF support enabled
and stores it under "~/.bpf-vmtest-tool", which is reused for future
invocations.

It can also compile BPF C source files or BPF bytecode objects and
test them against the kernel verifier for errors.  When a BPF program
is rejected by the kernel verifier, the verifier logs are displayed.

$ python3 main.py -k 6.15 --bpf-src assets/ebpf-programs/fail.c
BPF program failed to load
Verifier logs:
btf_vmlinux is malformed
0: R1=ctx() R10=fp0
0: (81) r0 = *(s32 *)(r10 +4)
invalid read from stack R10 off=4 size=4
processed 1 insns (limit 100) max_states_per_insn 0 total_states 0 
peak_states 0 mark_read 0

See the README for more examples.

The script uses vmtest (https://github.com/danobi/vmtest) to boot
the VM and run the program.  By default, it uses the host's root
("/") as the VM rootfs via the 9p filesystem, so only the kernel is
replaced during testing.

Tested with Python 3.9 and above.

contrib/ChangeLog:

* bpf-vmtest-tool/.gitignore: New file.
* bpf-vmtest-tool/README: New file.
* bpf-vmtest-tool/bpf.py: New file.
* bpf-vmtest-tool/config.py: New file.
* bpf-vmtest-tool/kernel.py: New file.
* bpf-vmtest-tool/main.py: New file.
* bpf-vmtest-tool/pyproject.toml: New file.
* bpf-vmtest-tool/tests/test_cli.py: New file.
* bpf-vmtest-tool/utils.py: New file.
* bpf-vmtest-tool/vm.py: New file.

Signed-off-by: Piyush Raj 
---
 contrib/bpf-vmtest-tool/.gitignore|  23 +++
 contrib/bpf-vmtest-tool/README|  78 
 contrib/bpf-vmtest-tool/bpf.py| 189 +++
 contrib/bpf-vmtest-tool/config.py |  18 ++
 contrib/bpf-vmtest-tool/kernel.py | 209 ++
 contrib/bpf-vmtest-tool/main.py   | 101 +++
 contrib/bpf-vmtest-tool/pyproject.toml|  36 
 contrib/bpf-vmtest-tool/tests/test_cli.py | 167 +
 contrib/bpf-vmtest-tool/utils.py  |  27 +++
 contrib/bpf-vmtest-tool/vm.py | 154 
 10 files changed, 1002 insertions(+)
 create mode 100644 contrib/bpf-vmtest-tool/.gitignore
 create mode 100644 contrib/bpf-vmtest-tool/README
 create mode 100644 contrib/bpf-vmtest-tool/bpf.py
 create mode 100644 contrib/bpf-vmtest-tool/config.py
 create mode 100644 contrib/bpf-vmtest-tool/kernel.py
 create mode 100644 contrib/bpf-vmtest-tool/main.py
 create mode 100644 contrib/bpf-vmtest-tool/pyproject.toml
 create mode 100644 contrib/bpf-vmtest-tool/tests/test_cli.py
 create mode 100644 contrib/bpf-vmtest-tool/utils.py
 create mode 100644 contrib/bpf-vmtest-tool/vm.py

diff --git a/contrib/bpf-vmtest-tool/.gitignore 
b/contrib/bpf-vmtest-tool/.gitignore
new file mode 100644
index 000..723dfe1d0f4
--- /dev/null
+++ b/contrib/bpf-vmtest-tool/.gitignore
@@ -0,0 +1,23 @@
+.gitignore_local
+.python-version
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[codz]
+*$py.class
+
+# Unit test / coverage reports
+.pytest_cache/
+
+
+# Environments
+.env
+.envrc
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Ruff stuff:
+.ruff_cache/
diff --git a/contrib/bpf-vmtest-tool/README b/contrib/bpf-vmtest-tool/README
new file mode 100644
index 000..52e1e1b9253
--- /dev/null
+++ b/contrib/bpf-vmtest-tool/README
@@ -0,0 +1,78 @@
+This directory contains a Python script to run BPF programs or shell commands
+under a live Linux kernel using QEMU virtual machines.
+
+USAGE
+=
+
+To run a shell command inside a live kernel VM:
+
+python main.py -k 6.15 -r / -c "uname -a"
+
+To run a BPF source file in the VM:
+
+python main.py -k 6.15  --bpf-src fail.c
+
+To run a precompiled BPF object file:
+
+python main.py -k 6.15 --bpf-obj fail.bpf.o
+
+The tool will download and build the specified kernel version from:
+
+https://www.kernel.org/pub/linux/kernel
+
+A prebuilt `bzImage` can be supplied using the `--kernel-image` flag.
+
+NOTE
+
+- Only x86_64 is supported
+- Only "/" (the root filesystem) is currently supported as the VM rootfs when
+running or testing BPF programs using `--bpf-src` or `--bpf-obj`.
+
+DEPENDENCIES
+
+
+- Python >= 3.9
+- vmtest >= v0.18.0 (https://github.com/danobi/vmtest)
+- QEMU
+- qemu-guest-agent
+
+For compiling kernel
+- https://docs.kernel.org/process/changes.html#current-minimal-requirements
+For compiling and loading BPF programs:
+
+- libbpf
+- bpftool
+- gcc-bpf-unknown-none
+   (https://gcc.gnu.org/wiki/BPFBackEnd#Where_to_find_GCC_BPF)
+- vmlinux.h
+Can be generated using:
+
+bpftool btf dump file /sys/kernel/btf/vmlinux format c > \
+/usr/local/include/vmlinux.h
+
+Or downloaded from https://github.com/libbpf/vmlinux.h/tree/main
+
+BUILD FLAGS
+===
+You can

Re: [PATCH] c++: -fno-delete-null-pointer-checks constexpr addr comparison [PR71962]

2025-07-03 Thread Jason Merrill

On 7/2/25 7:58 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?

-- >8 --

Here the flag -fno-delete-null-pointer-checks causes the trivial address
comparison in

   inline int a, b;
   static_assert(&a != &b);

to be rejected as non-constant because with the flag we can't assume
such weak symbols are non-NULL, which causes symtab/fold-const.cc to
punt on such comparisons.  Note this also affects -fsanitize=undefined
since it implies -fno-delete-null-pointer-checks.


Right, the underlying problem is that we use the one flag to mean two 
things:


1) a static storage duration decl can live at address 0
2) do more careful checking for null pointers/lvalues (i.e. in 
gimple_call_nonnull_result_p)


Both cases are related to checking for null, but they are different 
situations and really shouldn't depend on the same flag.


Your patch seems wrong for #1 targets; on such a target 'a' might end up 
allocated at address 0, so "&a != nullptr" is not decidable at compile time.


OTOH such targets are a small minority, and I suspect they already have 
other C++ issues with e.g. a conversion to base not adjusting a null 
pointer.


Jakub, what do you think?  It's been 9 years since you proposed a better 
fix in the PR, but that hasn't happened yet.



This issue seems conceptually the same as PR96862 which was about
-frounding-math breaking some constexpr floating point arithmetic,
and we fixed that PR by disabling -frounding-math during manifestly
constant evaluation.  This patch proposes to do the same for
-fno-delete-null-pointer-checks, disabling it during maniestly constant
evaluation.  I opted to disable it narrowly around the relevant
fold_binary call which seems to address all reported constexpr failures,
but we could consider it disabling it more broadly as well.

PR c++/71962

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_binary_expression): Set
flag_delete_null_pointer_checks alongside folding_cxx_constexpr
during manifestly constant evaluation.

gcc/testsuite/ChangeLog:

* g++.dg/ext/constexpr-pr71962.C: New test.
* g++.dg/ubsan/pr71962.C: New test.
---
  gcc/cp/constexpr.cc  |  2 ++
  gcc/testsuite/g++.dg/ext/constexpr-pr71962.C | 18 ++
  gcc/testsuite/g++.dg/ubsan/pr71962.C |  5 +
  3 files changed, 25 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-pr71962.C
  create mode 100644 gcc/testsuite/g++.dg/ubsan/pr71962.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 704d936f2ec3..e8426b40c543 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4068,6 +4068,8 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, 
tree t,
  || TREE_CODE (type) != REAL_TYPE))
{
  auto ofcc = make_temp_override (folding_cxx_constexpr, true);
+ auto odnpc = make_temp_override (flag_delete_null_pointer_checks,
+  true);
  r = fold_binary_initializer_loc (loc, code, type, lhs, rhs);
}
else
diff --git a/gcc/testsuite/g++.dg/ext/constexpr-pr71962.C 
b/gcc/testsuite/g++.dg/ext/constexpr-pr71962.C
new file mode 100644
index ..57cb14ac804e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/constexpr-pr71962.C
@@ -0,0 +1,18 @@
+// PR c++/71962
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fno-delete-null-pointer-checks" }
+
+struct A { void f(); };
+static_assert(&A::f != nullptr, "");
+
+#if __cpp_inline_variables
+inline int a, b;
+static_assert(&a != &b, "");
+static_assert(&a != nullptr, "");
+#endif
+
+int main() {
+  static int x, y;
+  static_assert(&x != &y, "");
+  static_assert(&x != nullptr, "");
+}
diff --git a/gcc/testsuite/g++.dg/ubsan/pr71962.C 
b/gcc/testsuite/g++.dg/ubsan/pr71962.C
new file mode 100644
index ..f17c825da449
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ubsan/pr71962.C
@@ -0,0 +1,5 @@
+// PR c++/71962
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fsanitize=undefined" }
+
+#include "../ext/constexpr-pr71962.C"




Re: [PATCH] libquadmath: add quad support for trig-pi functions

2025-07-03 Thread Joseph Myers
On Thu, 3 Jul 2025, Jakub Jelinek wrote:

> > Isn't the whole raison d'etre for the trig-pi functions that the internal 
> > argument reduction against multiples of pi becomes trivial and hence (a) 
> > performant, and (b) doesn't introduce rounding artifacts?  Expressing the 
> > trig-pi functions in terms of their counterparts completely defeats this 
> > purpose.  The other way around would be more sensible for the cases where 
> > it works, but the above doesn't seem very attractive.

>   x = M_FABS (x - M_LIT (2.0) * M_SUF (round) (M_LIT (0.5) * x));

In particular, this is what trivial range reduction looks like: no need to 
do multiple-precision multiplication with the relevant bits of a 
multiple-precision value of 1/pi, just round to the nearest integer 
(typically a single instruction).

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] libquadmath: add quad support for trig-pi functions

2025-07-03 Thread Yuao Ma

Hi all,

On 7/3/2025 9:21 PM, Jakub Jelinek wrote:

On Thu, Jul 03, 2025 at 02:43:43PM +0200, Michael Matz wrote:

Hello,

On Thu, 3 Jul 2025, Yuao Ma wrote:


This patch adds the required function for Fortran trigonometric functions to
work with glibc versions prior to 2.26. It's based on glibc source commit
632d895f3e5d98162f77b9c3c1da4ec19968b671.

I've built it successfully on my end. Documentation is also included.

Please take a look when you have a moment.


+__float128
+cospiq (__float128 x)
+{
...
+  if (__builtin_islessequal (x, 0.25Q))
+return cosq (M_PIq * x);

Isn't the whole raison d'etre for the trig-pi functions that the internal
argument reduction against multiples of pi becomes trivial and hence (a)
performant, and (b) doesn't introduce rounding artifacts?  Expressing the
trig-pi functions in terms of their counterparts completely defeats this
purpose.  The other way around would be more sensible for the cases where
it works, but the above doesn't seem very attractive.


glibc has
FLOAT
M_DECL_FUNC (__cospi) (FLOAT x)
{
   if (isless (M_FABS (x), M_EPSILON))
 return M_LIT (1.0);
   if (__glibc_unlikely (isinf (x)))
 __set_errno (EDOM);
   x = M_FABS (x - M_LIT (2.0) * M_SUF (round) (M_LIT (0.5) * x));
   if (islessequal (x, M_LIT (0.25)))
 return M_SUF (__cos) (M_MLIT (M_PI) * x);
   else if (x == M_LIT (0.5))
 return M_LIT (0.0);
   else if (islessequal (x, M_LIT (0.75)))
 return M_SUF (__sin) (M_MLIT (M_PI) * (M_LIT (0.5) - x));
   else
 return -M_SUF (__cos) (M_MLIT (M_PI) * (M_LIT (1.0) - x));
}
for this case, so if it is incorrect, has too bad precision
or there are better ways to do it, then perhaps it should be changed
on the glibc side first and then ported to libquadmath.



That's exactly what I want to say. I only touched the generation script; 
the other main changes are borrowed from glibc. Therefore, it should be 
at least as precise as the glibc generic implementation. If the current 
implementation isn't sufficient, we can update glibc first and then 
mirror the changes back. This is only used for systems with glibc 
versions prior to 2.26. Modern systems will use the implementation 
directly from glibc, which should avoid this potential issue.


Regarding math function implementation, I think LLVM-libc provides a 
great example. While it may not be as performant as its glibc 
equivalent, its structure is very clear. Even for a newcomer like me, 
who isn't very familiar with computational mathematics, I can grasp the 
basic ideas of implementation and testing, such as nan/inf handling, 
polynomial curve fitting, exception values, and range reduction. I'd 
love to learn from and contribute to the glibc generic implementation as 
Joseph suggested, but it seems it will take more time to get familiar 
with the glibc source code. Hopefully, I'll be able to do that one day.


Regarding the patch itself, the previous version was missing the 
declaration in the header file. The newly attached version fixes this. 
Is this patch good to go, or does it need further amendment? Looking 
forward to your further review comments.


Thanks,
Yuao
From 719886689fd21555c9ba1bd63a44111408c0f01d Mon Sep 17 00:00:00 2001
From: Yuao Ma 
Date: Thu, 3 Jul 2025 23:01:38 +0800
Subject: [PATCH] libquadmath: add quad support for trig-pi functions

This function is required for Fortran trigonometric functions with glibc <2.26.
Use glibc commit 632d895f3e5d98162f77b9c3c1da4ec19968b671.

libquadmath/ChangeLog:

* Makefile.am: Add sources to makefile.
* Makefile.in: Regen makefile.
* libquadmath.texi: Add doc for trig-pi funcs.
* quadmath.h (acospiq, asinpiq, atanpiq, atan2piq, cospiq, sinpiq,
tanpiq): New.
* update-quadmath.py: Update generation script.
* math/acospiq.c: New file.
* math/asinpiq.c: New file.
* math/atan2piq.c: New file.
* math/atanpiq.c: New file.
* math/cospiq.c: New file.
* math/sinpiq.c: New file.
* math/tanpiq.c: New file.

Signed-off-by: Yuao Ma 
---
 libquadmath/Makefile.am|  2 ++
 libquadmath/Makefile.in| 26 --
 libquadmath/libquadmath.texi   |  7 
 libquadmath/math/acospiq.c | 33 ++
 libquadmath/math/asinpiq.c | 40 ++
 libquadmath/math/atan2piq.c| 36 
 libquadmath/math/atanpiq.c | 35 +++
 libquadmath/math/cospiq.c  | 37 
 libquadmath/math/sinpiq.c  | 44 
 libquadmath/math/tanpiq.c  | 62 ++
 libquadmath/quadmath.h |  7 
 libquadmath/update-quadmath.py | 48 ++
 12 files changed, 359 insertions(+), 18 deletions(-)
 create mode 100644 libquadmath/math/acospiq.c
 create mode 100644 libquadmath/math/asinpiq.c
 create mode 100644 libquadmath/math/atan2piq.c
 create mode 100644 libquadmath/math/atanpiq.c
 create mode 1

Re: [PATCH] x86-64: Add --enable-x86-64-mfentry

2025-07-03 Thread Uros Bizjak
On Thu, Jul 3, 2025 at 12:01 PM H.J. Lu  wrote:
>
> When profiling is enabled with shrink wrapping, the mcount call may not
> be placed at the function entry after
>
> pushq %rbp
> movq %rsp,%rbp
>
> As the result, the profile data may be skewed which makes PGO less
> effective.
>
> Add --enable-x86-64-mfentry to enable -mfentry by default to use
> __fentry__, added to glibc in 2010 by:
>
> commit d22e4cc9397ed41534c9422d0b0ffef8c77bfa53
> Author: Andi Kleen 
> Date:   Sat Aug 7 21:24:05 2010 -0700
>
> x86: Add support for frame pointer less mcount
>
> instead of mcount, which is placed before the prologue so that -pg can
> be used with -fshrink-wrap-separate enabled at -O1.  This option is
> 64-bit only because __fentry__ doesn't support PIC in 32-bit mode.
>
> Also warn -pg without -mfentry with shrink wrapping enabled.  The warning
> is disable for PIC in 32-bit mode.
>
> gcc/
>
> PR target/120881
> * config.in: Regenerated.
> * configure: Likewise.
> * configure.ac: Add --enable-x86-64-mfentry.
> * config/i386/i386-options.cc (ix86_option_override_internal):
> Enable __fentry__ in 64-bit mode if ENABLE_X86_64_MFENTRY is set
> to 1.  Warn -pg without -mfentry with shrink wrapping enabled.
> * doc/install.texi: Document --enable-x86-64-mfentry.
>
> gcc/testsuite/
>
> PR target/120881
> * gcc.target/i386/pr120881-1a.c: New test.
> * gcc.target/i386/pr120881-1b.c: Likewise.
> * gcc.target/i386/pr120881-1c.c: Likewise.
> * gcc.target/i386/pr120881-1d.c: Likewise.
> * gcc.target/i386/pr120881-2a.c: Likewise.
> * gcc.target/i386/pr120881-2b.c: Likewise.
> * lib/target-supports.exp (check_effective_target_fentry): New.
>
> OK for master?

OK in principle, but please allow some time for distro maintainers
(CC'd) to voice their opinion.

Thanks,
Uros.


[PATCH] i386: Fix vect-pragma-target-[12].c testcase for -march=XYZ [PR120643]

2025-07-03 Thread Andrew Pinski
These 2 testcases were originally designed for the default -march= of
x86_64 so if you pass -march=native (on a target with AVX512 enabled),
they will fail. It fix this, we add `-mno-sse3 -mprefer-vector-width=512`
to the options to force a specific arch to the testcase.

Tested on a skylake-avx512 machine with -march=native.

PR testsuite/120643
gcc/testsuite/ChangeLog:

* gcc.target/i386/vect-pragma-target-1.c: Add `-mno-sse3 
-mprefer-vector-width=512`
to the options.
* gcc.target/i386/vect-pragma-target-2.c: Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.target/i386/vect-pragma-target-1.c | 2 +-
 gcc/testsuite/gcc.target/i386/vect-pragma-target-2.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/vect-pragma-target-1.c 
b/gcc/testsuite/gcc.target/i386/vect-pragma-target-1.c
index f5e71e453ec..8f7d3d9b63d 100644
--- a/gcc/testsuite/gcc.target/i386/vect-pragma-target-1.c
+++ b/gcc/testsuite/gcc.target/i386/vect-pragma-target-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
-/* { dg-options "-O0" } */
+/* { dg-options "-O0 -mno-sse3 -mprefer-vector-width=512" } */
 /* { dg-final { scan-assembler-times "paddd.+xmm\[0-9]+"1 } }   */
 /* { dg-final { scan-assembler-times "vfmadd132ps.+ymm\[0-9]+"  1 } }   */
 /* { dg-final { scan-assembler-times "vpaddw.+zmm\[0-9]+"   1 } }   */
diff --git a/gcc/testsuite/gcc.target/i386/vect-pragma-target-2.c 
b/gcc/testsuite/gcc.target/i386/vect-pragma-target-2.c
index 349680453a4..925d662b02b 100644
--- a/gcc/testsuite/gcc.target/i386/vect-pragma-target-2.c
+++ b/gcc/testsuite/gcc.target/i386/vect-pragma-target-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
-/* { dg-options "-O0" } */
+/* { dg-options "-O0 -mno-sse3 -mprefer-vector-width=512" } */
 /* { dg-final { scan-assembler-times "paddd.+xmm\[0-9]+"1 } }   */
 /* { dg-final { scan-assembler-times "vfmadd132ps.+ymm\[0-9]+"  1 } }   */
 /* { dg-final { scan-assembler-times "vpaddw.+zmm\[0-9]+"   1 } }   */
-- 
2.43.0



Re: [PATCH] Update alignment for argument on stack

2025-07-03 Thread Richard Sandiford
"H.J. Lu"  writes:
> Since a backend may ignore user type alignment for arguments passed on
> stack, update alignment for arguments passed on stack when copying MEM's
> memory attributes.
>
> gcc/
>
> PR target/120839
> * emit-rtl.cc (set_mem_attrs): Update alignment for argument on
> stack.
>
> gcc/testsuite/
>
> PR target/120839
> * gcc.target/i386/pr120839-1.c: New test.
> * gcc.target/i386/pr120839-2.c: Likewise.
>
>
> -- 
> H.J.
>
> From 3f8a9bfb4beae47bfc0da20b517a5b3b06a1cbcc Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" 
> Date: Sat, 28 Jun 2025 06:27:25 +0800
> Subject: [PATCH] Update alignment for argument on stack
>
> Since a backend may ignore user type alignment for arguments passed on
> stack, update alignment for arguments passed on stack when copying MEM's
> memory attributes.
>
> gcc/
>
>   PR target/120839
>   * emit-rtl.cc (set_mem_attrs): Update alignment for argument on
>   stack.
>
> gcc/testsuite/
>
>   PR target/120839
>   * gcc.target/i386/pr120839-1.c: New test.
>   * gcc.target/i386/pr120839-2.c: Likewise.
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/emit-rtl.cc| 14 ++
>  gcc/testsuite/gcc.target/i386/pr120839-1.c | 14 ++
>  gcc/testsuite/gcc.target/i386/pr120839-2.c | 19 +++
>  3 files changed, 47 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr120839-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr120839-2.c
>
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index f4fc92bb37a..0d1616361ca 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -389,6 +389,20 @@ set_mem_attrs (rtx mem, mem_attrs *attrs)
>  {
>MEM_ATTRS (mem) = ggc_alloc ();
>memcpy (MEM_ATTRS (mem), attrs, sizeof (mem_attrs));
> +  if (MEM_EXPR (mem))
> + {
> +   tree base_address = get_base_address (MEM_EXPR (mem));
> +   if (base_address && TREE_CODE (base_address) == PARM_DECL)
> + {
> +   /* User alignment on type may be ignored for parameter
> +  passed on stack.  */
> +   tree type = TREE_TYPE (base_address);
> +   unsigned int alignment
> + = targetm.calls.function_arg_boundary (TYPE_MODE (type),
> +type);
> +   set_mem_align (mem, alignment);
> + }
> + }
>  }
>  }

This doesn't feel like the right place to address this.  set_mem_attrs
is just supposed to install the attributes that it has been given,
without second-guessing the contents.

Where does the incorrect alignment ultimately come from?  (As in,
which piece of code creates the MEM and fails to give it the correct
alignment?)

Richard

>  
> diff --git a/gcc/testsuite/gcc.target/i386/pr120839-1.c 
> b/gcc/testsuite/gcc.target/i386/pr120839-1.c
> new file mode 100644
> index 000..74fbf876330
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr120839-1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +typedef struct
> +{
> +  long double a;
> +  long double b;
> +} c __attribute__((aligned(32)));
> +extern double d;
> +void
> +bar (c f)
> +{
> +  d = f.a;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr120839-2.c 
> b/gcc/testsuite/gcc.target/i386/pr120839-2.c
> new file mode 100644
> index 000..e5b711c966f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr120839-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "and\[lq\]?\[\\t \]*\\$-32,\[\\t 
> \]*%\[re\]?sp" } } */
> +
> +typedef struct
> +{
> +  long double a;
> +  long double b;
> +} c __attribute__((aligned(32)));
> +extern c x;
> +extern double d;
> +extern void bar (c);
> +void
> +foo (void)
> +{
> +  x.a = d;
> +  x.b = d;
> +  bar (x);
> +}


[PATCH] Add myself as an aarch64 port reviewer

2025-07-03 Thread Andrew Pinski
As mentioned in 
https://inbox.sourceware.org/gcc/ea828262-8f8f-4362-9ca8-312f7c20e...@nvidia.com/T/#m6e7e8e11656189598c759157d5d49cbd0ac9ba7c.
Adding myself as an aarch64 port reviewer.

ChangeLog:

* MAINTAINERS: Add myself as an aarch64 port reviewer.

Signed-off-by: Andrew Pinski 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6f663742fcc..5d5cfaa4e86 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -274,6 +274,7 @@ check in changes outside of the parts of the compiler they 
maintain.
 
 Reviewers
 
+aarch64 portAndrew Pinski  
 arm port (MVE)  Christophe Lyon 
 callgraph   Martin Jambor   
 C front end Marek Polacek   
-- 
2.43.0



Re: [PATCH] libstdc++: Members missing in std::numeric_limits

2025-07-03 Thread Mateusz Zych
Hello!

I've prepared a patch, which adds all members missing from
std::numeric_limits<> specializations for integer-class types.

Jonathan, please let me know whether you like these changes
and do not see any bugs or issues with them. From my side, I just want to
say that:

   - Since all std::numeric_limits<> specializations for integral types,
   defined in //libstdc++-v3/include/std/limits don't inherit from a base
   class
   providing common data members and member functions,
   I also didn't introduce such a base class in
   //libstdc++-v3/include/bits/max_size_type.h.
   Such implementation has quite a bit of code duplication, but it's like
   that on purpose, right?

   - I didn't test the traps static data member, because I don't know how to
   accurately predict when this compile-time constant should be true and
   when it should be false.
   Moreover, I saw that the unit-test verifying correctness of the traps
   constant
   from std::numeric_limits<> specializations for integral types
   (//libstdc++-v3/testsuite/18_support/numeric_limits/traps.cc) also
   doesn't verify its value.

   - In the unit-tests for integer-class types I've defined variable
   template
   verify_numeric_limits_values_not_meaningful_for<> to avoid code
   duplication
   and have clear and readable code. I hope this is OK.

Thanks, Mateusz Zych

On Wed, Jul 2, 2025 at 7:30 PM Jonathan Wakely  wrote:

> On Wed, 2 Jul 2025 at 17:15, Mateusz Zych wrote:
> >
> > OK, then I’ll prepare appropriate patch with tests and send it when I’m
> done implementing it.
>
> That would be great, thanks. I won't push the initial patch, we can
> wait for you to prepare the complete fix.
>
> Please note that for a more significant change, we have some legal
> prerequisites for contributions, as documented at:
> https://gcc.gnu.org/contribute.html#legal
>
> If you want to contribute under the DCO terms, please read
> https://gcc.gnu.org/dco.html so that you understand exactly what the
> Signed-off-by: trailer means.
>
> Thanks!
>
>
From d7d20d31e549e001f7644ee53899fa8494d0700f Mon Sep 17 00:00:00 2001
From: Mateusz Zych 
Date: Wed, 2 Jul 2025 01:51:40 +0300
Subject: [PATCH] libstdc++: Added missing members to numeric_limits
 specializations for integer-class types.

25.3.4.4 Concept weakly_incrementable  [iterator.concept.winc]

  (5) For every integer-class type I,
  let B(I) be a unique hypothetical extended integer type
  of the same signedness with the same width as I.

  [Note 2: The corresponding
   hypothetical specialization numeric_limits
   meets the requirements on
   numeric_limits specializations for integral types.]

 (11) For every (possibly cv-qualified) integer-class type I,
  numeric_limits is specialized such that:

  - each static data member m
has the same value as numeric_limits::m, and

  - each static member function f
returns I(numeric_limits::f()).

17.3.5.3 numeric_limits specializations  [numeric.special]

  (1) All members shall be provided for all specializations.
  However, many values are only required to be meaningful
  under certain conditions (for example,
  epsilon() is only meaningful if is_integer is false).
  Any value that is not meaningful shall be set to 0 or false.

libstdc++-v3/ChangeLog:

	* include/bits/max_size_type.h
	(numeric_limits<__max_size_type>): New static data members.
	(numeric_limits<__max_diff_type>): Likewise.

Signed-off-by: Mateusz Zych 
---
 libstdc++-v3/include/bits/max_size_type.h | 83 +++
 .../std/ranges/iota/max_size_type.cc  | 31 +++
 2 files changed, 114 insertions(+)

diff --git a/libstdc++-v3/include/bits/max_size_type.h b/libstdc++-v3/include/bits/max_size_type.h
index 73a6d141d5b..30c5b124767 100644
--- a/libstdc++-v3/include/bits/max_size_type.h
+++ b/libstdc++-v3/include/bits/max_size_type.h
@@ -38,6 +38,7 @@
 #include 
 #include  // __bit_width
 #include 
+#include  // __glibcxx_integral_traps
 
 // This header implements unsigned and signed integer-class types (as per
 // [iterator.concept.winc]) that are one bit wider than the widest supported
@@ -775,10 +776,27 @@ namespace ranges
   static constexpr bool is_signed = false;
   static constexpr bool is_integer = true;
   static constexpr bool is_exact = true;
+  static constexpr bool is_bounded = true;
+  static constexpr bool is_modulo = true;
+  static constexpr bool traps = __glibcxx_integral_traps;
+  static constexpr int radix = 2;
   static constexpr int digits
 	= __gnu_cxx::__int_traits<_Sp::__rep>::__digits + 1;
   static constexpr int digits10
 	= static_cast(digits * numbers::ln2 / numbers::ln10);
+  static constexpr int max_digits10 = 0;
+  static constexpr int min_exponent = 0;
+  static constexpr int min_exponent10 = 0;
+  static constexpr int max_exponent = 0;
+  static constexpr int max_exponent10 = 0;
+  s

Re: [PATCH] asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

2025-07-03 Thread Konstantinos Eleftheriou
On Wed, May 7, 2025 at 11:29 AM Richard Sandiford
 wrote:
> But I thought the code was allowing multiple stores to be forwarded to
> a single (wider) load.  E.g. 4 individual byte stores at address X, X+1,
> X+2 and X+3 could be forwarded to a 4-byte load at address X.  And the code
> I mentioned is handling the least significant byte by zero-extending it.
>
> For big-endian targets, the least significant byte should come from
> address X+3 rather than address X.  The byte at address X (i.e. the
> byte with the equal offset) should instead go in the most significant
> byte, typically using a shift left.
Hi, I'm attaching a patch that we prepared for this. It would be of
great help if someone could test it on a big-endian target, preferably
one with BITS_BIG_ENDIAN == 0 as we were having issues with that in
the past.

Thanks,
Konstantinos


0001-asf-Fix-offset-check-in-base-reg-initialization-for-.patch
Description: Binary data


Re: [PATCH] testsuite: Restore dg-do run on pr116906 and pr78185 tests

2025-07-03 Thread Christophe Lyon
ping^2 ?

On Wed, 18 Jun 2025 at 12:11, Christophe Lyon
 wrote:
>
> ping?
>
> On Mon, 26 May 2025 at 17:26, Christophe Lyon
>  wrote:
> >
> > On Mon, 26 May 2025 at 17:14, Christophe Lyon
> >  wrote:
> > >
> > > Commit r15-7152-g57b706d141b87c removed
> > > /* { dg-do run { target*-*-linux* *-*-gnu* *-*-uclinux* } } */
> > >
> > > from these tests, turning them into 'compile' only tests, even when
> > > they could be executed.
> > >
> > > This patch adds
> > > /* { dg-do run } */
> > >
> > > which is OK since the tests are correctly skipped if needed thanks to
> > > the following effective-targets (alarm and signal).
> > >
> > > With this patch we have again two entries for these tests on linux 
> > > targets:
> > > * compile (test for excess errors)
> > > * execution test
> >
> > Gasp I forgot to add a ChangeLog entry, but it would be an obvious:
> > Add 'dg-do run' :-)
> >
> >
> > > ---
> > >  gcc/testsuite/gcc.dg/pr116906-1.c | 1 +
> > >  gcc/testsuite/gcc.dg/pr116906-2.c | 1 +
> > >  gcc/testsuite/gcc.dg/pr78185.c| 1 +
> > >  3 files changed, 3 insertions(+)
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/pr116906-1.c 
> > > b/gcc/testsuite/gcc.dg/pr116906-1.c
> > > index 7187507a60d..ee60ad67e93 100644
> > > --- a/gcc/testsuite/gcc.dg/pr116906-1.c
> > > +++ b/gcc/testsuite/gcc.dg/pr116906-1.c
> > > @@ -1,3 +1,4 @@
> > > +/* { dg-do run } */
> > >  /* { dg-require-effective-target alarm } */
> > >  /* { dg-require-effective-target signal } */
> > >  /* { dg-options "-O2" } */
> > > diff --git a/gcc/testsuite/gcc.dg/pr116906-2.c 
> > > b/gcc/testsuite/gcc.dg/pr116906-2.c
> > > index 41a352bf837..4172ec3644a 100644
> > > --- a/gcc/testsuite/gcc.dg/pr116906-2.c
> > > +++ b/gcc/testsuite/gcc.dg/pr116906-2.c
> > > @@ -1,3 +1,4 @@
> > > +/* { dg-do run } */
> > >  /* { dg-require-effective-target alarm } */
> > >  /* { dg-require-effective-target signal } */
> > >  /* { dg-options "-O2 -fno-tree-ch" } */
> > > diff --git a/gcc/testsuite/gcc.dg/pr78185.c 
> > > b/gcc/testsuite/gcc.dg/pr78185.c
> > > index ada8b1b9f90..4c3af4f2890 100644
> > > --- a/gcc/testsuite/gcc.dg/pr78185.c
> > > +++ b/gcc/testsuite/gcc.dg/pr78185.c
> > > @@ -1,3 +1,4 @@
> > > +/* { dg-do run } */
> > >  /* { dg-require-effective-target alarm } */
> > >  /* { dg-require-effective-target signal } */
> > >  /* { dg-options "-O" } */
> > > --
> > > 2.34.1
> > >


Re: [PATCH] testsuite: Skip check-function-bodies sometimes

2025-07-03 Thread Jakub Jelinek
On Thu, Jul 03, 2025 at 02:55:37PM +0200, Stefan Schulze Frielinghaus wrote:
> Ok for mainline?

ChangeLog is missing.
And I think I'd appreciate another pair of eyes, Rainer/Mike, what do you
think about this?

> If a check-function-bodies test is compiled using -fstack-protector*,
> -fhardened, -fstack-check*, or -fstack-clash-protection, but the test is
> not asking explicitly for those via dg-options or
> dg-additional-options, then mark the test as unsupported.  Since these
> features influence prologue/epilogue it is rarely useful to check the
> full function body, if the test is not explicitly prepared for those.
> This might happen when the testsuite is passed additional options as
> e.g. via --target_board='unix{-fstack-protector-all}'.
> 
> Co-Authored-By: Jakub Jelinek 
> ---
>  gcc/doc/sourcebuild.texi  |  9 +
>  gcc/testsuite/lib/scanasm.exp | 17 +
>  2 files changed, 26 insertions(+)
> 
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 6c5586e4b03..2980b04cb0e 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -3621,6 +3621,15 @@ command line.  This can help if a source file is 
> compiled both with
>  and without optimization, since it is rarely useful to check the full
>  function body for unoptimized code.
>  
> +Similarly, if a check-function-bodies test is compiled using
> +@samp{-fstack-protector*}, @samp{-fhardened}, @samp{-fstack-check*}, or
> +@samp{-fstack-clash-protection}, but the test is not asking explicitly for
> +those via @samp{dg-options} or @samp{dg-additional-options}, then mark the
> +test as unsupported.  Since these features influence prologue/epilogue it is
> +rarely useful to check the full function body, if the test is not explicitly
> +prepared for those.  This might happen when the testsuite is passed 
> additional
> +options as e.g.@: via @samp{--target_board='unix@{-fstack-protector-all@}'}.
> +
>  The first line of the expected output for a function @var{fn} has the form:
>  
>  @smallexample
> diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
> index 97935cb23c3..814843306ad 100644
> --- a/gcc/testsuite/lib/scanasm.exp
> +++ b/gcc/testsuite/lib/scanasm.exp
> @@ -1042,6 +1042,23 @@ proc check-function-bodies { args } {
>  # The name might include a list of options; extract the file name.
>  set filename [lindex $testcase 0]
>  
> +# The set of options passed to gcc
> +global compiler_flags
> +# The set of options specified in the individual test case
> +# including dg-options and dg-additional-options
> +set current_compiler_flags [current_compiler_flags]
> +if { ( [lsearch -glob $compiler_flags "-fstack-protector*"] >= 0
> +&& [lsearch -regex $current_compiler_flags 
> "-f(no-)?stack-protector"] == -1 )
> +  || ( [lsearch -exact $compiler_flags "-fhardened"] >= 0
> +   && [lsearch -regex $current_compiler_flags "-f(no-)?hardened"] == 
> -1 )
> +  || ( [lsearch -glob $compiler_flags "-fstack-check*"] >= 0
> +   && [lsearch -regex $current_compiler_flags 
> "-f(no-)?stack-check.*"] == -1 )
> +  || ( [lsearch -exact $compiler_flags "-fstack-clash-protection"] >= 0
> +   && [lsearch -regex $current_compiler_flags 
> "-f(no-)?stack-clash-protection"] == -1 ) } {
> + unsupported "$testcase: skip check-function-bodies due to implicit 
> prologue/epilogue changes e.g. by stack protector"
> + return
> +}
> +
>  global srcdir
>  set input_filename "$srcdir/$filename"
>  set output_filename "[file rootname [file tail $filename]]"
> -- 
> 2.49.0

Jakub



[PUSHED] OpenMP: Add omp_get_initial_device/omp_get_num_devices builtins: Fix test cases

2025-07-03 Thread Thomas Schwinge
With this fix-up for commit 387209938d2c476a67966c6ddbdbf817626f24a2
"OpenMP: Add omp_get_initial_device/omp_get_num_devices builtins", we progress:

 PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c (test for 
excess errors)
 PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c 
scan-tree-dump-not optimized "abort"
-FAIL: c-c++-common/gomp/omp_get_num_devices_initial_device.c 
scan-tree-dump-times optimized "omp_get_num_devices;" 1
+PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c 
scan-tree-dump-times optimized "omp_get_num_devices" 1
 PASS: c-c++-common/gomp/omp_get_num_devices_initial_device.c 
scan-tree-dump optimized "_1 = __builtin_omp_get_num_devices \\(\\);[\\r\\n]+[ 
]+return _1;"

... etc. for offloading configurations.

gcc/testsuite/
* c-c++-common/gomp/omp_get_num_devices_initial_device.c: Fix.
* gfortran.dg/gomp/omp_get_num_devices_initial_device.f90: Likewise.
---
 .../c-c++-common/gomp/omp_get_num_devices_initial_device.c| 4 ++--
 .../gfortran.dg/gomp/omp_get_num_devices_initial_device.f90   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/gcc/testsuite/c-c++-common/gomp/omp_get_num_devices_initial_device.c 
b/gcc/testsuite/c-c++-common/gomp/omp_get_num_devices_initial_device.c
index 4b17143c282..6e2c1a8d98d 100644
--- a/gcc/testsuite/c-c++-common/gomp/omp_get_num_devices_initial_device.c
+++ b/gcc/testsuite/c-c++-common/gomp/omp_get_num_devices_initial_device.c
@@ -25,8 +25,8 @@ int f()
 
 /* { dg-final { scan-tree-dump-not "abort" "optimized" } }  */
 
-/* { dg-final { scan-tree-dump-not "omp_get_num_devices;" "optimized" { target 
{ ! offloading_enabled } } } }  */
+/* { dg-final { scan-tree-dump-not "omp_get_num_devices" "optimized" { target 
{ ! offloading_enabled } } } }  */
 /* { dg-final { scan-tree-dump "return 0;" "optimized" { target { ! 
offloading_enabled } } } }  */
 
-/* { dg-final { scan-tree-dump-times "omp_get_num_devices;" 1 "optimized" { 
target offloading_enabled } } }  */
+/* { dg-final { scan-tree-dump-times "omp_get_num_devices" 1 "optimized" { 
target offloading_enabled } } }  */
 /* { dg-final { scan-tree-dump "_1 = __builtin_omp_get_num_devices 
\\(\\);\[\\r\\n\]+\[ \]+return _1;" "optimized" { target offloading_enabled } } 
}  */
diff --git 
a/gcc/testsuite/gfortran.dg/gomp/omp_get_num_devices_initial_device.f90 
b/gcc/testsuite/gfortran.dg/gomp/omp_get_num_devices_initial_device.f90
index 5409f12f464..279656bdd84 100644
--- a/gcc/testsuite/gfortran.dg/gomp/omp_get_num_devices_initial_device.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/omp_get_num_devices_initial_device.f90
@@ -17,8 +17,8 @@ end
 
 ! { dg-final { scan-tree-dump-not "error_stop" "optimized" } }
 
-! { dg-final { scan-tree-dump-not "omp_get_num_devices;" "optimized" { target 
{ ! offloading_enabled } } } }
+! { dg-final { scan-tree-dump-not "omp_get_num_devices" "optimized" { target { 
! offloading_enabled } } } }
 ! { dg-final { scan-tree-dump "return 0;" "optimized" { target { ! 
offloading_enabled } } } }
 
-! { dg-final { scan-tree-dump-times "omp_get_num_devices;" 1 "optimized" { 
target offloading_enabled } } }
+! { dg-final { scan-tree-dump-times "omp_get_num_devices" 1 "optimized" { 
target offloading_enabled } } }
 ! { dg-final { scan-tree-dump "_1 = __builtin_omp_get_num_devices 
\\(\\);\[\\r\\n\]+\[ \]+return _1;" "optimized" { target offloading_enabled } } 
}
-- 
2.34.1



[PATCH] c-family: Tweak ptr +- (expr +- cst) FE optimization [PR120837]

2025-07-03 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled with -fsanitize=undefined but we
introduce UB into the IL even without that flag.

The optimization ptr +- (expr +- cst) when expr/cst have undefined
overflow into (ptr +- cst) +- expr is sometimes simply not valid,
without careful analysis on what ptr points to we don't know if it
is valid to do (ptr +- cst) pointer arithmetics.
E.g. on the testcase, ptr points to start of an array (actually
conditionally one or another) and cst is -1, so ptr - 1 is invalid
pointer arithmetics, while ptr + (expr - 1) can be valid if expr
is at runtime always > 1 and smaller than size of the array ptr points
to + 1.

Unfortunately, removing this 1992-ish optimization altogether causes
FAIL: c-c++-common/restrict-2.c  -Wc++-compat   scan-tree-dump-times lim2 
"Moving statement" 11
FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump ch2 "is now do-while loop"
FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump-times ch2 "  if " 3
FAIL: gcc.dg/vect/pr57558-2.c scan-tree-dump vect "vectorized 1 loops"
FAIL: gcc.dg/vect/pr57558-2.c -flto -ffat-lto-objects  scan-tree-dump vect 
"vectorized 1 loops"
regressions (restrict-2.c also for C++ in all std modes).  I've been thinking
about some match.pd optimization for signed integer addition/subtraction of
constant followed by widening integral conversion followed by multiplication
or left shift, but that wouldn't help 32-bit arches.

So, instead at least for now, the following patch keeps doing the
optimization, just doesn't perform it in pointer arithmetics.
pointer_int_sum itself actually adds the multiplication by size_exp,
so ptr + expr is turned into ptr p+ expr * size_exp,
so this patch will try to optimize
ptr + (expr +- cst)
into
ptr p+ ((sizetype)expr * size_exp +- (sizetype)cst * size_exp)
and
ptr - (expr +- cst)
into
ptr p+ -((sizetype)expr * size_exp +- (sizetype)cst * size_exp)

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-07-03  Jakub Jelinek  

PR c/120837
* c-common.cc (pointer_int_sum): Rewrite the intop PLUS_EXPR or
MINUS_EXPR optimization into extension of both intop operands,
their separate multiplication and then addition/subtraction followed
by rest of pointer_int_sum handling after the multiplication.

* gcc.dg/ubsan/pr120837.c: New test.

--- gcc/c-family/c-common.cc.jj 2025-07-01 09:36:43.115908270 +0200
+++ gcc/c-family/c-common.cc2025-07-03 12:31:12.789367448 +0200
@@ -3438,20 +3438,41 @@ pointer_int_sum (location_t loc, enum tr
 an overflow error if the constant is negative but INTOP is not.  */
   && (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (intop))
  || (TYPE_PRECISION (TREE_TYPE (intop))
- == TYPE_PRECISION (TREE_TYPE (ptrop)
+ == TYPE_PRECISION (TREE_TYPE (ptrop
+  && TYPE_PRECISION (TREE_TYPE (intop)) <= TYPE_PRECISION (sizetype))
 {
-  enum tree_code subcode = resultcode;
-  tree int_type = TREE_TYPE (intop);
-  if (TREE_CODE (intop) == MINUS_EXPR)
-   subcode = (subcode == PLUS_EXPR ? MINUS_EXPR : PLUS_EXPR);
-  /* Convert both subexpression types to the type of intop,
-because weird cases involving pointer arithmetic
-can result in a sum or difference with different type args.  */
-  ptrop = build_binary_op (EXPR_LOCATION (TREE_OPERAND (intop, 1)),
-  subcode, ptrop,
-  convert (int_type, TREE_OPERAND (intop, 1)),
-  true);
-  intop = convert (int_type, TREE_OPERAND (intop, 0));
+  tree intop0 = TREE_OPERAND (intop, 0);
+  tree intop1 = TREE_OPERAND (intop, 1);
+  if (TYPE_PRECISION (TREE_TYPE (intop)) != TYPE_PRECISION (sizetype)
+ || TYPE_UNSIGNED (TREE_TYPE (intop)) != TYPE_UNSIGNED (sizetype))
+   {
+ tree optype = c_common_type_for_size (TYPE_PRECISION (sizetype),
+   TYPE_UNSIGNED (sizetype));
+ intop0 = convert (optype, intop0);
+ intop1 = convert (optype, intop1);
+   }
+  tree t = fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (intop0), intop0,
+   convert (TREE_TYPE (intop0), size_exp));
+  intop0 = convert (sizetype, t);
+  if (TREE_OVERFLOW_P (intop0) && !TREE_OVERFLOW (t))
+   intop0 = wide_int_to_tree (TREE_TYPE (intop0), wi::to_wide (intop0));
+  t = fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (intop1), intop1,
+  convert (TREE_TYPE (intop1), size_exp));
+  intop1 = convert (sizetype, t);
+  if (TREE_OVERFLOW_P (intop1) && !TREE_OVERFLOW (t))
+   intop1 = wide_int_to_tree (TREE_TYPE (intop1), wi::to_wide (intop1));
+  intop = build_binary_op (EXPR_LOCATION (intop), TREE_CODE (intop),
+  intop0, intop1, true);
+
+  /* Create the sum or difference.  */
+  if (resultcode == MINUS_EXPR)
+   intop = fold_build1_loc (lo

Re: [PATCH] s390: More vec-perm-const cases.

2025-07-03 Thread Juergen Christ
> On 6/27/25 8:09 PM, Juergen Christ wrote:
> > s390 missed constant vector permutation cases based on the vector pack
> > instruction or changing the size of the vector elements during vector
> > merge.  This enables some more patterns that do not need to load a
> > constant vector for permutation.
> > 
> > Bootstrapped and regtested on s390.  Okay for trunk?
> > 
> > gcc/ChangeLog:
> > 
> > * config/s390/s390.cc (expand_perm_with_merge): Add size change cases.
> > (expand_perm_with_pack): New function.
> > (vectorize_vec_perm_const_1): Wire up new function.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/s390/vector/vec-perm-merge-1.c: New test.
> > * gcc.target/s390/vector/vec-perm-pack-1.c: New test.
> > 
> > Signed-off-by: Juergen Christ 
> 
> Ok. Thanks!
> 
> 
> Andreas

I guess after the recent change set from Jakub I should add
-fno-stack-protector to the new test files.  Still okay with this
change?

> 
> 
> > ---
> >   gcc/config/s390/s390.cc   | 169 +++-
> >   .../gcc.target/s390/vector/vec-perm-merge-1.c | 242 ++
> >   .../gcc.target/s390/vector/vec-perm-pack-1.c  | 133 ++
> >   3 files changed, 542 insertions(+), 2 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-perm-merge-1.c
> >   create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-perm-pack-1.c
> > 
> > diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> > index 38267202f668..de9c15c7bd42 100644
> > --- a/gcc/config/s390/s390.cc
> > +++ b/gcc/config/s390/s390.cc
> > @@ -18041,9 +18041,34 @@ expand_perm_with_merge (const struct 
> > expand_vec_perm_d &d)
> > static const unsigned char lo_perm_qi_swap[16]
> >   = {17, 1, 19, 3, 21, 5, 23, 7, 25, 9, 27, 11, 29, 13, 31, 15};
> > +  static const unsigned char hi_perm_qi_di[16]
> > += {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23};
> > +  static const unsigned char hi_perm_qi_si[16]
> > += {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23};
> > +  static const unsigned char hi_perm_qi_hi[16]
> > += {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23};
> > +
> > +  static const unsigned char lo_perm_qi_di[16]
> > += {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31};
> > +  static const unsigned char lo_perm_qi_si[16]
> > += {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31};
> > +  static const unsigned char lo_perm_qi_hi[16]
> > += {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31};
> > +
> > +  static const unsigned char hi_perm_hi_si[8] = {0, 1, 8, 9, 2, 3, 10, 11};
> > +  static const unsigned char hi_perm_hi_di[8] = {0, 1, 2, 3, 8, 9, 10, 11};
> > +
> > +  static const unsigned char lo_perm_hi_si[8] = {4, 5, 12, 13, 6, 7, 14, 
> > 15};
> > +  static const unsigned char lo_perm_hi_di[8] = {4, 5, 6, 7, 12, 13, 14, 
> > 15};
> > +
> > +  static const unsigned char hi_perm_si_di[4] = {0, 1, 4, 5};
> > +
> > +  static const unsigned char lo_perm_si_di[4] = {2, 3, 6, 7};
> > +
> > bool merge_lo_p = false;
> > bool merge_hi_p = false;
> > bool swap_operands_p = false;
> > +  machine_mode mergemode = d.vmode;
> > if ((d.nelt == 2 && memcmp (d.perm, hi_perm_di, 2) == 0)
> > || (d.nelt == 4 && memcmp (d.perm, hi_perm_si, 4) == 0)
> > @@ -18075,6 +18100,75 @@ expand_perm_with_merge (const struct 
> > expand_vec_perm_d &d)
> > merge_lo_p = true;
> > swap_operands_p = true;
> >   }
> > +  else if (d.nelt == 16)
> > +{
> > +  if (memcmp (d.perm, hi_perm_qi_di, 16) == 0)
> > +   {
> > + merge_hi_p = true;
> > + mergemode = E_V2DImode;
> > +   }
> > +  else if (memcmp (d.perm, hi_perm_qi_si, 16) == 0)
> > +   {
> > + merge_hi_p = true;
> > + mergemode = E_V4SImode;
> > +   }
> > +  else if (memcmp (d.perm, hi_perm_qi_hi, 16) == 0)
> > +   {
> > + merge_hi_p = true;
> > + mergemode = E_V8HImode;
> > +   }
> > +  else if (memcmp (d.perm, lo_perm_qi_di, 16) == 0)
> > +   {
> > + merge_lo_p = true;
> > + mergemode = E_V2DImode;
> > +   }
> > +  else if (memcmp (d.perm, lo_perm_qi_si, 16) == 0)
> > +   {
> > + merge_lo_p = true;
> > + mergemode = E_V4SImode;
> > +   }
> > +  else if (memcmp (d.perm, lo_perm_qi_hi, 16) == 0)
> > +   {
> > + merge_lo_p = true;
> > + mergemode = E_V8HImode;
> > +   }
> > +}
> > +  else if (d.nelt == 8)
> > +{
> > +  if (memcmp (d.perm, hi_perm_hi_di, 8) == 0)
> > +   {
> > + merge_hi_p = true;
> > + mergemode = E_V2DImode;
> > +   }
> > +  else if (memcmp (d.perm, hi_perm_hi_si, 8) == 0)
> > +   {
> > + merge_hi_p = true;
> > + mergemode = E_V4SImode;
> > +   }
> > +  else if (memcmp (d.perm, lo_perm_hi_di, 8) == 0)
> > +   {
> > + merge_lo_p = true;
> > + mergemode = E_V2DImode;
> > +   }
> > +  else if (memcmp (d.perm, lo_perm_hi_si, 8) == 0)
> > +   {
> > + merge_lo_p = true;
> > + mergemod

Re: [PATCH] c++: -fno-delete-null-pointer-checks constexpr addr comparison [PR71962]

2025-07-03 Thread Patrick Palka
On Thu, 3 Jul 2025, Jason Merrill wrote:

> On 7/2/25 7:58 PM, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > for trunk?
> > 
> > -- >8 --
> > 
> > Here the flag -fno-delete-null-pointer-checks causes the trivial address
> > comparison in
> > 
> >inline int a, b;
> >static_assert(&a != &b);
> > 
> > to be rejected as non-constant because with the flag we can't assume
> > such weak symbols are non-NULL, which causes symtab/fold-const.cc to
> > punt on such comparisons.  Note this also affects -fsanitize=undefined
> > since it implies -fno-delete-null-pointer-checks.
> 
> Right, the underlying problem is that we use the one flag to mean two things:
> 
> 1) a static storage duration decl can live at address 0
> 2) do more careful checking for null pointers/lvalues (i.e. in
> gimple_call_nonnull_result_p)
> 
> Both cases are related to checking for null, but they are different situations
> and really shouldn't depend on the same flag.
> 
> Your patch seems wrong for #1 targets; on such a target 'a' might end up
> allocated at address 0, so "&a != nullptr" is not decidable at compile time.
> 
> OTOH such targets are a small minority, and I suspect they already have other
> C++ issues with e.g. a conversion to base not adjusting a null pointer.

Yep, and normally I would not be so bold to propose making such a
trade-off, but this seems to be exactly the trade-off we made in
PR96862 for -frounding-math?  The flag makes lossy floating-point
operations depend on the run-time rounding mode and so not decidable at
compile time, so we ended up disabling the flag during constexpr
evaluation and using the default rounding mode.  I don't immediately
see why -frounding-math maybe be special.

On a related note, I notice we accept

  [[gnu::weak]] inline int a, b;
  static_assert(&a != &b);

with the default -fdelete-null-pointer-checks, which seems wrong,
we probably should reject address comparisons of weak symbols as
non-constant by default.

> 
> Jakub, what do you think?  It's been 9 years since you proposed a better fix
> in the PR, but that hasn't happened yet.
> 
> > This issue seems conceptually the same as PR96862 which was about
> > -frounding-math breaking some constexpr floating point arithmetic,
> > and we fixed that PR by disabling -frounding-math during manifestly
> > constant evaluation.  This patch proposes to do the same for
> > -fno-delete-null-pointer-checks, disabling it during maniestly constant
> > evaluation.  I opted to disable it narrowly around the relevant
> > fold_binary call which seems to address all reported constexpr failures,
> > but we could consider it disabling it more broadly as well.
> > 
> > PR c++/71962
> > 
> > gcc/cp/ChangeLog:
> > 
> > * constexpr.cc (cxx_eval_binary_expression): Set
> > flag_delete_null_pointer_checks alongside folding_cxx_constexpr
> > during manifestly constant evaluation.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/ext/constexpr-pr71962.C: New test.
> > * g++.dg/ubsan/pr71962.C: New test.
> > ---
> >   gcc/cp/constexpr.cc  |  2 ++
> >   gcc/testsuite/g++.dg/ext/constexpr-pr71962.C | 18 ++
> >   gcc/testsuite/g++.dg/ubsan/pr71962.C |  5 +
> >   3 files changed, 25 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-pr71962.C
> >   create mode 100644 gcc/testsuite/g++.dg/ubsan/pr71962.C
> > 
> > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> > index 704d936f2ec3..e8426b40c543 100644
> > --- a/gcc/cp/constexpr.cc
> > +++ b/gcc/cp/constexpr.cc
> > @@ -4068,6 +4068,8 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx,
> > tree t,
> >   || TREE_CODE (type) != REAL_TYPE))
> > {
> >   auto ofcc = make_temp_override (folding_cxx_constexpr, true);
> > + auto odnpc = make_temp_override (flag_delete_null_pointer_checks,
> > +  true);
> >   r = fold_binary_initializer_loc (loc, code, type, lhs, rhs);
> > }
> > else
> > diff --git a/gcc/testsuite/g++.dg/ext/constexpr-pr71962.C
> > b/gcc/testsuite/g++.dg/ext/constexpr-pr71962.C
> > new file mode 100644
> > index ..57cb14ac804e
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/ext/constexpr-pr71962.C
> > @@ -0,0 +1,18 @@
> > +// PR c++/71962
> > +// { dg-do compile { target c++11 } }
> > +// { dg-additional-options "-fno-delete-null-pointer-checks" }
> > +
> > +struct A { void f(); };
> > +static_assert(&A::f != nullptr, "");
> > +
> > +#if __cpp_inline_variables
> > +inline int a, b;
> > +static_assert(&a != &b, "");
> > +static_assert(&a != nullptr, "");
> > +#endif
> > +
> > +int main() {
> > +  static int x, y;
> > +  static_assert(&x != &y, "");
> > +  static_assert(&x != nullptr, "");
> > +}
> > diff --git a/gcc/testsuite/g++.dg/ubsan/pr71962.C
> > b/gcc/testsuite/g++.dg/ubsan/pr71962.C
> > new file mode 100644
> > index ..f17c825da449
> > --- /dev

[PATCH v1 1/2] Add TARGET_ARG_EXTENDED_ON_STACK

2025-07-03 Thread Palmer Dabbelt
We currently handle arguments that are split between the stack and
registers by storing the registers to the stack and then treating the
argument as if it was entirely passed on the stack.  Allow targets to
override this behavior and instead treat the argument as if it was
passed entirely in registers.

gcc/ChangeLog:

* doc/tm.texi: Add TARGET_ARG_EXTENDED_ON_STACK.
* doc/tm.texi.in: Likewise.
* function.cc (struct assign_parm_data_one): Add extended.
(assign_parm_find_entry_rtl): Set and use extended.
(assign_parm_is_stack_parm): Use extended.
(assign_parm_adjust_entry_rtl): Likewise.
(assign_parm_setup_reg): Likewise.
(assign_parm_setup_stack): Likewise.
* target.def: Add TARGET_ARG_EXTENDED_ON_STACK.
* targhooks.cc (hook_int_CUMULATIVE_ARGS_arg_info_1): New hook.
* targhooks.h: Likewise.
---
 gcc/doc/tm.texi|  9 +
 gcc/doc/tm.texi.in |  2 ++
 gcc/function.cc| 39 ---
 gcc/target.def | 11 +++
 gcc/targhooks.cc   |  7 +++
 gcc/targhooks.h|  2 ++
 6 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5e305643b3a..356522d53c3 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4320,6 +4320,15 @@ register to be used by the caller for this argument; 
likewise
 @code{TARGET_FUNCTION_INCOMING_ARG}, for the called function.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_ARG_EXTENDED_ON_STACK (cumulative_args_t 
@var{cum}, const function_arg_info @var{&arg})
+When arguments are split between the registers and the stack, it
+is usually profitable to construct the argument in place on the stack
+and then load it into a register.  These sequences may cause issues for
+some systems, for example by generating misaligned accesses.  This target
+hook determines if arguments should be construct in place on the stack, or
+the whole value should be reconstructed.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_PASS_BY_REFERENCE (cumulative_args_t 
@var{cum}, const function_arg_info @var{&arg})
 This target hook should return @code{true} if argument @var{arg} at the
 position indicated by @var{cum} should be passed by reference.  This
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index eccc4d88493..925d5efd835 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3338,6 +3338,8 @@ the stack.
 
 @hook TARGET_ARG_PARTIAL_BYTES
 
+@hook TARGET_ARG_EXTENDED_ON_STACK
+
 @hook TARGET_PASS_BY_REFERENCE
 
 @hook TARGET_CALLEE_COPIES
diff --git a/gcc/function.cc b/gcc/function.cc
index 48167b0c207..7aa389f77bf 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -2297,6 +2297,7 @@ struct assign_parm_data_one
   machine_mode passed_mode;
   struct locate_and_pad_arg_data locate;
   int partial;
+  int extended;
 };
 
 /* A subroutine of assign_parms.  Initialize ALL.  */
@@ -2566,18 +2567,23 @@ assign_parm_find_entry_rtl (struct assign_parm_data_all 
*all,
 
   if (entry_parm)
 {
-  int partial;
+  int partial, extended;
 
   partial = targetm.calls.arg_partial_bytes (all->args_so_far, data->arg);
   data->partial = partial;
 
+  extended = targetm.calls.arg_extended_on_stack (all->args_so_far, 
data->arg);
+  data->extended = extended;
+
   /* The caller might already have allocated stack space for the
 register parameters.  */
-  if (partial != 0 && all->reg_parm_stack_space == 0)
+  if (partial != 0 && all->reg_parm_stack_space == 0 && extended)
{
  /* Part of this argument is passed in registers and part
 is passed on the stack.  Ask the prologue code to extend
-the stack part so that we can recreate the full value.
+the stack part so that we can recreate the full value, unless for
+some reason the backend doesn't like that in which case we'll
+synthesize the argument via subword moves later.
 
 PRETEND_BYTES is the size of the registers we need to store.
 CURRENT_FUNCTION_PRETEND_ARGS_SIZE is the amount of extra
@@ -2631,8 +2637,9 @@ assign_parm_is_stack_parm (struct assign_parm_data_all 
*all,
   /* Trivially true if we've no incoming register.  */
   if (data->entry_parm == NULL)
 ;
-  /* Also true if we're partially in registers and partially not,
- since we've arranged to drop the entire argument on the stack.  */
+  /* Also true if we're partially in registers and partially not, as we'll
+ either extend the argument in-place on the stack or fix it up later
+ depending on what the target wants.  */
   else if (data->partial != 0)
 ;
   /* Also true if the target says that it's passed in both registers
@@ -2749,7 +2756,7 @@ assign_parm_adjust_entry_rtl (struct assign_parm_data_one 
*data)
  In the special case of a DImode or DFmode that is split, we could put
  it together in a pseudoreg directly, but for now that

[PATCH v1 0/2] Allow targets to avoid materializing split parameters via stack extension [PR/82106]

2025-07-03 Thread Palmer Dabbelt
This is really Jim's code, but it's been sitting around in Bugzilla for a while
so I've picked it up.  All I really did here is add a target hook and mangle
some comments, but I think I understand enough about what's going on to try and
get things moving forward.  So I'm writing up a pretty big cover letter to try
and summarize what I think is going on here, as it's definitely not something I
fully understand yet.

We've got a quirk in the RISC-V ABI where DF arguments on rv32 get split into
an X register and a 32-bit aligned stack slot.  The middle-end prologue code
just stores out the X register and treats the argument as if it was entirely
passed on the stack.  This can result in a misaligned load, and those are still
slow on a bunch of RISC-V systems.

This patch set adds a target hook that essentially biases the middle-end the
other way: load the stack part of the argument and then merge it with the
register part via subword moves.  That's essentially handling these via
register-register operations, but for the specific case that trips up as a
misaligned access bug on RISC-V the generated code ends up with more memory
ops.

More specifically, the included test case is essentially

double foo(..., double split) { return split; }

with the arguments sot up so "split" has 32 bits in a7 (an integer register
used for arguments) and 32 bits on the stack.  The return goes into a
floating-point register, as they're 64 bits on rv32ifd (even when integer
registers are only 32 bits).

Without this patch (and with this patch on targets with fast misaligned
accesses) that generates

sw  a7,12(sp)
fld fa0,12(sp)

and with this patch (on a subtarget with slow misaligned access) ends up as

lw  a5,16(sp)
sw  a7,8(sp)
sw  a5,12(sp)
fld fa0,8(sp)

That looks a little odd, but I think it's actually good code -- the only way to
get a double into a register on rv32 is to load it from memory, so without
misaligned loads we're sort of just stuck there.

While playing around writing this cover letter I came up with another case
that's essentially

long long foo(..., long long split) { return split; }

that used to generate 

sw  a7,12(sp)
lw  a0,12(sp)
lw  a1,16(sp)

and now generates

lw  a1,0(sp)
mv  a0,a7

so I do think we've at least got some room for new optimizations here, maybe
even on other targets.

The target hook will need some adjustment, but ultimately I'm not even sure if
a target hook is the way to go here.  It was just an easy way to flip the
behavior so I could play around with some of Jim's code.  It kind of feels like
the load/subword merge version would result in better code in general, but I'm
not sure on that one.

That said, I figured I'd just send it out so others could see this.  It's very
much out of my wheel house, so I'd be shocked if this doesn't cause any
failures...




[PATCH v1 2/2] RISC-V: Implement TARGET_ARG_EXTENDED_ON_STACK.

2025-07-03 Thread Palmer Dabbelt
When we split an argument between the stack and a registers we might end
up with a misaligned access, so use this newly implemented hook to
instead bias the codegen towards the registers rather than the stack.

PR/82106

gcc/ChangeLog:

* config/riscv/riscv.cc (struct riscv_arg_info): Add ap_offset.
(riscv_get_arg_info): Set ap_offset.
(riscv_arg_extended_on_stack): New hook.
(TARGET_ARG_EXTENDED_ON_STACK): Likewise.
---
 gcc/config/riscv/riscv.cc| 30 
 gcc/testsuite/gcc.target/riscv/pr82106.c | 12 ++
 2 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr82106.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index bbc7547d385..4c922f8fca3 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -247,6 +247,9 @@ struct riscv_arg_info {
 
   /* The offset of the first register used, provided num_mrs is nonzero.  */
   unsigned int mr_offset;
+
+  /* The offset from the (virtual) arg pointer of this argument, if it is on 
the stack.  */
+  unsigned int ap_offset;
 };
 
 /* One stage in a constant building sequence.  These sequences have
@@ -6335,6 +6338,7 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   info->num_fprs = 0;
   info->num_gprs = MIN (num_words, MAX_ARGS_IN_REGISTERS - info->gpr_offset);
   info->stack_p = (num_words - info->num_gprs) != 0;
+  info->ap_offset = (num_words - info->num_gprs) * UNITS_PER_WORD;
 
   if (info->num_gprs || return_p)
 return gen_rtx_REG (mode, gpr_base + info->gpr_offset);
@@ -6405,6 +6409,30 @@ riscv_arg_partial_bytes (cumulative_args_t cum,
   return arg.stack_p ? arg.num_gprs * UNITS_PER_WORD : 0;
 }
 
+/* Implement TARGET_ARG_EXTENDED_ON_STACK.  */
+
+static int
+riscv_arg_extended_on_stack (cumulative_args_t cum,
+const function_arg_info &generic_arg)
+{
+  struct riscv_arg_info arg;
+  poly_int64 mode_size;
+
+  /* For machines with fast unaligned accesses we'll always be better off
+   * mangling the access in place.  */
+  if (! riscv_slow_unaligned_access_p)
+return 1;
+
+  riscv_get_arg_info (&arg, get_cumulative_args (cum), generic_arg.mode,
+ generic_arg.type, generic_arg.named, false);
+
+  mode_size = GET_MODE_SIZE (generic_arg.mode);
+  gcc_assert (mode_size.is_constant ());
+  /* This assumes the arg pointer is aligned to the type size.  IIRC this isn't
+   * true for the 32-bit embedded ABI, but I don't remember if we implemented 
that.  */
+  return (arg.ap_offset % mode_size.to_constant ()) == 0;
+}
+
 /* Implement FUNCTION_VALUE and LIBCALL_VALUE.  For normal calls,
VALTYPE is the return type and MODE is VOIDmode.  For libcalls,
VALTYPE is null and MODE is the mode of the return value.  */
@@ -14881,6 +14909,8 @@ synthesize_and (rtx operands[3])
 #define TARGET_PASS_BY_REFERENCE riscv_pass_by_reference
 #undef TARGET_ARG_PARTIAL_BYTES
 #define TARGET_ARG_PARTIAL_BYTES riscv_arg_partial_bytes
+#undef TARGET_ARG_EXTENDED_ON_STACK
+#define TARGET_ARG_EXTENDED_ON_STACK riscv_arg_extended_on_stack
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG riscv_function_arg
 #undef TARGET_FUNCTION_ARG_ADVANCE
diff --git a/gcc/testsuite/gcc.target/riscv/pr82106.c 
b/gcc/testsuite/gcc.target/riscv/pr82106.c
new file mode 100644
index 000..7bcfbaf8723
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr82106.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32ifd -mabi=ilp32d -O2 -mtune=generic" } */
+
+double mla(float fa0, float fa1, float fa2, float fa3, float fa4, float fa5, 
+float fa6, float fa7, int a0, int a1, int a2, int a3, int a4, int a5, int 
+a6, double a7_s0, double unused)
+{
+  return a7_s0;
+}
+
+/* { dg-final { scan-assembler-not "fld\tfa0,12(sp)" } } */
+/* { dg-final { scan-assembler-times "fld\tfa0,8(sp)" 1 } } */
-- 
2.39.5 (Apple Git-154)



[COMMITTED 10/42] ada: Correct documentation of policy_identifiers for Assertion_Policy

2025-07-03 Thread Marc Poulhiès
From: Bob Duff 

Follow-on to gnat-945.

Change Ignore to Disable; Ignore is defined by the language,
Disable is the implementation-defined one.

Also minor code cleanup.

gcc/ada/ChangeLog:

* doc/gnat_rm/implementation_defined_characteristics.rst:
Change Ignore to Disable.
* sem_ch13.ads (Analyze_Aspect_Specifications):
Minor: Remove incorrect comment; there is no need to check
Has_Aspects (N) at the call site.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/doc/gnat_rm/implementation_defined_characteristics.rst | 2 +-
 gcc/ada/gnat_rm.texi   | 2 +-
 gcc/ada/gnat_ugn.texi  | 2 +-
 gcc/ada/sem_ch13.ads   | 3 +--
 4 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/doc/gnat_rm/implementation_defined_characteristics.rst 
b/gcc/ada/doc/gnat_rm/implementation_defined_characteristics.rst
index 563f62a04f9..f7746c8e72f 100644
--- a/gcc/ada/doc/gnat_rm/implementation_defined_characteristics.rst
+++ b/gcc/ada/doc/gnat_rm/implementation_defined_characteristics.rst
@@ -463,7 +463,7 @@ Implementation-defined assertion_aspect_marks include 
Assert_And_Cut,
 Assume, Contract_Cases, Debug, Ghost, Initial_Condition, Loop_Invariant,
 Loop_Variant, Postcondition, Precondition, Predicate, Refined_Post,
 Statement_Assertions, and Subprogram_Variant. Implementation-defined
-policy_identifiers include Ignore and Suppressible.
+policy_identifiers include Disable and Suppressible.
 
 *
   "The default assertion policy.  See 11.4.2(10)."
diff --git a/gcc/ada/gnat_rm.texi b/gcc/ada/gnat_rm.texi
index 6e95e34359a..79fb225a555 100644
--- a/gcc/ada/gnat_rm.texi
+++ b/gcc/ada/gnat_rm.texi
@@ -16894,7 +16894,7 @@ Implementation-defined assertion_aspect_marks include 
Assert_And_Cut,
 Assume, Contract_Cases, Debug, Ghost, Initial_Condition, Loop_Invariant,
 Loop_Variant, Postcondition, Precondition, Predicate, Refined_Post,
 Statement_Assertions, and Subprogram_Variant. Implementation-defined
-policy_identifiers include Ignore and Suppressible.
+policy_identifiers include Disable and Suppressible.
 
 
 @itemize *
diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi
index 6cd0bed8d67..7b3175e3d27 100644
--- a/gcc/ada/gnat_ugn.texi
+++ b/gcc/ada/gnat_ugn.texi
@@ -30297,8 +30297,8 @@ to permit their use in free software.
 
 @printindex ge
 
-@anchor{gnat_ugn/gnat_utility_programs switches-related-to-project-files}@w{   
   }
 @anchor{d2}@w{  }
+@anchor{gnat_ugn/gnat_utility_programs switches-related-to-project-files}@w{   
   }
 
 @c %**end of body
 @bye
diff --git a/gcc/ada/sem_ch13.ads b/gcc/ada/sem_ch13.ads
index 9bf1ce310c5..f2c5f706200 100644
--- a/gcc/ada/sem_ch13.ads
+++ b/gcc/ada/sem_ch13.ads
@@ -43,8 +43,7 @@ package Sem_Ch13 is
 
procedure Analyze_Aspect_Specifications (N : Node_Id; E : Entity_Id);
--  This procedure is called to analyze aspect specifications for node N. E
-   --  is the corresponding entity declared by the declaration node N. Callers
-   --  should check that Has_Aspects (N) is True before calling this routine.
+   --  is the corresponding entity declared by the declaration node N.
 
procedure Analyze_Aspects_On_Subprogram_Body_Or_Stub (N : Node_Id);
--  Analyze the aspect specifications of [generic] subprogram body or stub
-- 
2.43.0



[COMMITTED 04/42] ada: Fix wrong conversion of controlled array with representation change

2025-07-03 Thread Marc Poulhiès
From: Eric Botcazou 

The problem is that a temporary is created for the conversion because of the
representation change, and it is finalized without having been initialized.

gcc/ada/ChangeLog:

* exp_ch4.adb (Handle_Changed_Representation): Alphabetize local
variables.  Set the No_Finalize_Actions flag on the assignment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch4.adb | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index b4270021faf..a845982d690 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -11285,11 +11285,12 @@ package body Exp_Ch4 is
   ---
 
   procedure Handle_Changed_Representation is
- Temp : Entity_Id;
- Decl : Node_Id;
- Odef : Node_Id;
- N_Ix : Node_Id;
  Cons : List_Id;
+ Decl : Node_Id;
+ N_Ix : Node_Id;
+ Odef : Node_Id;
+ Stmt : Node_Id;
+ Temp : Entity_Id;
 
   begin
  --  Nothing else to do if no change of representation
@@ -11432,19 +11433,24 @@ package body Exp_Ch4 is
 Defining_Identifier => Temp,
 Object_Definition   => Odef);
 
-Set_No_Initialization (Decl, True);
+--  The temporary need not be initialized
+
+Set_No_Initialization (Decl);
+
+Stmt :=
+  Make_Assignment_Statement (Loc,
+Name   => New_Occurrence_Of (Temp, Loc),
+Expression => Relocate_Node (N));
+
+--  And, therefore, cannot be finalized
+
+Set_No_Finalize_Actions (Stmt);
 
 --  Insert required actions. It is essential to suppress checks
 --  since we have suppressed default initialization, which means
 --  that the variable we create may have no discriminants.
 
-Insert_Actions (N,
-  New_List (
-Decl,
-Make_Assignment_Statement (Loc,
-  Name   => New_Occurrence_Of (Temp, Loc),
-  Expression => Relocate_Node (N))),
-Suppress => All_Checks);
+Insert_Actions (N, New_List (Decl, Stmt), Suppress => All_Checks);
 
 Rewrite (N, New_Occurrence_Of (Temp, Loc));
 return;
-- 
2.43.0



[COMMITTED 02/42] ada: Fix ALI elaboration flags for ghost compilation units (cont.)

2025-07-03 Thread Marc Poulhiès
From: Piotr Trojanek 

When GNAT was compiling a ghost unit, the ALI file wrongly suggested that this
unit required elaboration counters, which caused linking errors to non-existing
objects.

gcc/ada/ChangeLog:

* sem_ch10.adb (Analyze_Compilation_Unit): Ignored ghost unit need no
elaboration checks.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch10.adb | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/ada/sem_ch10.adb b/gcc/ada/sem_ch10.adb
index 45aabadf21f..3a44149aeff 100644
--- a/gcc/ada/sem_ch10.adb
+++ b/gcc/ada/sem_ch10.adb
@@ -1491,6 +1491,10 @@ package body Sem_Ch10 is
 --  No checks required if no separate spec
 
 or else Acts_As_Spec (N)
+
+--  No checked needed for ignored ghost units
+
+or else Is_Ignored_Ghost_Entity (Spec_Id)
   )
 then
--  This is a case where we only need the entity for checking to
-- 
2.43.0



[COMMITTED 07/42] ada: Fix SPARK context discovery from within subunits

2025-07-03 Thread Marc Poulhiès
From: Piotr Trojanek 

When navigating the AST to find the enclosing subprogram we must traverse
from subunits to the corresponding stub.

gcc/ada/ChangeLog:

* lib-xref-spark_specific.adb
(Enclosing_Subprogram_Or_Library_Package): Traverse subunits and body
stubs.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/lib-xref-spark_specific.adb | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/ada/lib-xref-spark_specific.adb 
b/gcc/ada/lib-xref-spark_specific.adb
index d77d6aa4dd0..03693a96bae 100644
--- a/gcc/ada/lib-xref-spark_specific.adb
+++ b/gcc/ada/lib-xref-spark_specific.adb
@@ -258,6 +258,13 @@ package body SPARK_Specific is
Context := Defining_Entity (Context);
exit;
 
+when N_Subunit =>
+   Context := Corresponding_Stub (Context);
+
+when N_Body_Stub =>
+   Context := Corresponding_Spec_Of_Stub (Context);
+   exit;
+
 when others =>
Context := Parent (Context);
  end case;
-- 
2.43.0



[COMMITTED 21/42] ada: Refine subtypes in task-counting code

2025-07-03 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup; semantics is unaffected.

gcc/ada/ChangeLog:

* exp_ch3.adb (Count_Default_Sized_Task_Stacks): Refine subtypes of
parameters; same for callsites.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index f5173936943..5a47a5a5132 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -6908,8 +6908,8 @@ package body Exp_Ch3 is
 
   procedure Count_Default_Sized_Task_Stacks
 (Typ : Entity_Id;
- Pri_Stacks  : out Int;
- Sec_Stacks  : out Int);
+ Pri_Stacks  : out Nat;
+ Sec_Stacks  : out Nat);
   --  Count the number of default-sized primary and secondary task stacks
   --  required for task objects contained within type Typ. If the number of
   --  task objects contained within the type is not known at compile time
@@ -7186,8 +7186,8 @@ package body Exp_Ch3 is
 
   procedure Count_Default_Sized_Task_Stacks
 (Typ : Entity_Id;
- Pri_Stacks  : out Int;
- Sec_Stacks  : out Int)
+ Pri_Stacks  : out Nat;
+ Sec_Stacks  : out Nat)
   is
  Component : Entity_Id;
 
@@ -7259,8 +7259,8 @@ package body Exp_Ch3 is
 
while Present (Component) loop
   declare
- P : Int;
- S : Int;
+ P : Nat;
+ S : Nat;
 
   begin
  Count_Default_Sized_Task_Stacks (Etype (Component), P, S);
@@ -7678,7 +7678,7 @@ package body Exp_Ch3 is
 and then not (Is_Array_Type (Typ) and then Has_Init_Expression (N))
   then
  declare
-PS_Count, SS_Count : Int;
+PS_Count, SS_Count : Nat;
  begin
 Count_Default_Sized_Task_Stacks (Typ, PS_Count, SS_Count);
 Increment_Primary_Stack_Count (PS_Count);
-- 
2.43.0



[COMMITTED 11/42] ada: Remove unnecessary "return;" statements

2025-07-03 Thread Marc Poulhiès
From: Bob Duff 

A "return;" at the end of a procedure is unnecessary and
misleading. This patch removes them.

gcc/ada/ChangeLog:

* checks.adb: Remove unnecessary "return;" statements.
* eval_fat.adb: Likewise.
* exp_aggr.adb: Likewise.
* exp_attr.adb: Likewise.
* exp_ch3.adb: Likewise.
* exp_ch4.adb: Likewise.
* exp_ch5.adb: Likewise.
* exp_ch6.adb: Likewise.
* exp_unst.adb: Likewise.
* krunch.adb: Likewise.
* layout.adb: Likewise.
* libgnat/s-excdeb.adb: Likewise.
* libgnat/s-trasym__dwarf.adb: Likewise.
* par-endh.adb: Likewise.
* par-tchk.adb: Likewise.
* sem.adb: Likewise.
* sem_attr.adb: Likewise.
* sem_ch6.adb: Likewise.
* sem_elim.adb: Likewise.
* sem_eval.adb: Likewise.
* sfn_scan.adb: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb  | 10 ++
 gcc/ada/eval_fat.adb|  2 --
 gcc/ada/exp_aggr.adb|  4 ++--
 gcc/ada/exp_attr.adb|  2 +-
 gcc/ada/exp_ch3.adb |  6 +++---
 gcc/ada/exp_ch4.adb | 12 ++--
 gcc/ada/exp_ch5.adb |  4 ++--
 gcc/ada/exp_ch6.adb |  2 +-
 gcc/ada/exp_unst.adb|  2 --
 gcc/ada/krunch.adb  |  2 --
 gcc/ada/layout.adb  |  4 +---
 gcc/ada/libgnat/s-excdeb.adb|  2 +-
 gcc/ada/libgnat/s-trasym__dwarf.adb |  2 +-
 gcc/ada/par-endh.adb|  2 --
 gcc/ada/par-tchk.adb|  1 -
 gcc/ada/sem.adb |  3 ---
 gcc/ada/sem_attr.adb|  1 -
 gcc/ada/sem_ch6.adb |  1 -
 gcc/ada/sem_elim.adb|  2 --
 gcc/ada/sem_eval.adb|  1 -
 gcc/ada/sfn_scan.adb|  2 --
 21 files changed, 20 insertions(+), 47 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 6a98292d1cc..0b3ae02259e 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -750,7 +750,7 @@ package body Checks is
   --  mode then just skip the check (it is not required in any case).
 
   when RE_Not_Available =>
- return;
+ null;
end Apply_Address_Clause_Check;
 
-
@@ -1078,7 +1078,7 @@ package body Checks is
 
   exception
  when RE_Not_Available =>
-return;
+null;
   end;
end Apply_Arithmetic_Overflow_Strict;
 
@@ -6437,8 +6437,6 @@ package body Checks is
  if Debug_Flag_CC then
 w ("  exception occurred, overflow flag set");
  end if;
-
- return;
end Enable_Overflow_Check;
 

@@ -6686,8 +6684,6 @@ package body Checks is
  if Debug_Flag_CC then
 w ("  exception occurred, range flag set");
  end if;
-
- return;
end Enable_Range_Check;
 
--
@@ -7091,8 +7087,6 @@ package body Checks is
   end loop;
 
   --  If we fall through entry was not found
-
-  return;
end Find_Check;
 
-
diff --git a/gcc/ada/eval_fat.adb b/gcc/ada/eval_fat.adb
index 09a5b3fa1b7..5a2e43ef597 100644
--- a/gcc/ada/eval_fat.adb
+++ b/gcc/ada/eval_fat.adb
@@ -146,8 +146,6 @@ package body Eval_Fat is
   if UR_Is_Negative (X) then
  Fraction := -Fraction;
   end if;
-
-  return;
end Decompose;
 
---
diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index e3734a2d8c9..fcf57bf9c31 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -6633,7 +6633,7 @@ package body Exp_Aggr is
 
exception
   when RE_Not_Available =>
- return;
+ null;
end Expand_N_Aggregate;
 
---
@@ -7957,7 +7957,7 @@ package body Exp_Aggr is
 
exception
   when RE_Not_Available =>
- return;
+ null;
end Expand_N_Extension_Aggregate;
 
-
diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb
index 0f09ba587ac..4f9f16cfa55 100644
--- a/gcc/ada/exp_attr.adb
+++ b/gcc/ada/exp_attr.adb
@@ -8776,7 +8776,7 @@ package body Exp_Attr is
 
exception
   when RE_Not_Available =>
- return;
+ null;
end Expand_N_Attribute_Reference;
 

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index 7c18f81cb07..2372a9f11df 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -5956,7 +5956,7 @@ package body Exp_Ch3 is
 
exception
   when RE_Not_Available =>
- return;
+ null;
end Expand_Freeze_Enumeration_Type;
 
---
@@ -9239,7 +9239,7 @@ package body Exp_Ch3 is
 
exception
   when RE_Not_Available =>
- return;
+ null;
end Expand_N_Object_Declaration;
 

[COMMITTED 05/42] ada: Fix index bounds check in Super_Delete functions and procedures

2025-07-03 Thread Marc Poulhiès
From: Aleksandra Pasek 

gcc/ada/ChangeLog:

* libgnat/a-strsup.adb (Super_Delete): Fix index check.
* libgnat/a-stwisu.adb (Super_Delete): Likewise.
* libgnat/a-stzsup.adb (Super_Delete): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/a-strsup.adb | 4 ++--
 gcc/ada/libgnat/a-stwisu.adb | 4 ++--
 gcc/ada/libgnat/a-stzsup.adb | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/libgnat/a-strsup.adb b/gcc/ada/libgnat/a-strsup.adb
index 8afde718581..3ac1a5ac724 100644
--- a/gcc/ada/libgnat/a-strsup.adb
+++ b/gcc/ada/libgnat/a-strsup.adb
@@ -755,7 +755,7 @@ package body Ada.Strings.Superbounded with SPARK_Mode is
   if Num_Delete <= 0 then
  return Source;
 
-  elsif From - 1 > Slen then
+  elsif From > Slen then
  raise Ada.Strings.Index_Error;
 
   elsif Through >= Slen then
@@ -784,7 +784,7 @@ package body Ada.Strings.Superbounded with SPARK_Mode is
   if Num_Delete <= 0 then
  return;
 
-  elsif From - 1 > Slen then
+  elsif From > Slen then
  raise Ada.Strings.Index_Error;
 
   elsif Through >= Slen then
diff --git a/gcc/ada/libgnat/a-stwisu.adb b/gcc/ada/libgnat/a-stwisu.adb
index e7e6b1f75c1..28ae887cc5a 100644
--- a/gcc/ada/libgnat/a-stwisu.adb
+++ b/gcc/ada/libgnat/a-stwisu.adb
@@ -753,7 +753,7 @@ package body Ada.Strings.Wide_Superbounded is
   if Num_Delete <= 0 then
  return Source;
 
-  elsif From > Slen + 1 then
+  elsif From > Slen then
  raise Ada.Strings.Index_Error;
 
   elsif Through >= Slen then
@@ -782,7 +782,7 @@ package body Ada.Strings.Wide_Superbounded is
   if Num_Delete <= 0 then
  return;
 
-  elsif From > Slen + 1 then
+  elsif From > Slen then
  raise Ada.Strings.Index_Error;
 
   elsif Through >= Slen then
diff --git a/gcc/ada/libgnat/a-stzsup.adb b/gcc/ada/libgnat/a-stzsup.adb
index fb1baf6c62c..5dcbadf3c03 100644
--- a/gcc/ada/libgnat/a-stzsup.adb
+++ b/gcc/ada/libgnat/a-stzsup.adb
@@ -754,7 +754,7 @@ package body Ada.Strings.Wide_Wide_Superbounded is
   if Num_Delete <= 0 then
  return Source;
 
-  elsif From > Slen + 1 then
+  elsif From > Slen then
  raise Ada.Strings.Index_Error;
 
   elsif Through >= Slen then
@@ -783,7 +783,7 @@ package body Ada.Strings.Wide_Wide_Superbounded is
   if Num_Delete <= 0 then
  return;
 
-  elsif From > Slen + 1 then
+  elsif From > Slen then
  raise Ada.Strings.Index_Error;
 
   elsif Through >= Slen then
-- 
2.43.0



[COMMITTED 12/42] ada: Fix assertion failure on finalizable aggregate

2025-07-03 Thread Marc Poulhiès
From: Ronan Desplanques 

The Finalizable aspect makes it possible that
Insert_Actions_In_Scope_Around is entered with an empty list of after
actions. This patch fixes a condition that was not quite right in this
case.

gcc/ada/ChangeLog:

* exp_ch7.adb (Insert_Actions_In_Scope_Around): Fix condition.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 4d2b8348048..381294b05d6 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -5460,7 +5460,7 @@ package body Exp_Ch7 is
 
   --  Finalization calls are inserted after the target
 
-  if Present (Act_After) then
+  if Is_Non_Empty_List (Act_After) then
  Last_Obj := Last (Act_After);
  Insert_List_After (Target, Act_After);
   else
-- 
2.43.0



[COMMITTED 08/42] ada: Call Semantics when analyzing a renamed package

2025-07-03 Thread Marc Poulhiès
From: Viljar Indus 

Calling Semantics here will additionally update the reference to
Current_Sem_Unit the renamed unit so that we will not receive
bogus visibility errors when checking for self-referential with-s.

gcc/ada/ChangeLog:

* sem_ch10.adb(Analyze_With_Clause): Call Semantics instead
of Analyze to bring Current_Sem_Unit up to date.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch10.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_ch10.adb b/gcc/ada/sem_ch10.adb
index 3a44149aeff..f58513d115f 100644
--- a/gcc/ada/sem_ch10.adb
+++ b/gcc/ada/sem_ch10.adb
@@ -3299,7 +3299,7 @@ package body Sem_Ch10 is
 --  the renamed unit, and the renaming declaration itself has not
 --  been analyzed.
 
-Analyze (Parent (Parent (Entity (Pref;
+Semantics (Parent (Parent (Entity (Pref;
 pragma Assert (Renamed_Entity (Entity (Pref)) = Par_Name);
 Par_Name := Entity (Pref);
  end if;
-- 
2.43.0



[COMMITTED 25/42] ada: Fix minor fallout of latest change

2025-07-03 Thread Marc Poulhiès
From: Eric Botcazou 

This adjusts the header of the renamed files and adds missing blank lines.

gcc/ada/ChangeLog:

* errid.ads: Adjust header to renaming and fix copyright line.
* errid.adb: Adjust header to renaming and add blank line.
* erroutc-pretty_emitter.ads: Adjust header to renaming.
* erroutc-pretty_emitter.adb: Likewise.
* erroutc-sarif_emitter.ads: Likewise.
* erroutc-sarif_emitter.adb: Likewise.
* errsw.ads: Adjust header to renaming and add blank line.
* errsw.adb: Likewise.
* json_utils.ads: Likewise.
* json_utils.adb: Adjust header to renaming.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/errid.adb  | 3 ++-
 gcc/ada/errid.ads  | 5 +++--
 gcc/ada/erroutc-pretty_emitter.adb | 2 +-
 gcc/ada/erroutc-pretty_emitter.ads | 2 +-
 gcc/ada/erroutc-sarif_emitter.adb  | 2 +-
 gcc/ada/erroutc-sarif_emitter.ads  | 3 +--
 gcc/ada/errsw.adb  | 3 ++-
 gcc/ada/errsw.ads  | 3 ++-
 gcc/ada/json_utils.adb | 2 +-
 gcc/ada/json_utils.ads | 3 ++-
 10 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/gcc/ada/errid.adb b/gcc/ada/errid.adb
index a661fcf1e0b..46d319e2d54 100644
--- a/gcc/ada/errid.adb
+++ b/gcc/ada/errid.adb
@@ -2,7 +2,7 @@
 --  --
 -- GNAT COMPILER COMPONENTS --
 --  --
---   D I A G N O S T I C S . R E P O S I T O R Y--
+--E R R I D --
 --  --
 -- B o d y  --
 --  --
@@ -22,6 +22,7 @@
 -- Extensive contributions were provided by Ada Core Technologies Inc.  --
 --  --
 --
+
 with JSON_Utils; use JSON_Utils;
 with Output; use Output;
 
diff --git a/gcc/ada/errid.ads b/gcc/ada/errid.ads
index acabc3b1dfe..4d56d73cdf5 100644
--- a/gcc/ada/errid.ads
+++ b/gcc/ada/errid.ads
@@ -2,11 +2,11 @@
 --  --
 -- GNAT COMPILER COMPONENTS --
 --  --
---   D I A G N O S T I C S . R E P O S I T O R Y--
+--E R R I D --
 --  --
 -- S p e c  --
 --  --
---  Copyright (C) 19925, Free Software Foundation, Inc. --
+--  Copyright (C) 1992-2025, Free Software Foundation, Inc. --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -22,6 +22,7 @@
 -- Extensive contributions were provided by Ada Core Technologies Inc.  --
 --  --
 --
+
 with Types; use Types;
 with Errsw; use Errsw;
 
diff --git a/gcc/ada/erroutc-pretty_emitter.adb 
b/gcc/ada/erroutc-pretty_emitter.adb
index d9bf560dd8d..72cc03fadb5 100644
--- a/gcc/ada/erroutc-pretty_emitter.adb
+++ b/gcc/ada/erroutc-pretty_emitter.adb
@@ -2,7 +2,7 @@
 --  --
 -- GNAT COMPILER COMPONENTS --
 --  --
--- D I A G N O S T I C S . P R E T T Y _ E M I T T E R  --
+-- E R R O U T C . P R E T T Y _ E M I T T E R  --
 --  --
 -- B o d y  --
 --  --
diff --git a/gcc/ada/erroutc-pretty_emitter.ads 
b/gcc/ada/erroutc-pretty_emitter.ads
index 3ff0109db63..a4521a26d17 100644
--- a/gcc/ada/erroutc-pretty_emitter.ads
+++ b/gcc/ada/erroutc-pretty_emitter.ads
@@ -2,7 +2,7 @@
 --  --
 --

[COMMITTED 09/42] ada: Remove Empty_Or_Error

2025-07-03 Thread Marc Poulhiès
From: Bob Duff 

Minor stylistic improvement: Remove Empty_Or_Error, and replace
comparisons with Empty_Or_Error with "[not] in Empty | Error".
(Found while working on VAST.)

gcc/ada/ChangeLog:

* types.ads (Empty_Or_Error): Remove.
* atree.adb: Remove reference to Empty_Or_Error.
* par-endh.adb: Likewise.
* sem_ch12.adb: Likewise.
* sem_ch3.adb: Likewise.
* sem_util.adb: Likewise.
* treepr.adb: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/atree.adb|  7 +++
 gcc/ada/par-endh.adb |  4 ++--
 gcc/ada/sem_ch12.adb |  2 +-
 gcc/ada/sem_ch3.adb  |  5 ++---
 gcc/ada/sem_util.adb |  4 ++--
 gcc/ada/treepr.adb   | 13 ++---
 gcc/ada/types.ads|  5 -
 7 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/gcc/ada/atree.adb b/gcc/ada/atree.adb
index 17538de8954..20ca189ad8c 100644
--- a/gcc/ada/atree.adb
+++ b/gcc/ada/atree.adb
@@ -1296,8 +1296,7 @@ package body Atree is
 Node_Offsets.Table (Node_Offsets.First .. Node_Offsets.Last);
 
begin
-  --  Empty_Or_Error use as described in types.ads
-  if Destination <= Empty_Or_Error or No (Source) then
+  if Destination in Empty | Error or else No (Source) then
  pragma Assert (Serious_Errors_Detected > 0);
  return;
   end if;
@@ -1458,7 +1457,7 @@ package body Atree is
--  Start of processing for Copy_Separate_Tree
 
begin
-  if Source <= Empty_Or_Error then
+  if Source in Empty | Error then
  return Source;
 
   elsif Is_Entity (Source) then
@@ -1841,7 +1840,7 @@ package body Atree is
   pragma Debug (Validate_Node (Source));
   S_Size : constant Slot_Count := Size_In_Slots_To_Alloc (Source);
begin
-  if Source <= Empty_Or_Error then
+  if Source in Empty | Error then
  return Source;
   end if;
 
diff --git a/gcc/ada/par-endh.adb b/gcc/ada/par-endh.adb
index 12baed455d7..b045d74bd0e 100644
--- a/gcc/ada/par-endh.adb
+++ b/gcc/ada/par-endh.adb
@@ -300,7 +300,7 @@ package body Endh is
 else
End_Labl := Scopes (Scope.Last).Labl;
 
-   if End_Labl > Empty_Or_Error then
+   if End_Labl not in Empty | Error then
 
   --  The task here is to construct a designator from the
   --  opening label, with the components all marked as not
@@ -921,7 +921,7 @@ package body Endh is
 
   --  Suppress message if error was posted on opening label
 
-  if Error_Msg_Node_1 > Empty_Or_Error
+  if Error_Msg_Node_1 not in Empty | Error
 and then Error_Posted (Error_Msg_Node_1)
   then
  return;
diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index f492b236857..7ebf145d783 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -3171,7 +3171,7 @@ package body Sem_Ch12 is
  end if;
   end if;
 
-  if Subtype_Mark (Def) <= Empty_Or_Error then
+  if Subtype_Mark (Def) in Empty | Error then
  pragma Assert (Serious_Errors_Detected > 0);
  --  avoid passing bad argument to Entity
  return;
diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index 45b28bf96a4..b4342af134e 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -19159,8 +19159,7 @@ package body Sem_Ch3 is
   --  Otherwise we have a subtype mark without a constraint
 
   elsif Error_Posted (S) then
- --  Don't rewrite if S is Empty or Error
- if S > Empty_Or_Error then
+ if S not in Empty | Error then
 Rewrite (S, New_Occurrence_Of (Any_Id, Sloc (S)));
  end if;
  return Any_Type;
@@ -21094,7 +21093,7 @@ package body Sem_Ch3 is
 
   --  If no range was given, set a dummy range
 
-  if RRS <= Empty_Or_Error then
+  if RRS in Empty | Error then
  Low_Val  := -Small_Val;
  High_Val := Small_Val;
 
diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index b61f3bbad5e..ed8f054fc63 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -24112,7 +24112,7 @@ package body Sem_Util is
 
  Result := N;
 
- if N > Empty_Or_Error then
+ if N not in Empty | Error then
 pragma Assert (Nkind (N) not in N_Entity);
 
 Result := New_Copy (N);
@@ -24193,7 +24193,7 @@ package body Sem_Util is
 
  Result := Id;
 
- if Id > Empty_Or_Error then
+ if Id not in Empty | Error then
 pragma Assert (Nkind (Id) in N_Entity);
 
 --  Determine whether the entity has a corresponding new entity
diff --git a/gcc/ada/treepr.adb b/gcc/ada/treepr.adb
index d58f3ceb36f..375608d2ba6 100644
--- a/gcc/ada/treepr.adb
+++ b/gcc/ada/treepr.adb
@@ -2015,17 +2015,16 @@ package body Treepr is
  --  Case of descendant is a node
 
  if D in Node_Range then
-
---  Don't bother about Empty or Error descendants
-
-if D <= Union_Id (Empty_Or_Error) the

[COMMITTED 17/42] ada: Enforce visibility of unit used as a parent instance of a child instance

2025-07-03 Thread Marc Poulhiès
From: Gary Dismukes 

In cases involving instantiation of a generic child unit, the visibility
of the parent unit was mishandled, allowing the parent to be referenced
in another compilation unit that has visibility of the child instance
but no with_clause for the parent of the instance.

gcc/ada/ChangeLog:

* sem_ch12.adb (Install_Spec): Remove "not Is_Generic_Instance (Par)"
in test for setting Instance_Parent_Unit. Revise comment to no longer
say "noninstance", plus remove "???".
(Remove_Parent): Restructure if_statement to allow for both "elsif"
parts to be executed (by changing them to be separate if_statements
within an "else" part).

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch12.adb | 72 ++--
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index 7ebf145d783..e80aea5fe76 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -11131,13 +11131,9 @@ package body Sem_Ch12 is
   begin
  --  If this parent of the child instance is a top-level unit,
  --  then record the unit and its visibility for later resetting in
- --  Remove_Parent. We exclude units that are generic instances, as we
- --  only want to record this information for the ultimate top-level
- --  noninstance parent (is that always correct???).
+ --  Remove_Parent.
 
- if Scope (Par) = Standard_Standard
-   and then not Is_Generic_Instance (Par)
- then
+ if Scope (Par) = Standard_Standard then
 Parent_Unit_Visible := Is_Immediately_Visible (Par);
 Instance_Parent_Unit := Par;
  end if;
@@ -16338,39 +16334,43 @@ package body Sem_Ch12 is
   Install_Private_Declarations (P);
end if;
 
---  If the ultimate parent is a top-level unit recorded in
---  Instance_Parent_Unit, then reset its visibility to what it was
---  before instantiation. (It's not clear what the purpose is of
---  testing whether Scope (P) is In_Open_Scopes, but that test was
---  present before the ultimate parent test was added.???)
+else
+   --  If the ultimate parent is a top-level unit recorded in
+   --  Instance_Parent_Unit, then reset its visibility to what
+   --  it was before instantiation. (It's not clear what the
+   --  purpose is of testing whether Scope (P) is In_Open_Scopes,
+   --  but that test was present before the ultimate parent test
+   --  was added.???)
 
-elsif not In_Open_Scopes (Scope (P))
-  or else (P = Instance_Parent_Unit
-and then not Parent_Unit_Visible)
-then
-   Set_Is_Immediately_Visible (P, False);
+   if not In_Open_Scopes (Scope (P))
+ or else (P = Instance_Parent_Unit
+   and then not Parent_Unit_Visible)
+   then
+  Set_Is_Immediately_Visible (P, False);
+   end if;
 
---  If the current scope is itself an instantiation of a generic
---  nested within P, and we are in the private part of body of this
---  instantiation, restore the full views of P, that were removed
---  in End_Package_Scope above. This obscure case can occur when a
---  subunit of a generic contains an instance of a child unit of
---  its generic parent unit.
+   --  If the current scope is itself an instantiation of a generic
+   --  nested within P, and we are in the private part of body of
+   --  the instantiation, restore the full views of P, which were
+   --  removed in End_Package_Scope above. This obscure case can
+   --  occur when a subunit of a generic contains an instance of
+   --  a child unit of its generic parent unit.
 
-elsif S = Current_Scope and then Is_Generic_Instance (S)
-  and then (In_Package_Body (S) or else In_Private_Part (S))
-then
-   declare
-  Par : constant Entity_Id :=
-  Generic_Parent (Package_Specification (S));
-   begin
-  if Present (Par)
-and then P = Scope (Par)
-  then
- Set_In_Private_Part (P);
- Install_Private_Declarations (P);
-  end if;
-   end;
+   if S = Current_Scope and then Is_Generic_Instance (S)
+ and then (In_Package_Body (S) or else In_Private_Part (S))
+   then
+  declare
+ Par : constant Entity_Id :=
+ Generic_Parent (Pack

[COMMITTED 16/42] ada: Fix comment

2025-07-03 Thread Marc Poulhiès
From: Ronan Desplanques 

This patch fixes a misnaming of Make_Predefined_Primitive_Specs in a
comment.

gcc/ada/ChangeLog:

* exp_ch3.adb (Predefined_Primitive_Bodies): Fix comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index eec276ccd04..f5173936943 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -401,7 +401,7 @@ package body Exp_Ch3 is
  (Tag_Typ: Entity_Id;
   Renamed_Eq : Entity_Id) return List_Id;
--  Create the bodies of the predefined primitives that are described in
-   --  Predefined_Primitive_Specs. When not empty, Renamed_Eq must denote
+   --  Make_Predefined_Primitive_Specs. When not empty, Renamed_Eq must denote
--  the defining unit name of the type's predefined equality as returned
--  by Make_Predefined_Primitive_Specs.
 
-- 
2.43.0



[COMMITTED 22/42] ada: Fix crash with Finalizable in corner case

2025-07-03 Thread Marc Poulhiès
From: Ronan Desplanques 

The Finalizable aspect introduced controlled types for which not all the
finalization primitives exist. This patch makes Make_Deep_Record_Body
handle this case correctly.

gcc/ada/ChangeLog:

* exp_ch7.adb (Make_Deep_Record_Body): Fix case of absent Initialize
primitive.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 95a790e5cee..e4daf4bc7a3 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -7830,13 +7830,23 @@ package body Exp_Ch7 is
 
  when Initialize_Case =>
 if Is_Controlled (Typ) then
-   return New_List (
- Make_Procedure_Call_Statement (Loc,
-   Name   =>
- New_Occurrence_Of
-   (Find_Controlled_Prim_Op (Typ, Name_Initialize), Loc),
-   Parameter_Associations => New_List (
- Make_Identifier (Loc, Name_V;
+   declare
+  Intlz : constant Entity_Id :=
+Find_Controlled_Prim_Op (Typ, Name_Initialize);
+   begin
+  if Present (Intlz) then
+ return
+   New_List
+ (Make_Procedure_Call_Statement
+(Loc,
+ Name   =>
+   New_Occurrence_Of (Intlz, Loc),
+ Parameter_Associations =>
+   New_List (Make_Identifier (Loc, Name_V;
+  else
+ return Empty_List;
+  end if;
+   end;
 else
return Empty_List;
 end if;
-- 
2.43.0



[COMMITTED 13/42] ada: Fix comment

2025-07-03 Thread Marc Poulhiès
From: Ronan Desplanques 

This patch fixes a comment that wrongly stated that no dispatch entry
for deep finalize was created for limited tagged types.

gcc/ada/ChangeLog:

* exp_ch3.adb (Make_Predefined_Primitive_Specs): Fix comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index 2372a9f11df..eec276ccd04 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -342,9 +342,9 @@ package body Exp_Ch3 is
-- typSO  provides result of 'Output attribute
-- typPI  provides result of 'Put_Image attribute
--
-   --  The following entries are additionally present for non-limited tagged
-   --  types, and implement additional dispatching operations for predefined
-   --  operations:
+   --  The following entries implement additional dispatching operations for
+   --  predefined operations. Deep finalization is present on all tagged types;
+   --  the others only on nonlimited tagged types:
--
-- _equality  implements "=" operator
-- _assignimplements assignment operation
-- 
2.43.0



[COMMITTED 31/42] ada: Improve retrieval of nominal unconstrained type in extended return

2025-07-03 Thread Marc Poulhiès
From: Piotr Trojanek 

To reliably retrieve the nominal unconstrained type of object declared in
extended return statement we need to rely on the Original_Node.

gcc/ada/ChangeLog:

* sem_ch3.adb (Check_Return_Subtype_Indication): Use Original_Node.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch3.adb | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index f25941d72a8..5354d82bd7d 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -4204,11 +4204,7 @@ package body Sem_Ch3 is
  --  to recover the nominal unconstrained type.
 
  if Is_Constr_Subt_For_U_Nominal (Obj_Typ) then
-if Nkind (Object_Definition (Obj_Decl)) = N_Subtype_Indication then
-   Obj_Typ := Entity (Subtype_Mark (Object_Definition (Obj_Decl)));
-else
-   Obj_Typ := Etype (Obj_Typ);
-end if;
+Obj_Typ := Entity (Original_Node (Object_Definition (Obj_Decl)));
 pragma Assert (not Is_Constrained (Obj_Typ));
  end if;
 
-- 
2.43.0



[COMMITTED 19/42] ada: Fix crash with Finalizable in corner case

2025-07-03 Thread Marc Poulhiès
From: Ronan Desplanques 

Since the introduction of the Finalizable aspect, there can be types
for which Is_Controlled returns True but that don't have all three
finalization primitives. The Generate_Finalization_Actions raised an
exception in that case before this patch, which fixes the problem.

gcc/ada/ChangeLog:

* exp_aggr.adb (Generate_Finalization_Actions): Stop assuming that
initialize primitive exists.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index fcf57bf9c31..9ff69ec8130 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -2570,12 +2570,21 @@ package body Exp_Aggr is
 Ref := Convert_To (Init_Typ, New_Copy_Tree (Target));
 Set_Assignment_OK (Ref);
 
-Append_To (L,
-  Make_Procedure_Call_Statement (Loc,
-Name   =>
-  New_Occurrence_Of
-(Find_Controlled_Prim_Op (Init_Typ, Name_Initialize), Loc),
-Parameter_Associations => New_List (New_Copy_Tree (Ref;
+declare
+   Intlz : constant Entity_Id :=
+ Find_Controlled_Prim_Op (Init_Typ, Name_Initialize);
+begin
+   if Present (Intlz) then
+  Append_To
+(L,
+ Make_Procedure_Call_Statement
+   (Loc,
+Name   =>
+  New_Occurrence_Of (Intlz, Loc),
+Parameter_Associations =>
+  New_List (New_Copy_Tree (Ref;
+   end if;
+end;
  end if;
   end Generate_Finalization_Actions;
 
-- 
2.43.0



[COMMITTED 14/42] ada: Fix spurious Constraint_Error raised by 'Value of fixed-point types

2025-07-03 Thread Marc Poulhiès
From: Eric Botcazou 

This happens for very large Smalls with regard to the size of the mantissa,
because the prerequisites of the implementation used in this case are not
met, although they are documented in the head comment of Integer_To_Fixed.

This change documents them at the beginning of the body of System.Value_F
and adjusts the compiler interface accordingly.

gcc/ada/ChangeLog:

* libgnat/s-valuef.adb: Document the prerequisites more precisely.
* libgnat/a-tifiio.adb (OK_Get_32): Adjust to the prerequisites.
(OK_Get_64): Likewise.
* libgnat/a-tifiio__128.adb (OK_Get_32): Likewise.
(OK_Get_64): Likewise.
(OK_Get_128): Likewise.
* libgnat/a-wtfiio.adb (OK_Get_32): Likewise.
(OK_Get_64): Likewise.
* libgnat/a-wtfiio__128.adb (OK_Get_32): Likewise.
(OK_Get_64): Likewise.
(OK_Get_128): Likewise.
* libgnat/a-ztfiio.adb (OK_Get_32): Likewise.
(OK_Get_64): Likewise.
* libgnat/a-ztfiio__128.adb (OK_Get_32): Likewise.
(OK_Get_64): Likewise.
(OK_Get_128): Likewise.
* exp_imgv.adb (Expand_Value_Attribute): Adjust the conditions under
which the RE_Value_Fixed{32,64,128} routines are called for ordinary
fixed-point types.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_imgv.adb  |  7 +++
 gcc/ada/libgnat/a-tifiio.adb  |  6 --
 gcc/ada/libgnat/a-tifiio__128.adb |  9 -
 gcc/ada/libgnat/a-wtfiio.adb  |  6 --
 gcc/ada/libgnat/a-wtfiio__128.adb |  9 -
 gcc/ada/libgnat/a-ztfiio.adb  |  6 --
 gcc/ada/libgnat/a-ztfiio__128.adb |  9 -
 gcc/ada/libgnat/s-valuef.adb  | 16 ++--
 8 files changed, 13 insertions(+), 55 deletions(-)

diff --git a/gcc/ada/exp_imgv.adb b/gcc/ada/exp_imgv.adb
index c7cf06ba444..6c2b940736b 100644
--- a/gcc/ada/exp_imgv.adb
+++ b/gcc/ada/exp_imgv.adb
@@ -1640,23 +1640,22 @@ package body Exp_Imgv is
 Num : constant Uint := Norm_Num (Small_Value (Rtyp));
 Den : constant Uint := Norm_Den (Small_Value (Rtyp));
 Max : constant Uint := UI_Max (Num, Den);
-Min : constant Uint := UI_Min (Num, Den);
 Siz : constant Uint := Esize (Rtyp);
 
  begin
 if Siz <= 32
   and then Max <= Uint_2 ** 31
-  and then (Min = Uint_1 or else Max <= Uint_2 ** 27)
+  and then (Num = Uint_1 or else Max <= Uint_2 ** 27)
 then
Vid := RE_Value_Fixed32;
 elsif Siz <= 64
   and then Max <= Uint_2 ** 63
-  and then (Min = Uint_1 or else Max <= Uint_2 ** 59)
+  and then (Num = Uint_1 or else Max <= Uint_2 ** 59)
 then
Vid := RE_Value_Fixed64;
 elsif System_Max_Integer_Size = 128
   and then Max <= Uint_2 ** 127
-  and then (Min = Uint_1 or else Max <= Uint_2 ** 123)
+  and then (Num = Uint_1 or else Max <= Uint_2 ** 123)
 then
Vid := RE_Value_Fixed128;
 else
diff --git a/gcc/ada/libgnat/a-tifiio.adb b/gcc/ada/libgnat/a-tifiio.adb
index 735859c3f15..26f04ed726e 100644
--- a/gcc/ada/libgnat/a-tifiio.adb
+++ b/gcc/ada/libgnat/a-tifiio.adb
@@ -194,9 +194,6 @@ package body Ada.Text_IO.Fixed_IO with SPARK_Mode => Off is
  ((Num'Base'Small_Numerator = 1
 and then Num'Base'Small_Denominator <= 2**31)
or else
-  (Num'Base'Small_Denominator = 1
-and then Num'Base'Small_Numerator <= 2**31)
-   or else
   (Num'Base'Small_Numerator <= 2**27
 and then Num'Base'Small_Denominator <= 2**27));
--  These conditions are derived from the prerequisites of System.Value_F
@@ -223,9 +220,6 @@ package body Ada.Text_IO.Fixed_IO with SPARK_Mode => Off is
  ((Num'Base'Small_Numerator = 1
 and then Num'Base'Small_Denominator <= 2**63)
or else
-  (Num'Base'Small_Denominator = 1
-and then Num'Base'Small_Numerator <= 2**63)
-   or else
   (Num'Base'Small_Numerator <= 2**59
 and then Num'Base'Small_Denominator <= 2**59));
--  These conditions are derived from the prerequisites of System.Value_F
diff --git a/gcc/ada/libgnat/a-tifiio__128.adb 
b/gcc/ada/libgnat/a-tifiio__128.adb
index 7424346fe3d..78c25f29bc8 100644
--- a/gcc/ada/libgnat/a-tifiio__128.adb
+++ b/gcc/ada/libgnat/a-tifiio__128.adb
@@ -201,9 +201,6 @@ package body Ada.Text_IO.Fixed_IO with SPARK_Mode => Off is
  ((Num'Base'Small_Numerator = 1
 and then Num'Base'Small_Denominator <= 2**31)
or else
-  (Num'Base'Small_Denominator = 1
-and then Num'Base'Small_Numerator <= 2**31)
-   or else
   (Num'Base'Small_Numerator <= 2**27
 and then Num'Base'Small_Denominator <= 2**27));
--  These conditions are derived f

[COMMITTED 15/42] ada: Cleanup in type support subprograms code

2025-07-03 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup; semantics is unaffected.

gcc/ada/ChangeLog:

* exp_tss.adb (TSS): Refactor IF condition to make code smaller.
* lib.adb (Increment_Serial_Number, Synchronize_Serial_Number):
Use type of renamed object when creating renaming.
* lib.ads (Unit_Record): Refine subtype of dependency number.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_tss.adb | 10 +++---
 gcc/ada/lib.adb |  4 ++--
 gcc/ada/lib.ads |  2 +-
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/gcc/ada/exp_tss.adb b/gcc/ada/exp_tss.adb
index 89bcd29c5ad..89af166a654 100644
--- a/gcc/ada/exp_tss.adb
+++ b/gcc/ada/exp_tss.adb
@@ -504,13 +504,9 @@ package body Exp_Tss is
   Subp : Entity_Id;
 
begin
-  if No (FN) then
- return Empty;
-
-  elsif No (TSS_Elist (FN)) then
- return Empty;
-
-  else
+  if Present (FN)
+and then Present (TSS_Elist (FN))
+  then
  Elmt := First_Elmt (TSS_Elist (FN));
  while Present (Elmt) loop
 if Is_TSS (Node (Elmt), Nam) then
diff --git a/gcc/ada/lib.adb b/gcc/ada/lib.adb
index a727f48c611..3fd9540acb3 100644
--- a/gcc/ada/lib.adb
+++ b/gcc/ada/lib.adb
@@ -1062,7 +1062,7 @@ package body Lib is
-
 
function Increment_Serial_Number return Nat is
-  TSN : Int renames Units.Table (Current_Sem_Unit).Serial_Number;
+  TSN : Nat renames Units.Table (Current_Sem_Unit).Serial_Number;
begin
   TSN := TSN + 1;
   return TSN;
@@ -1223,7 +1223,7 @@ package body Lib is
---
 
procedure Synchronize_Serial_Number (SN : Nat) is
-  TSN : Int renames Units.Table (Current_Sem_Unit).Serial_Number;
+  TSN : Nat renames Units.Table (Current_Sem_Unit).Serial_Number;
begin
   --  We should not be trying to synchronize downward
 
diff --git a/gcc/ada/lib.ads b/gcc/ada/lib.ads
index a085aa7f19f..928f6f840c8 100644
--- a/gcc/ada/lib.ads
+++ b/gcc/ada/lib.ads
@@ -852,7 +852,7 @@ private
   Source_Index   : Source_File_Index;
   Cunit  : Node_Id;
   Cunit_Entity   : Entity_Id;
-  Dependency_Num : Int;
+  Dependency_Num : Nat;
   Ident_String   : Node_Id;
   Main_Priority  : Int;
   Main_CPU   : Int;
-- 
2.43.0



[COMMITTED 37/42] ada: Fix check for elaboration order on subprogram body stubs

2025-07-03 Thread Marc Poulhiès
From: Piotr Trojanek 

Fix an assertion failure occurring when elaboration checks were applied to
subprogram with a separate body.

gcc/ada/ChangeLog:

* sem_elab.adb (Check_Overriding_Primitive): Find early call region
of the subprogram body declaration, not of the subprogram body stub.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_elab.adb | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/sem_elab.adb b/gcc/ada/sem_elab.adb
index 6547813a8e2..77b1e120b80 100644
--- a/gcc/ada/sem_elab.adb
+++ b/gcc/ada/sem_elab.adb
@@ -15236,7 +15236,15 @@ package body Sem_Elab is
 end if;
 
 Body_Decl := Unit_Declaration_Node (Body_Id);
-Region:= Find_Early_Call_Region (Body_Decl);
+
+--  For subprogram bodies in subunits we check where the subprogram
+--  body stub is declared.
+
+if Nkind (Parent (Body_Decl)) = N_Subunit then
+   Body_Decl := Corresponding_Stub (Parent (Body_Decl));
+end if;
+
+Region := Find_Early_Call_Region (Body_Decl);
 
 --  The freeze node appears prior to the early call region of the
 --  primitive body.
-- 
2.43.0



[COMMITTED 18/42] ada: Fix typo in comment

2025-07-03 Thread Marc Poulhiès
From: Ronan Desplanques 

gcc/ada/ChangeLog:

* exp_ch7.adb (Build_Record_Deep_Procs): Fix typo in comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 381294b05d6..95a790e5cee 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -436,7 +436,7 @@ package body Exp_Ch7 is
 
procedure Build_Record_Deep_Procs (Typ : Entity_Id);
--  Build the deep Initialize/Adjust/Finalize for a record Typ with
-   --  Has_Component_Component set and store them using the TSS mechanism.
+   --  Has_Controlled_Component set and store them using the TSS mechanism.
 

-- Transient Scope Management --
-- 
2.43.0



[COMMITTED 35/42] ada: Refine sanity check in Insert_Actions

2025-07-03 Thread Marc Poulhiès
From: Ronan Desplanques 

Insert_Actions performs a sanity check when it goes through an
expression with actions while going up the three. That check was not
perfectly right before this patch and spuriously failed when inserting
range checks in some situation. This patch makes the check more robust.

gcc/ada/ChangeLog:

* exp_util.adb (Insert_Actions): Fix check.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_util.adb | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index 90778910e99..4f987790405 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -8167,21 +8167,21 @@ package body Exp_Util is
 --  never climb up as far as the N_Expression_With_Actions itself.
 
 when N_Expression_With_Actions =>
-   if N = Expression (P) then
-  if Is_Empty_List (Actions (P)) then
- Append_List_To (Actions (P), Ins_Actions);
- Analyze_List (Actions (P));
-  else
- Insert_List_After_And_Analyze
-   (Last (Actions (P)), Ins_Actions);
-  end if;
-
-  return;
-
-   else
+   if Is_List_Member (N) and then List_Containing (N) = Actions (P)
+   then
   raise Program_Error;
end if;
 
+   if Is_Empty_List (Actions (P)) then
+  Append_List_To (Actions (P), Ins_Actions);
+  Analyze_List (Actions (P));
+   else
+  Insert_List_After_And_Analyze
+(Last (Actions (P)), Ins_Actions);
+   end if;
+
+   return;
+
 --  Case of appearing in the condition of a while expression or
 --  elsif. We insert the actions into the Condition_Actions field.
 --  They will be moved further out when the while loop or elsif
-- 
2.43.0



[COMMITTED 20/42] ada: Remove a couple of redundant calls to Set_Etype

2025-07-03 Thread Marc Poulhiès
From: Eric Botcazou 

The OK_Convert_To function already sets the Etype of its result.

gcc/ada/ChangeLog:

* exp_imgv.adb (Expand_Value_Attribute): Do not call Set_Etype on N
after rewriting it by means of OK_Convert_To.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_imgv.adb | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/ada/exp_imgv.adb b/gcc/ada/exp_imgv.adb
index 6c2b940736b..3fef6fabe64 100644
--- a/gcc/ada/exp_imgv.adb
+++ b/gcc/ada/exp_imgv.adb
@@ -1631,7 +1631,6 @@ package body Exp_Imgv is
Name => New_Occurrence_Of (RTE (Vid), Loc),
Parameter_Associations => Args)));
 
- Set_Etype (N, Btyp);
  Analyze_And_Resolve (N, Btyp);
  return;
 
@@ -1675,7 +1674,6 @@ package body Exp_Imgv is
  Name => New_Occurrence_Of (RTE (Vid), Loc),
  Parameter_Associations => Args)));
 
-   Set_Etype (N, Btyp);
Analyze_And_Resolve (N, Btyp);
return;
 end if;
-- 
2.43.0



[COMMITTED 41/42] ada: Enforce alignment constraint for large Object_Size clauses

2025-07-03 Thread Marc Poulhiès
From: Eric Botcazou 

The constraint is that the Object_Size must be a multiple of the alignment
in bits.  But it's enforced only when the value of the clause is lower than
the Value_Size rounded up to the alignment in bits, not for larger values.

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_entity): Use default messages
for errors reported for Object_Size clauses.
(validate_size): Give an error for stand-alone objects of composite
types if the specified size is not a multiple of the alignment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 1d9832d69ad..27d2cea1f3d 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -4502,7 +4502,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, 
bool definition)
  if (Known_Esize (gnat_entity))
gnu_size
  = validate_size (Esize (gnat_entity), gnu_type, gnat_entity,
-  VAR_DECL, false, false, size_s, type_s);
+  VAR_DECL, false, false, NULL, NULL);
 
  /* ??? The test on Has_Size_Clause must be removed when "unknown" is
 no longer represented as Uint_0 (i.e. Use_New_Unknown_Rep).  */
@@ -9696,6 +9696,20 @@ validate_size (Uint uint_size, tree gnu_type, Entity_Id 
gnat_object,
   return NULL_TREE;
 }
 
+  /* The size of stand-alone objects is always a multiple of the alignment,
+ but that's already enforced for elementary types by the front-end.  */
+  if (kind == VAR_DECL
+  && !component_p
+  && RECORD_OR_UNION_TYPE_P (gnu_type)
+  && !TYPE_FAT_POINTER_P (gnu_type)
+  && !integer_zerop (size_binop (TRUNC_MOD_EXPR, size,
+bitsize_int (TYPE_ALIGN (gnu_type)
+{
+  post_error_ne_num ("size for& must be multiple of alignment ^",
+gnat_error_node, gnat_object, TYPE_ALIGN (gnu_type));
+  return NULL_TREE;
+}
+
   return size;
 }
 
-- 
2.43.0



[COMMITTED 24/42] ada: Turn diagnostic object from variable to constant

2025-07-03 Thread Marc Poulhiès
From: Piotr Trojanek 

Diagnostic entries are not supposed to be modified while compiling the code.
Code cleanup; behavior is unaffected.

gcc/ada/ChangeLog:

* errid.ads (Diagnostic_Entries): Now a constant.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/errid.ads | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/errid.ads b/gcc/ada/errid.ads
index 56516d028bc..acabc3b1dfe 100644
--- a/gcc/ada/errid.ads
+++ b/gcc/ada/errid.ads
@@ -76,7 +76,7 @@ package Errid is
--- Optionally additional information
--TODO: the mandatory fields for the documentation file could be changed
 
-   Diagnostic_Entries : Diagnostics_Registry_Type :=
+   Diagnostic_Entries : constant Diagnostics_Registry_Type :=
  (No_Diagnostic_Id => <>,
   GNAT0001 =>
 (Status=> Active,
-- 
2.43.0



[PATCH] x86: Emit label only for __mcount_loc section

2025-07-03 Thread H.J. Lu
commit ecc81e33123d7ac9c11742161e128858d844b99d (HEAD)
Author: Andi Kleen 
Date:   Fri Sep 26 04:06:40 2014 +

Add direct support for Linux kernel __fentry__ patching

emitted a label, 1, for __mcount_loc section:

1: call mcount
.section __mcount_loc, "a",@progbits
.quad 1b
.previous

If __mcount_loc wasn't used, we got an unused label.  Update
x86_function_profiler to emit label only when __mcount_loc section
is used.

gcc/

PR target/120936
* config/i386/i386.cc (x86_print_call_or_nop): Add a label
argument and use it to print label.
(x86_function_profiler): Emit label only when __mcount_loc
section is used.

gcc/testsuite/

PR target/120936
* gcc.target/i386/pr120936-1.c: New test
* gcc.target/i386/pr120936-2.c: Likewise.
* gcc.target/i386/pr120936-3.c: Likewise.
* gcc.target/i386/pr120936-4.c: Likewise.
* gcc.target/i386/pr120936-5.c: Likewise.
* gcc.target/i386/pr120936-6.c: Likewise.
* gcc.target/i386/pr120936-7.c: Likewise.
* gcc.target/i386/pr120936-8.c: Likewise.
* gcc.target/i386/pr120936-9.c: Likewise.
* gcc.target/i386/pr120936-10.c: Likewise.
* gcc.target/i386/pr120936-11.c: Likewise.
* gcc.target/i386/pr120936-12.c: Likewise.
* gcc.target/i386/pr93492-3.c: Updated.
* gcc.target/i386/pr93492-5.c: Likewise.

OK for master?

Thanks.

-- 
H.J.
From dee88812fc0ca372107224fb3460e133efe8822d Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 3 Jul 2025 10:13:48 +0800
Subject: [PATCH] x86: Emit label only for __mcount_loc section

commit ecc81e33123d7ac9c11742161e128858d844b99d (HEAD)
Author: Andi Kleen 
Date:   Fri Sep 26 04:06:40 2014 +

Add direct support for Linux kernel __fentry__ patching

emitted a label, 1, for __mcount_loc section:

1:	call	mcount
	.section __mcount_loc, "a",@progbits
	.quad 1b
	.previous

If __mcount_loc wasn't used, we got an unused label.  Update
x86_function_profiler to emit label only when __mcount_loc section
is used.

gcc/

	PR target/120936
	* config/i386/i386.cc (x86_print_call_or_nop): Add a label
	argument and use it to print label.
	(x86_function_profiler): Emit label only when __mcount_loc
	section is used.

gcc/testsuite/

	PR target/120936
	* gcc.target/i386/pr120936-1.c: New test
	* gcc.target/i386/pr120936-2.c: Likewise.
	* gcc.target/i386/pr120936-3.c: Likewise.
	* gcc.target/i386/pr120936-4.c: Likewise.
	* gcc.target/i386/pr120936-5.c: Likewise.
	* gcc.target/i386/pr120936-6.c: Likewise.
	* gcc.target/i386/pr120936-7.c: Likewise.
	* gcc.target/i386/pr120936-8.c: Likewise.
	* gcc.target/i386/pr120936-9.c: Likewise.
	* gcc.target/i386/pr120936-10.c: Likewise.
	* gcc.target/i386/pr120936-11.c: Likewise.
	* gcc.target/i386/pr120936-12.c: Likewise.
	* gcc.target/i386/pr93492-3.c: Updated.
	* gcc.target/i386/pr93492-5.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc | 55 +
 gcc/testsuite/gcc.target/i386/pr120936-1.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-10.c | 23 +
 gcc/testsuite/gcc.target/i386/pr120936-11.c | 19 +++
 gcc/testsuite/gcc.target/i386/pr120936-12.c | 23 +
 gcc/testsuite/gcc.target/i386/pr120936-2.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-3.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-4.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-5.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-6.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-7.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-8.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-9.c  | 19 +++
 gcc/testsuite/gcc.target/i386/pr93492-3.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr93492-5.c   |  2 +-
 15 files changed, 264 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-9.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 9657c6ae31f..5c888b52c1c 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23685,19 +23685,21 @@ x86_field_alignment (tree type, int computed)
 /* Print call to TARGET to FILE.  */
 
 static void
-x86_print_call_or_nop (FILE *file, const char *target)
+x86_print_call_or_nop (FILE *file, const char *target,
+		   const char *label)
 {
   if (flag_nop_mcount || !strcmp (target, "nop"))
 /* 5 byte nop: nopl 0(%[re]ax,%[re]ax,1) */
-

Re: [PATCH v8 0/9] AArch64: CMPBR support

2025-07-03 Thread Karl Meakin



On 02/07/2025 18:45, Karl Meakin wrote:

This patch series adds support for the CMPBR extension. It includes the
new `+cmpbr` option and rules to generate the new instructions when
lowering conditional branches.

Changelog:
* v8:
   - Support far branches for the `CBB` and `CBH` instructions, and add tests 
for them.
   - Mark the branch in the far branch tests likely, so that the optimizer does
 not invert the condition.
   - Use regex captures for register and label names so that the tests are less 
fragile.
   - Minor formatting fixes.
* v7:
   - Support far branches and add a test for them.
   - Replace `aarch64_cb_short_operand` with `aarch64_reg_or_zero_operand`.
   - Delete the new predicates that aren't needed anymore.
   - Minor formatting and comment fixes.
* v6:
   - Correct the constraint string for immediate operands.
   - Drop the commit for adding `%j` format specifiers. The suffix for
 the `cb` instruction is now calculated by the `cmp_op` code
 attribute.
* v5:
   - Moved Moved patch 10/10 (adding %j ...) before patch 8/10 (rules for
 CMPBR...). Every commit in the series should now produce a correct
 compiler.
   - Reduce excessive diff context by not passing `--function-context` to
 `git format-patch`.
* v4:
   - Added a commit to use HS/LO instead of CS/CC mnemonics.
   - Rewrite the range checks for immediate RHSes in aarch64.cc: CBGE,
 CBHS, CBLE and CBLS have different ranges of allowed immediates than
 the other comparisons.

Karl Meakin (9):
   AArch64: place branch instruction rules together
   AArch64: reformat branch instruction rules
   AArch64: rename branch instruction rules
   AArch64: add constants for branch displacements
   AArch64: make `far_branch` attribute a boolean
   AArch64: recognize `+cmpbr` option
   AArch64: precommit test for CMPBR instructions
   AArch64: rules for CMPBR instructions
   AArch64: make rules for CBZ/TBZ higher priority

  .../aarch64/aarch64-option-extensions.def |2 +
  gcc/config/aarch64/aarch64-protos.h   |2 +
  gcc/config/aarch64/aarch64-simd.md|2 +-
  gcc/config/aarch64/aarch64-sme.md |2 +-
  gcc/config/aarch64/aarch64.cc |   39 +-
  gcc/config/aarch64/aarch64.h  |3 +
  gcc/config/aarch64/aarch64.md |  570 --
  gcc/config/aarch64/constraints.md |   18 +
  gcc/config/aarch64/iterators.md   |   30 +
  gcc/doc/invoke.texi   |3 +
  gcc/testsuite/gcc.target/aarch64/cmpbr-far.c  |   52 +
  gcc/testsuite/gcc.target/aarch64/cmpbr.c  | 1824 +
  gcc/testsuite/gcc.target/aarch64/cmpbr.h  |   16 +
  .../gcc.target/aarch64/sve/mask_store.c   |   28 +
  gcc/testsuite/gcc.target/aarch64/sve/sqlite.c |  205 ++
  gcc/testsuite/lib/target-supports.exp |   14 +-
  16 files changed, 2586 insertions(+), 224 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr-far.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.h
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/mask_store.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/sqlite.c

--
2.48.1


I accidentally committed some WIP files that shouldn't have been 
included (cmpbr-far.c, cmpbr.h, mask_store.c, sqlite.c).


Please ignore them. I have removed them in the next revision (v9)



Re: [PATCH v2 4/5] libstdc++: Implement mdspan and tests.

2025-07-03 Thread Luc Grosheintz

Thank you for the nice review! I've locally implemented everything and
I'll send a v3 later today or tomorrow; after squashing the commits
correctly; and retesting everything.

Meanwhile a couple of comments below.

On 7/1/25 16:42, Tomasz Kaminski wrote:

On Fri, Jun 27, 2025 at 11:37 AM Luc Grosheintz 
wrote:


Implements the class mdspan as described in N4950, i.e. without P3029.
It also adds tests for mdspan.

libstdc++-v3/ChangeLog:

 * include/std/mdspan (mdspan): New class.
 * src/c++23/std.cc.in: Add std::mdspan.
 * testsuite/23_containers/mdspan/class_mandate_neg.cc: New test.
 * testsuite/23_containers/mdspan/mdspan.cc: New test.
 * testsuite/23_containers/mdspan/layout_like.h: Add class
 LayoutLike which models a user-defined layout.

Signed-off-by: Luc Grosheintz 
---


As usual really solid implementation, few additional comments:
* use () to value-initialize in ctor initializer list
* redundant parentheses in requires clauses
* suggesting for adding __mdspan::__size
* few suggestion for tests

  libstdc++-v3/include/std/mdspan   | 282 +

  libstdc++-v3/src/c++23/std.cc.in  |   3 +-
  .../23_containers/mdspan/class_mandate_neg.cc |  58 ++
  .../23_containers/mdspan/layout_like.h|  63 ++
  .../testsuite/23_containers/mdspan/mdspan.cc  | 540 ++
  5 files changed, 945 insertions(+), 1 deletion(-)
  create mode 100644
libstdc++-v3/testsuite/23_containers/mdspan/class_mandate_neg.cc
  create mode 100644
libstdc++-v3/testsuite/23_containers/mdspan/layout_like.h
  create mode 100644 libstdc++-v3/testsuite/23_containers/mdspan/mdspan.cc

diff --git a/libstdc++-v3/include/std/mdspan
b/libstdc++-v3/include/std/mdspan
index e198d65bba3..852f881971e 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -1052,6 +1052,288 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{ return __p + __i; }
  };

+  namespace __mdspan
+  {
+template
+  constexpr bool
+  __is_multi_index(const _Extents& __exts, span<_IndexType, _Nm>
__indices)
+  {
+   static_assert(__exts.rank() == _Nm);
+   for (size_t __i = 0; __i < __exts.rank(); ++__i)
+ if (__indices[__i] >= __exts.extent(__i))
+   return false;
+   return true;
+  }
+  }
+
+  template>
+class mdspan
+{
+  static_assert(!is_array_v<_ElementType>,
+   "ElementType must not be an array type");
+  static_assert(!is_abstract_v<_ElementType>,
+   "ElementType must not be an abstract class type");
+  static_assert(__mdspan::__is_extents<_Extents>,
+   "Extents must be a specialization of std::extents");
+  static_assert(is_same_v<_ElementType,
+ typename _AccessorPolicy::element_type>);
+
+public:
+  using extents_type = _Extents;
+  using layout_type = _LayoutPolicy;
+  using accessor_type = _AccessorPolicy;
+  using mapping_type = typename layout_type::template
mapping;
+  using element_type = _ElementType;
+  using value_type = remove_cv_t;
+  using index_type = typename extents_type::index_type;
+  using size_type = typename extents_type::size_type;
+  using rank_type = typename extents_type::rank_type;
+  using data_handle_type = typename accessor_type::data_handle_type;
+  using reference = typename accessor_type::reference;
+
+  static constexpr rank_type
+  rank() noexcept { return extents_type::rank(); }
+
+  static constexpr rank_type
+  rank_dynamic() noexcept { return extents_type::rank_dynamic(); }
+
+  static constexpr size_t
+  static_extent(rank_type __r) noexcept
+  { return extents_type::static_extent(__r); }
+
+  constexpr index_type
+  extent(rank_type __r) const noexcept { return
extents().extent(__r); }
+
+  constexpr
+  mdspan()
+  requires (rank_dynamic() > 0 &&
+ is_default_constructible_v &&
+ is_default_constructible_v &&
+ is_default_constructible_v)
+  : _M_accessor{}, _M_mapping{}, _M_handle{}


Here and in every other constructor, please use () to value-initialize the
field,
for example:
  _M_accessor(), _M_mapping(), _M_handle()


+  { }
+
+  constexpr
+  mdspan(const mdspan& __other) = default;
+
+  constexpr
+  mdspan(mdspan&& __other) = default;
+
+  template<__mdspan::__valid_index_type... _OIndexTypes>
+   requires ((sizeof...(_OIndexTypes) == rank()
+  || sizeof...(_OIndexTypes) == rank_dynamic())
+   && is_constructible_v
+   && is_default_constructible_v)


Here, and in the whole file, the outermost parentheses are not required in
"requires".
Only one round checking of rank should remain.


Do you have a way of remembering when one does or doesn't need the extra
set of parens?




+   constexpr explicit
+   mdspan(data_handle_type __handle, _OIndexTypes... __exts)
+   : _M_accessor{},
+
  _M_mapping

[PATCH v4 1/2] tree-simplify: unify simple_comparison ops in vec_cond for bit and/or/xor [PR119196]

2025-07-03 Thread Icen Zeyada
Merge simple_comparison patterns under a single vec_cond_expr for bit_and,
bit_ior, and bit_xor in the simplify pass.

Ensure that when both operands of a bit_and, bit_or, or bit_xor are 
simple_comparison
results, they reside within the same vec_cond_expr rather than separate ones.
This prepares the AST so that subsequent transformations (e.g., folding the
comparisons if possible) can take effect.

PR tree-optimization/119196

gcc/ChangeLog:

* match.pd: Merge multiple vec_cond_expr in a single one for
  bit_and, bit_ior and bit_xor.

Signed-off-by: Icen Zeyada 
---
 gcc/match.pd | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index f4416d9172c..36317b9128f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5939,6 +5939,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && !expand_vec_cond_expr_p (TREE_TYPE (@1), TREE_TYPE (@0)
(vec_cond @0 (op! @1 @3) (op! @2 @4
 
+/* (@0 ? @2 : @3) lop (@1 ? @2 : @3)  -->  (@0 lop @1) ? @2 : @3.  */
+(for lop (bit_and bit_ior bit_xor)
+   (simplify
+   (lop
+  (vec_cond @0 integer_minus_onep@2 integer_zerop@3)
+  (vec_cond @1 @2 @3))
+   (vec_cond (lop @0 @1) @2 @3)))
+
 /* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
  (simplify
   (op (vec_cond:s @0 @1 @2) @3)
-- 
2.43.0



[PATCH v4 0/2] tree-optimization: extend scalar comparison folding to vectors [PR119196]

2025-07-03 Thread Icen Zeyada


New in V4:
Check whether the vector is of boolean type in specific comparisons.
If it is, determine whether the operation can be expanded using the selected
expression. If so, proceed with the optimization; otherwise, skip the 
optimization.

---

This patch generalizes existing scalar bitwise comparison simplifications
to vector types by matching patterns of the form

```
(cmp x y) bit_and (cmp x y)
(cmp x y) bit_ior (cmp x y)
(cmp x y) bit_xor (cmp x y)
```

Icen Zeyada (2):
  tree-simplify: unify simple_comparison ops in vec_cond for bit
and/or/xor [PR119196]
  gimple-fold: extend vector simplification to match scalar bitwise
optimizations [PR119196]

 gcc/match.pd  | 65 +++---
 .../gcc.target/aarch64/vector-compare-5.c | 67 +++
 2 files changed, 121 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vector-compare-5.c

-- 
2.43.0



[PATCH v4 2/2] gimple-fold: extend vector simplification to match scalar bitwise optimizations [PR119196]

2025-07-03 Thread Icen Zeyada
Generalize existing scalar gimple_fold rules to apply the same
bitwise comparison simplifications to vector types.  Previously, an
expression like

(x < y) && (x > y)

would fold to `false` if x and y are scalars, but equivalent vector
comparisons were left untouched.  This patch enables folding of
patterns of the form

(cmp x y) bit_and (cmp x y)
(cmp x y) bit_ior (cmp x y)
(cmp x y) bit_xor (cmp x y)

for vector operands as well, ensuring consistent optimization across
all data types.

PR tree-optimization/119196

gcc/ChangeLog:

  * match.pd: Allow scalar optimizations with bitwise AND/OR/XOR to apply 
to vectors.

gcc/testsuite/ChangeLog:

  * gcc.target/aarch64/vector-compare-5.c: Add new test for vector compare 
simplification.

Signed-off-by: Icen Zeyada 
---
 gcc/match.pd  | 57 +---
 .../gcc.target/aarch64/vector-compare-5.c | 67 +++
 2 files changed, 113 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vector-compare-5.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 36317b9128f..80c02a0ab02 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3674,6 +3674,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if ((TREE_CODE (@1) == INTEGER_CST
 && TREE_CODE (@2) == INTEGER_CST)
|| ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   || (VECTOR_TYPE_P (TREE_TYPE (@1))
+   && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, code2))
 || POINTER_TYPE_P (TREE_TYPE (@1)))
&& bitwise_equal_p (@1, @2)))
 (with
@@ -3712,27 +3714,39 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (code1 == EQ_EXPR && val) @3)
   (if (code1 == EQ_EXPR && !val) { constant_boolean_node (false, type); })
   (if (code1 == NE_EXPR && !val && allbits) @4)
-  (if (code1 == NE_EXPR
+  (if ((code1 == NE_EXPR
&& code2 == GE_EXPR
   && cmp == 0
   && allbits)
+  && ((VECTOR_BOOLEAN_TYPE_P (type)
+  && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, GT_EXPR))
+  || !VECTOR_TYPE_P (TREE_TYPE (@1
(gt @c0 (convert @1)))
-  (if (code1 == NE_EXPR
+  (if ((code1 == NE_EXPR
&& code2 == LE_EXPR
   && cmp == 0
   && allbits)
+  && ((VECTOR_BOOLEAN_TYPE_P (type)
+  && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, LT_EXPR))
+  || !VECTOR_TYPE_P (TREE_TYPE (@1
(lt @c0 (convert @1)))
   /* (a != (b+1)) & (a > b) -> a > (b+1) */
-  (if (code1 == NE_EXPR
+  (if ((code1 == NE_EXPR
&& code2 == GT_EXPR
   && one_after
   && allbits)
+  && ((VECTOR_BOOLEAN_TYPE_P (type)
+  && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, GT_EXPR))
+  || !VECTOR_TYPE_P (TREE_TYPE (@1
(gt @c0 (convert @1)))
   /* (a != (b-1)) & (a < b) -> a < (b-1) */
-  (if (code1 == NE_EXPR
+  (if ((code1 == NE_EXPR
&& code2 == LT_EXPR
   && one_before
   && allbits)
+  && ((VECTOR_BOOLEAN_TYPE_P (type)
+  && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, LT_EXPR))
+  || !VECTOR_TYPE_P (TREE_TYPE (@1
(lt @c0 (convert @1)))
  )
 )
@@ -3751,6 +3765,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if ((TREE_CODE (@1) == INTEGER_CST
&& TREE_CODE (@2) == INTEGER_CST)
|| ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   || (VECTOR_TYPE_P (TREE_TYPE (@1))
+   && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, code2))
|| POINTER_TYPE_P (TREE_TYPE (@1)))
   && operand_equal_p (@1, @2)))
(with
@@ -3801,6 +3817,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if ((TREE_CODE (@1) == INTEGER_CST
 && TREE_CODE (@2) == INTEGER_CST)
|| ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   || (VECTOR_TYPE_P (TREE_TYPE (@1)))
|| POINTER_TYPE_P (TREE_TYPE (@1)))
&& bitwise_equal_p (@1, @2)))
 (with
@@ -3842,24 +3859,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (code1 == EQ_EXPR
&& code2 == GT_EXPR
   && cmp == 0
-  && allbits)
+  && allbits
+  && ((VECTOR_BOOLEAN_TYPE_P (type)
+  && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, GE_EXPR))
+  || !VECTOR_TYPE_P (TREE_TYPE (@1
(ge @c0 @2))
   (if (code1 == EQ_EXPR
&& code2 == LT_EXPR
   && cmp == 0
-  && allbits)
+  && allbits
+  && ((VECTOR_BOOLEAN_TYPE_P (type)
+  && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, LE_EXPR))
+  || !VECTOR_TYPE_P (TREE_TYPE (@1
(le @c0 @2))
   /* (a == (b-1)) | (a >= b) -> a >= (b-1) */
   (if (code1 == EQ_EXPR
&& code2 == GE_EXPR
   && one_before
-  && allbits)
+  && allbits
+  && ((VECTOR_BOOLEAN_TYPE_P (type)
+  && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, GE_EXPR))
+  || !VECTOR_TYPE_P (TREE_TYPE (@1
(ge @c0 (convert @1)

Re: [PATCH] Add string_slice class.

2025-07-03 Thread Richard Sandiford
Alfie Richards  writes:
> +/* string_slice inherits from array_slice, specifically to refer to a 
> substring
> +   of a character array.
> +   It includes some string like helpers.  */
> +class string_slice : public array_slice
> +{
> +public:
> +  string_slice () : array_slice () {}
> +  string_slice (const char *str) : array_slice (str, strlen (str)) {}
> +  explicit string_slice (const char *str, size_t len) :
> +array_slice (str, len) {}
> +  explicit string_slice (const char *start, const char *end) :
> +array_slice (start, end - start) {}

Sorry for the formatting nits, but I think the usual style is to put
the : on the next line:

  explicit string_slice (const char *str, size_t len)
: array_slice (str, len) {}
  explicit string_slice (const char *start, const char *end)
: array_slice (start, end - start) {}

> +
> +  friend bool operator== (const string_slice &lhs, const string_slice &rhs)
> +  {
> +if (!lhs.is_valid () || !rhs.is_valid ())
> +  return false;
> +if (lhs.size () != rhs.size ())
> +  return false;
> +return memcmp (lhs.begin (), rhs.begin (), lhs.size ()) == 0;
> +  }
> +
> +  friend bool operator!= (const string_slice &lhs, const string_slice &rhs)
> +  {
> +return !(lhs == rhs);
> +  }
> +
> +  /* Returns an invalid string_slice.  */
> +  static string_slice invalid ()
> +  {
> +return string_slice (nullptr, ~0U);
> +  }
> +
> +  /* tokenize is used to split a string by some deliminator into
> + strtok_slice's.  Similarly to the posix strtok_r.but without modifying 
> the

string_slices

OK with those changes, thanks.

Richard

> + input string, and returning all tokens which may be empty in the case
> + of an empty input string of consecutive deliminators.  */
> +  static string_slice tokenize (string_slice *str, string_slice delims);
> +
> +  /* Removes white space from the front and back of the string_slice.  */
> +  string_slice strip ();
> +
> +  /* Compares two string_slices in lexographical ordering.  */
> +  static int strcmp (string_slice str1, string_slice str2);
> +};
> +
>  #endif // GCC_VEC_H


Re: [PATCH v6 1/3][Middle-end] Provide more contexts for -Warray-bounds, -Wstringop-*warning messages due to code movements from compiler transformation (Part 1) [PR109071,PR85788,PR88771,PR106762,PR1

2025-07-03 Thread Qing Zhao
Another update on this:

> On Jun 30, 2025, at 11:51, Qing Zhao  wrote:
>> 
>>>   For each single predecessor block, locate the conditional statement
>>>   in the end of the block. determine whether the STMT is on the taken
>>>   path of the condition. Add these two information to each event of
>>>   the path.  */
>>> 
>>> 
>>> The good news is:  With the above simple heuristic and a simple back 
>>> tracing of  the CFG, all the
>>> current testing cases for the following PRs passed without any issue:
>>> 
>>> PR109071
>>> PR88771
>>> PR85788
>>> PR108770
>>> PR106762
>>> PR115274
>>> PR117179
>> 
>> Nice.  An incremental improvement would be to instead of handling
>> single predecessors, skip to the immediate dominator - this way
>> uninteresting intermediate CFG is skipped.  This might be confusing
>> and the next "step" could be very far away, not sure how to best
>> handle this.  But it would handle intermediate conditional code, like
>> 
>> if (i < 10)
>>   {
>>  if (dollar)
>>printf ("dollar");
>>  else
>>printf ("euro");
>>  printf ("%d", amout[i]);
>>   }
>> 
>> where you'd not able to go up to if (i < 10).
> 
> Yes, immediate dominator should work for such cases. 
>> Maybe a simple heuristic
>> on number of line to not skip too far works, or simply ignore this
>> for now, or simply only use immediate dominators for the first
>> condition to be printed?
> 
> I will experiment a little bit on this.  Do you have any more complicate 
> testing cases? That will be very helpful.

  /* For the following code:

  if (i == -1)
{
  if (is_dollar)
printf ("dollar");
  else
printf ("euro");
  a[i] = -1;
}
  else
a[i] = i;

  it has the following CFG:
 B2
/  \
   V\
  B3 \
  / \   \
 V   V V
 B4  B5 B7
   \ /
V
B6

  the STMT is in B6, and the interesting condition for the diagnostic
  is in the edge B2->B3.
  We should locate the immediate dominator of B6, i.e., B3, then check
  whether it has a single predecessor.
  */

If the current block doesn’t have a single predecessor,  We can get the 
immediate dominator of the current block,
And then backtracing to its single predecessor to locate the interesting 
condition. 

With this improvement.  All the old testing cases and these new testing cases 
works well. 

Let me know if you have any more comments and suggestions.

Thanks a lot.

Qing




Re: [PATCH] libquadmath: add quad support for trig-pi functions

2025-07-03 Thread Steve Kargl
On Thu, Jul 03, 2025 at 02:43:43PM +0200, Michael Matz wrote:
> Hello,
> 
> On Thu, 3 Jul 2025, Yuao Ma wrote:
> 
> > This patch adds the required function for Fortran trigonometric functions to
> > work with glibc versions prior to 2.26. It's based on glibc source commit
> > 632d895f3e5d98162f77b9c3c1da4ec19968b671.
> > 
> > I've built it successfully on my end. Documentation is also included.
> > 
> > Please take a look when you have a moment.
> 
> +__float128
> +cospiq (__float128 x)
> +{
> ...
> +  if (__builtin_islessequal (x, 0.25Q))
> +return cosq (M_PIq * x);
> 
> Isn't the whole raison d'etre for the trig-pi functions that the internal 
> argument reduction against multiples of pi becomes trivial and hence (a) 
> performant, and (b) doesn't introduce rounding artifacts?  Expressing the 
> trig-pi functions in terms of their counterparts completely defeats this 
> purpose.  The other way around would be more sensible for the cases where 
> it works, but the above doesn't seem very attractive.
> 

It's more than just easier range reduction.  There are special
cases that give exact results.  For example, if x is integer,
then sinpi(x) = +-0, exactly.

It's also telling when looking at reported accuracy
https://members.loria.fr/PZimmermann/papers/accuracy.pdf

-- 
Steve


[PATCH] fortran: Add the preliminary code of MOVE_ALLOC arguments

2025-07-03 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on aarch64-unknown-linux-gnu.
OK for master?

-- >8 --

Add the preliminary code produced for the evaluation of the FROM and TO
arguments of the MOVE_ALLOC intrinsic before using their values.
Before this change, the preliminary code was ignored and dropped,
limiting the validity of the implementation of MOVE_ALLOC to simple
cases without preliminary code.

This change also adds the cleanup code of the same arguments.  It
doesn't make any difference on the testcase though.  Because of the
limited set of arguments that are allowed (variables or components
without subreference), it is possible that the cleanup code is actually
guaranteed to be empty.  At least adding the cleanup code makes the
array case consistent with the scalar case.

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (conv_intrinsic_move_alloc): Add pre and
post code for the FROM and TO arguments.

gcc/testsuite/ChangeLog:

* gfortran.dg/move_alloc_20.f03: New test.
---
 gcc/fortran/trans-intrinsic.cc  |   5 +
 gcc/testsuite/gfortran.dg/move_alloc_20.f03 | 151 
 2 files changed, 156 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/move_alloc_20.f03

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index f1bfd3eee51..be984271d6a 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -13101,6 +13101,8 @@ conv_intrinsic_move_alloc (gfc_code *code)
 }
   gfc_conv_expr_descriptor (&to_se, to_expr);
   gfc_conv_expr_descriptor (&from_se, from_expr);
+  gfc_add_block_to_block (&block, &to_se.pre);
+  gfc_add_block_to_block (&block, &from_se.pre);
 
   /* For coarrays, call SYNC ALL if TO is already deallocated as MOVE_ALLOC
  is an image control "statement", cf. IR F08/0040 in 12-006A.  */
@@ -13174,6 +13176,9 @@ conv_intrinsic_move_alloc (gfc_code *code)
   if (fin_label)
 gfc_add_expr_to_block (&block, build1_v (LABEL_EXPR, fin_label));
 
+  gfc_add_block_to_block (&block, &to_se.post);
+  gfc_add_block_to_block (&block, &from_se.post);
+
   return gfc_finish_block (&block);
 }
 
diff --git a/gcc/testsuite/gfortran.dg/move_alloc_20.f03 
b/gcc/testsuite/gfortran.dg/move_alloc_20.f03
new file mode 100644
index 000..20403c30028
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/move_alloc_20.f03
@@ -0,0 +1,151 @@
+! { dg-do run }
+!
+! Check the presence of the pre and post code of the FROM and TO arguments
+! of the MOVE_ALLOC intrinsic subroutine.
+
+module m
+  implicit none
+  type :: t
+integer, allocatable :: a(:)
+  end type
+end module 
+
+module pre
+  use m
+  implicit none
+  private
+  public :: check_pre
+
+contains
+
+  subroutine check_pre
+integer, parameter :: n = 5
+type(t) :: x(n)
+integer, allocatable :: tmp(:)
+integer :: array(4) = [ -1, 0, 1, 2 ]
+integer :: i
+
+if (allocated(tmp)) error stop 1
+
+tmp = [17]
+
+if (.not. allocated(tmp)) error stop 11
+if (any(shape(tmp) /= [1])) error stop 12
+if (any(tmp /= [17])) error stop 13
+do i=1,n
+  if (allocated(x(i)%a)) error stop 14
+end do
+
+! Check that the index of X is properly computed for the evaluation of TO.
+call move_alloc(tmp, x(sum(array))%a)
+
+do i=1,n
+  if (i == 2) cycle
+  if (allocated(x(i)%a)) error stop 21
+end do
+if (.not. allocated(x(2)%a)) error stop 22
+if (any(shape(x(2)%a) /= [1])) error stop 23
+if (any(x(2)%a /= [17])) error stop 24
+if (allocated(tmp)) error stop 25
+
+! Check that the index of X is properly computed for the evaluation of 
FROM.
+call move_alloc(x(sum(array))%a, tmp)
+
+if (.not. allocated(tmp)) error stop 31
+if (any(shape(tmp) /= [1])) error stop 32
+if (any(tmp /= [17])) error stop 33
+do i=1,n
+  if (allocated(x(i)%a)) error stop 34
+end do
+  end subroutine
+
+end module
+
+module post
+  use m
+  implicit none
+  private
+  public :: check_post
+  integer, parameter :: n = 5
+  type(t), target :: x(n)
+  type :: u
+integer :: a
+  contains
+final :: finalize
+  end type
+  integer :: finalization_count = 0
+
+contains
+
+  function idx(arg)
+type(u) :: arg
+integer :: idx
+idx = mod(arg%a, n)
+  end function
+
+  subroutine check_post
+type(u) :: y
+integer, allocatable :: tmp(:)
+integer, target :: array(4) = [ -1, 0, 1, 2 ]
+integer :: i
+
+y%a = 12
+
+if (allocated(tmp)) error stop 1
+
+tmp = [37]
+
+if (.not. allocated(tmp)) error stop 11
+if (any(shape(tmp) /= [1])) error stop 12
+if (any(tmp /= [37])) error stop 13
+if (finalization_count /= 0) error stop 14
+do i=1,n
+  if (allocated(x(i)%a)) error stop 15
+end do
+
+! Check that the cleanup code for the evaluation of TO is properly
+! executed after MOVE_ALLOC: the result of GET_U should be finalized.
+call move_alloc(tmp, x(idx(get_u(y)))%a)
+
+do i=1,n
+  if (i == 2) cycle
+ 

Re: [PATCH] fortran: Add the preliminary code of MOVE_ALLOC arguments

2025-07-03 Thread Steve Kargl
On Thu, Jul 03, 2025 at 10:12:52PM +0200, Mikael Morin wrote:
> From: Mikael Morin 
> 
> Regression-tested on aarch64-unknown-linux-gnu.
> OK for master?
> 

Yes.  Almost looks obvious once someone finds and fixes the issue.

Thanks for the patch.

-- 
Steve


[committed] c++: Fix a pasto in the PR120471 fix [PR120940]

2025-07-03 Thread Jakub Jelinek
Hi!

No idea how this slipped in, I'm terribly sorry.
Strangely nothing in the testsuite has caught this, so I've added
a new test for that.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
and for 15.2, 14.4, 13.4 and 12.5.

2025-07-03  Jakub Jelinek  

PR c++/120940
* typeck.cc (cp_build_array_ref): Fix a pasto.

* g++.dg/parse/pr120940.C: New test.
* g++.dg/warn/Wduplicated-branches9.C: New test.

--- gcc/cp/typeck.cc.jj 2025-07-03 12:44:48.361162801 +0200
+++ gcc/cp/typeck.cc2025-07-03 19:32:04.155912353 +0200
@@ -4004,7 +4004,7 @@ cp_build_array_ref (location_t loc, tree
   tree op0, op1, op2;
   op0 = TREE_OPERAND (array, 0);
   op1 = TREE_OPERAND (array, 1);
-  op2 = TREE_OPERAND (array, 1);
+  op2 = TREE_OPERAND (array, 2);
   if (TREE_SIDE_EFFECTS (idx) || !tree_invariant_p (idx))
{
  /* If idx could possibly have some SAVE_EXPRs, turning
--- gcc/testsuite/g++.dg/parse/pr120940.C.jj2025-07-03 19:39:26.808149189 
+0200
+++ gcc/testsuite/g++.dg/parse/pr120940.C   2025-07-03 19:42:27.499903370 
+0200
@@ -0,0 +1,18 @@
+// PR c++/120940
+// { dg-do run }
+
+int a[8] = { 1, 2, 3, 4, 5, 6, 7, 8 };
+int b[8] = { 9, 10, 11, 12, 13, 14, 15, 16 };
+
+__attribute__((noipa)) int
+foo (int x, int y)
+{
+  return (x ? a : b)[y];
+}
+
+int
+main ()
+{
+  if (foo (1, 4) != 5 || foo (0, 6) != 15)
+__builtin_abort ();
+}
--- gcc/testsuite/g++.dg/warn/Wduplicated-branches9.C.jj2025-07-03 
19:35:53.383915748 +0200
+++ gcc/testsuite/g++.dg/warn/Wduplicated-branches9.C   2025-07-03 
19:35:47.132997460 +0200
@@ -0,0 +1,11 @@
+// PR c++/120940
+// { dg-do compile }
+// { dg-options "-Wduplicated-branches" }
+
+static char a[16][8], b[16][8];
+
+char *
+foo (int x, int y)
+{
+  return (x ? a : b)[y];
+}

Jakub



Re: [PATCH] middle-end: Fix complex lowering of cabs with no LHS [PR120369]

2025-07-03 Thread Andrew Pinski
On Tue, May 20, 2025 at 6:44 PM Andrew Pinski  wrote:
>
> This was introduced by r15-1797-gd8fe4f05ef448e . I had missed that
> the LHS of the cabs call could be NULL. This seems to only happen at -O0,
> I tried to produce one that happens at -O1 but needed many different
> options to prevent the removal of the call.
> Anyways the fix is just keep around the call if the LHS is null.
>
> Bootstrapped and tested on x86_64-linux-gnu.

Backported to GCC 15 also.

>
> PR middle-end/120369
>
> gcc/ChangeLog:
>
> * tree-complex.cc (gimple_expand_builtin_cabs): Return early
> if the LHS of cabs is null.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/pr120369-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/gcc.dg/torture/pr120369-1.c | 9 +
>  gcc/tree-complex.cc   | 4 
>  2 files changed, 13 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr120369-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr120369-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr120369-1.c
> new file mode 100644
> index 000..4c20fb0932f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr120369-1.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* PR middle-end/120369 */
> +
> +/* Make sure cabs without a lhs does not cause an ICE. */
> +void f()
> +{
> +  double _Complex z = 1.0;
> +  __builtin_cabs(z);
> +}
> diff --git a/gcc/tree-complex.cc b/gcc/tree-complex.cc
> index 8a812d4bf9b..e339b3a5b37 100644
> --- a/gcc/tree-complex.cc
> +++ b/gcc/tree-complex.cc
> @@ -1715,6 +1715,10 @@ gimple_expand_builtin_cabs (gimple_stmt_iterator *gsi, 
> gimple *old_stmt)
>
>tree lhs = gimple_call_lhs (old_stmt);
>
> +  /* If there is not a LHS, then just keep the statement around.  */
> +  if (!lhs)
> +return;
> +
>real_part = extract_component (gsi, arg, false, true);
>imag_part = extract_component (gsi, arg, true, true);
>location_t loc = gimple_location (old_stmt);
> --
> 2.43.0
>


Re: [PATCH] libquadmath: add quad support for trig-pi functions

2025-07-03 Thread Michael Matz
Hello,

On Thu, 3 Jul 2025, Joseph Myers wrote:

> > > Isn't the whole raison d'etre for the trig-pi functions that the internal 
> > > argument reduction against multiples of pi becomes trivial and hence (a) 
> > > performant, and (b) doesn't introduce rounding artifacts?  Expressing the 
> > > trig-pi functions in terms of their counterparts completely defeats this 
> > > purpose.  The other way around would be more sensible for the cases where 
> > > it works, but the above doesn't seem very attractive.
> 
> >   x = M_FABS (x - M_LIT (2.0) * M_SUF (round) (M_LIT (0.5) * x));
> 
> In particular, this is what trivial range reduction looks like: no need to 
> do multiple-precision multiplication with the relevant bits of a 
> multiple-precision value of 1/pi, just round to the nearest integer 
> (typically a single instruction).

Yes.  And then the above is multiplied by PI, passed to cos/sin and that 
one then tries to figure out the multiple of PI (i.e. the 'x' above) again 
via range reduction (not a _terribly_ slow one anymore in a good 
implementation, because of the limited input range, but still).  That's 
backwards.


Ciao,
Michael.


Re: [PATCH] libquadmath: add quad support for trig-pi functions

2025-07-03 Thread Joseph Myers
On Thu, 3 Jul 2025, Michael Matz wrote:

> Yes.  And then the above is multiplied by PI, passed to cos/sin and that 
> one then tries to figure out the multiple of PI (i.e. the 'x' above) again 
> via range reduction (not a _terribly_ slow one anymore in a good 
> implementation, because of the limited input range, but still).  That's 
> backwards.

People are encouraged to contribute format-specific implementations of 
these functions to glibc (for example, from CORE-MATH), after adding 
inputs to the glibc benchmarks so it can be verified if new 
implementations are a performance improvement.  Indeed, that's already 
been done for various functions for binary32 format.  The point of 
type-generic implementations is not to be optimal for performance or 
accuracy but to get the API supported across all glibc configurations 
within a reasonable time (including formats such as IBM long double that 
probably no-one cares enough about to write specific implementations for 
now), so that further improvements can be made incrementally.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] libquadmath: add quad support for trig-pi functions

2025-07-03 Thread Andreas Schwab
On Jul 03 2025, Michael Matz wrote:

> Yes.  And then the above is multiplied by PI, passed to cos/sin and that 
> one then tries to figure out the multiple of PI (i.e. the 'x' above) again 
> via range reduction (not a _terribly_ slow one anymore in a good 
> implementation, because of the limited input range, but still).

That "range reduction" consists of a simple compare against the point
where range reduction is required, which is cheap.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[Ada] Remove left-overs of front-end exception mechanism

2025-07-03 Thread Eric Botcazou
Tested on x86-64/Linux, applied on the mainline.


2025-07-03  Eric Botcazou  

* gcc-interface/Makefile.in (gnatlib-sjlj): Delete.
(gnatlib-zcx): Do not modify Frontend_Exceptions constant.
* libgnat/system-linux-loongarch.ads (Frontend_Exceptions): Delete.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/Makefile.in b/gcc/ada/gcc-interface/Makefile.in
index 3557b46c64d..8615b598623 100644
--- a/gcc/ada/gcc-interface/Makefile.in
+++ b/gcc/ada/gcc-interface/Makefile.in
@@ -840,35 +840,6 @@ gnatlib-shared:
 	 PICFLAG_FOR_TARGET="$(PICFLAG_FOR_TARGET)" \
 	 $(GNATLIB_SHARED)
 
-# When building a SJLJ runtime for VxWorks, we need to ensure that the extra
-# linker options needed for ZCX are not passed to prevent the inclusion of
-# useless objects and potential troubles from the presence of extra symbols
-# and references in some configurations.  The inhibition is performed by
-# commenting the pragma instead of deleting the line, as the latter might
-# result in getting multiple blank lines, hence possible style check errors.
-gnatlib-sjlj:
-	$(MAKE) $(FLAGS_TO_PASS) \
-	 EH_MECHANISM="" \
-	 MULTISUBDIR="$(MULTISUBDIR)" \
-	 THREAD_KIND="$(THREAD_KIND)" \
-	 LN_S="$(LN_S)" \
-	 ../stamp-gnatlib1-$(RTSDIR)
-	sed \
-	  -e 's/Frontend_Exceptions.*/Frontend_Exceptions   : constant Boolean := True;/' \
-	  -e 's/ZCX_By_Default.*/ZCX_By_Default: constant Boolean := False;/' \
-	  $(RTSDIR)/system.ads > $(RTSDIR)/s.ads
-	$(MV) $(RTSDIR)/s.ads $(RTSDIR)/system.ads
-	$(MAKE) $(FLAGS_TO_PASS) \
-	 EH_MECHANISM="" \
-	 GNATLIBFLAGS="$(GNATLIBFLAGS)" \
-	 GNATLIBCFLAGS="$(GNATLIBCFLAGS)" \
-	 GNATLIBCFLAGS_FOR_C="$(GNATLIBCFLAGS_FOR_C)" \
-	 FORCE_DEBUG_ADAFLAGS="$(FORCE_DEBUG_ADAFLAGS)" \
-	 MULTISUBDIR="$(MULTISUBDIR)" \
-	 THREAD_KIND="$(THREAD_KIND)" \
-	 LN_S="$(LN_S)" \
-	 gnatlib
-
 gnatlib-zcx:
 	$(MAKE) $(FLAGS_TO_PASS) \
 	 EH_MECHANISM="-gcc" \
@@ -877,7 +848,6 @@ gnatlib-zcx:
 	 LN_S="$(LN_S)" \
 	 ../stamp-gnatlib1-$(RTSDIR)
 	sed \
-	  -e 's/Frontend_Exceptions.*/Frontend_Exceptions   : constant Boolean := False;/' \
 	  -e 's/ZCX_By_Default.*/ZCX_By_Default: constant Boolean := True;/' \
 	  $(RTSDIR)/system.ads > $(RTSDIR)/s.ads
 	$(MV) $(RTSDIR)/s.ads $(RTSDIR)/system.ads
diff --git a/gcc/ada/libgnat/system-linux-loongarch.ads b/gcc/ada/libgnat/system-linux-loongarch.ads
index 77a21396255..683b7a44155 100644
--- a/gcc/ada/libgnat/system-linux-loongarch.ads
+++ b/gcc/ada/libgnat/system-linux-loongarch.ads
@@ -139,7 +139,6 @@ private
Always_Compatible_Rep : constant Boolean := False;
Suppress_Standard_Library : constant Boolean := False;
Use_Ada_Main_Program_Name : constant Boolean := False;
-   Frontend_Exceptions   : constant Boolean := False;
ZCX_By_Default: constant Boolean := True;
 
 end System;


Re: [PATCH] c-family: Tweak ptr +- (expr +- cst) FE optimization [PR120837]

2025-07-03 Thread Richard Biener



> Am 03.07.2025 um 16:11 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The following testcase is miscompiled with -fsanitize=undefined but we
> introduce UB into the IL even without that flag.
> 
> The optimization ptr +- (expr +- cst) when expr/cst have undefined
> overflow into (ptr +- cst) +- expr is sometimes simply not valid,
> without careful analysis on what ptr points to we don't know if it
> is valid to do (ptr +- cst) pointer arithmetics.
> E.g. on the testcase, ptr points to start of an array (actually
> conditionally one or another) and cst is -1, so ptr - 1 is invalid
> pointer arithmetics, while ptr + (expr - 1) can be valid if expr
> is at runtime always > 1 and smaller than size of the array ptr points
> to + 1.
> 
> Unfortunately, removing this 1992-ish optimization altogether causes
> FAIL: c-c++-common/restrict-2.c  -Wc++-compat   scan-tree-dump-times lim2 
> "Moving statement" 11
> FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump ch2 "is now do-while 
> loop"
> FAIL: gcc.dg/tree-ssa/copy-headers-5.c scan-tree-dump-times ch2 "  if " 3
> FAIL: gcc.dg/vect/pr57558-2.c scan-tree-dump vect "vectorized 1 loops"
> FAIL: gcc.dg/vect/pr57558-2.c -flto -ffat-lto-objects  scan-tree-dump vect 
> "vectorized 1 loops"
> regressions (restrict-2.c also for C++ in all std modes).  I've been thinking
> about some match.pd optimization for signed integer addition/subtraction of
> constant followed by widening integral conversion followed by multiplication
> or left shift, but that wouldn't help 32-bit arches.
> 
> So, instead at least for now, the following patch keeps doing the
> optimization, just doesn't perform it in pointer arithmetics.
> pointer_int_sum itself actually adds the multiplication by size_exp,
> so ptr + expr is turned into ptr p+ expr * size_exp,
> so this patch will try to optimize
> ptr + (expr +- cst)
> into
> ptr p+ ((sizetype)expr * size_exp +- (sizetype)cst * size_exp)
> and
> ptr - (expr +- cst)
> into
> ptr p+ -((sizetype)expr * size_exp +- (sizetype)cst * size_exp)

So the important part is the distribution of the multiplication, not the 
(invalid) association?

OK then.

Thanks,
Richard 

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2025-07-03  Jakub Jelinek  
> 
>PR c/120837
>* c-common.cc (pointer_int_sum): Rewrite the intop PLUS_EXPR or
>MINUS_EXPR optimization into extension of both intop operands,
>their separate multiplication and then addition/subtraction followed
>by rest of pointer_int_sum handling after the multiplication.
> 
>* gcc.dg/ubsan/pr120837.c: New test.
> 
> --- gcc/c-family/c-common.cc.jj2025-07-01 09:36:43.115908270 +0200
> +++ gcc/c-family/c-common.cc2025-07-03 12:31:12.789367448 +0200
> @@ -3438,20 +3438,41 @@ pointer_int_sum (location_t loc, enum tr
> an overflow error if the constant is negative but INTOP is not.  */
>   && (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (intop))
>  || (TYPE_PRECISION (TREE_TYPE (intop))
> -  == TYPE_PRECISION (TREE_TYPE (ptrop)
> +  == TYPE_PRECISION (TREE_TYPE (ptrop
> +  && TYPE_PRECISION (TREE_TYPE (intop)) <= TYPE_PRECISION (sizetype))
> {
> -  enum tree_code subcode = resultcode;
> -  tree int_type = TREE_TYPE (intop);
> -  if (TREE_CODE (intop) == MINUS_EXPR)
> -subcode = (subcode == PLUS_EXPR ? MINUS_EXPR : PLUS_EXPR);
> -  /* Convert both subexpression types to the type of intop,
> - because weird cases involving pointer arithmetic
> - can result in a sum or difference with different type args.  */
> -  ptrop = build_binary_op (EXPR_LOCATION (TREE_OPERAND (intop, 1)),
> -   subcode, ptrop,
> -   convert (int_type, TREE_OPERAND (intop, 1)),
> -   true);
> -  intop = convert (int_type, TREE_OPERAND (intop, 0));
> +  tree intop0 = TREE_OPERAND (intop, 0);
> +  tree intop1 = TREE_OPERAND (intop, 1);
> +  if (TYPE_PRECISION (TREE_TYPE (intop)) != TYPE_PRECISION (sizetype)
> +  || TYPE_UNSIGNED (TREE_TYPE (intop)) != TYPE_UNSIGNED (sizetype))
> +{
> +  tree optype = c_common_type_for_size (TYPE_PRECISION (sizetype),
> +TYPE_UNSIGNED (sizetype));
> +  intop0 = convert (optype, intop0);
> +  intop1 = convert (optype, intop1);
> +}
> +  tree t = fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (intop0), intop0,
> +convert (TREE_TYPE (intop0), size_exp));
> +  intop0 = convert (sizetype, t);
> +  if (TREE_OVERFLOW_P (intop0) && !TREE_OVERFLOW (t))
> +intop0 = wide_int_to_tree (TREE_TYPE (intop0), wi::to_wide (intop0));
> +  t = fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (intop1), intop1,
> +   convert (TREE_TYPE (intop1), size_exp));
> +  intop1 = convert (sizetype, t);
> +  if (TREE_OVERFLOW_P (intop1) && !TREE_OVERFLOW (t))
> +intop1 = wide_int_to_tree (TREE_TYPE (intop1), wi::to_wide (intop1));
> +  intop = build_binary_op 

Re: [PATCH] c-family: Tweak ptr +- (expr +- cst) FE optimization [PR120837]

2025-07-03 Thread Jakub Jelinek
On Thu, Jul 03, 2025 at 08:12:22PM +0200, Richard Biener wrote:
> > So, instead at least for now, the following patch keeps doing the
> > optimization, just doesn't perform it in pointer arithmetics.
> > pointer_int_sum itself actually adds the multiplication by size_exp,
> > so ptr + expr is turned into ptr p+ expr * size_exp,
> > so this patch will try to optimize
> > ptr + (expr +- cst)
> > into
> > ptr p+ ((sizetype)expr * size_exp +- (sizetype)cst * size_exp)
> > and
> > ptr - (expr +- cst)
> > into
> > ptr p+ -((sizetype)expr * size_exp +- (sizetype)cst * size_exp)
> 
> So the important part is the distribution of the multiplication, not the 
> (invalid) association?

For the regressions yes.
Consider even trivial cases like
int a[1024];

int *
foo (int x)
{
  return a + (x - 1);
}

Vanilla trunk as well as trunk with this patch compile this on x86_64 -O2
to
movslq  %edi, %rdi
leaqa-4(,%rdi,4), %rax
ret
With just the c-common.cc opt removed, I get
subl$1, %edi
movslq  %edi, %rdi
leaqa(,%rdi,4), %rax
ret
instead.  I admit I haven't checked other targets, just ia32 and there is
no change whatsoever (expectedly, sizetype is 32-bit, so RTL opts can fold
it into leal just fine.  But my hope is that
this patch keeps similar code like before more often, even when clearly it
will not be always the case.
E.g. for the (x ? a : b)[y - 1] case when a and b are arrays,
we used to do (x ? &a - 4 : &b - 4) p+ y * 4 so the subtraction done
on each of the branches, now it will be just added later on.  But at least
on sizetype and not on the narrower type.

Jakub



Re: [PATCH] tree-optimization/120927 - 510.parest_r segfault with masked epilog

2025-07-03 Thread Richard Sandiford
Richard Biener  writes:
> The following fixes bad alignment computaton for epilog vectorization
> when as in this case for 510.parest_r and masked epilog vectorization
> with AVX512 we end up choosing AVX to vectorize the main loop and
> masked AVX512 (sic!) to vectorize the epilog.  In that case alignment
> analysis for the epilog tries to force alignment of the base to 64,
> but that cannot possibly help the epilog when the main loop had used
> a vector mode with smaller alignment requirement.
>
> There's another issue, that the check whether the step preserves
> alignment needs to consider possibly previously involved VFs
> (here, the main loops smaller VF) as well.
>
> These might not be the only case with problems for such a mode mix
> but at least there it seems wise to never use DR alignment forcing
> when analyzing an epilog.
>
> We get to chose this mode setup because the iteration over epilog
> modes doesn't prevent this, the maybe_ge (cached_vf_per_mode[0],
> first_vinfo_vf) skip is conditional on !supports_partial_vectors
> and it is also conditional on having a cached VF.  Further nothing
> in vect_analyze_loop_1 rejects this setup - it might be conceivable
> that a target can do masking only for larger modes.  There is a
> second reason we end up with this mode setup, which is that
> vect_need_peeling_or_partial_vectors_p says we do not need
> peeling or partial vectors when analyzing the main loop with
> AVX512 (if it would say so we'd have chosen a masked AVX512
> epilog-only vectorization).  It does that because it looks at
> LOOP_VINFO_COST_MODEL_THRESHOLD (which is not yet computed, so
> always zero at this point), and compares max_niter (5) against
> the VF (8), but not with equality as the comment says but with
> greater.  This also needs looking at, PR120939.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
>   PR tree-optimization/120927
>   * tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
>   Do not force a DRs base alignment when analyzing an
>   epilog loop.  Check whether the step preserves alignment
>   for all VFs possibly involved sofar.
>
>   * gcc.dg/vect/vect-pr120927.c: New testcase.

LGTM FWIW.

Richard

> ---
>  gcc/testsuite/gcc.dg/vect/vect-pr120927.c | 24 +++
>  gcc/tree-vect-data-refs.cc| 16 +++
>  2 files changed, 36 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-pr120927.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-pr120927.c 
> b/gcc/testsuite/gcc.dg/vect/vect-pr120927.c
> new file mode 100644
> index 000..793593f758f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-pr120927.c
> @@ -0,0 +1,24 @@
> +/* { dg-additional-options "--param vect-partial-vector-usage=1" } */
> +/* { dg-additional-options "-mavx512bw -mavx512vl" { target avx512f_runtime 
> } } */
> +
> +#include "tree-vect.h"
> +
> +static const double a[] = { 1., 2., 3., 4., 5. };
> +
> +void __attribute__((noipa))
> +foo (double *b, double *bp, double c, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +b[i] = bp[i] = a[i] * c;
> +}
> +
> +int main()
> +{
> +  double b[6], bp[6];
> +  b[5] = bp[5] = 13.;
> +  check_vect ();
> +  foo (b, bp, 3., 5);
> +  if (b[5] != 13. || bp[5] != 13.)
> +abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index ee040eb9888..c84cd29116e 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -1501,10 +1501,17 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
> dr_vec_info *dr_info,
>/* We can only use base and misalignment information relative to
>an innermost loop if the misalignment stays the same throughout the
>execution of the loop.  As above, this is the case if the stride of
> -  the dataref evenly divides by the alignment.  */
> -  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> -  step_preserves_misalignment_p
> - = multiple_p (drb->step_alignment * vf, vect_align_c);
> +  the dataref evenly divides by the alignment.  Make sure to check
> +  previous epilogues and the main loop.  */
> +  step_preserves_misalignment_p = true;
> +  auto lvinfo = loop_vinfo;
> +  while (lvinfo)
> + {
> +   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (lvinfo);
> +   step_preserves_misalignment_p
> + &= multiple_p (drb->step_alignment * vf, vect_align_c);
> +   lvinfo = LOOP_VINFO_ORIG_LOOP_INFO (lvinfo);
> + }
>  
>if (!step_preserves_misalignment_p && dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -1571,6 +1578,7 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
> dr_vec_info *dr_info,
>unsigned int max_alignment;
>tree base = get_base_for_alignment (drb->base_address, &max_alignment);
>if (max_alignment < vect_align_c
> +   || (loop_vinfo && LOOP_VINFO_EPILOGUE_P (loop_vinf

Re: [PATCH v4 1/6] c-family: add btf_type_tag and btf_decl_tag attributes

2025-07-03 Thread David Faust



On 7/2/25 00:42, Richard Biener wrote:
> On Tue, Jun 10, 2025 at 11:40 PM David Faust  wrote:
>>
>> Add two new c-family attributes, "btf_type_tag" and "btf_decl_tag"
>> along with a simple shared handler for them.
>>
>> gcc/c-family/
>> * c-attribs.cc (c_common_attribute_table): Add btf_decl_tag and
>> btf_type_tag attributes.
>> (handle_btf_tag_attribute): New handler for both new attributes.
>> ---
>>  gcc/c-family/c-attribs.cc | 25 -
>>  1 file changed, 24 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
>> index 5a0e3d328ba..cc1efaeaaec 100644
>> --- a/gcc/c-family/c-attribs.cc
>> +++ b/gcc/c-family/c-attribs.cc
>> @@ -189,6 +189,8 @@ static tree handle_fd_arg_attribute (tree *, tree, tree, 
>> int, bool *);
>>  static tree handle_flag_enum_attribute (tree *, tree, tree, int, bool *);
>>  static tree handle_null_terminated_string_arg_attribute (tree *, tree, 
>> tree, int, bool *);
>>
>> +static tree handle_btf_tag_attribute (tree *, tree, tree, int, bool *);
>> +
>>  /* Helper to define attribute exclusions.  */
>>  #define ATTR_EXCL(name, function, type, variable)  \
>>{ name, function, type, variable }
>> @@ -640,7 +642,11 @@ const struct attribute_spec c_common_gnu_attributes[] =
>>{ "flag_enum", 0, 0, false, true, false, false,
>>   handle_flag_enum_attribute, NULL },
>>{ "null_terminated_string_arg", 1, 1, false, true, true, false,
>> - handle_null_terminated_string_arg_attribute, 
>> NULL}
>> + handle_null_terminated_string_arg_attribute, 
>> NULL},
>> +  { "btf_type_tag",  1, 1, false, true, false, false,
>> + handle_btf_tag_attribute, NULL},
>> +  { "btf_decl_tag",  1, 1, true, false, false, false,
>> + handle_btf_tag_attribute, NULL}
>>  };
>>
>>  const struct scoped_attribute_specs c_common_gnu_attribute_table =
>> @@ -5101,6 +5107,23 @@ handle_null_terminated_string_arg_attribute (tree 
>> *node, tree name, tree args,
>>return NULL_TREE;
>>  }
>>
>> +/* Handle the "btf_decl_tag" and "btf_type_tag" attributes.  */
>> +
>> +static tree
>> +handle_btf_tag_attribute (tree * ARG_UNUSED (node), tree name, tree args,
>> + int ARG_UNUSED (flags), bool *no_add_attrs)
>> +{
>> +  if (!args)
>> +*no_add_attrs = true;
>> +  else if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
>> +{
>> +  error ("%qE attribute requires a string", name);
>> +  *no_add_attrs = true;
>> +}
>> +
> 
> So with respect to the dwarf2out patch discussion I think attribute
> handling should
> be similar to how we handle the aligned attribute which makes sure to
> build a new
> type variant to apply the attribute to if not ATTR_FLAG_TYPE_IN_PLACE.

Thanks for the pointer to aligned. This part is more clear to me now,
after seeing how the type variants are handled there.

I think you're right, we should do similar for type_tag.  I will try
it and include in the next version.

> 
> Richard.
> 
>> +  return NULL_TREE;
>> +}
>> +
>>  /* Handle the "nonstring" variable attribute.  */
>>
>>  static tree
>> --
>> 2.47.2
>>



[to-be-committed][RISC-V] Add basic instrumentation to fusion detection

2025-07-03 Thread Jeff Law
This is primarily Shreya's work from a few months back.  I just fixed 
the formatting, cobbled together the cover letter/ChangeLog.



We were looking to evaluate some changes from Artemiy that improve GCC's 
ability to discover fusible instruction pairs.  There was no good way to 
get any static data out of the compiler about what kinds of fusions were 
happening.  Yea, you could grub around the .sched dumps looking for the 
magic '+' annotation, then look around at the slim RTL representation 
and make an educated guess about what fused.  But boy that was inconvenient.


All we really needed was a quick note in the dump file that the target 
hook found a fusion pair and what kind was discovered.  That made it 
easy to spot invalid fusions, evaluate the effectiveness of Artemiy's 
work, write/discover testcases for existing fusions and implement new 
fusions.


So from a codegen standpoint this is NFC, it only affects dump file output.

It's gone through the usual testing and I'll wait for pre-commit CI to 
churn through it before moving forward.


Jeff

gcc/
* config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Add basic
instrumentation to all cases where fusion is detected.  Fix
minor formatting goofs found in the process.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8fa1082f7c13..167e78d41ef4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10253,11 +10253,15 @@ riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
  && CONST_INT_P (XEXP (SET_SRC (prev_set), 1))
  && CONST_INT_P (XEXP (SET_SRC (curr_set), 1))
  && INTVAL (XEXP (SET_SRC (prev_set), 1)) == 32
- && (( INTVAL (XEXP (SET_SRC (curr_set), 1)) == 32
-   && riscv_fusion_enabled_p(RISCV_FUSE_ZEXTW) )
- || ( INTVAL (XEXP (SET_SRC (curr_set), 1)) < 32
-  && riscv_fusion_enabled_p(RISCV_FUSE_ZEXTWS
-   return true;
+ && ((INTVAL (XEXP (SET_SRC (curr_set), 1)) == 32
+  && riscv_fusion_enabled_p (RISCV_FUSE_ZEXTW) )
+ || (INTVAL (XEXP (SET_SRC (curr_set), 1)) < 32
+ && riscv_fusion_enabled_p (RISCV_FUSE_ZEXTWS
+   {
+ if (dump_file)
+   fprintf (dump_file, "RISCV_FUSE_ZEXTWS\n");
+ return true;
+   }
 }
 
   if (simple_sets_p && riscv_fusion_enabled_p (RISCV_FUSE_ZEXTH)
@@ -10278,7 +10282,11 @@ riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
  && CONST_INT_P (XEXP (SET_SRC (curr_set), 1))
  && INTVAL (XEXP (SET_SRC (prev_set), 1)) == 48
  && INTVAL (XEXP (SET_SRC (curr_set), 1)) == 48)
-   return true;
+   {
+ if (dump_file)
+   fprintf (dump_file,"RISCV_FUSE_ZEXTH\n");
+ return true;
+   }
 }
 
   if (simple_sets_p && riscv_fusion_enabled_p (RISCV_FUSE_LDINDEXED)
@@ -10297,7 +10305,11 @@ riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
  && GET_CODE (SET_SRC (prev_set)) == PLUS
  && REG_P (XEXP (SET_SRC (prev_set), 0))
  && REG_P (XEXP (SET_SRC (prev_set), 1)))
-   return true;
+   {
+ if (dump_file)
+   fprintf (dump_file, "RISCV_FUSE_LDINDEXED\n");
+ return true;
+   }
 
   /* We are trying to match the following:
   prev (add) == (set (reg:DI rD)
@@ -10313,7 +10325,11 @@ riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
  && GET_CODE (SET_SRC (prev_set)) == PLUS
  && REG_P (XEXP (SET_SRC (prev_set), 0))
  && REG_P (XEXP (SET_SRC (prev_set), 1)))
-   return true;
+   {
+ if (dump_file)
+   fprintf (dump_file, "RISCV_FUSE_LDINDEXED\n");
+ return true;
+   }
 }
 
   if (simple_sets_p && riscv_fusion_enabled_p (RISCV_FUSE_LDPREINCREMENT)
@@ -10332,7 +10348,11 @@ riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
  && GET_CODE (SET_SRC (prev_set)) == PLUS
  && REG_P (XEXP (SET_SRC (prev_set), 0))
  && CONST_INT_P (XEXP (SET_SRC (prev_set), 1)))
-   return true;
+   {
+ if (dump_file)
+   fprintf (dump_file, "RISCV_FUSE_LDPREINCREMENT\n");
+ return true;
+   }
 }
 
   if (simple_sets_p && riscv_fusion_enabled_p (RISCV_FUSE_LUI_ADDI)
@@ -10350,7 +10370,11 @@ riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
  && (GET_CODE (SET_SRC (prev_set)) == HIGH
  || (CONST_INT_P (SET_SRC (prev_set))
  && LUI_OPERAND (INTVAL (SET_SRC (prev_set))
-   return true;
+   {
+ if (dump_file)
+   fprintf (dump_file, "RISCV_FUSE_LUI_ADDI\n");
+ return true;
+   }
 }
 
   if (simple_sets_p && riscv_fusion_enabled_p (RISCV_FUSE_AUIPC_ADDI)
@@ -10372,7 +10396,11 @@ riscv_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
  && CONST_INT_P (XEXP (SET_SRC (curr_set), 1))
  && SMALL_OPERAND (INTVAL (XEXP (SET_S

Re: [PATCH] libquadmath: add quad support for trig-pi functions

2025-07-03 Thread Michael Matz
Hello,

On Thu, 3 Jul 2025, Yuao Ma wrote:

> This patch adds the required function for Fortran trigonometric functions to
> work with glibc versions prior to 2.26. It's based on glibc source commit
> 632d895f3e5d98162f77b9c3c1da4ec19968b671.
> 
> I've built it successfully on my end. Documentation is also included.
> 
> Please take a look when you have a moment.

+__float128
+cospiq (__float128 x)
+{
...
+  if (__builtin_islessequal (x, 0.25Q))
+return cosq (M_PIq * x);

Isn't the whole raison d'etre for the trig-pi functions that the internal 
argument reduction against multiples of pi becomes trivial and hence (a) 
performant, and (b) doesn't introduce rounding artifacts?  Expressing the 
trig-pi functions in terms of their counterparts completely defeats this 
purpose.  The other way around would be more sensible for the cases where 
it works, but the above doesn't seem very attractive.


Ciao,
Michael.


Re: [PATCH] x86-64: Add RDI clobber to tls_local_dynamic_64 patterns

2025-07-03 Thread H.J. Lu
On Thu, Jul 3, 2025 at 2:39 PM Uros Bizjak  wrote:
>
> On Thu, Jul 3, 2025 at 6:32 AM H.J. Lu  wrote:
> >
> > *tls_local_dynamic_64_ uses RDI as the __tls_get_addr argument.
> > Add RDI clobber to tls_local_dynamic_64 patterns to show it.
> >
> > PR target/120908
> > * config/i386/i386.cc (legitimize_tls_address): Pass RDI to
> > gen_tls_local_dynamic_64.
> > * config/i386/i386.md (*tls_local_dynamic_64_): Add RDI
> > clobber and use it to generate LEA.
> > (@tls_local_dynamic_64_): Add a clobber.
>
> *tls_local_dynamic_base_64_largepic needs the same treatment.
>
> > OK for master?
>
> OK with *tls_local_dynamic_base_64_largepic also fixed.
>
> Thanks,
> Uros.

This is the patch I am checking in with a test.  I added RDI clobber
to *tls_global_dynamic_64_largepic and *tls_local_dynamic_base_64_largepic
to avoid unrecognizable insn in the test.

Thanks.

-- 
H.J.
From 17c40e81f2fabb5896222b8b78690a375696d0f5 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 3 Jul 2025 10:54:39 +0800
Subject: [PATCH v2] x86-64: Add RDI clobber to 64-bit dynamic TLS patterns

*tls_global_dynamic_64_largepic, *tls_local_dynamic_64_ and
*tls_local_dynamic_base_64_largepic use RDI as the __tls_get_addr
argument.  Add RDI clobber to these patterns to show it.

gcc/

	PR target/120908
	* config/i386/i386.cc (legitimize_tls_address): Pass RDI to
	gen_tls_local_dynamic_64.
	* config/i386/i386.md (*tls_global_dynamic_64_largepic): Add
	RDI clobber and use it to generate LEA.
	(*tls_local_dynamic_64_): Likewise.
	(*tls_local_dynamic_base_64_largepic): Likewise.
	(@tls_local_dynamic_64_): Add a clobber.

gcc/testsuite/

	PR target/120908
	* gcc.target/i386/pr120908.c: New test.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc  |  3 ++-
 gcc/config/i386/i386.md  | 18 +++---
 gcc/testsuite/gcc.target/i386/pr120908.c | 16 
 3 files changed, 29 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120908.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 5c888b52c1c..eb5b2eb6a86 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -12616,12 +12616,13 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	  if (TARGET_64BIT)
 	{
 	  rtx rax = gen_rtx_REG (Pmode, AX_REG);
+	  rtx rdi = gen_rtx_REG (Pmode, DI_REG);
 	  rtx_insn *insns;
 	  rtx eqv;
 
 	  start_sequence ();
 	  emit_call_insn
-		(gen_tls_local_dynamic_base_64 (Pmode, rax, caddr));
+		(gen_tls_local_dynamic_base_64 (Pmode, rax, caddr, rdi));
 	  insns = end_sequence ();
 
 	  /* Attach a unique REG_EQUAL, to allow the RTL optimizers to
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 370e79bb511..21b9f5ccd7a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -23243,14 +23243,15 @@ (define_insn "*tls_global_dynamic_64_largepic"
 	 (match_operand 4)))
(unspec:DI [(match_operand 1 "tls_symbolic_operand")
 	   (reg:DI SP_REG)]
-	  UNSPEC_TLS_GD)]
+	  UNSPEC_TLS_GD)
+   (clobber (match_operand:DI 5 "register_operand" "=D"))]
   "TARGET_64BIT && ix86_cmodel == CM_LARGE_PIC && !TARGET_PECOFF
&& GET_CODE (operands[3]) == CONST
&& GET_CODE (XEXP (operands[3], 0)) == UNSPEC
&& XINT (XEXP (operands[3], 0), 1) == UNSPEC_PLTOFF"
 {
   output_asm_insn
-("lea{q}\t{%E1@tlsgd(%%rip), %%rdi|rdi, %E1@tlsgd[rip]}", operands);
+("lea{q}\t{%E1@tlsgd(%%rip), %5|%5, %E1@tlsgd[rip]}", operands);
   output_asm_insn ("movabs{q}\t{%3, %%rax|rax, %3}", operands);
   output_asm_insn ("add{q}\t{%2, %%rax|rax, %2}", operands);
   return "call\t{*%%rax|rax}";
@@ -23318,11 +23319,12 @@ (define_insn "*tls_local_dynamic_base_64_"
 	(call:P
 	 (mem:QI (match_operand 1 "constant_call_address_operand" "Bz"))
 	 (match_operand 2)))
-   (unspec:P [(reg:P SP_REG)] UNSPEC_TLS_LD_BASE)]
+   (unspec:P [(reg:P SP_REG)] UNSPEC_TLS_LD_BASE)
+   (clobber (match_operand:P 3 "register_operand" "=D"))]
   "TARGET_64BIT"
 {
   output_asm_insn
-("lea{q}\t{%&@tlsld(%%rip), %%rdi|rdi, %&@tlsld[rip]}", operands);
+("lea{q}\t{%&@tlsld(%%rip), %q3|%q3, %&@tlsld[rip]}", operands);
   if (TARGET_SUN_TLS)
 return "call\t%p1@plt";
   if (flag_plt || !HAVE_AS_IX86_TLS_GET_ADDR_GOT)
@@ -23338,14 +23340,15 @@ (define_insn "*tls_local_dynamic_base_64_largepic"
 	 (mem:QI (plus:DI (match_operand:DI 1 "register_operand" "b")
 			  (match_operand:DI 2 "immediate_operand" "i")))
 	 (match_operand 3)))
-   (unspec:DI [(reg:DI SP_REG)] UNSPEC_TLS_LD_BASE)]
+   (unspec:DI [(reg:DI SP_REG)] UNSPEC_TLS_LD_BASE)
+   (clobber (match_operand:DI 4 "register_operand" "=D"))]
   "TARGET_64BIT && ix86_cmodel == CM_LARGE_PIC && !TARGET_PECOFF
&& GET_CODE (operands[2]) == CONST
&& GET_CODE (XEXP (operands[2], 0)) == UNSPEC
&& XINT (XEXP (operands[2], 0), 1) == UNSPEC_PLTOFF"
 {
   output_asm_insn
-("lea{q}\t{%&@tlsld(%%rip), %%rdi|rdi, %&@tlsld[rip]}", operands);
+("lea{q

Re: [PATCH v3] tree-optimization/120780: Support object size for containing objects

2025-07-03 Thread Jakub Jelinek
On Thu, Jul 03, 2025 at 08:33:45AM +0200, Richard Biener wrote:
> On Wed, Jul 2, 2025 at 11:32 PM Siddhesh Poyarekar  
> wrote:
> >
> > MEM_REF cast of a subobject to its containing object has negative
> > offsets, which objsz sees as an invalid access.  Support this use case
> > by peeking into the structure to validate that the containing object
> > indeed contains a type of the subobject at that offset and if present,
> > adjust the wholesize for the object to allow the negative offset.
> 
> This variant works for me.
> 
> > gcc/ChangeLog:
> >
> > PR tree-optimization/120780
> > * tree-object-size.cc (inner_at_offset,
> > get_wholesize_for_memref): New functions.
> > (addr_object_size): Call GET_WHOLESIZE_FOR_MEMREF.

Please no caps in function names in the ChangeLog, sometimes it
is used for parameter names, but never saw it for functions.

Jakub



[COMMITTED 01/42] ada: Use consistent truncation of 'Value for decimal fixed-point types

2025-07-03 Thread Marc Poulhiès
From: Eric Botcazou 

This uses truncation for all bases instead of for base 10 only.

gcc/ada/ChangeLog:

* libgnat/s-valued.adb (Integer_to_Decimal): Use truncation for the
scaled divide operation performed for bases other than 10.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-valued.adb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/libgnat/s-valued.adb b/gcc/ada/libgnat/s-valued.adb
index 4f2e1020466..b7982b6046f 100644
--- a/gcc/ada/libgnat/s-valued.adb
+++ b/gcc/ada/libgnat/s-valued.adb
@@ -228,9 +228,9 @@ package body System.Value_D is
raise Program_Error;
 end if;
 
---  Perform a scaled divide operation with rounding to match 'Image
+--  Perform a scaled divide operation with truncation
 
-Scaled_Divide (To_Signed (V), Y, Z, Q, R, Round => True);
+Scaled_Divide (To_Signed (V), Y, Z, Q, R, Round => False);
 
 return Q;
  end;
-- 
2.43.0



[COMMITTED 06/42] ada: Document restriction on array length

2025-07-03 Thread Marc Poulhiès
From: Tonu Naks 

gcc/ada/ChangeLog:

* libgnat/i-cstrin.ads (Value): add documentation

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/i-cstrin.ads | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/ada/libgnat/i-cstrin.ads b/gcc/ada/libgnat/i-cstrin.ads
index 0d057d074e5..5939fe041a4 100644
--- a/gcc/ada/libgnat/i-cstrin.ads
+++ b/gcc/ada/libgnat/i-cstrin.ads
@@ -100,6 +100,17 @@ is
 
--  The Value functions copy the contents of a chars_ptr object
--  into a char_array/String.
+   --  There is a guard for a storage error on an object declaration for
+   --  an array type with a modular index type with the size of
+   --  Long_Long_Integer. The special processing is needed in this case
+   --  to compute reliably the size of the object, and eventually, to
+   --  raise Storage_Error, when wrap-around arithmetic might compute
+   --  a meangingless size for the object.
+   --
+   --  The guard raises Storage_Error when
+   --
+   --(Arr'Last / 2 - Arr'First / 2) > (2 ** 30)
+   --
function Value (Item : chars_ptr) return char_array with
  Pre=> Item /= Null_Ptr,
  Global => (Input => C_Memory);
-- 
2.43.0



Re: [PATCH] s390: More vec-perm-const cases.

2025-07-03 Thread Andreas Krebbel

On 6/27/25 8:09 PM, Juergen Christ wrote:

s390 missed constant vector permutation cases based on the vector pack
instruction or changing the size of the vector elements during vector
merge.  This enables some more patterns that do not need to load a
constant vector for permutation.

Bootstrapped and regtested on s390.  Okay for trunk?

gcc/ChangeLog:

* config/s390/s390.cc (expand_perm_with_merge): Add size change cases.
(expand_perm_with_pack): New function.
(vectorize_vec_perm_const_1): Wire up new function.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-perm-merge-1.c: New test.
* gcc.target/s390/vector/vec-perm-pack-1.c: New test.

Signed-off-by: Juergen Christ 


Ok. Thanks!


Andreas



---
  gcc/config/s390/s390.cc   | 169 +++-
  .../gcc.target/s390/vector/vec-perm-merge-1.c | 242 ++
  .../gcc.target/s390/vector/vec-perm-pack-1.c  | 133 ++
  3 files changed, 542 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-perm-merge-1.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-perm-pack-1.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 38267202f668..de9c15c7bd42 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -18041,9 +18041,34 @@ expand_perm_with_merge (const struct expand_vec_perm_d 
&d)
static const unsigned char lo_perm_qi_swap[16]
  = {17, 1, 19, 3, 21, 5, 23, 7, 25, 9, 27, 11, 29, 13, 31, 15};
  
+  static const unsigned char hi_perm_qi_di[16]

+= {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23};
+  static const unsigned char hi_perm_qi_si[16]
+= {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23};
+  static const unsigned char hi_perm_qi_hi[16]
+= {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23};
+
+  static const unsigned char lo_perm_qi_di[16]
+= {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31};
+  static const unsigned char lo_perm_qi_si[16]
+= {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31};
+  static const unsigned char lo_perm_qi_hi[16]
+= {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31};
+
+  static const unsigned char hi_perm_hi_si[8] = {0, 1, 8, 9, 2, 3, 10, 11};
+  static const unsigned char hi_perm_hi_di[8] = {0, 1, 2, 3, 8, 9, 10, 11};
+
+  static const unsigned char lo_perm_hi_si[8] = {4, 5, 12, 13, 6, 7, 14, 15};
+  static const unsigned char lo_perm_hi_di[8] = {4, 5, 6, 7, 12, 13, 14, 15};
+
+  static const unsigned char hi_perm_si_di[4] = {0, 1, 4, 5};
+
+  static const unsigned char lo_perm_si_di[4] = {2, 3, 6, 7};
+
bool merge_lo_p = false;
bool merge_hi_p = false;
bool swap_operands_p = false;
+  machine_mode mergemode = d.vmode;
  
if ((d.nelt == 2 && memcmp (d.perm, hi_perm_di, 2) == 0)

|| (d.nelt == 4 && memcmp (d.perm, hi_perm_si, 4) == 0)
@@ -18075,6 +18100,75 @@ expand_perm_with_merge (const struct expand_vec_perm_d 
&d)
merge_lo_p = true;
swap_operands_p = true;
  }
+  else if (d.nelt == 16)
+{
+  if (memcmp (d.perm, hi_perm_qi_di, 16) == 0)
+   {
+ merge_hi_p = true;
+ mergemode = E_V2DImode;
+   }
+  else if (memcmp (d.perm, hi_perm_qi_si, 16) == 0)
+   {
+ merge_hi_p = true;
+ mergemode = E_V4SImode;
+   }
+  else if (memcmp (d.perm, hi_perm_qi_hi, 16) == 0)
+   {
+ merge_hi_p = true;
+ mergemode = E_V8HImode;
+   }
+  else if (memcmp (d.perm, lo_perm_qi_di, 16) == 0)
+   {
+ merge_lo_p = true;
+ mergemode = E_V2DImode;
+   }
+  else if (memcmp (d.perm, lo_perm_qi_si, 16) == 0)
+   {
+ merge_lo_p = true;
+ mergemode = E_V4SImode;
+   }
+  else if (memcmp (d.perm, lo_perm_qi_hi, 16) == 0)
+   {
+ merge_lo_p = true;
+ mergemode = E_V8HImode;
+   }
+}
+  else if (d.nelt == 8)
+{
+  if (memcmp (d.perm, hi_perm_hi_di, 8) == 0)
+   {
+ merge_hi_p = true;
+ mergemode = E_V2DImode;
+   }
+  else if (memcmp (d.perm, hi_perm_hi_si, 8) == 0)
+   {
+ merge_hi_p = true;
+ mergemode = E_V4SImode;
+   }
+  else if (memcmp (d.perm, lo_perm_hi_di, 8) == 0)
+   {
+ merge_lo_p = true;
+ mergemode = E_V2DImode;
+   }
+  else if (memcmp (d.perm, lo_perm_hi_si, 8) == 0)
+   {
+ merge_lo_p = true;
+ mergemode = E_V4SImode;
+   }
+}
+  else if (d.nelt == 4)
+{
+  if (memcmp (d.perm, hi_perm_si_di, 4) == 0)
+   {
+ merge_hi_p = true;
+ mergemode = E_V2DImode;
+   }
+  else if (memcmp (d.perm, lo_perm_si_di, 4) == 0)
+   {
+ merge_lo_p = true;
+ mergemode = E_V2DImode;
+   }
+}
  
if (!merge_lo_p && !merge_hi_p)

  return false;
@@ -18082,7 +18176,7 @@ expand_perm_with_merge (const struct expa

[COMMITTED] testsuite: Fix gcc.dg/ipa/pr120295.c on Solaris

2025-07-03 Thread Rainer Orth
gcc.dg/ipa/pr120295.c FAILs on Solaris:

FAIL: gcc.dg/ipa/pr120295.c (test for excess errors)

Excess errors:
ld: warning: symbol 'glob' has differing types:
(file /var/tmp//ccsDR59c.o type=OBJT; file /lib/libc.so type=FUNC);
/var/tmp//ccsDR59c.o definition taken

Fixed by renaming the glob variable to glob_ to avoid the conflict.

Tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu.

Committed.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


gcc/testsuite:
* gcc.dg/ipa/pr120295.c (glob): Rename to glob_.

diff --git a/gcc/testsuite/gcc.dg/ipa/pr120295.c b/gcc/testsuite/gcc.dg/ipa/pr120295.c
--- a/gcc/testsuite/gcc.dg/ipa/pr120295.c
+++ b/gcc/testsuite/gcc.dg/ipa/pr120295.c
@@ -9,10 +9,10 @@ char c, k, g, e;
 short d[2] = {0};
 int *i = &j;
 
-volatile int glob;
+volatile int glob_;
 void __attribute__((noipa)) sth (const char *, int a)
 {
-  glob = a;
+  glob_ = a;
   return;
 }
 


[PATCH] testsuite: Skip check-function-bodies sometimes

2025-07-03 Thread Stefan Schulze Frielinghaus
From: Stefan Schulze Frielinghaus 

My understand is that during check_compile compiler_flags contains all
the options passed to gcc and current_compiler_flags contains options
passed via dg-options and dg-additional-options.  I did a couple of
experiments and printf-style debugging which endorsed that this is true.
Nevertheless, would be great if someone with tcl experience could ack
this.

Another small experiment shows for gcc.target/s390/foo.c with

/* { dg-do run } */
/* { dg-options "-O2 -save-temps" } */
/* { dg-final { check-function-bodies "**" "" "" } } */

/*
** main:
**  lghi%r2,0
**  br  %r14
*/

int main (void) { }

and using --target_board='unix{-fstack-protector-all}' we have

PASS: gcc.target/s390/foo.c (test for excess errors)
PASS: gcc.target/s390/foo.c execution test
UNSUPPORTED: gcc.target/s390/foo.c: skip check-function-bodies due to implicit 
prologue/epilogue changes e.g. by stack protector

Whereas without --target_board='unix{-fstack-protector-all}' we have

PASS: gcc.target/s390/foo.c (test for excess errors)
PASS: gcc.target/s390/foo.c execution test
PASS: gcc.target/s390/foo.c check-function-bodies main

For gcc.target/s390/bar.c with (note the option -fstack-protector)

/* { dg-do run } */
/* { dg-options "-O2 -save-temps -fstack-protector" } */
/* { dg-final { check-function-bodies "**" "" "" } } */

/*
** main:
**  lghi%r2,0
**  br  %r14
*/

int main (void) { }

we get with and without --target_board='unix{-fstack-protector-all}'

PASS: gcc.target/s390/bar.c (test for excess errors)
PASS: gcc.target/s390/bar.c execution test
PASS: gcc.target/s390/bar.c check-function-bodies main

Bootstrapped and regtested on x86_64.  Running the testsuite on x86_64
using --target_board='unix{-fstack-protector-all}' "resolves" the
following failures if the patch is applied:

FAIL: g++.target/i386/memset-pr101366-1.C   check-function-bodies _Z4TestPc
FAIL: g++.target/i386/memset-pr101366-2.C   check-function-bodies _Z4TestPc
FAIL: g++.target/i386/memset-pr108585-1a.C   check-function-bodies _Z6squarei
FAIL: g++.target/i386/memset-pr108585-1b.C   check-function-bodies _Z6squarei
FAIL: g++.target/i386/memset-pr118276-1a.C   check-function-bodies 
_Z22makeDefaultConstructedv
FAIL: g++.target/i386/memset-pr118276-1b.C   check-function-bodies 
_Z22makeDefaultConstructedv
FAIL: g++.target/i386/memset-pr118276-1c.C   check-function-bodies 
_Z22makeDefaultConstructedv
FAIL: gcc.target/i386/memset-pr70308-1a.c check-function-bodies foo
FAIL: gcc.target/i386/memset-pr70308-1b.c check-function-bodies foo
FAIL: gcc.target/i386/memset-strategy-25.c check-function-bodies foo
FAIL: gcc.target/i386/memset-strategy-28.c check-function-bodies foo
FAIL: gcc.target/i386/memset-strategy-29.c check-function-bodies foo
FAIL: gcc.target/i386/memset-strategy-30.c check-function-bodies foo
FAIL: gcc.target/i386/memset-strategy-31.c check-function-bodies foo
FAIL: gcc.target/i386/pr111673.c check-function-bodies advance
FAIL: gcc.target/i386/pr116174.c check-function-bodies foo
FAIL: gcc.target/i386/pr119784a.c check-function-bodies start
FAIL: gcc.target/i386/pr82142a.c check-function-bodies assignzero
FAIL: gcc.target/i386/pr92080-17.c check-function-bodies foo
FAIL: gcc.target/i386/pr93492-1.c check-function-bodies f10_endbr
FAIL: gcc.target/i386/pr93492-1.c check-function-bodies f10_none
FAIL: gcc.target/i386/pr93492-1.c check-function-bodies f11_endbr
FAIL: gcc.target/i386/pr93492-1.c check-function-bodies f11_none
FAIL: gcc.target/i386/pr93492-1.c check-function-bodies f21_endbr
FAIL: gcc.target/i386/pr93492-1.c check-function-bodies f21_none
FAIL: gcc.target/i386/preserve-none-25.c check-function-bodies entry
FAIL: gcc.target/i386/preserve-none-26.c check-function-bodies entry
FAIL: gcc.target/i386/preserve-none-27.c check-function-bodies entry
FAIL: gcc.target/i386/preserve-none-30a.c check-function-bodies entry
FAIL: gcc.target/i386/preserve-none-30b.c check-function-bodies entry

Ok for mainline?

-- 8< --

If a check-function-bodies test is compiled using -fstack-protector*,
-fhardened, -fstack-check*, or -fstack-clash-protection, but the test is
not asking explicitly for those via dg-options or
dg-additional-options, then mark the test as unsupported.  Since these
features influence prologue/epilogue it is rarely useful to check the
full function body, if the test is not explicitly prepared for those.
This might happen when the testsuite is passed additional options as
e.g. via --target_board='unix{-fstack-protector-all}'.

Co-Authored-By: Jakub Jelinek 
---
 gcc/doc/sourcebuild.texi  |  9 +
 gcc/testsuite/lib/scanasm.exp | 17 +
 2 files changed, 26 insertions(+)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 6c5586e4b03..2980b04cb0e 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -3621,6 +3621,15 @@ command line.  This can help if a source file is 
compiled both with
 and without optimization, since

Re: [PATCH v3] tree-optimization/120780: Support object size for containing objects

2025-07-03 Thread Siddhesh Poyarekar

On 2025-07-03 03:13, Jakub Jelinek wrote:

On Thu, Jul 03, 2025 at 08:33:45AM +0200, Richard Biener wrote:

On Wed, Jul 2, 2025 at 11:32 PM Siddhesh Poyarekar  wrote:


MEM_REF cast of a subobject to its containing object has negative
offsets, which objsz sees as an invalid access.  Support this use case
by peeking into the structure to validate that the containing object
indeed contains a type of the subobject at that offset and if present,
adjust the wholesize for the object to allow the negative offset.


This variant works for me.


gcc/ChangeLog:

 PR tree-optimization/120780
 * tree-object-size.cc (inner_at_offset,
 get_wholesize_for_memref): New functions.
 (addr_object_size): Call GET_WHOLESIZE_FOR_MEMREF.


Please no caps in function names in the ChangeLog, sometimes it
is used for parameter names, but never saw it for functions.


Ack, I'll fix it up before committing.  I'll also backport to the gcc15 
branch in a couple of days, giving post-commit CIs a chance to identify 
any potential issues before I do that.


Thanks both!

Sid


Re: [PATCH v2 1/5] libstdc++: Check prerequisites of layout_*::operator().

2025-07-03 Thread Luc Grosheintz



On 7/3/25 12:45, Jonathan Wakely wrote:

On Thu, 3 Jul 2025 at 11:12, Tomasz Kaminski  wrote:

On Thu, Jul 3, 2025 at 12:08 PM Luc Grosheintz  wrote:

The reasoning for this approach was:

1. The mapping::operator() and mdspan::operator[] have the same
precondition; and mdspan::operator[] calls mapping::operator().


Yes, although a user-defined mapping might not bother to check
preconditions. So in order for us to implement the required check in
operator[] we really need to check it there.

We could check it in *both* places, and assume that the compiler will
see that the second check is entirely redundant.

We could also check in mapping::operator() and then in
mdspan::operator[] do something like:

if constexpr (!__is_std_mapping)
   __glibcxx_assert(...);

So if we know there's a check in mapping::operator() then don't bother
to check in operator[] as well.


This is an excellent description of how it's currently implemented. We
only skip the check in mdspan::operator[] if we know that the equivalent
is performed in mapping::operator().


2. The place I chose to check the precondition is where we already
have both the index and the extent in L1 and almost certainly in a
register. The hope was that together with branch prediction, this
will be a reasonably cheap place to put the check.

3. The layouts are highly valuable on their own. I've implemented
that piece of logic numerous times in different contexts; and it's
wonderful that soon we can convert `i, j, k` to a linear index easily
using the standard library.
Therefore, I didn't want to skip them in mapping::operator() because
they're a guard against out of bounds accesses, e.g. in a user-defined
dynamically allocated, owning, multi-dimensional array.


I think such types would have their own bounds checks, contracts, preconditions.


Not if it's just something wrapping a unique_ptr, for example.


Very valid points. I'd like to add that out of bounds accesses when
iterating over multi-dimensional arrays is a reasonably frequent bug
while developing scientific codes, e.g. write (i, i) instead of (i, j)
or get an N and M mixed up. If unchecked it manifests itself in cryptic
ways; if checked it's usually trivial to fix.

To me checking in mapping::operator() seems like a good idea, I'm sure
it would have caught bugs in code I wrote.



There's a few paths forwards:

1. Remove the check from mapping::operator() and unconditionally
check in mdspan::operator[].


I would go for option 1.


My original preference was option 2, but I've convinced myself that we
need the checks in operator[]. Rather than repeat them in both places,
I think I'm OK with option 1 too.


I'll downgrade them to _GLIBCXX_ASSERT_DEBUG and make the check in
mdspan::operator[] unconditional.




2. Leave it as is and return when we do optimization or hardening.

3. Start measuring to figure out the cost of these checks; and then
decide.

I'm open to all three.






Re: [PATCH v9 0/9] AArch64: CMPBR support

2025-07-03 Thread Richard Sandiford
Karl Meakin  writes:
> This patch series adds support for the CMPBR extension. It includes the
> new `+cmpbr` option and rules to generate the new instructions when
> lowering conditional branches.

Thanks for the update, LGTM.  I've pushed the series to trunk.

Richard

> Changelog:
> * v9:
>   - Mark the non-far branches unlikely, so that the branch is consistently 
> generated as:
>   ```asm
> branch-if-true .L123
> b  not_taken
> .L123:
> b  taken
>   ```
> * v8:
>   - Support far branches for the `CBB` and `CBH` instructions, and add tests 
> for them.
>   - Mark the branch in the far branch tests likely, so that the optimizer does
> not invert the condition.
>   - Use regex captures for register and label names so that the tests are 
> less fragile.
>   - Minor formatting fixes.
> * v7:
>   - Support far branches and add a test for them.
>   - Replace `aarch64_cb_short_operand` with `aarch64_reg_or_zero_operand`.
>   - Delete the new predicates that aren't needed anymore.
>   - Minor formatting and comment fixes.
> * v6:
>   - Correct the constraint string for immediate operands.
>   - Drop the commit for adding `%j` format specifiers. The suffix for
> the `cb` instruction is now calculated by the `cmp_op` code
> attribute.
> * v5:
>   - Moved Moved patch 10/10 (adding %j ...) before patch 8/10 (rules for
> CMPBR...). Every commit in the series should now produce a correct
> compiler.
>   - Reduce excessive diff context by not passing `--function-context` to
> `git format-patch`.
> * v4:
>   - Added a commit to use HS/LO instead of CS/CC mnemonics.
>   - Rewrite the range checks for immediate RHSes in aarch64.cc: CBGE,
> CBHS, CBLE and CBLS have different ranges of allowed immediates than
> the other comparisons.
>
> Karl Meakin (9):
>   AArch64: place branch instruction rules together
>   AArch64: reformat branch instruction rules
>   AArch64: rename branch instruction rules
>   AArch64: add constants for branch displacements
>   AArch64: make `far_branch` attribute a boolean
>   AArch64: recognize `+cmpbr` option
>   AArch64: precommit test for CMPBR instructions
>   AArch64: rules for CMPBR instructions
>   AArch64: make rules for CBZ/TBZ higher priority
>
>  .../aarch64/aarch64-option-extensions.def |2 +
>  gcc/config/aarch64/aarch64-protos.h   |2 +
>  gcc/config/aarch64/aarch64-simd.md|2 +-
>  gcc/config/aarch64/aarch64-sme.md |2 +-
>  gcc/config/aarch64/aarch64.cc |   39 +-
>  gcc/config/aarch64/aarch64.h  |3 +
>  gcc/config/aarch64/aarch64.md |  570 --
>  gcc/config/aarch64/constraints.md |   18 +
>  gcc/config/aarch64/iterators.md   |   30 +
>  gcc/doc/invoke.texi   |3 +
>  gcc/testsuite/gcc.target/aarch64/cmpbr.c  | 1824 +
>  gcc/testsuite/lib/target-supports.exp |   14 +-
>  12 files changed, 2285 insertions(+), 224 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/cmpbr.c
>
> --
> 2.48.1


[COMMITTED 36/42] ada: More Tbuild cleanup

2025-07-03 Thread Marc Poulhiès
From: Bob Duff 

Remove "Nmake_Assert => ..." on N_Unchecked_Type_Conversion at
gen_il-gen-gen_nodes.adb:473 (was disabled).

This was left over from commit 82a794419a00ea98b68d69b64363ae6746710de9
"Tbuild cleanup".

In addition, the checks for "Is_Composite_Type" in
Tbuild.Unchecked_Convert_To are narrowed to "not Is_Scalar_Type";
that way, useless duplicate unchecked conversions of access types will
be removed as for composite types.

gcc/ada/ChangeLog:

* gen_il-gen-gen_nodes.adb (N_Unchecked_Type_Conversion):
Remove useless Nmake_Assert.
* tbuild.adb (Unchecked_Convert_To):
Narrow the bitfield-related conditions.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gen_il-gen-gen_nodes.adb | 7 +--
 gcc/ada/tbuild.adb   | 6 +++---
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/gen_il-gen-gen_nodes.adb b/gcc/ada/gen_il-gen-gen_nodes.adb
index debc66b0fcd..f4e79173502 100644
--- a/gcc/ada/gen_il-gen-gen_nodes.adb
+++ b/gcc/ada/gen_il-gen-gen_nodes.adb
@@ -469,12 +469,7 @@ begin -- Gen_IL.Gen.Gen_Nodes
(Sy (Subtype_Mark, Node_Id, Default_Empty),
 Sy (Expression, Node_Id, Default_Empty),
 Sm (Kill_Range_Check, Flag),
-Sm (No_Truncation, Flag)),
-   Nmake_Assert => "True or else Nkind (Expression) /= 
N_Unchecked_Type_Conversion");
---   Nmake_Assert => "Nkind (Expression) /= N_Unchecked_Type_Conversion");
-   --  Assert that we don't have unchecked conversions of unchecked
-   --  conversions; if Expression might be an unchecked conversion,
-   --  then Tbuild.Unchecked_Convert_To should be used.
+Sm (No_Truncation, Flag)));
 
Cc (N_Subtype_Indication, N_Has_Etype,
(Sy (Subtype_Mark, Node_Id, Default_Empty),
diff --git a/gcc/ada/tbuild.adb b/gcc/ada/tbuild.adb
index 52fdbfc2163..b89c40851bc 100644
--- a/gcc/ada/tbuild.adb
+++ b/gcc/ada/tbuild.adb
@@ -926,11 +926,11 @@ package body Tbuild is
   --  conversion of an unchecked conversion. Extra unchecked conversions
   --  make the .dg output less readable. We can't do this in cases
   --  involving bitfields, because the sizes might not match. The
-  --  Is_Composite_Type checks avoid such cases.
+  --  "not Is_Scalar_Type" checks avoid such cases.
 
   elsif Nkind (Expr) = N_Unchecked_Type_Conversion
-and then Is_Composite_Type (Etype (Expr))
-and then Is_Composite_Type (Typ)
+and then not Is_Scalar_Type (Etype (Expr))
+and then not Is_Scalar_Type (Typ)
   then
  Set_Subtype_Mark (Expr, New_Occurrence_Of (Typ, Loc));
  Result := Relocate_Node (Expr);
-- 
2.43.0



[COMMITTED 27/42] ada: Fix strange holes for type with variant part reported by -gnatRh

2025-07-03 Thread Marc Poulhiès
From: Eric Botcazou 

The problem is that the sorting algorithm mixes components of variants.

gcc/ada/ChangeLog:

* repinfo.adb (First_Comp_Or_Discr.Is_Placed_Before): Return True
only if the components are in the same component list.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/repinfo.adb | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/repinfo.adb b/gcc/ada/repinfo.adb
index 1d616db71f3..bbf92a77800 100644
--- a/gcc/ada/repinfo.adb
+++ b/gcc/ada/repinfo.adb
@@ -1239,15 +1239,25 @@ package body Repinfo is
  function First_Comp_Or_Discr (Ent : Entity_Id) return Entity_Id is
 
 function Is_Placed_Before (C1, C2 : Entity_Id) return Boolean;
---  Return True if component C1 is placed before component C2
+--  Return True if components C1 and C2 are in the same component
+--  list and component C1 is placed before component C2 in there.
 
 --
 -- Is_Placed_Before --
 --
 
 function Is_Placed_Before (C1, C2 : Entity_Id) return Boolean is
+   L1 : constant Node_Id := Parent (Parent (C1));
+   L2 : constant Node_Id := Parent (Parent (C2));
+
 begin
-   return Known_Static_Component_Bit_Offset (C1)
+   --  Discriminants and top-level components are considered to be
+   --  in the same list, although this is not syntactically true.
+
+   return (L1 = L2
+or else (Nkind (Parent (L1)) /= N_Variant
+  and then Nkind (Parent (L2)) /= N_Variant))
+ and then Known_Static_Component_Bit_Offset (C1)
  and then Known_Static_Component_Bit_Offset (C2)
  and then
Component_Bit_Offset (C1) < Component_Bit_Offset (C2);
-- 
2.43.0



[COMMITTED 42/42] ada: Fix poor code generated for return of Out parameter with access type

2025-07-03 Thread Marc Poulhiès
From: Eric Botcazou 

The record type of the return object is unnecessarily given BLKmode.

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (type_contains_only_integral_data): Do not
return false only because the type contains pointer data.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 27d2cea1f3d..903ec844b96 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -6022,7 +6022,8 @@ gnat_to_gnu_profile_type (Entity_Id gnat_type)
   return gnu_type;
 }
 
-/* Return true if TYPE contains only integral data, recursively if need be.  */
+/* Return true if TYPE contains only integral data, recursively if need be.
+   (integral data is to be understood as not floating-point data here).  */
 
 static bool
 type_contains_only_integral_data (tree type)
@@ -6042,7 +6043,7 @@ type_contains_only_integral_data (tree type)
   return type_contains_only_integral_data (TREE_TYPE (type));
 
 default:
-  return INTEGRAL_TYPE_P (type);
+  return INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type);
 }
 
   gcc_unreachable ();
-- 
2.43.0



[COMMITTED 29/42] ada: Adjust message about statically compatible result subtype

2025-07-03 Thread Marc Poulhiès
From: Piotr Trojanek 

Ada RM 6.5(5.3/5) is about "result SUBTYPE of the function", while the error
message was saying "result TYPE of the function". Now use the exact RM wording
in the error message for this rule.

gcc/ada/ChangeLog:

* sem_ch3.adb (Check_Return_Subtype_Indication): Adjust error message
to match the RM wording.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch3.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index 0afc65da52c..98a8fa56391 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -4297,7 +4297,7 @@ package body Sem_Ch3 is
 then
Error_Msg_N
  ("result subtype must be statically compatible with the " &
-  "function result type", Indic);
+  "function result subtype", Indic);
 
if not Predicates_Compatible (Obj_Typ, R_Typ) then
   Error_Msg_NE
-- 
2.43.0



[COMMITTED 38/42] ada: Fix missing error on too large Component_Size not multiple of storage unit

2025-07-03 Thread Marc Poulhiès
From: Eric Botcazou 

This is a small regression introduced a few years ago.

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_component_type): Validate the
Component_Size like the size of a type only if the component type
is actually packed.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 972607a917b..1d9832d69ad 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -5444,7 +5444,7 @@ gnat_to_gnu_component_type (Entity_Id gnat_array, bool 
definition,
   const bool is_bit_packed = Is_Bit_Packed_Array (gnat_array);
   tree gnu_type = gnat_to_gnu_type (gnat_type);
   tree gnu_comp_size;
-  bool has_packed_components;
+  bool has_packed_component;
   unsigned int max_align;
 
   /* If an alignment is specified, use it as a cap on the component type
@@ -5465,16 +5465,22 @@ gnat_to_gnu_component_type (Entity_Id gnat_array, bool 
definition,
   && !TYPE_FAT_POINTER_P (gnu_type)
   && tree_fits_uhwi_p (TYPE_SIZE (gnu_type)))
 {
-  gnu_type = make_packable_type (gnu_type, false, max_align);
-  has_packed_components = true;
+  tree gnu_packable_type = make_packable_type (gnu_type, false, max_align);
+  if (gnu_packable_type != gnu_type)
+   {
+ gnu_type = gnu_packable_type;
+ has_packed_component = true;
+   }
+  else
+   has_packed_component = false;
 }
   else
-has_packed_components = is_bit_packed;
+has_packed_component = is_bit_packed;
 
   /* Get and validate any specified Component_Size.  */
   gnu_comp_size
 = validate_size (Component_Size (gnat_array), gnu_type, gnat_array,
-has_packed_components ? TYPE_DECL : VAR_DECL, true,
+has_packed_component ? TYPE_DECL : VAR_DECL, true,
 Has_Component_Size_Clause (gnat_array), NULL, NULL);
 
   /* If the component type is a RECORD_TYPE that has a self-referential size,
-- 
2.43.0



[COMMITTED 39/42] ada: Fix wrong finalization of constrained subtype of unconstrained array type

2025-07-03 Thread Marc Poulhiès
From: Eric Botcazou 

This implements the Is_Constr_Array_Subt_With_Bounds flag for allocators.

gcc/ada/ChangeLog:

* gcc-interface/trans.cc (gnat_to_gnu) : Allocate the
bounds alongside the data if the Is_Constr_Array_Subt_With_Bounds
flag is set on the designated type.
: Take into account the allocated bounds if the
Is_Constr_Array_Subt_With_Bounds flag is set on the designated type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 38 --
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 23fc814f9de..7549b8e37bf 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -7590,6 +7590,10 @@ gnat_to_gnu (Node_Id gnat_node)
 
 case N_Allocator:
   {
+   const Entity_Id gnat_desig_type
+ = Designated_Type (Underlying_Type (Etype (gnat_node)));
+   const Entity_Id gnat_pool = Storage_Pool (gnat_node);
+
tree gnu_type, gnu_init;
bool ignore_init_type;
 
@@ -7608,9 +7612,6 @@ gnat_to_gnu (Node_Id gnat_node)
 
else if (Nkind (gnat_temp) == N_Qualified_Expression)
  {
-   const Entity_Id gnat_desig_type
- = Designated_Type (Underlying_Type (Etype (gnat_node)));
-
ignore_init_type = Has_Constrained_Partial_View (gnat_desig_type);
 
gnu_init = gnat_to_gnu (Expression (gnat_temp));
@@ -7637,11 +7638,24 @@ gnat_to_gnu (Node_Id gnat_node)
else
  gcc_unreachable ();
 
-   gnu_result_type = get_unpadded_type (Etype (gnat_node));
+   /* If this is an array allocated with its bounds, use the thin pointer
+  as the result type to trigger the machinery in build_allocator, but
+  make sure not to do it for allocations on the return and secondary
+  stacks (see build_call_alloc_dealloc_proc for more details).  */
+if (Is_Constr_Array_Subt_With_Bounds (gnat_desig_type)
+   && Is_Record_Type (Underlying_Type (Etype (gnat_pool)))
+   && !type_annotate_only)
+ {
+   tree gnu_array = gnat_to_gnu_type (Base_Type (gnat_desig_type));
+   gnu_result_type
+ = build_pointer_type (TYPE_OBJECT_RECORD_TYPE (gnu_array));
+ }
+   else
+ gnu_result_type = get_unpadded_type (Etype (gnat_node));
+
return build_allocator (gnu_type, gnu_init, gnu_result_type,
Procedure_To_Call (gnat_node),
-   Storage_Pool (gnat_node), gnat_node,
-   ignore_init_type);
+   gnat_pool, gnat_node, ignore_init_type);
   }
   break;
 
@@ -8577,6 +8591,18 @@ gnat_to_gnu (Node_Id gnat_node)
  (void) gnat_to_gnu_entity (gnat_desig_type, NULL_TREE, false);
 
  gnu_ptr = gnat_to_gnu (gnat_temp);
+
+ /* If this is an array allocated with its bounds, first convert to
+the thin pointer to trigger the special machinery below.  */
+ if (Is_Constr_Array_Subt_With_Bounds (gnat_desig_type))
+   {
+ tree gnu_array = gnat_to_gnu_type (Base_Type (gnat_desig_type));
+ gnu_ptr
+   = convert (build_pointer_type
+  (TYPE_OBJECT_RECORD_TYPE (gnu_array)),
+  gnu_ptr);
+   }
+
  gnu_ptr_type = TREE_TYPE (gnu_ptr);
 
  /* If this is a thin pointer, we must first dereference it to create
-- 
2.43.0



[PATCH] x86-64: Remove redundant TLS calls

2025-07-03 Thread H.J. Lu
For TLS calls:

1. UNSPEC_TLS_GD:

  (parallel [
(set (reg:DI 0 ax)
 (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
  (const_int 0 [0])))
(unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
(reg/f:DI 7 sp)] UNSPEC_TLS_GD)
(clobber (reg:DI 5 di))])

2. UNSPEC_TLS_LD_BASE:

  (parallel [
(set (reg:DI 0 ax)
 (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
  (const_int 0 [0])))
(unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)])

3. UNSPEC_TLSDESC:

  (parallel [
 (set (reg/f:DI 104)
   (plus:DI (unspec:DI [
   (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
   (reg:DI 114)
   (reg/f:DI 7 sp)] UNSPEC_TLSDESC)
(const:DI (unspec:DI [
 (symbol_ref:DI ("e") [flags 0x1a])
  ] UNSPEC_DTPOFF
 (clobber (reg:CC 17 flags))])

  (parallel [
(set (reg:DI 101)
 (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
 (reg:DI 112)
 (reg/f:DI 7 sp)] UNSPEC_TLSDESC))
(clobber (reg:CC 17 flags))])

they return the same value for the same input value.  But multiple calls
with the same input value may be generated for simple programs like:

void a(long *);
int b(void);
void c(void);
static __thread long e;
long
d(void)
{
  a(&e);
  if (b())
c();
  return e;
}

When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are
generated:

.type   d, @function
d:
.LFB0:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
leaqe@TLSDESC(%rip), %rbx
movq%rbx, %rax
call*e@TLSCALL(%rax)
addq%fs:0, %rax
movq%rax, %rdi
calla@PLT
callb@PLT
testl   %eax, %eax
jne .L8
movq%rbx, %rax
call*e@TLSCALL(%rax)
popq%rbx
.cfi_remember_state
.cfi_def_cfa_offset 8
movq%fs:(%rax), %rax
ret
.p2align 4,,10
.p2align 3
.L8:
.cfi_restore_state
callc@PLT
movq%rbx, %rax
call*e@TLSCALL(%rax)
popq%rbx
.cfi_def_cfa_offset 8
movq%fs:(%rax), %rax
ret
.cfi_endproc

There are 3 "call *e@TLSCALL(%rax)".  They all return the same value.
Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit,
extend it to also remove redundant TLS calls to generate:

d:
.LFB0:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
leaqe@TLSDESC(%rip), %rax
movq%fs:0, %rdi
call*e@TLSCALL(%rax)
addq%rax, %rdi
movq%rax, %rbx
calla@PLT
callb@PLT
testl   %eax, %eax
jne .L8
movq%fs:(%rbx), %rax
popq%rbx
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L8:
.cfi_restore_state
callc@PLT
movq%fs:(%rbx), %rax
popq%rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc

with only one "call *e@TLSCALL(%rax)".  This reduces the number of
__tls_get_addr calls in libgcc.a by 72%:

__tls_get_addr calls before after
libgcc.a 868243

gcc/

PR target/81501
* config/i386/i386-features.cc (x86_cse_kind): Add X86_CSE_TLS_GD,
X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC.
(redundant_load): Renamed to ...
(redundant_pattern): This.
(replace_tls_call): New.
(ix86_place_single_tls_call): Likewise.
(remove_redundant_vector_load): Renamed to ...
(x86_cse): This.  Extend to remove redundant TLS calls.
(pass_remove_redundant_vector_load): Renamed to ...
(pass_x86_cse): This.
(make_pass_remove_redundant_vector_load): Renamed to ...
(make_pass_x86_cse): This.
(config/i386/i386-passes.def): Replace
pass_remove_redundant_vector_load with pass_x86_cse.
config/i386/i386-protos.h (ix86_tls_get_addr): New.
(make_pass_remove_redundant_vector_load): Renamed to ...
(make_pass_x86_cse): This.
* config/i386/i386.cc (ix86_tls_get_addr): Remove static.
* config/i386/i386.h (machine_function): Add
tls_descriptor_call_multiple_p.
* config/i386/i386.md (@tls_global_dynamic_64_): Set
tls_descriptor_call_multiple_p.
(@tls_local_dynamic_base_64_): Likewise.
(@tls_dynamic_gnu2_64_): Likewise.
(*tls_dynamic_gnu2_lea_64_): Renamed to ...
(tls_dynamic_gnu2_lea_64_): This.
(*tls_dynamic_gnu2_call_64_): Renamed to ...
(tls_dynamic_gnu2_call_64_): This.
(*tls_dynamic_gnu2_combine_64_): Renamed to ...
(tls_dynamic_gnu2_combi

Re: [PATCH] x86: Emit label only for __mcount_loc section

2025-07-03 Thread H.J. Lu
On Thu, Jul 3, 2025 at 6:07 PM Uros Bizjak  wrote:
>
> On Thu, Jul 3, 2025 at 11:54 AM H.J. Lu  wrote:
> >
> > commit ecc81e33123d7ac9c11742161e128858d844b99d (HEAD)
> > Author: Andi Kleen 
> > Date:   Fri Sep 26 04:06:40 2014 +
> >
> > Add direct support for Linux kernel __fentry__ patching
> >
> > emitted a label, 1, for __mcount_loc section:
> >
> > 1: call mcount
> > .section __mcount_loc, "a",@progbits
> > .quad 1b
> > .previous
> >
> > If __mcount_loc wasn't used, we got an unused label.  Update
> > x86_function_profiler to emit label only when __mcount_loc section
> > is used.
> >
> > gcc/
> >
> > PR target/120936
> > * config/i386/i386.cc (x86_print_call_or_nop): Add a label
> > argument and use it to print label.
> > (x86_function_profiler): Emit label only when __mcount_loc
> > section is used.
> >
> > gcc/testsuite/
> >
> > PR target/120936
> > * gcc.target/i386/pr120936-1.c: New test
> > * gcc.target/i386/pr120936-2.c: Likewise.
> > * gcc.target/i386/pr120936-3.c: Likewise.
> > * gcc.target/i386/pr120936-4.c: Likewise.
> > * gcc.target/i386/pr120936-5.c: Likewise.
> > * gcc.target/i386/pr120936-6.c: Likewise.
> > * gcc.target/i386/pr120936-7.c: Likewise.
> > * gcc.target/i386/pr120936-8.c: Likewise.
> > * gcc.target/i386/pr120936-9.c: Likewise.
> > * gcc.target/i386/pr120936-10.c: Likewise.
> > * gcc.target/i386/pr120936-11.c: Likewise.
> > * gcc.target/i386/pr120936-12.c: Likewise.
> > * gcc.target/i386/pr93492-3.c: Updated.
> > * gcc.target/i386/pr93492-5.c: Likewise.
> >
> > OK for master?
> >
> > Thanks.
>
> +  bool fentry_section_p
> += (flag_record_mcount
> +   || lookup_attribute ("fentry_section",
> +DECL_ATTRIBUTES (current_function_decl)));
> +  const char *label;
> +  if (fentry_section_p)
> +label = "1:";
> +  else
> +label = "";
>
> Just write this part as:
>
> const char *label = fentry_section_p ? "1:" : "";
>
> and using one vertical space before declaration.

Fixed in the v2 patch.  This is what I am checking in.

Thanks.

> Otherwise OK.
>
> Thanks,
> Uros.



-- 
H.J.
From 60aa01a85a62001e2299c50fd3ac89aae7db5e68 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 3 Jul 2025 10:13:48 +0800
Subject: [PATCH v2] x86: Emit label only for __mcount_loc section

commit ecc81e33123d7ac9c11742161e128858d844b99d
Author: Andi Kleen 
Date:   Fri Sep 26 04:06:40 2014 +

Add direct support for Linux kernel __fentry__ patching

emitted a label, 1, for __mcount_loc section:

1:	call	mcount
	.section __mcount_loc, "a",@progbits
	.quad 1b
	.previous

If __mcount_loc wasn't used, we got an unused label.  Update
x86_function_profiler to emit label only when __mcount_loc section
is used.

gcc/

	PR target/120936
	* config/i386/i386.cc (x86_print_call_or_nop): Add a label
	argument and use it to print label.
	(x86_function_profiler): Emit label only when __mcount_loc
	section is used.

gcc/testsuite/

	PR target/120936
	* gcc.target/i386/pr120936-1.c: New test
	* gcc.target/i386/pr120936-2.c: Likewise.
	* gcc.target/i386/pr120936-3.c: Likewise.
	* gcc.target/i386/pr120936-4.c: Likewise.
	* gcc.target/i386/pr120936-5.c: Likewise.
	* gcc.target/i386/pr120936-6.c: Likewise.
	* gcc.target/i386/pr120936-7.c: Likewise.
	* gcc.target/i386/pr120936-8.c: Likewise.
	* gcc.target/i386/pr120936-9.c: Likewise.
	* gcc.target/i386/pr120936-10.c: Likewise.
	* gcc.target/i386/pr120936-11.c: Likewise.
	* gcc.target/i386/pr120936-12.c: Likewise.
	* gcc.target/i386/pr93492-3.c: Updated.
	* gcc.target/i386/pr93492-5.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc | 52 -
 gcc/testsuite/gcc.target/i386/pr120936-1.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-10.c | 23 +
 gcc/testsuite/gcc.target/i386/pr120936-11.c | 19 
 gcc/testsuite/gcc.target/i386/pr120936-12.c | 23 +
 gcc/testsuite/gcc.target/i386/pr120936-2.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-3.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-4.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-5.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-6.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-7.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-8.c  | 18 +++
 gcc/testsuite/gcc.target/i386/pr120936-9.c  | 19 
 gcc/testsuite/gcc.target/i386/pr93492-3.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr93492-5.c   |  2 +-
 15 files changed, 261 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120936-5.c

[PATCH v1 2/3] libstdc++: Prepare test code for default_accessor for reuse.

2025-07-03 Thread Luc Grosheintz
All test code of default_accessor can be reused. This commit moves
the reuseable code into a file generic.cc and prepares the tests for
reuse with aligned_accessor.

The AllocatorTrait creates a unified interface for creating both
default_accessor and aligned_accessor typenames.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/mdspan/accessors/default.cc: Delete.
* testsuite/23_containers/mdspan/accessors/generic.cc: Slightly
generalize the test code previously in default.cc.
---
 .../23_containers/mdspan/accessors/default.cc |  99 
 .../23_containers/mdspan/accessors/generic.cc | 141 ++
 2 files changed, 141 insertions(+), 99 deletions(-)
 delete mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/generic.cc

diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
deleted file mode 100644
index c036f8ad10f..000
--- a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
+++ /dev/null
@@ -1,99 +0,0 @@
-// { dg-do run { target c++23 } }
-#include 
-
-#include 
-
-constexpr size_t dyn = std::dynamic_extent;
-
-template
-  constexpr void
-  test_accessor_policy()
-  {
-static_assert(std::copyable);
-static_assert(std::is_nothrow_move_constructible_v);
-static_assert(std::is_nothrow_move_assignable_v);
-static_assert(std::is_nothrow_swappable_v);
-  }
-
-constexpr bool
-test_access()
-{
-  std::default_accessor accessor;
-  std::array a{10, 11, 12, 13, 14};
-  VERIFY(accessor.access(a.data(), 0) == 10);
-  VERIFY(accessor.access(a.data(), 4) == 14);
-  return true;
-}
-
-constexpr bool
-test_offset()
-{
-  std::default_accessor accessor;
-  std::array a{10, 11, 12, 13, 14};
-  VERIFY(accessor.offset(a.data(), 0) == a.data());
-  VERIFY(accessor.offset(a.data(), 4) == a.data() + 4);
-  return true;
-}
-
-class Base
-{ };
-
-class Derived : public Base
-{ };
-
-constexpr void
-test_ctor()
-{
-  // T -> T
-  static_assert(std::is_nothrow_constructible_v,
-   std::default_accessor>);
-  static_assert(std::is_convertible_v,
- std::default_accessor>);
-
-  // T -> const T
-  static_assert(std::is_convertible_v,
- std::default_accessor>);
-  static_assert(std::is_convertible_v,
- std::default_accessor>);
-
-  // const T -> T
-  static_assert(!std::is_constructible_v,
-std::default_accessor>);
-  static_assert(!std::is_constructible_v,
-std::default_accessor>);
-
-  // T <-> volatile T
-  static_assert(std::is_convertible_v,
- std::default_accessor>);
-  static_assert(!std::is_constructible_v,
-std::default_accessor>);
-
-  // size difference
-  static_assert(!std::is_constructible_v,
-std::default_accessor>);
-
-  // signedness
-  static_assert(!std::is_constructible_v,
-std::default_accessor>);
-  static_assert(!std::is_constructible_v,
-std::default_accessor>);
-
-  // Derived <-> Base
-  static_assert(!std::is_constructible_v,
-std::default_accessor>);
-  static_assert(!std::is_constructible_v,
-std::default_accessor>);
-
-}
-
-int
-main()
-{
-  test_accessor_policy>();
-  test_access();
-  static_assert(test_access());
-  test_offset();
-  static_assert(test_offset());
-  test_ctor();
-  return 0;
-}
diff --git a/libstdc++-v3/testsuite/23_containers/mdspan/accessors/generic.cc 
b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/generic.cc
new file mode 100644
index 000..600d152b690
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/mdspan/accessors/generic.cc
@@ -0,0 +1,141 @@
+// { dg-do run { target c++23 } }
+#include 
+
+#include 
+
+template
+  constexpr bool
+  test_class_properties()
+  {
+static_assert(std::is_trivially_copyable_v);
+static_assert(std::semiregular);
+return true;
+  }
+
+template
+  constexpr bool
+  test_accessor_policy()
+  {
+static_assert(std::copyable);
+static_assert(std::is_nothrow_move_constructible_v);
+static_assert(std::is_nothrow_move_assignable_v);
+static_assert(std::is_nothrow_swappable_v);
+return true;
+  }
+
+class Base
+{ };
+
+class Derived : public Base
+{ };
+
+template
+  constexpr bool
+  test_ctor()
+  {
+// T -> T
+static_assert(std::is_nothrow_constructible_v<
+   typename AccessorTrait::type,
+   typename AccessorTrait::type>);
+static_assert(std::is_convertible_v<
+   typename AccessorTrait::type,
+   typenam

[PATCH v1 0/3] Implement aligned_accessor [P2897R7].

2025-07-03 Thread Luc Grosheintz
This patch series implements the aligned_accessor paper P2897R7 in three
parts:

  - Implement `is_sufficiently_aligned` which is part of .
  - Prepare the accessor tests for reuse.
  - Implement aligned_accessor.

A couple of remarks:

  - The paper P2897R7 and spec N5008 don't specify that the aligment
  for is_sufficiently_aligned must be a power of two.

  - The reasoning for why is_sufficiently_aligned isn't constexpr is
  nicely described in the paper.

  - Use of `class` in is_sufficiently_aligned is for consistency within
  that file.

  - The tests create new unsupported tests and expected failures. The
  testsuite doesn't have all that many of those; so there's likely a
  strategy to avoid this. However, I don't know how.

  - These changes are independent of mdspan, but due to the precise
  location of the code it might conflict with the mdspan patch series.

  - I skipped updating `cxxapi-data.csv` for is_sufficiently_aligned
  due to consistency with the rest of the mdspan patches, i.e. there
  will be a bulk update of the file later.

  - Each commit was tested with/without PCH and with/without
  _GLIBCXX_DEBUG filtered by 20_util/is_sufficiently_aligned
  and 23_containers/mdspan.
  The last commit was tested fully with/without PCH. All tests on
  x86_64-linux.

As always I'm happy to reorganize into different commits, if the
grouping doesn't make sense.

Luc Grosheintz (3):
  libstdc++: Implement is_sufficiently_aligned.
  libstdc++: Prepare test code for default_accessor for reuse.
  libstdc++: Implement aligned_accessor from mdspan.

 libstdc++-v3/include/bits/align.h |  16 ++
 libstdc++-v3/include/bits/version.def |  18 ++
 libstdc++-v3/include/bits/version.h   |  20 +++
 libstdc++-v3/include/std/mdspan   |  72 
 libstdc++-v3/include/std/memory   |   1 +
 libstdc++-v3/src/c++23/std.cc.in  |   4 +-
 .../20_util/is_sufficiently_aligned/1.cc  |  31 
 .../20_util/is_sufficiently_aligned/2.cc  |   7 +
 .../23_containers/mdspan/accessors/aligned.cc |  43 +
 .../mdspan/accessors/aligned_ftm.cc   |   6 +
 .../mdspan/accessors/aligned_neg.cc   |  33 
 .../accessors/debug/aligned_access_neg.cc |  23 +++
 .../accessors/debug/aligned_offset_neg.cc |  23 +++
 .../23_containers/mdspan/accessors/default.cc |  99 ---
 .../23_containers/mdspan/accessors/generic.cc | 168 ++
 15 files changed, 464 insertions(+), 100 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/1.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/2.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/aligned.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/aligned_ftm.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/aligned_neg.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/debug/aligned_access_neg.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/debug/aligned_offset_neg.cc
 delete mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/default.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/generic.cc

-- 
2.49.0



[PATCH v1 1/3] libstdc++: Implement is_sufficiently_aligned.

2025-07-03 Thread Luc Grosheintz
This commit implements and tests the function is_sufficiently_aligned
from P2897R7.

libstdc++-v3/ChangeLog:

* include/bits/align.h (is_sufficiently_aligned): New function.
* include/bits/version.def (is_sufficiently_aligned): Add.
* include/bits/version.h: Regenerate.
* include/std/memory: Add __glibcxx_want_is_sufficiently_aligned.
* src/c++23/std.cc.in (is_sufficiently_aligned): Add.
* testsuite/20_util/is_sufficiently_aligned/1.cc: New test.
* testsuite/20_util/is_sufficiently_aligned/2.cc: New test.
---
 libstdc++-v3/include/bits/align.h | 16 ++
 libstdc++-v3/include/bits/version.def |  8 +
 libstdc++-v3/include/bits/version.h   | 10 ++
 libstdc++-v3/include/std/memory   |  1 +
 libstdc++-v3/src/c++23/std.cc.in  |  1 +
 .../20_util/is_sufficiently_aligned/1.cc  | 31 +++
 .../20_util/is_sufficiently_aligned/2.cc  |  7 +
 7 files changed, 74 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/1.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/2.cc

diff --git a/libstdc++-v3/include/bits/align.h 
b/libstdc++-v3/include/bits/align.h
index 2b40c37e033..fbbe9cb1f9c 100644
--- a/libstdc++-v3/include/bits/align.h
+++ b/libstdc++-v3/include/bits/align.h
@@ -102,6 +102,22 @@ align(size_t __align, size_t __size, void*& __ptr, size_t& 
__space) noexcept
 }
 #endif // __glibcxx_assume_aligned
 
+#ifdef __glibcxx_is_sufficiently_aligned // C++ >= 26
+  /** @brief Is @a __ptr aligned to an _Align byte boundary?
+   *
+   *  @tparam _Align An alignment value
+   *  @tparam _TpAn object type
+   *
+   *  C++26 20.2.5 [ptr.align]
+   *
+   *  @ingroup memory
+   */
+  template
+bool
+is_sufficiently_aligned(_Tp* __ptr)
+{ return reinterpret_cast<__UINTPTR_TYPE__>(__ptr) % _Align == 0; }
+#endif // __glibcxx_is_sufficiently_aligned
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index f4ba501c403..a2695e67716 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -732,6 +732,14 @@ ftms = {
   };
 };
 
+ftms = {
+  name = is_sufficiently_aligned;
+  values = {
+v = 202411;
+cxxmin = 26;
+  };
+};
+
 ftms = {
   name = atomic_flag_test;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index dc8ac07be16..1b17a965239 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -815,6 +815,16 @@
 #endif /* !defined(__cpp_lib_assume_aligned) && 
defined(__glibcxx_want_assume_aligned) */
 #undef __glibcxx_want_assume_aligned
 
+#if !defined(__cpp_lib_is_sufficiently_aligned)
+# if (__cplusplus >  202302L)
+#  define __glibcxx_is_sufficiently_aligned 202411L
+#  if defined(__glibcxx_want_all) || 
defined(__glibcxx_want_is_sufficiently_aligned)
+#   define __cpp_lib_is_sufficiently_aligned 202411L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_is_sufficiently_aligned) && 
defined(__glibcxx_want_is_sufficiently_aligned) */
+#undef __glibcxx_want_is_sufficiently_aligned
+
 #if !defined(__cpp_lib_atomic_flag_test)
 # if (__cplusplus >= 202002L)
 #  define __glibcxx_atomic_flag_test 201907L
diff --git a/libstdc++-v3/include/std/memory b/libstdc++-v3/include/std/memory
index 1da03b3ea6a..ff342ff35f3 100644
--- a/libstdc++-v3/include/std/memory
+++ b/libstdc++-v3/include/std/memory
@@ -110,6 +110,7 @@
 #define __glibcxx_want_constexpr_memory
 #define __glibcxx_want_enable_shared_from_this
 #define __glibcxx_want_indirect
+#define __glibcxx_want_is_sufficiently_aligned
 #define __glibcxx_want_make_unique
 #define __glibcxx_want_out_ptr
 #define __glibcxx_want_parallel_algorithm
diff --git a/libstdc++-v3/src/c++23/std.cc.in b/libstdc++-v3/src/c++23/std.cc.in
index e692caaa5f9..6f4214ed3a7 100644
--- a/libstdc++-v3/src/c++23/std.cc.in
+++ b/libstdc++-v3/src/c++23/std.cc.in
@@ -1864,6 +1864,7 @@ export namespace std
   using std::allocator_arg_t;
   using std::allocator_traits;
   using std::assume_aligned;
+  using std::is_sufficiently_aligned;
   using std::make_obj_using_allocator;
   using std::pointer_traits;
   using std::to_address;
diff --git a/libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/1.cc 
b/libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/1.cc
new file mode 100644
index 000..4c2738b57db
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/is_sufficiently_aligned/1.cc
@@ -0,0 +1,31 @@
+// { dg-do run { target c++26 } }
+
+#include 
+#include 
+#include 
+
+void
+test01()
+{
+  constexpr size_t N = 4;
+  constexpr size_t M = 2*N + 1;
+  alignas(N) std::array buffer{};
+
+  auto* ptr = buffer.data();
+  VERIFY(std::is_sufficiently_aligned<1>(ptr+0));
+  VERIFY(std::is_sufficiently_aligned<1>(ptr+1));
+
+  VERIFY(std::is_sufficiently_aligned<

[PATCH v1 3/3] libstdc++: Implement aligned_accessor from mdspan.

2025-07-03 Thread Luc Grosheintz
This commit completes the implementation of P2897R7 by implementing and
testing the template class aligned_accessor.

libstdc++-v3/ChangeLog:

* include/bits/version.def (aligned_accessor): Add.
* include/bits/version.h: Regenerate.
* include/std/mdspan (aligned_accessor): New class.
* src/c++23/std.cc.in (aligned_accessor): Add.
* testsuite/23_containers/mdspan/accessors/generic.cc: Add tests
for aligned_accessor.
* testsuite/23_containers/mdspan/accessors/aligned.cc: New test.
* testsuite/23_containers/mdspan/accessors/aligned_ftm.cc: New test.
* testsuite/23_containers/mdspan/accessors/aligned_neg.cc: New test.
---
 libstdc++-v3/include/bits/version.def | 10 +++
 libstdc++-v3/include/bits/version.h   | 10 +++
 libstdc++-v3/include/std/mdspan   | 72 +++
 libstdc++-v3/src/c++23/std.cc.in  |  3 +-
 .../23_containers/mdspan/accessors/aligned.cc | 43 +++
 .../mdspan/accessors/aligned_ftm.cc   |  6 ++
 .../mdspan/accessors/aligned_neg.cc   | 33 +
 .../accessors/debug/aligned_access_neg.cc | 23 ++
 .../accessors/debug/aligned_offset_neg.cc | 23 ++
 .../23_containers/mdspan/accessors/generic.cc | 27 +++
 10 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/aligned.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/aligned_ftm.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/aligned_neg.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/debug/aligned_access_neg.cc
 create mode 100644 
libstdc++-v3/testsuite/23_containers/mdspan/accessors/debug/aligned_offset_neg.cc

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index a2695e67716..42445165b91 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1022,6 +1022,16 @@ ftms = {
   };
 };
 
+ftms = {
+  name = aligned_accessor;
+  values = {
+v = 202411;
+cxxmin = 26;
+extra_cond = "__glibcxx_assume_aligned "
+"&& __glibcxx_is_sufficiently_aligned";
+  };
+};
+
 ftms = {
   name = ssize;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 1b17a965239..3efa7b1baae 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1143,6 +1143,16 @@
 #endif /* !defined(__cpp_lib_mdspan) && defined(__glibcxx_want_mdspan) */
 #undef __glibcxx_want_mdspan
 
+#if !defined(__cpp_lib_aligned_accessor)
+# if (__cplusplus >  202302L) && (__glibcxx_assume_aligned && 
__glibcxx_is_sufficiently_aligned)
+#  define __glibcxx_aligned_accessor 202411L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_aligned_accessor)
+#   define __cpp_lib_aligned_accessor 202411L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_aligned_accessor) && 
defined(__glibcxx_want_aligned_accessor) */
+#undef __glibcxx_want_aligned_accessor
+
 #if !defined(__cpp_lib_ssize)
 # if (__cplusplus >= 202002L)
 #  define __glibcxx_ssize 201902L
diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index c72a64094b7..6eb804bf9a0 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -39,7 +39,12 @@
 #include 
 #include 
 
+#if __cplusplus > 202302L
+#include 
+#endif
+
 #define __glibcxx_want_mdspan
+#define __glibcxx_want_aligned_accessor
 #include 
 
 #ifdef __glibcxx_mdspan
@@ -1035,6 +1040,73 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return __p + __i; }
 };
 
+#ifdef __glibcxx_aligned_accessor
+  template
+struct aligned_accessor
+{
+  static_assert(has_single_bit(_ByteAlignment),
+   "ByteAlignment must be a power of two");
+  static_assert(_ByteAlignment >= alignof(_ElementType),
+   "ByteAlignment is too small for ElementType");
+  static_assert(!is_array_v<_ElementType>,
+   "ElementType must not be an array type");
+  static_assert(!is_abstract_v<_ElementType>,
+   "ElementType must not be an abstract class type");
+
+  using offset_policy = default_accessor<_ElementType>;
+  using element_type = _ElementType;
+  using reference = element_type&;
+  using data_handle_type = element_type*;
+
+  static constexpr size_t byte_alignment = _ByteAlignment;
+
+  constexpr
+  aligned_accessor() noexcept = default;
+
+  template
+   requires (is_convertible_v<_OElementType(*)[], element_type(*)[]>
+   && _OByteAlignment >= byte_alignment)
+   constexpr
+   aligned_accessor(aligned_accessor<_OElementType, _OByteAlignment>)
+   noexcept
+   { }
+
+  template
+   requires is_convertible_v<_OElementType(*)[], element_type(*)[]>
+   constexpr explicit
+   aligned_accessor(default_accessor<_OElementType>) noexcept
+  

Re: [PATCH v2 1/5] libstdc++: Check prerequisites of layout_*::operator().

2025-07-03 Thread Jonathan Wakely
On Thu, 3 Jul 2025 at 11:12, Tomasz Kaminski  wrote:
>
>
>
> On Thu, Jul 3, 2025 at 12:08 PM Luc Grosheintz  
> wrote:
>>
>>
>>
>> On 7/1/25 22:56, Jonathan Wakely wrote:
>> > On Tue, 1 Jul 2025 at 11:32, Tomasz Kaminski  wrote:
>> >>
>> >> Hi,
>> >> More of the review will be later, but I have noticed that you have added 
>> >> preconditions checks
>> >> to the layouts, and then avoid checking them inside the operator[] of the 
>> >> mdspan. This is general
>> >> sounds good.
>> >>
>> >> However, the precondition on mdspan::operator[] is now hardened: 
>> >> https://eel.is/c++draft/views.multidim#mdspan.mdspan.members-3.
>> >> This implies that breaking this precondition is a contract violation, and 
>> >> thus the user may expect a contract violation handler to be invoked
>> >> for it. Amongst the information provided to the handler via 
>> >> contract_violation object 
>> >> (https://eel.is/c++draft/support.contract.violation)
>> >> is source_location, that includes the name.
>> >
>> > Even without contracts, our __glibcxx_assert macro includes the
>> > __PRETTY_FUNCTION__ string in the abort message.
>> >
>> >> Given that I think we want to always check the extents in operator[] of 
>> >> mdspan, and thus remove
>> >> the checks from layout (to avoid duplication).
>> >
>> > I think it's probably QoI whether the contract checks happen directly
>> > in the operator[] function or in something that it calls. If doing the
>> > checks in operator[] has no extra runtime cost (e.g. due to them
>> > getting duplicated), then I think it is better to do them there, so
>> > that errors are reported from the "correct" place. But if it hurts
>> > performance, I don't think it's essential to check them there.
>>
>> The reasoning for this approach was:
>>
>>1. The mapping::operator() and mdspan::operator[] have the same
>>precondition; and mdspan::operator[] calls mapping::operator().

Yes, although a user-defined mapping might not bother to check
preconditions. So in order for us to implement the required check in
operator[] we really need to check it there.

We could check it in *both* places, and assume that the compiler will
see that the second check is entirely redundant.

We could also check in mapping::operator() and then in
mdspan::operator[] do something like:

if constexpr (!__is_std_mapping)
  __glibcxx_assert(...);

So if we know there's a check in mapping::operator() then don't bother
to check in operator[] as well.


>>2. The place I chose to check the precondition is where we already
>>have both the index and the extent in L1 and almost certainly in a
>>register. The hope was that together with branch prediction, this
>>will be a reasonably cheap place to put the check.
>>
>>3. The layouts are highly valuable on their own. I've implemented
>>that piece of logic numerous times in different contexts; and it's
>>wonderful that soon we can convert `i, j, k` to a linear index easily
>>using the standard library.
>>Therefore, I didn't want to skip them in mapping::operator() because
>>they're a guard against out of bounds accesses, e.g. in a user-defined
>>dynamically allocated, owning, multi-dimensional array.
>
> I think such types would have their own bounds checks, contracts, 
> preconditions.

Not if it's just something wrapping a unique_ptr, for example.

>>
>>
>>4. Finally, my assumption was that for performance critical code
>>one would be forced to turn off bounds checks. Hence, any place that
>>doesn't duplicate the check would be acceptable (until measured
>>otherwise).
>>In HPC workloads it wouldn't be uncommon to have 1M elements and 1M
>>iterations and 10s - 100s of different accesses per element per
>>iteration; per CPU core. In this loop often there's little more than
>>a few additions and multiplications.
>>
>> There's a few paths forwards:
>>
>>1. Remove the check from mapping::operator() and unconditionally
>>check in mdspan::operator[].
>
> I would go for option 1.

My original preference was option 2, but I've convinced myself that we
need the checks in operator[]. Rather than repeat them in both places,
I think I'm OK with option 1 too.



>>2. Leave it as is and return when we do optimization or hardening.
>>
>>3. Start measuring to figure out the cost of these checks; and then
>>decide.
>>
>> I'm open to all three.



Enable ipa-cp cloning for cold wrappers of hot functions

2025-07-03 Thread Jan Hubicka
Hi,
ipa-cp cloning disables itself for all functions not passing opt_for_fn
(node->decl, optimize_size) which disables it for cold wrappers of hot
functions where we want to propagate.  Since we later want to time saved
to be considered hot, we do not need to make this early test.

The patch also fixes few other places where AFDO 0 disables ipa-cp.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* ipa-cp.cc (cs_interesting_for_ipcp_p): Handle
correctly GLOBAL0 afdo counts.
(ipcp_cloning_candidate_p): Do not rule out nodes
!node->optimize_for_size_p ().
(good_cloning_opportunity_p): Handle afdo counts
as non-zero.
(update_profiling_info):

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 75ea94f2ad8..480cf48786c 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -554,6 +554,7 @@ cs_interesting_for_ipcp_p (cgraph_edge *e)
   /* If we have zero IPA profile, still consider edge for cloning
  in case we do partial training.  */
   if (e->count.ipa ().initialized_p ()
+  && e->count.ipa ().quality () != AFDO
   && !opt_for_fn (e->callee->decl,flag_profile_partial_training))
 return false;
   return true;
@@ -617,7 +618,9 @@ ipcp_cloning_candidate_p (struct cgraph_node *node)
   return false;
 }
 
-  if (node->optimize_for_size_p ())
+  /* Do not use profile here since cold wrapper wrap
+ hot function.  */
+  if (opt_for_fn (node->decl, optimize_size))
 {
   if (dump_file)
fprintf (dump_file, "Not considering %s for cloning; "
@@ -3391,9 +3394,10 @@ good_cloning_opportunity_p (struct cgraph_node *node, 
sreal time_benefit,
int size_cost, bool called_without_ipa_profile)
 {
   gcc_assert (count_sum.ipa () == count_sum);
+  if (count_sum.quality () == AFDO)
+count_sum = count_sum.force_nonzero ();
   if (time_benefit == 0
   || !opt_for_fn (node->decl, flag_ipa_cp_clone)
-  || node->optimize_for_size_p ()
   /* If there is no call which was executed in profiling or where
 profile is missing, we do not want to clone.  */
   || (!called_without_ipa_profile && !count_sum.nonzero_p ()))


Fix overlfow in ipa-cp heuristics

2025-07-03 Thread Jan Hubicka
Hi,
ipa-cp converts sreal times to int, while point of sreal is to accomodate very
large values that can happen for loops with large number of iteraitons and also
when profile is inconsistent.  This happens with afdo in testsuite where loop
preheader is estimated to have 0 excutions while loop body has large number of
executions.

Bootstrapped/regtesed x86_64-linux, comitted.

gcc/ChangeLog:

* ipa-cp.cc (hint_time_bonus): Return sreal and avoid
conversions to integer.
(good_cloning_opportunity_p): Avoid sreal to integer
conversions
(perform_estimation_of_a_value): Update.

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 3e073af662a..75ea94f2ad8 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -3341,10 +3341,10 @@ devirtualization_time_bonus (struct cgraph_node *node,
 
 /* Return time bonus incurred because of hints stored in ESTIMATES.  */
 
-static int
+static sreal
 hint_time_bonus (cgraph_node *node, const ipa_call_estimates &estimates)
 {
-  int result = 0;
+  sreal result = 0;
   ipa_hints hints = estimates.hints;
   if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
 result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
@@ -3352,10 +3352,10 @@ hint_time_bonus (cgraph_node *node, const 
ipa_call_estimates &estimates)
   sreal bonus_for_one = opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
 
   if (hints & INLINE_HINT_loop_iterations)
-result += (estimates.loops_with_known_iterations * bonus_for_one).to_int 
();
+result += estimates.loops_with_known_iterations * bonus_for_one;
 
   if (hints & INLINE_HINT_loop_stride)
-result += (estimates.loops_with_known_strides * bonus_for_one).to_int ();
+result += estimates.loops_with_known_strides * bonus_for_one;
 
   return result;
 }
@@ -3436,7 +3436,7 @@ good_cloning_opportunity_p (struct cgraph_node *node, 
sreal time_benefit,
 introduced.  This is likely almost always going to be true, since we
 already checked that time saved is large enough to be considered
 hot.  */
-  else if (evaluation.to_int () >= eval_threshold)
+  else if (evaluation >= (sreal)eval_threshold)
return true;
   /* If all call sites have profile known; we know we do not want t clone.
 If there are calls with unknown profile; try local heuristics.  */
@@ -3457,7 +3457,7 @@ good_cloning_opportunity_p (struct cgraph_node *node, 
sreal time_benefit,
 info->node_calling_single_call ? ", single_call" : "",
 evaluation.to_double (), eval_threshold);
 
-  return evaluation.to_int () >= eval_threshold;
+  return evaluation >= eval_threshold;
 }
 
 /* Grow vectors in AVALS and fill them with information about values of
@@ -3543,8 +3543,8 @@ perform_estimation_of_a_value (cgraph_node *node,
 time_benefit = 0;
   else
 time_benefit = (estimates.nonspecialized_time - estimates.time)
+  + hint_time_bonus (node, estimates)
   + (devirtualization_time_bonus (node, avals)
-+ hint_time_bonus (node, estimates)
 + removable_params_cost + est_move_cost);
 
   int size = estimates.size;


Re: [PATCH] x86: Emit label only for __mcount_loc section

2025-07-03 Thread Uros Bizjak
On Thu, Jul 3, 2025 at 11:54 AM H.J. Lu  wrote:
>
> commit ecc81e33123d7ac9c11742161e128858d844b99d (HEAD)
> Author: Andi Kleen 
> Date:   Fri Sep 26 04:06:40 2014 +
>
> Add direct support for Linux kernel __fentry__ patching
>
> emitted a label, 1, for __mcount_loc section:
>
> 1: call mcount
> .section __mcount_loc, "a",@progbits
> .quad 1b
> .previous
>
> If __mcount_loc wasn't used, we got an unused label.  Update
> x86_function_profiler to emit label only when __mcount_loc section
> is used.
>
> gcc/
>
> PR target/120936
> * config/i386/i386.cc (x86_print_call_or_nop): Add a label
> argument and use it to print label.
> (x86_function_profiler): Emit label only when __mcount_loc
> section is used.
>
> gcc/testsuite/
>
> PR target/120936
> * gcc.target/i386/pr120936-1.c: New test
> * gcc.target/i386/pr120936-2.c: Likewise.
> * gcc.target/i386/pr120936-3.c: Likewise.
> * gcc.target/i386/pr120936-4.c: Likewise.
> * gcc.target/i386/pr120936-5.c: Likewise.
> * gcc.target/i386/pr120936-6.c: Likewise.
> * gcc.target/i386/pr120936-7.c: Likewise.
> * gcc.target/i386/pr120936-8.c: Likewise.
> * gcc.target/i386/pr120936-9.c: Likewise.
> * gcc.target/i386/pr120936-10.c: Likewise.
> * gcc.target/i386/pr120936-11.c: Likewise.
> * gcc.target/i386/pr120936-12.c: Likewise.
> * gcc.target/i386/pr93492-3.c: Updated.
> * gcc.target/i386/pr93492-5.c: Likewise.
>
> OK for master?
>
> Thanks.

+  bool fentry_section_p
+= (flag_record_mcount
+   || lookup_attribute ("fentry_section",
+DECL_ATTRIBUTES (current_function_decl)));
+  const char *label;
+  if (fentry_section_p)
+label = "1:";
+  else
+label = "";

Just write this part as:

const char *label = fentry_section_p ? "1:" : "";

and using one vertical space before declaration.

Otherwise OK.

Thanks,
Uros.


Re: [PATCH v2 1/5] libstdc++: Check prerequisites of layout_*::operator().

2025-07-03 Thread Luc Grosheintz




On 7/1/25 22:56, Jonathan Wakely wrote:

On Tue, 1 Jul 2025 at 11:32, Tomasz Kaminski  wrote:


Hi,
More of the review will be later, but I have noticed that you have added 
preconditions checks
to the layouts, and then avoid checking them inside the operator[] of the 
mdspan. This is general
sounds good.

However, the precondition on mdspan::operator[] is now hardened: 
https://eel.is/c++draft/views.multidim#mdspan.mdspan.members-3.
This implies that breaking this precondition is a contract violation, and thus 
the user may expect a contract violation handler to be invoked
for it. Amongst the information provided to the handler via contract_violation 
object (https://eel.is/c++draft/support.contract.violation)
is source_location, that includes the name.


Even without contracts, our __glibcxx_assert macro includes the
__PRETTY_FUNCTION__ string in the abort message.


Given that I think we want to always check the extents in operator[] of mdspan, 
and thus remove
the checks from layout (to avoid duplication).


I think it's probably QoI whether the contract checks happen directly
in the operator[] function or in something that it calls. If doing the
checks in operator[] has no extra runtime cost (e.g. due to them
getting duplicated), then I think it is better to do them there, so
that errors are reported from the "correct" place. But if it hurts
performance, I don't think it's essential to check them there.


The reasoning for this approach was:

  1. The mapping::operator() and mdspan::operator[] have the same
  precondition; and mdspan::operator[] calls mapping::operator().

  2. The place I chose to check the precondition is where we already
  have both the index and the extent in L1 and almost certainly in a
  register. The hope was that together with branch prediction, this
  will be a reasonably cheap place to put the check.

  3. The layouts are highly valuable on their own. I've implemented
  that piece of logic numerous times in different contexts; and it's
  wonderful that soon we can convert `i, j, k` to a linear index easily
  using the standard library.
  Therefore, I didn't want to skip them in mapping::operator() because
  they're a guard against out of bounds accesses, e.g. in a user-defined
  dynamically allocated, owning, multi-dimensional array.

  4. Finally, my assumption was that for performance critical code
  one would be forced to turn off bounds checks. Hence, any place that
  doesn't duplicate the check would be acceptable (until measured
  otherwise).
  In HPC workloads it wouldn't be uncommon to have 1M elements and 1M
  iterations and 10s - 100s of different accesses per element per
  iteration; per CPU core. In this loop often there's little more than
  a few additions and multiplications.

There's a few paths forwards:

  1. Remove the check from mapping::operator() and unconditionally
  check in mdspan::operator[].

  2. Leave it as is and return when we do optimization or hardening.

  3. Start measuring to figure out the cost of these checks; and then
  decide.

I'm open to all three.






Hope, the above makes sense.

Regards,
Tomasz

On Fri, Jun 27, 2025 at 11:12 AM Luc Grosheintz  
wrote:


Previously the prerequisite that the arguments passed to operator() are
a multi-dimensional index (of extents()) was not checked.

This commit adds the __glibcxx_asserts and the required tests.

libstdc++-v3/ChangeLog:

 * include/std/mdspan: Check prerequisites of
 layout_*::operator().
 * testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc:
 Add tests for prerequisites.

Signed-off-by: Luc Grosheintz 
---
  libstdc++-v3/include/std/mdspan   |  4 +++
  .../mdspan/layouts/class_mandate_neg.cc   | 26 +++
  2 files changed, 30 insertions(+)

diff --git a/libstdc++-v3/include/std/mdspan b/libstdc++-v3/include/std/mdspan
index c72a64094b7..39d02ac08df 100644
--- a/libstdc++-v3/include/std/mdspan
+++ b/libstdc++-v3/include/std/mdspan
@@ -441,6 +441,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _IndexType __mult = 1;
 auto __update = [&, __pos = 0u](_IndexType __idx) mutable
   {
+   __glibcxx_assert(cmp_less(__idx, __exts.extent(__pos)));
 __res += __idx * __mult;
 __mult *= __exts.extent(__pos);
 ++__pos;
@@ -651,6 +652,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 auto __update = [&, __pos = __exts.rank()](_IndexType) mutable
   {
 --__pos;
+   __glibcxx_assert(cmp_less(__ind_arr[__pos],
+ __exts.extent(__pos)));
 __res += __ind_arr[__pos] * __mult;
 __mult *= __exts.extent(__pos);
   };
@@ -822,6 +825,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
 auto __update = [&, __pos = 0u](_IndexType __idx) mutable
   {
+   __glibcxx_assert(cmp_less(__idx, __m.exte

Re: [PATCH v2 1/5] libstdc++: Check prerequisites of layout_*::operator().

2025-07-03 Thread Tomasz Kaminski
On Thu, Jul 3, 2025 at 12:08 PM Luc Grosheintz 
wrote:

>
>
> On 7/1/25 22:56, Jonathan Wakely wrote:
> > On Tue, 1 Jul 2025 at 11:32, Tomasz Kaminski 
> wrote:
> >>
> >> Hi,
> >> More of the review will be later, but I have noticed that you have
> added preconditions checks
> >> to the layouts, and then avoid checking them inside the operator[] of
> the mdspan. This is general
> >> sounds good.
> >>
> >> However, the precondition on mdspan::operator[] is now hardened:
> https://eel.is/c++draft/views.multidim#mdspan.mdspan.members-3.
> >> This implies that breaking this precondition is a contract violation,
> and thus the user may expect a contract violation handler to be invoked
> >> for it. Amongst the information provided to the handler via
> contract_violation object (
> https://eel.is/c++draft/support.contract.violation)
> >> is source_location, that includes the name.
> >
> > Even without contracts, our __glibcxx_assert macro includes the
> > __PRETTY_FUNCTION__ string in the abort message.
> >
> >> Given that I think we want to always check the extents in operator[] of
> mdspan, and thus remove
> >> the checks from layout (to avoid duplication).
> >
> > I think it's probably QoI whether the contract checks happen directly
> > in the operator[] function or in something that it calls. If doing the
> > checks in operator[] has no extra runtime cost (e.g. due to them
> > getting duplicated), then I think it is better to do them there, so
> > that errors are reported from the "correct" place. But if it hurts
> > performance, I don't think it's essential to check them there.
>
> The reasoning for this approach was:
>
>1. The mapping::operator() and mdspan::operator[] have the same
>precondition; and mdspan::operator[] calls mapping::operator().
>
>2. The place I chose to check the precondition is where we already
>have both the index and the extent in L1 and almost certainly in a
>register. The hope was that together with branch prediction, this
>will be a reasonably cheap place to put the check.
>
>3. The layouts are highly valuable on their own. I've implemented
>that piece of logic numerous times in different contexts; and it's
>wonderful that soon we can convert `i, j, k` to a linear index easily
>using the standard library.
>Therefore, I didn't want to skip them in mapping::operator() because
>they're a guard against out of bounds accesses, e.g. in a user-defined
>dynamically allocated, owning, multi-dimensional array.
>
I think such types would have their own bounds checks, contracts,
preconditions.

>
>4. Finally, my assumption was that for performance critical code
>one would be forced to turn off bounds checks. Hence, any place that
>doesn't duplicate the check would be acceptable (until measured
>otherwise).
>In HPC workloads it wouldn't be uncommon to have 1M elements and 1M
>iterations and 10s - 100s of different accesses per element per
>iteration; per CPU core. In this loop often there's little more than
>a few additions and multiplications.
>
> There's a few paths forwards:
>
>1. Remove the check from mapping::operator() and unconditionally
>check in mdspan::operator[].
>
I would go for option 1.

>
>2. Leave it as is and return when we do optimization or hardening.
>
>3. Start measuring to figure out the cost of these checks; and then
>decide.
>
> I'm open to all three.
>
> >
> >
> >>
> >> Hope, the above makes sense.
> >>
> >> Regards,
> >> Tomasz
> >>
> >> On Fri, Jun 27, 2025 at 11:12 AM Luc Grosheintz <
> luc.groshei...@gmail.com> wrote:
> >>>
> >>> Previously the prerequisite that the arguments passed to operator() are
> >>> a multi-dimensional index (of extents()) was not checked.
> >>>
> >>> This commit adds the __glibcxx_asserts and the required tests.
> >>>
> >>> libstdc++-v3/ChangeLog:
> >>>
> >>>  * include/std/mdspan: Check prerequisites of
> >>>  layout_*::operator().
> >>>  * testsuite/23_containers/mdspan/layouts/class_mandate_neg.cc:
> >>>  Add tests for prerequisites.
> >>>
> >>> Signed-off-by: Luc Grosheintz 
> >>> ---
> >>>   libstdc++-v3/include/std/mdspan   |  4 +++
> >>>   .../mdspan/layouts/class_mandate_neg.cc   | 26
> +++
> >>>   2 files changed, 30 insertions(+)
> >>>
> >>> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> >>> index c72a64094b7..39d02ac08df 100644
> >>> --- a/libstdc++-v3/include/std/mdspan
> >>> +++ b/libstdc++-v3/include/std/mdspan
> >>> @@ -441,6 +441,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>>  _IndexType __mult = 1;
> >>>  auto __update = [&, __pos = 0u](_IndexType __idx) mutable
> >>>{
> >>> +   __glibcxx_assert(cmp_less(__idx,
> __exts.extent(__pos)));
> >>>  __res += __idx * __mult;
> >>>  __mult *= __exts.extent(__pos);
> >>>  ++__pos;
> >>> @

[PATCH v1 1/1] libiberty: add common methods for type-sensitive doubly linked lists

2025-07-03 Thread Matthieu Longo
Those methods's implementation is relying on duck-typing at compile
time.
The structure corresponding to the node of a doubly linked list needs
to define attributes 'prev' and 'next' which are pointers on the type
of a node.
The structure wrapping the nodes and others metadata (first, last, size)
needs to define pointers 'first_', and 'last_' of the node's type, and
an integer type for 'size'.

Mutative methods can be bundled together and be declarable once via a
same macro, or can be declared separately. The merge sort is bundled
separately.
There are 3 types of macros:
1. for the declaration of prototypes: to use in a header file for a
   public declaration, or as a forward declaration in the source file
   for private declaration.
2. for the declaration of the implementation: to use always in a
   source file.
3. for the invocation of the functions.

The methods can be declared either public or private via the second
argument of the declaration macros.

List of currently implemented methods:
- LINKED_LIST_*:
- APPEND: insert a node at the end of the list.
- PREPEND: insert a node at the beginning of the list.
- INSERT_BEFORE: insert a node before the given node.
- POP_FRONT: remove the first node of the list.
- POP_BACK: remove the last node of the list.
- REMOVE: remove the given node from the list.
- SWAP: swap the two given nodes in the list.
- LINKED_LIST_MERGE_SORT: a merge sort implementation.
---
 include/doubly-linked-list.h  | 440 ++
 libiberty/Makefile.in |   1 +
 libiberty/testsuite/Makefile.in   |  12 +-
 libiberty/testsuite/test-doubly-linked-list.c | 253 ++
 4 files changed, 705 insertions(+), 1 deletion(-)
 create mode 100644 include/doubly-linked-list.h
 create mode 100644 libiberty/testsuite/test-doubly-linked-list.c

diff --git a/include/doubly-linked-list.h b/include/doubly-linked-list.h
new file mode 100644
index 000..3b3ce1ee6b9
--- /dev/null
+++ b/include/doubly-linked-list.h
@@ -0,0 +1,440 @@
+/* Copyright (C) 2025 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see .  */
+
+
+#ifndef _DOUBLY_LINKED_LIST_H
+#define _DOUBLY_LINKED_LIST_H
+
+#include 
+
+/* Doubly linked list implementation enforcing typing.
+
+   This implementation of doubly linked list tries to achieve the enforcement 
of
+   typing similarly to C++ templates, but without encapsulation.
+
+   All the functions are prefixed with the type of the value: "AType_xxx".
+   Some functions are prefixed with "_AType_xxx" and are not part of the public
+   API, so should not be used, except for _##LTYPE##_merge_sort with a caveat
+   (see note above its definition).
+
+   Each function (### is a placeholder for method name) has a macro for:
+   (1) its invocation LINKED_LIST_###(LTYPE).
+   (2) its prototype LINKED_LIST_DECL_###(A, A2, scope). To add in a header
+   file, or a source file for forward declaration. 'scope' should be set
+   respectively to 'extern', or 'static'.
+   (3) its definition LINKED_LIST_DEFN_###(A, A2, scope). To add in a source
+   file with the 'scope' set respectively to nothing, or 'static' depending
+   on (2).
+
+   Data structures requirements:
+   - LTYPE corresponds to the node of a doubly linked list. It needs to define
+ attributes 'prev' and 'next' which are pointers on the type of a node.
+ For instance:
+   struct my_list_node
+   {
+T value;
+struct my_list_node *prev;
+struct my_list_node *next;
+   };
+   - LWRAPPERTYPE is a structure wrapping the nodes and others metadata 
(first_,
+ last_, size).
+ */
+
+
+/* Mutative operations:
+- append
+- prepend
+- insert_before
+- pop_front
+- pop_back
+- remove
+- swap
+   The header and body of each of those operation can be declared individually,
+   or as a whole via LINKED_LIST_MUTATIVE_OPS_PROTOTYPE for the prototypes, and
+   LINKED_LIST_MUTATIVE_OPS_DECL for the implementations.  */
+
+/* Append the given node new_ to the exising list.  */
+#define LINKED_LIST_APPEND(LTYPE)  LTYPE##_append
+
+#define LINKED_LIST_DECL_APPEND(LWRAPPERTYPE, LTYPE, EXPORT)   \
+  EXPORT void  \
+  LTYPE##_append (LWRAPPERTYPE *wrapper, LTYPE *new_)
+
+#defi

[PATCH v1 0/1] libiberty: add common methods for type-sensitive doubly linked lists

2025-07-03 Thread Matthieu Longo
This patch was originally part of [1]. Merging it in GCC is a prerequisite of 
merging it inside binutils.

Those methods's implementation is relying on duck-typing at compile time. The 
structure corresponding to the node of a doubly linked list needs to define 
attributes 'prev' and 'next' which are pointers on the type of a node.
The structure wrapping the nodes and others metadata (first, last, size) needs 
to define pointers 'first_', and 'last_' of the node's type, and an integer 
type for 'size'.

Mutative methods can be bundled together and be declarable once via a same 
macro, or can be declared separately. The merge sort is bundled separately.
There are 3 types of macros:
1. for the declaration of prototypes: to use in a header file for a public 
declaration, or as a forward declaration in the source file for private 
declaration.
2. for the declaration of the implementation: to use always in a source file.
3. for the invocation of the functions.

The methods can be declared either public or private via the second
argument of the declaration macros.

List of currently implemented methods:
- LINKED_LIST_*:
- APPEND: insert a node at the end of the list.
- PREPEND: insert a node at the beginning of the list.
- INSERT_BEFORE: insert a node before the given node.
- POP_FRONT: remove the first node of the list.
- POP_BACK: remove the last node of the list.
- REMOVE: remove the given node from the list.
- SWAP: swap the two given nodes in the list.
- LINKED_LIST_MERGE_SORT: a merge sort implementation.

Regression tested on aarch64-unknown-linux-gnu. No failure found.

[1]: 
https://inbox.sourceware.org/binutils/20250509151319.88725-10-matthieu.lo...@arm.com/

Regards,
Matthieu


Matthieu Longo (1):
  libiberty: add common methods for type-sensitive doubly linked lists

 include/doubly-linked-list.h  | 440 ++
 libiberty/Makefile.in |   1 +
 libiberty/testsuite/Makefile.in   |  12 +-
 libiberty/testsuite/test-doubly-linked-list.c | 253 ++
 4 files changed, 705 insertions(+), 1 deletion(-)
 create mode 100644 include/doubly-linked-list.h
 create mode 100644 libiberty/testsuite/test-doubly-linked-list.c

-- 
2.50.0



[COMMITTED 40/42] ada: Fix alignment violation for mix of aligned and misaligned composite types

2025-07-03 Thread Marc Poulhiès
From: Eric Botcazou 

This happens when the chain of initialization procedures is called on the
subcomponents and causes the creation of temporaries along the way out of
alignment considerations.  Now these temporaries are not necessary in the
context and were not created until recently, so this gets rid of them.

gcc/ada/ChangeLog:

* gcc-interface/trans.cc (addressable_p): Add COMPG third parameter.
: Do not return true out of alignment considerations
for non-strict-alignment targets if COMPG is set.
(Call_to_gnu): Pass true as COMPG in the call to the addressable_p
predicate if the called subprogram is an initialization procedure.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 41 +++---
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 7549b8e37bf..e02804b75af 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -257,7 +257,7 @@ static tree emit_check (tree, tree, int, Node_Id);
 static tree build_unary_op_trapv (enum tree_code, tree, tree, Node_Id);
 static tree build_binary_op_trapv (enum tree_code, tree, tree, tree, Node_Id);
 static tree convert_with_check (Entity_Id, tree, bool, bool, Node_Id);
-static bool addressable_p (tree, tree);
+static bool addressable_p (tree, tree, bool);
 static bool aliasable_p (tree, tree);
 static tree assoc_to_constructor (Entity_Id, Node_Id, tree);
 static tree pos_to_constructor (Node_Id, tree);
@@ -4876,6 +4876,8 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, 
tree gnu_target,
   tree gnu_formal = present_gnu_tree (gnat_formal)
? get_gnu_tree (gnat_formal) : NULL_TREE;
   tree gnu_actual_type = gnat_to_gnu_type (Etype (gnat_actual));
+  const bool is_init_proc
+   = Is_Entity_Name (gnat_subprog) && Is_Init_Proc (Entity (gnat_subprog));
   const bool in_param = (Ekind (gnat_formal) == E_In_Parameter);
   const bool is_true_formal_parm
= gnu_formal && TREE_CODE (gnu_formal) == PARM_DECL;
@@ -4925,7 +4927,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, 
tree gnu_target,
 copy to avoid breaking strict aliasing rules.  */
   if (is_by_ref_formal_parm
  && (gnu_name_type = gnat_to_gnu_type (Etype (gnat_name)))
- && (!addressable_p (gnu_name, gnu_name_type)
+ && (!addressable_p (gnu_name, gnu_name_type, is_init_proc)
  || (node_is_type_conversion (gnat_actual)
  && (aliasing = !aliasable_p (gnu_name, gnu_actual_type)
{
@@ -5051,9 +5053,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, 
tree gnu_target,
 
  /* Do not initialize it for the _Init parameter of an initialization
 procedure since no data is meant to be passed in.  */
- if (Ekind (gnat_formal) == E_Out_Parameter
- && Is_Entity_Name (gnat_subprog)
- && Is_Init_Proc (Entity (gnat_subprog)))
+ if (Ekind (gnat_formal) == E_Out_Parameter && is_init_proc)
gnu_name = gnu_temp = create_temporary ("A", TREE_TYPE (gnu_name));
 
  /* Initialize it on the fly like for an implicit temporary in the
@@ -10379,7 +10379,8 @@ convert_with_check (Entity_Id gnat_type, tree gnu_expr, 
bool overflow_p,
unless it is an expression involving computation or if it involves a
reference to a bitfield or to an object not sufficiently aligned for
its type.  If GNU_TYPE is non-null, return true only if GNU_EXPR can
-   be directly addressed as an object of this type.
+   be directly addressed as an object of this type.  COMPG is true when
+   the predicate is invoked for compiler-generated code.
 
*** Notes on addressability issues in the Ada compiler ***
 
@@ -10436,7 +10437,7 @@ convert_with_check (Entity_Id gnat_type, tree gnu_expr, 
bool overflow_p,
generated to connect everything together.  */
 
 static bool
-addressable_p (tree gnu_expr, tree gnu_type)
+addressable_p (tree gnu_expr, tree gnu_type, bool compg)
 {
   /* For an integral type, the size of the actual type of the object may not
  be greater than that of the expected type, otherwise an indirect access
@@ -10497,13 +10498,13 @@ addressable_p (tree gnu_expr, tree gnu_type)
 
 case COMPOUND_EXPR:
   /* The address of a compound expression is that of its 2nd operand.  */
-  return addressable_p (TREE_OPERAND (gnu_expr, 1), gnu_type);
+  return addressable_p (TREE_OPERAND (gnu_expr, 1), gnu_type, compg);
 
 case COND_EXPR:
   /* We accept &COND_EXPR as soon as both operands are addressable and
 expect the outcome to be the address of the selected operand.  */
-  return (addressable_p (TREE_OPERAND (gnu_expr, 1), NULL_TREE)
- && addressable_p (TREE_OPERAND (gnu_expr, 2), NULL_TREE));
+  return (addressable_p (TREE_OPERAND (gnu_expr, 1), N

  1   2   >