Re: 'gcc/config/nvptx/gen-multilib-matches.sh': Support '--selftest' (was: 'gcc/config/nvptx/t-nvptx': Don't use the 'shell' function of 'make' (was: nvptx: Allow '--with-arch' to override the default

2024-12-06 Thread Sam James
Hi!

The script has #!/bin/sh shebang (and hence must have POSIX shell
compatibility), but the patch introduces uses of the 'local' keyword
which isn't in POSIX.

While many shells do have the 'local' keyword, its behaviour isn't
portable across those either, which is why it's likely it'll never
be added to POSIX :(

thanks,
sam


Re: 'gcc/config/nvptx/gen-multilib-matches.sh': Support '--selftest' (was: 'gcc/config/nvptx/t-nvptx': Don't use the 'shell' function of 'make' (was: nvptx: Allow '--with-arch' to override the default

2024-12-06 Thread Thomas Schwinge
Hi Sam!

On 2024-12-06T09:34:32+, Sam James  wrote:
> The script has #!/bin/sh shebang (and hence must have POSIX shell
> compatibility), but the patch introduces uses of the 'local' keyword
> which isn't in POSIX.
>
> While many shells do have the 'local' keyword, its behaviour isn't
> portable across those either, which is why it's likely it'll never
> be added to POSIX :(

Right, but I intentionally picked the form that I thought was supported
by all reasonable '/bin/sh's: 'local [name]', without any further
adornement.  For example, per :

| 'local' is mandated by the LSB and Debian policy specifications, though only 
the 'local varname' (not 'local var=value') syntax is specified.

Portable, reliable shell programming is a nice idea, but then, reality
check...

(Don't ask me how much time I already spent on this simple script, to get
it into its current form -- and I'd consider myself well-versed in shell
programming...)

I was inclined to just rewrite it in Python, what do you think?  In my
opinion, a GCC-build-time Python dependency is not a problem for
'--target=nvptx-none', as that one's not in the bootstrapping chain?


Grüße
 Thomas


[PATCH] arm, testsuite: Add -mtune=cortex-m55 to dlstp-compile-asm-1.c test.

2024-12-06 Thread Christophe Lyon
This test would fail if GCC is configured with non-default options,
such as -mtune=cortex-a9.

This 'unexpected' scheduling makes the DLSTP optimization generate
subslr, #16
bhi .L4
lctp
pop {r4, r5, pc}
.L4:
sub ip, ip, #16
b  

instead of the expected
sub ip, ip, #16
letp lr, 

Although GCC still optimizes all 144 loops, only 96 use letp, 48
others use lctp.

The patch simply forces -mtune=cortex-m55 to avoid this unexpected
issue.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-compile-asm-1.c: Add -mtune=cortex-m55
---
 gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c 
b/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c
index 6e6da3d3d59..7b7f1da6435 100644
--- a/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
-/* { dg-options "-O3 -save-temps" } */
+/* { dg-options "-O3 -save-temps -mtune=cortex-m55" } */
 /* { dg-add-options arm_v8_1m_mve } */
 
 #include 
-- 
2.34.1



Re: [PATCH] arm, testsuite: Add -mtune=cortex-m55 to dlstp-compile-asm-1.c test.

2024-12-06 Thread Richard Earnshaw (lists)
On 06/12/2024 10:02, Christophe Lyon wrote:
> This test would fail if GCC is configured with non-default options,
> such as -mtune=cortex-a9.
> 
> This 'unexpected' scheduling makes the DLSTP optimization generate
>   subslr, #16
>   bhi .L4
>   lctp
>   pop {r4, r5, pc}
> .L4:
>   sub ip, ip, #16
>   b  
> 
> instead of the expected
> sub ip, ip, #16
> letp lr, 
> 
> Although GCC still optimizes all 144 loops, only 96 use letp, 48
> others use lctp.
> 
> The patch simply forces -mtune=cortex-m55 to avoid this unexpected
> issue.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/dlstp-compile-asm-1.c: Add -mtune=cortex-m55
> ---
>  gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c 
> b/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c
> index 6e6da3d3d59..7b7f1da6435 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-compile-asm-1.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
> -/* { dg-options "-O3 -save-temps" } */
> +/* { dg-options "-O3 -save-temps -mtune=cortex-m55" } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  
>  #include 

OK
R.


Re: [PATCH] config: nvptx: fix bashisms with gen-copyright.sh use

2024-12-06 Thread Thomas Schwinge
Hi Sam and Tom!

On 2024-12-06T09:13:40+, Sam James  wrote:
> Providing parameters to `.` when sourcing is a bashism and not supported
> by POSIX shell which causes a build failure when compiling a toolchain
> for nvptx-none with dash as /bin/sh.

Hmm, something must be wrong in that statement, as I'm regularly building
GCC/nvptx on a system with '/bin/sh -> dash'.

> gen-copyright.sh takes a parameter for the format of copyright notice
> required. Switch that to using an environment variable `NVPTX_GEN_COPYRIGHT`,
> although this could be changed to a function if desired (just more churn
> in gen-copyright.sh then).
>
> gcc/ChangeLog:
>   PR target/117854
>
>   * config/nvptx/gen-copyright.sh: Read NVPTX_GEN_COPYRIGHT envvar.
>   * config/nvptx/gen-h.sh: Set NVPTX_GEN_COPYRIGHT.
>   * config/nvptx/gen-opt.sh: Ditto.
> ---
> Testing it now with a build for nvptx-none. Is this approach OK or
> would you prefer the function approach (which will make the diff larger
> because of reformatting)?

First: Tom, what was your original intention why we'd keep the generated
files in the sources?  (..., instead of just generating them at build
time, like 'gcc/config/nvptx/t-omp-device' does for
'omp-device-properties-nvptx', for example.  In that case, we could just
skip adding these copyright/licensing headers.)

Second: why not just invoke 'gen-copyright.sh' instead of sourcing it?


Grüße
 Thomas


>  gcc/config/nvptx/gen-copyright.sh | 2 +-
>  gcc/config/nvptx/gen-h.sh | 2 +-
>  gcc/config/nvptx/gen-opt.sh   | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/nvptx/gen-copyright.sh 
> b/gcc/config/nvptx/gen-copyright.sh
> index d0a86acb832c..b140de0eb76d 100644
> --- a/gcc/config/nvptx/gen-copyright.sh
> +++ b/gcc/config/nvptx/gen-copyright.sh
> @@ -18,7 +18,7 @@
>  # along with GCC; see the file COPYING3.  If not see
>  # .
>  
> -style="$1"
> +style="${1:-${NVPTX_GEN_COPYRIGHT}}"
>  case $style in
>  opt)
>  ;;
> diff --git a/gcc/config/nvptx/gen-h.sh b/gcc/config/nvptx/gen-h.sh
> index ea75e127cdeb..beafd5a9d2c4 100644
> --- a/gcc/config/nvptx/gen-h.sh
> +++ b/gcc/config/nvptx/gen-h.sh
> @@ -32,7 +32,7 @@ EOF
>  # Separator.
>  echo
>  
> -. $gen_copyright_sh c
> +NVPTX_GEN_COPYRIGHT=c . $gen_copyright_sh
>  
>  # Separator.
>  echo
> diff --git a/gcc/config/nvptx/gen-opt.sh b/gcc/config/nvptx/gen-opt.sh
> index 6022f51f8975..267a5005f66b 100644
> --- a/gcc/config/nvptx/gen-opt.sh
> +++ b/gcc/config/nvptx/gen-opt.sh
> @@ -36,7 +36,7 @@ EOF
>  # Separator.
>  echo
>  
> -. $gen_copyright_sh opt
> +NVPTX_GEN_COPYRIGHT=opt . $gen_copyright_sh
>  
>  # Not emitting the following here (in addition to having it in 'nvptx.opt'), 
> as
>  # we'll otherwise run into:
> -- 
> 2.47.1


nvptx: Enhance '-march=[...]' test cases (was: [committed][nvptx, testsuite] Add gcc.target/nvptx/sm*.c)

2024-12-06 Thread Thomas Schwinge
Hi!

On 2022-03-01T09:00:45+0100, Tom de Vries via Gcc-patches 
 wrote:
> Add a few test-cases that test passing each -misa=sm_xx version and verify 
> that
> the proper __PTX_SM__ is defined.

Pushed to trunk branch commit ed96ce81b19b76ba6a5edfe68dd86d8ea319c6d9
"nvptx: Enhance '-march=[...]' test cases", see attached.


Grüße
 Thomas


>From ed96ce81b19b76ba6a5edfe68dd86d8ea319c6d9 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sun, 10 Nov 2024 20:09:42 +0100
Subject: [PATCH] nvptx: Enhance '-march=[...]' test cases

This expands upon the test cases added in
commit 4706670cd3b06bb024da0683776bf86c79d55940
"[nvptx, testsuite] Add gcc.target/nvptx/sm*.c".

	gcc/testsuite/
	* gcc.target/nvptx/sm30.c: Remove; expanded into...
	* gcc.target/nvptx/march=sm_30.c: ... this.
	* gcc.target/nvptx/sm35.c: Remove; expanded into...
	* gcc.target/nvptx/march=sm_35.c: ... this.
	* gcc.target/nvptx/sm53.c: Remove; expanded into...
	* gcc.target/nvptx/march=sm_53.c: ... this.
	* gcc.target/nvptx/sm70.c: Remove; expanded into...
	* gcc.target/nvptx/march=sm_70.c: ... this.
	* gcc.target/nvptx/sm75.c: Remove; expanded into...
	* gcc.target/nvptx/march=sm_75.c: ... this.
	* gcc.target/nvptx/sm80.c: Remove; expanded into...
	* gcc.target/nvptx/march=sm_80.c: ... this.
	* gcc.target/nvptx/march.c: Remove.
---
 gcc/testsuite/gcc.target/nvptx/march.c   |  5 -
 gcc/testsuite/gcc.target/nvptx/march=sm_30.c | 19 +++
 gcc/testsuite/gcc.target/nvptx/march=sm_35.c | 19 +++
 gcc/testsuite/gcc.target/nvptx/march=sm_53.c | 19 +++
 gcc/testsuite/gcc.target/nvptx/march=sm_70.c | 19 +++
 gcc/testsuite/gcc.target/nvptx/march=sm_75.c | 19 +++
 gcc/testsuite/gcc.target/nvptx/march=sm_80.c | 19 +++
 gcc/testsuite/gcc.target/nvptx/sm30.c|  6 --
 gcc/testsuite/gcc.target/nvptx/sm35.c|  6 --
 gcc/testsuite/gcc.target/nvptx/sm53.c|  6 --
 gcc/testsuite/gcc.target/nvptx/sm70.c|  6 --
 gcc/testsuite/gcc.target/nvptx/sm75.c|  6 --
 gcc/testsuite/gcc.target/nvptx/sm80.c|  6 --
 13 files changed, 114 insertions(+), 41 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/march.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march=sm_30.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march=sm_35.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march=sm_53.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march=sm_70.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march=sm_75.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march=sm_80.c
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/sm30.c
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/sm35.c
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/sm53.c
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/sm70.c
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/sm75.c
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/sm80.c

diff --git a/gcc/testsuite/gcc.target/nvptx/march.c b/gcc/testsuite/gcc.target/nvptx/march.c
deleted file mode 100644
index d1dd715798c4..
--- a/gcc/testsuite/gcc.target/nvptx/march.c
+++ /dev/null
@@ -1,5 +0,0 @@
-/* { dg-options "-march=sm_30" } */
-
-#include "main.c"
-
-/* { dg-final { scan-assembler-times "\\.target\tsm_30" 1 } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/march=sm_30.c b/gcc/testsuite/gcc.target/nvptx/march=sm_30.c
new file mode 100644
index ..a362935f3827
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/march=sm_30.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble } */
+/* { dg-options {-march=sm_30 -mptx=_} } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { scan-assembler-times {(?n)^	\.version	6\.0$} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^	\.target	sm_30$} 1 } } */
+
+#if __PTX_ISA_VERSION_MAJOR__ != 6
+#error wrong value for __PTX_ISA_VERSION_MAJOR__
+#endif
+
+#if __PTX_ISA_VERSION_MINOR__ != 0
+#error wrong value for __PTX_ISA_VERSION_MINOR__
+#endif
+
+#if __PTX_SM__ != 300
+#error wrong value for __PTX_SM__
+#endif
+
+int dummy;
diff --git a/gcc/testsuite/gcc.target/nvptx/march=sm_35.c b/gcc/testsuite/gcc.target/nvptx/march=sm_35.c
new file mode 100644
index ..c9e92261b0e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/march=sm_35.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble } */
+/* { dg-options {-march=sm_35 -mptx=_} } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { scan-assembler-times {(?n)^	\.version	6\.0$} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^	\.target	sm_35$} 1 } } */
+
+#if __PTX_ISA_VERSION_MAJOR__ != 6
+#error wrong value for __PTX_ISA_VERSION_MAJOR__
+#endif
+
+#if __PTX_ISA_VERSION_MINOR__ != 0
+#error wrong value for __PTX_ISA_VERSION_MINOR__
+#endif
+
+#if __PTX_SM__ != 350
+#error wrong value for __PTX_SM__
+#endif
+
+int dummy;
diff --git a/gcc/testsuite/gcc.target/nvptx/march=sm_53.c b/gcc/testsuite/gcc.target/nvptx/march=sm_

nvptx: Enhance '-march-map=[...]' test cases (was: [committed][nvptx] Add march-map)

2024-12-06 Thread Thomas Schwinge
Hi!

On 2022-03-29T14:03:22+0200, Tom de Vries via Gcc-patches 
 wrote:
> [...]
>
> gcc/testsuite/ChangeLog:
>
> 2022-03-29  Tom de Vries  
>
>   PR target/104714
>   * gcc.target/nvptx/march-map.c: New test.

Pushed to trunk branch commit ee6711ead30876daf2a8a66f8647cad95470fe79
"nvptx: Enhance '-march-map=[...]' test cases", see attached.


Grüße
 Thomas


>From ee6711ead30876daf2a8a66f8647cad95470fe79 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sun, 10 Nov 2024 18:29:25 +0100
Subject: [PATCH] nvptx: Enhance '-march-map=[...]' test cases

This expands upon the one test case added in
commit de0ef04419e90eacf0d1ddb265552a1b08c18d4b "[nvptx] Add march-map".

	gcc/testsuite/
	* gcc.target/nvptx/march-map.c: Remove; expanded into...
	* gcc.target/nvptx/march-map=sm_50.c: ... this.
	* gcc.target/nvptx/march-map=sm_30.c: New.
	* gcc.target/nvptx/march-map=sm_32.c: Likewise.
	* gcc.target/nvptx/march-map=sm_35.c: Likewise.
	* gcc.target/nvptx/march-map=sm_37.c: Likewise.
	* gcc.target/nvptx/march-map=sm_52.c: Likewise.
	* gcc.target/nvptx/march-map=sm_53.c: Likewise.
	* gcc.target/nvptx/march-map=sm_60.c: Likewise.
	* gcc.target/nvptx/march-map=sm_61.c: Likewise.
	* gcc.target/nvptx/march-map=sm_62.c: Likewise.
	* gcc.target/nvptx/march-map=sm_70.c: Likewise.
	* gcc.target/nvptx/march-map=sm_72.c: Likewise.
	* gcc.target/nvptx/march-map=sm_75.c: Likewise.
	* gcc.target/nvptx/march-map=sm_80.c: Likewise.
	* gcc.target/nvptx/march-map=sm_86.c: Likewise.
	* gcc.target/nvptx/march-map=sm_87.c: Likewise.
	* gcc.target/nvptx/march-map=sm_89.c: Likewise.
	* gcc.target/nvptx/march-map=sm_90.c: Likewise.
	* gcc.target/nvptx/march-map=sm_90a.c: Likewise.
	* gcc.target/nvptx/main.c: Remove.
---
 gcc/testsuite/gcc.target/nvptx/main.c |  7 ---
 gcc/testsuite/gcc.target/nvptx/march-map.c|  5 -
 .../gcc.target/nvptx/march-map=sm_30.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_32.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_35.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_37.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_50.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_52.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_53.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_60.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_61.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_62.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_70.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_72.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_75.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_80.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_86.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_87.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_89.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_90.c| 19 +++
 .../gcc.target/nvptx/march-map=sm_90a.c   | 19 +++
 21 files changed, 361 insertions(+), 12 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/main.c
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/march-map.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_30.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_32.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_35.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_37.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_50.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_52.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_53.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_60.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_61.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_62.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_70.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_72.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_75.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_80.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_86.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_87.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_89.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_90.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march-map=sm_90a.c

diff --git a/gcc/testsuite/gcc.target/nvptx/main.c b/gcc/testsuite/gcc.target/nvptx/main.c
deleted file mode 100644
index 3af2b5758424..
--- a/gcc/testsuite/gcc.target/nvptx/main.c
+++ /dev/null
@@ -1,7 +0,0 @@
-/* { dg-do link } */
-
-int
-main (void)
-{
-  return 0;
-}
diff --git a/gcc

Re: [PATCH] config: nvptx: fix bashisms with gen-copyright.sh use

2024-12-06 Thread Sam James
Thomas Schwinge  writes:

> Hi Sam and Tom!

Hi!

>
> On 2024-12-06T09:13:40+, Sam James  wrote:
>> Providing parameters to `.` when sourcing is a bashism and not supported
>> by POSIX shell which causes a build failure when compiling a toolchain
>> for nvptx-none with dash as /bin/sh.
>
> Hmm, something must be wrong in that statement, as I'm regularly building
> GCC/nvptx on a system with '/bin/sh -> dash'.

H. Do you per chance override CONFIG_SHELL (or SHELL)? Or does
configure, for some reason, determine that SHELL on your system is bash?

Note that at least on Debian, I believe dash is built without lineno
support, so configure will disqualify it as a shell for some purposes
(the story behind that is a bit odd but whatever).

>
>> gen-copyright.sh takes a parameter for the format of copyright notice
>> required. Switch that to using an environment variable `NVPTX_GEN_COPYRIGHT`,
>> although this could be changed to a function if desired (just more churn
>> in gen-copyright.sh then).
>>
>> gcc/ChangeLog:
>>  PR target/117854
>>
>>  * config/nvptx/gen-copyright.sh: Read NVPTX_GEN_COPYRIGHT envvar.
>>  * config/nvptx/gen-h.sh: Set NVPTX_GEN_COPYRIGHT.
>>  * config/nvptx/gen-opt.sh: Ditto.
>> ---
>> Testing it now with a build for nvptx-none. Is this approach OK or
>> would you prefer the function approach (which will make the diff larger
>> because of reformatting)?
>
> First: Tom, what was your original intention why we'd keep the generated
> files in the sources?  (..., instead of just generating them at build
> time, like 'gcc/config/nvptx/t-omp-device' does for
> 'omp-device-properties-nvptx', for example.  In that case, we could just
> skip adding these copyright/licensing headers.)
>
> Second: why not just invoke 'gen-copyright.sh' instead of sourcing it?

I had wondered about the latter as well.

>
>
> Grüße
>  Thomas

thanks,
sam

> [...]


nvptx: Enhance '-mptx=[...]' test cases (was: [committed][nvptx] Add __PTX_ISA_VERSION_{MAJOR,MINOR}__)

2024-12-06 Thread Thomas Schwinge
Hi!

On 2022-03-29T16:17:43+0200, Tom de Vries via Gcc-patches 
 wrote:
> [...]
>
> gcc/testsuite/ChangeLog:
>
> 2022-03-29  Tom de Vries  
>
>   PR target/104857
>   * gcc.target/nvptx/ptx31.c: New test.
>   * gcc.target/nvptx/ptx60.c: New test.
>   * gcc.target/nvptx/ptx63.c: New test.
>   * gcc.target/nvptx/ptx70.c: New test.

Pushed to trunk branch commit b7abc7cabdbcc889a74cde1cdc1ffb27cf965128
"nvptx: Enhance '-mptx=[...]' test cases", see attached.


Grüße
 Thomas


>From b7abc7cabdbcc889a74cde1cdc1ffb27cf965128 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sun, 10 Nov 2024 20:01:58 +0100
Subject: [PATCH] nvptx: Enhance '-mptx=[...]' test cases

This expands upon the test cases added in
commit a2eacdbd4c4a698b3b6f27ef5e1f8dd3d836b2e5
"[nvptx] Add __PTX_ISA_VERSION_{MAJOR,MINOR}__".

	gcc/testsuite/
	* gcc.target/nvptx/ptx31.c: Remove; expanded into...
	* gcc.target/nvptx/mptx=3.1.c: ... this.
	* gcc.target/nvptx/ptx60.c: Remove; expanded into...
	* gcc.target/nvptx/mptx=6.0.c: ... this.
	* gcc.target/nvptx/ptx63.c: Remove; expanded into...
	* gcc.target/nvptx/mptx=6.3.c: ... this.
	* gcc.target/nvptx/ptx70.c: Remove; expanded into...
	* gcc.target/nvptx/mptx=7.0.c: ... this.
	* gcc.target/nvptx/mptx=_.c: New.
---
 gcc/testsuite/gcc.target/nvptx/mptx=3.1.c | 19 +++
 gcc/testsuite/gcc.target/nvptx/mptx=6.0.c | 19 +++
 gcc/testsuite/gcc.target/nvptx/mptx=6.3.c | 19 +++
 gcc/testsuite/gcc.target/nvptx/mptx=7.0.c | 19 +++
 gcc/testsuite/gcc.target/nvptx/mptx=_.c   | 19 +++
 gcc/testsuite/gcc.target/nvptx/ptx31.c| 10 --
 gcc/testsuite/gcc.target/nvptx/ptx60.c| 10 --
 gcc/testsuite/gcc.target/nvptx/ptx63.c| 10 --
 gcc/testsuite/gcc.target/nvptx/ptx70.c| 10 --
 9 files changed, 95 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/mptx=3.1.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/mptx=6.0.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/mptx=6.3.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/mptx=7.0.c
 create mode 100644 gcc/testsuite/gcc.target/nvptx/mptx=_.c
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/ptx31.c
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/ptx60.c
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/ptx63.c
 delete mode 100644 gcc/testsuite/gcc.target/nvptx/ptx70.c

diff --git a/gcc/testsuite/gcc.target/nvptx/mptx=3.1.c b/gcc/testsuite/gcc.target/nvptx/mptx=3.1.c
new file mode 100644
index ..30e244761ac7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/mptx=3.1.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble } */
+/* { dg-options {-march=sm_30 -mptx=3.1} } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { scan-assembler-times {(?n)^	\.version	3\.1$} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^	\.target	sm_30$} 1 } } */
+
+#if __PTX_ISA_VERSION_MAJOR__ != 3
+#error wrong value for __PTX_ISA_VERSION_MAJOR__
+#endif
+
+#if __PTX_ISA_VERSION_MINOR__ != 1
+#error wrong value for __PTX_ISA_VERSION_MINOR__
+#endif
+
+#if __PTX_SM__ != 300
+#error wrong value for __PTX_SM__
+#endif
+
+int dummy;
diff --git a/gcc/testsuite/gcc.target/nvptx/mptx=6.0.c b/gcc/testsuite/gcc.target/nvptx/mptx=6.0.c
new file mode 100644
index ..6877438fe653
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/mptx=6.0.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble } */
+/* { dg-options {-march=sm_30 -mptx=6.0} } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { scan-assembler-times {(?n)^	\.version	6\.0$} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^	\.target	sm_30$} 1 } } */
+
+#if __PTX_ISA_VERSION_MAJOR__ != 6
+#error wrong value for __PTX_ISA_VERSION_MAJOR__
+#endif
+
+#if __PTX_ISA_VERSION_MINOR__ != 0
+#error wrong value for __PTX_ISA_VERSION_MINOR__
+#endif
+
+#if __PTX_SM__ != 300
+#error wrong value for __PTX_SM__
+#endif
+
+int dummy;
diff --git a/gcc/testsuite/gcc.target/nvptx/mptx=6.3.c b/gcc/testsuite/gcc.target/nvptx/mptx=6.3.c
new file mode 100644
index ..e997840eabd0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/mptx=6.3.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble } */
+/* { dg-options {-march=sm_30 -mptx=6.3} } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { scan-assembler-times {(?n)^	\.version	6\.3$} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^	\.target	sm_30$} 1 } } */
+
+#if __PTX_ISA_VERSION_MAJOR__ != 6
+#error wrong value for __PTX_ISA_VERSION_MAJOR__
+#endif
+
+#if __PTX_ISA_VERSION_MINOR__ != 3
+#error wrong value for __PTX_ISA_VERSION_MINOR__
+#endif
+
+#if __PTX_SM__ != 300
+#error wrong value for __PTX_SM__
+#endif
+
+int dummy;
diff --git a/gcc/testsuite/gcc.target/nvptx/mptx=7.0.c b/gcc/testsuite/gcc.target/nvptx/mptx=7.0.c
new file mode 100644
index ..c14c03e21866
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/mptx=7.0.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble } */
+/

[patch,avr] Disable CRC lookup tables

2024-12-06 Thread Georg-Johann Lay

This patch disables CRC lookup tables which consume quite some RAM.

Ok for trunk?

Johann

--

AVR: Disable generation of CRC lookup tables.

With -foptimize-crc, large lookup tables may be generated which
are places in .rodata (RAM).  This patch disables such tables.

gcc/
* common/config/avr/avr-common.cc
(avr_option_optimization_table): Default to -fno-optimize-crc.

diff --git a/gcc/common/config/avr/avr-common.cc 
b/gcc/common/config/avr/avr-common.cc

index 9059e7d2b48..7a05e19a8be 100644
--- a/gcc/common/config/avr/avr-common.cc
+++ b/gcc/common/config/avr/avr-common.cc
@@ -32,6 +32,8 @@ static const struct default_options 
avr_option_optimization_table[] =

 // The only effect of -fcaller-saves might be that it triggers
 // a frame without need when it tries to be smart around calls.
 { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 },
+// Avoid large lookup tables in RAM from -foptimize-crc.
+{ OPT_LEVELS_ALL, OPT_foptimize_crc, NULL, 0 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mgas_isr_prologues, NULL, 1 },
 { OPT_LEVELS_1_PLUS, OPT_mmain_is_OS_task, NULL, 1 },
 { OPT_LEVELS_1_PLUS, OPT_mfuse_add_, NULL, 1 },


Re: [PATCH v3] arm: [MVE intrinsics] Fix support for predicate constants [PR target/114801]

2024-12-06 Thread Richard Earnshaw (lists)
On 04/12/2024 20:56, Christophe Lyon wrote:
> On Wed, 4 Dec 2024 at 12:39, Richard Earnshaw (lists)
>  wrote:
>>
>> On 25/11/2024 20:08, Christophe Lyon wrote:
>>> In this PR, we have to handle a case where MVE predicates are supplied
>>> as a const_int, where individual predicates have illegal boolean
>>> values (such as 0xc for a 4-bit boolean predicate).  To avoid the ICE,
>>> fix the constant (any non-zero value is converted to all 1s) and emit
>>> a warning.
>>>
>>> On MVE, V8BI and V4BI multi-bit masks are interpreted byte-by-byte at
>>> instruction level, but end-users should describe lanes rather than
>>> bytes (so all bytes of a true-predicated lane should be '1'), see
>>> https://developer.arm.com/documentation/101028/0012/14--M-profile-Vector-Extension--MVE--intrinsics.
>>>
>>> Since gen_lowpart can ICE on a subreg, we force predicates in a subreg
>>> into a reg, after removing subreg of the same size as the target
>>> (HImode) which would be made redundant by gen_lowpart and confuse the
>>> DLSTP optimization.
>>>
>>> 2024-11-20  Christophe Lyon  
>>>   Jakub Jelinek  
>>>
>>>   PR target/114801
>>>   gcc/
>>>   * config/arm/arm-mve-builtins.cc
>>>   (function_expander::add_input_operand): Handle CONST_INT
>>>   predicates.
>>>
>>>   gcc/testsuite/
>>>   * gcc.target/arm/mve/pr108443.c: Update predicate constant.
>>>   * gcc.target/arm/mve/pr114801.c: New test.
>>> ---
>>>  gcc/config/arm/arm-mve-builtins.cc  | 37 ++-
>>>  gcc/testsuite/gcc.target/arm/mve/pr108443.c |  4 +--
>>>  gcc/testsuite/gcc.target/arm/mve/pr114801.c | 39 +
>>>  3 files changed, 77 insertions(+), 3 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr114801.c
>>>
>>> diff --git a/gcc/config/arm/arm-mve-builtins.cc 
>>> b/gcc/config/arm/arm-mve-builtins.cc
>>> index 255aed25600..5ff32ce06b7 100644
>>> --- a/gcc/config/arm/arm-mve-builtins.cc
>>> +++ b/gcc/config/arm/arm-mve-builtins.cc
>>> @@ -2352,7 +2352,42 @@ function_expander::add_input_operand (insn_code 
>>> icode, rtx x)
>>>mode = GET_MODE (x);
>>>  }
>>>else if (VALID_MVE_PRED_MODE (mode))
>>> -x = gen_lowpart (mode, x);
>>> +{
>>> +  if (CONST_INT_P (x) && (mode == V8BImode || mode == V4BImode))
>>> + {
>>> +   /* In V8BI or V4BI each element has 2 or 4 bits, if those bits 
>>> aren't
>>> +  all the same, gen_lowpart might ICE.  Canonicalize all the 2 or 4
>>> +  bits to all ones if any of them is non-zero.  V8BI and V4BI
>>> +  multi-bit masks are interpreted byte-by-byte at instruction 
>>> level,
>>> +  but such constants should describe lanes, rather than bytes.  See
>>> +  
>>> https://developer.arm.com/documentation/101028/0012/14--M-profile-Vector-Extension--MVE--intrinsics.
>>>   */
>>
>> Apart from being an overly long line, deep links like this are generally not 
>> very stable.  I suggest we just say something like "See the section on MVE 
>> intrinsics in the Arm ACLE specification".
> 
> Right, I was wondering what was the best practice, I think I've seen
> such links recently, not sure where.
> I'll update the comment, and the commit message.
> 
>>
>>> +   unsigned HOST_WIDE_INT xi = UINTVAL (x);
>>> +   xi |= ((xi & 0x) << 1) | ((xi & 0x) >> 1);
>>> +   if (mode == V4BImode)
>>> + xi |= ((xi & 0x) << 2) | ((xi & 0x) >> 2);
>>> +   if (xi != UINTVAL (x))
>>> + inform (location, "constant predicate argument %d (%wx) does"
>>> + " not map to %d lane numbers, converted to %wx",
>>> + opno, UINTVAL (x) & 0x, mode == V8BImode ? 8 : 4,
>>> + xi & 0x);
>>
>> I think this should be a warning (so that werror can work with it).  
>> Otherwise such messages can't be faulted.
> OK, I will change this.
> 
>>
>>> +
>>> +   x = gen_int_mode (xi, HImode);
>>> + }
>>> +  else if (SUBREG_P (x))
>>> + {
>>> +   /* Already of the right size, drop the subreg which will be made
>>> +  redundant by gen_lowpart below.  */
>>> +   if (GET_MODE_SIZE (GET_MODE (x)) == GET_MODE_SIZE (HImode)
>>> +   || SUBREG_BYTE (x) == 0)
>>> + x = SUBREG_REG (x);
>>> +
>>> +   /* gen_lowpart on a SUBREG can ICE.  */
>>> +   if (gen_lowpart_common (mode, x) == 0)
>>> + x = force_reg (GET_MODE (x), x);
>>> + }
>>> +
>>> +  x = gen_lowpart (mode, x);
>>
>> I wonder if this is overly complex.  Wouldn't it be better to just write 
>> here:
>>
>>   else if (!REG_P (x))
>> x = force_reg (GET_MODE (x), x);
>>
>> and then let the optimizers clean things up?
>>
> The !REG_P(x) condition is not right, because force_reg crashes if x
> == (const_int 1) for instance (we have mode=VOID in
> int_mode_for_mode).
> 
> The first 'if' which looks at the mode sizes it to avoid regressions
> in DLSTP transform, despite recent improvements there.
> Without this,

[PUSHED] nvptx: Support '-march=sm_37'

2024-12-06 Thread Thomas Schwinge
gcc/
* config/nvptx/nvptx-sm.def: Add '37'.
* config/nvptx/nvptx-gen.h: Regenerate.
* config/nvptx/nvptx-gen.opt: Likewise.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust.
* config/nvptx/nvptx.opt (-march-map=sm_37, -march-map=sm_50):
Likewise.
* config.gcc: Likewise.
* doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_37'.
* config/nvptx/gen-multilib-matches-tests: Extend.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_37.c: Adjust.
* gcc.target/nvptx/march-map=sm_50.c: Likewise.
* gcc.target/nvptx/march-map=sm_52.c: Likewise.
* gcc.target/nvptx/march=sm_37.c: New.
libgomp/
* testsuite/libgomp.c/declare-variant-3-sm37.c: New.
* testsuite/libgomp.c/declare-variant-3.h: Adjust.
---
 gcc/config.gcc|  2 +-
 gcc/config/nvptx/gen-multilib-matches-tests   | 50 ---
 gcc/config/nvptx/nvptx-gen.h  |  1 +
 gcc/config/nvptx/nvptx-gen.opt|  3 ++
 gcc/config/nvptx/nvptx-sm.def |  1 +
 gcc/config/nvptx/nvptx.cc |  2 +
 gcc/config/nvptx/nvptx.opt|  6 +--
 gcc/doc/invoke.texi   |  2 +-
 .../gcc.target/nvptx/march-map=sm_37.c|  4 +-
 .../gcc.target/nvptx/march-map=sm_50.c|  4 +-
 .../gcc.target/nvptx/march-map=sm_52.c|  4 +-
 gcc/testsuite/gcc.target/nvptx/march=sm_37.c  | 19 +++
 .../libgomp.c/declare-variant-3-sm37.c|  8 +++
 .../testsuite/libgomp.c/declare-variant-3.h   |  8 +++
 14 files changed, 97 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march=sm_37.c
 create mode 100644 libgomp/testsuite/libgomp.c/declare-variant-3-sm37.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6381a5793194..b68ede921ec9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -5642,7 +5642,7 @@ case "${target}" in
for nvptx_multilib in $nvptx_multilibs; do
case $nvptx_multilib in
#TODO 'sm_[...]' list per 'nvptx-sm.def'.
-   sm_30 | sm_35 \
+   sm_30 | sm_35 | sm_37 \
| sm_53 \
| sm_70 | sm_75 \
| sm_80 )
diff --git a/gcc/config/nvptx/gen-multilib-matches-tests 
b/gcc/config/nvptx/gen-multilib-matches-tests
index b93369149465..87045040b11a 100644
--- a/gcc/config/nvptx/gen-multilib-matches-tests
+++ b/gcc/config/nvptx/gen-multilib-matches-tests
@@ -15,6 +15,7 @@ SMOID sm_30
 SMOIL sm_30
 AEMM .=misa?sm_30
 AEMM .=misa?sm_35
+AEMM .=misa?sm_37
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
@@ -26,14 +27,15 @@ SMOID sm_30
 SMOIL sm_30 sm_80
 AEMM .=misa?sm_30
 AEMM .=misa?sm_35
+AEMM .=misa?sm_37
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 CMMC
 
-BEGIN '--with-arch=sm_30', 
'--with-multilib-list=sm_30,sm_35,sm_53,sm_70,sm_75,sm_80'
+BEGIN '--with-arch=sm_30', 
'--with-multilib-list=sm_30,sm_35,sm_37,sm_53,sm_70,sm_75,sm_80'
 SMOID sm_30
-SMOIL sm_30 sm_35 sm_53 sm_70 sm_75 sm_80
+SMOIL sm_30 sm_35 sm_37 sm_53 sm_70 sm_75 sm_80
 AEMM .=misa?sm_30
 CMMC
 
@@ -43,6 +45,7 @@ SMOID sm_35
 SMOIL sm_35
 AEMM .=misa?sm_30
 AEMM .=misa?sm_35
+AEMM .=misa?sm_37
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
@@ -53,6 +56,19 @@ BEGIN '--with-arch=sm_35', '--with-multilib-list=sm_35,sm_30'
 SMOID sm_35
 SMOIL sm_35 sm_30
 AEMM .=misa?sm_35
+AEMM .=misa?sm_37
+AEMM .=misa?sm_53
+AEMM .=misa?sm_70
+AEMM .=misa?sm_75
+AEMM .=misa?sm_80
+CMMC
+
+
+BEGIN '--with-arch=sm_37', '--with-multilib-list=sm_37,sm_30'
+SMOID sm_37
+SMOIL sm_37 sm_30
+AEMM misa?sm_30=misa?sm_35
+AEMM .=misa?sm_37
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
@@ -64,15 +80,27 @@ BEGIN '--with-arch=sm_53', 
'--with-multilib-list=sm_53,sm_30'
 SMOID sm_53
 SMOIL sm_53 sm_30
 AEMM misa?sm_30=misa?sm_35
+AEMM misa?sm_30=misa?sm_37
+AEMM .=misa?sm_53
+AEMM .=misa?sm_70
+AEMM .=misa?sm_75
+AEMM .=misa?sm_80
+CMMC
+
+BEGIN '--with-arch=sm_53', '--with-multilib-list=sm_53,sm_37'
+SMOID sm_53
+SMOIL sm_53 sm_37
+AEMM misa?sm_37=misa?sm_30
+AEMM misa?sm_37=misa?sm_35
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 AEMM .=misa?sm_80
 CMMC
 
-BEGIN '--with-arch=sm_53', 
'--with-multilib-list=sm_53,sm_30,sm_35,sm_70,sm_75,sm_80'
+BEGIN '--with-arch=sm_53', 
'--with-multilib-list=sm_53,sm_30,sm_35,sm_37,sm_70,sm_75,sm_80'
 SMOID sm_53
-SMOIL sm_53 sm_30 sm_35 sm_70 sm_75 sm_80
+SMOIL sm_53 sm_30 sm_35 sm_37 sm_70 sm_75 sm_80
 AEMM .=misa?sm_53
 CMMC
 
@@ -82,6 +110,7 @@ SMOID sm_70
 SMOIL sm_70
 AEMM .=misa?sm_30
 AEMM .=misa?sm_35
+AEMM .=misa?sm_37
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
@@ -92,6 +121,7 @@ BEGIN '--with-arch=sm_70', '--with-multilib-list=sm_70,sm_30'
 SMOID sm_70
 SMOIL sm_70 sm_30
 AEMM misa?sm_30=misa?sm_35
+AEMM misa?sm_30=misa?sm_37
 A

[PUSHED] nvptx: Support '-mptx=4.1'

2024-12-06 Thread Thomas Schwinge
gcc/
* config/nvptx/nvptx-opts.h (enum ptx_version): Add
'PTX_VERSION_4_1'.
* config/nvptx/nvptx.cc (ptx_version_to_string)
(ptx_version_to_number): Adjust.
* config/nvptx/nvptx.h (TARGET_PTX_4_1): New.
* config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue'
'4.1' for 'PTX_VERSION_4_1'.
* doc/invoke.texi (Nvidia PTX Options): Document '-mptx=4.1'.
gcc/testsuite/
* gcc.target/nvptx/mptx=4.1.c: New.
---
 gcc/config/nvptx/nvptx-opts.h |  1 +
 gcc/config/nvptx/nvptx.cc |  4 
 gcc/config/nvptx/nvptx.h  |  1 +
 gcc/config/nvptx/nvptx.opt|  3 +++
 gcc/doc/invoke.texi   |  2 +-
 gcc/testsuite/gcc.target/nvptx/mptx=4.1.c | 19 +++
 6 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/mptx=4.1.c

diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index d0b47f0aeeff..1277f2130896 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -38,6 +38,7 @@ enum ptx_version
   PTX_VERSION_unset,
   PTX_VERSION_default = PTX_VERSION_unset,
   PTX_VERSION_3_1,
+  PTX_VERSION_4_1,
   PTX_VERSION_4_2,
   PTX_VERSION_6_0,
   PTX_VERSION_6_3,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index fb5a45a18e3c..85c6f5d75912 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -252,6 +252,8 @@ ptx_version_to_string (enum ptx_version v)
 {
 case PTX_VERSION_3_1:
   return "3.1";
+case PTX_VERSION_4_1:
+  return "4.1";
 case PTX_VERSION_4_2:
   return "4.2";
 case PTX_VERSION_6_0:
@@ -272,6 +274,8 @@ ptx_version_to_number (enum ptx_version v, bool major_p)
 {
 case PTX_VERSION_3_1:
   return major_p ? 3 : 1;
+case PTX_VERSION_4_1:
+  return major_p ? 4 : 1;
 case PTX_VERSION_4_2:
   return major_p ? 4 : 2;
 case PTX_VERSION_6_0:
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index d9a5e541257d..7b7e172f7878 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -90,6 +90,7 @@
 
 /* There are no 'TARGET_PTX_3_1' and smaller conditionals: our baseline is
PTX ISA Version 3.1.  */
+#define TARGET_PTX_4_1 (ptx_version_option >= PTX_VERSION_4_1)
 #define TARGET_PTX_4_2 (ptx_version_option >= PTX_VERSION_4_2)
 #define TARGET_PTX_6_0 (ptx_version_option >= PTX_VERSION_6_0)
 #define TARGET_PTX_6_3 (ptx_version_option >= PTX_VERSION_6_3)
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 408c88354446..12f96d0885b6 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -127,6 +127,9 @@ Known PTX ISA versions (for use with the -mptx= option):
 EnumValue
 Enum(ptx_version) String(3.1) Value(PTX_VERSION_3_1)
 
+EnumValue
+Enum(ptx_version) String(4.1) Value(PTX_VERSION_4_1)
+
 EnumValue
 Enum(ptx_version) String(4.2) Value(PTX_VERSION_4_2)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a2234725e671..6ec68347967d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -30073,7 +30073,7 @@ capable.  For instance, for @option{-march-map=sm_50} 
select
 Generate code for the specified PTX ISA version.
 Valid version strings are
 @samp{3.1},
-@samp{4.2},
+@samp{4.1}, @samp{4.2},
 @samp{6.0}, @samp{6.3},
 and @samp{7.0}.
 The default PTX ISA version is 6.0, unless a higher
diff --git a/gcc/testsuite/gcc.target/nvptx/mptx=4.1.c 
b/gcc/testsuite/gcc.target/nvptx/mptx=4.1.c
new file mode 100644
index ..57d050c990ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/mptx=4.1.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble } */
+/* { dg-options {-march=sm_30 -mptx=4.1} } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { scan-assembler-times {(?n)^\.version   4\.1$} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^\.targetsm_30$} 1 } } */
+
+#if __PTX_ISA_VERSION_MAJOR__ != 4
+#error wrong value for __PTX_ISA_VERSION_MAJOR__
+#endif
+
+#if __PTX_ISA_VERSION_MINOR__ != 1
+#error wrong value for __PTX_ISA_VERSION_MINOR__
+#endif
+
+#if __PTX_SM__ != 300
+#error wrong value for __PTX_SM__
+#endif
+
+int dummy;
-- 
2.34.1



[PUSHED] nvptx: Support '-mptx=7.8'

2024-12-06 Thread Thomas Schwinge
gcc/
* config/nvptx/nvptx-opts.h (enum ptx_version): Add
'PTX_VERSION_7_8'.
* config/nvptx/nvptx.cc (ptx_version_to_string)
(ptx_version_to_number): Adjust.
* config/nvptx/nvptx.h (TARGET_PTX_7_8): New.
* config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue'
'7.8' for 'PTX_VERSION_7_8'.
* doc/invoke.texi (Nvidia PTX Options): Document '-mptx=7.8'.
gcc/testsuite/
* gcc.target/nvptx/mptx=7.8.c: New.
---
 gcc/config/nvptx/nvptx-opts.h |  3 ++-
 gcc/config/nvptx/nvptx.cc |  4 
 gcc/config/nvptx/nvptx.h  |  1 +
 gcc/config/nvptx/nvptx.opt|  3 +++
 gcc/doc/invoke.texi   |  2 +-
 gcc/testsuite/gcc.target/nvptx/mptx=7.8.c | 19 +++
 6 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/mptx=7.8.c

diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index 1277f2130896..7b55086081ab 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -42,7 +42,8 @@ enum ptx_version
   PTX_VERSION_4_2,
   PTX_VERSION_6_0,
   PTX_VERSION_6_3,
-  PTX_VERSION_7_0
+  PTX_VERSION_7_0,
+  PTX_VERSION_7_8
 };
 
 #endif
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 93402bf516b9..65d339b36ede 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -265,6 +265,8 @@ ptx_version_to_string (enum ptx_version v)
   return "6.3";
 case PTX_VERSION_7_0:
   return "7.0";
+case PTX_VERSION_7_8:
+  return "7.8";
 default:
   gcc_unreachable ();
 }
@@ -287,6 +289,8 @@ ptx_version_to_number (enum ptx_version v, bool major_p)
   return major_p ? 6 : 3;
 case PTX_VERSION_7_0:
   return major_p ? 7 : 0;
+case PTX_VERSION_7_8:
+  return major_p ? 7 : 8;
 default:
   gcc_unreachable ();
 }
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 7b7e172f7878..5d914c45e090 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -95,6 +95,7 @@
 #define TARGET_PTX_6_0 (ptx_version_option >= PTX_VERSION_6_0)
 #define TARGET_PTX_6_3 (ptx_version_option >= PTX_VERSION_6_3)
 #define TARGET_PTX_7_0 (ptx_version_option >= PTX_VERSION_7_0)
+#define TARGET_PTX_7_8 (ptx_version_option >= PTX_VERSION_7_8)
 
 /* Registers.  Since ptx is a virtual target, we just define a few
hard registers for special purposes and leave pseudos unallocated.
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 4d14eda76991..842cbbbedeee 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -142,6 +142,9 @@ Enum(ptx_version) String(6.3) Value(PTX_VERSION_6_3)
 EnumValue
 Enum(ptx_version) String(7.0) Value(PTX_VERSION_7_0)
 
+EnumValue
+Enum(ptx_version) String(7.8) Value(PTX_VERSION_7_8)
+
 EnumValue
 Enum(ptx_version) String(_) Value(PTX_VERSION_default)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 92728a7f6ce3..8271e6474817 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -30075,7 +30075,7 @@ Valid version strings are
 @samp{3.1},
 @samp{4.1}, @samp{4.2},
 @samp{6.0}, @samp{6.3},
-and @samp{7.0}.
+@samp{7.0}, and @samp{7.8}.
 The default PTX ISA version is 6.0, unless a higher
 version is required for specified PTX ISA target architecture via
 option @option{-march=}.
diff --git a/gcc/testsuite/gcc.target/nvptx/mptx=7.8.c 
b/gcc/testsuite/gcc.target/nvptx/mptx=7.8.c
new file mode 100644
index ..d80bdbaa83a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/mptx=7.8.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble } */
+/* { dg-options {-march=sm_30 -mptx=7.8} } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { scan-assembler-times {(?n)^\.version   7\.8$} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^\.targetsm_30$} 1 } } */
+
+#if __PTX_ISA_VERSION_MAJOR__ != 7
+#error wrong value for __PTX_ISA_VERSION_MAJOR__
+#endif
+
+#if __PTX_ISA_VERSION_MINOR__ != 8
+#error wrong value for __PTX_ISA_VERSION_MINOR__
+#endif
+
+#if __PTX_SM__ != 300
+#error wrong value for __PTX_SM__
+#endif
+
+int dummy;
-- 
2.34.1



[PUSHED] nvptx: Support '-march=sm_89'

2024-12-06 Thread Thomas Schwinge
gcc/
* config/nvptx/nvptx-sm.def: Add '89'.
* config/nvptx/nvptx-gen.h: Regenerate.
* config/nvptx/nvptx-gen.opt: Likewise.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust.
* config/nvptx/nvptx.opt (-march-map=sm_89, -march-map=sm_90)
(march-map=sm_90a): Likewise.
* config.gcc: Likewise.
* doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_89'.
* config/nvptx/gen-multilib-matches-tests: Extend.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_89.c: Adjust.
* gcc.target/nvptx/march-map=sm_90.c: Likewise.
* gcc.target/nvptx/march-map=sm_90a.c: Likewise.
* gcc.target/nvptx/march=sm_89.c: New.
libgomp/
* testsuite/libgomp.c/declare-variant-3-sm89.c: New.
* testsuite/libgomp.c/declare-variant-3.h: Adjust.
---
 gcc/config.gcc|  2 +-
 gcc/config/nvptx/gen-multilib-matches-tests   | 65 ---
 gcc/config/nvptx/nvptx-gen.h  |  1 +
 gcc/config/nvptx/nvptx-gen.opt|  3 +
 gcc/config/nvptx/nvptx-sm.def |  3 +-
 gcc/config/nvptx/nvptx.cc |  2 +
 gcc/config/nvptx/nvptx.opt|  6 +-
 gcc/doc/invoke.texi   |  2 +-
 .../gcc.target/nvptx/march-map=sm_89.c|  8 +--
 .../gcc.target/nvptx/march-map=sm_90.c|  8 +--
 .../gcc.target/nvptx/march-map=sm_90a.c   |  8 +--
 gcc/testsuite/gcc.target/nvptx/march=sm_89.c  | 19 ++
 .../libgomp.c/declare-variant-3-sm89.c|  8 +++
 .../testsuite/libgomp.c/declare-variant-3.h   |  8 +++
 14 files changed, 116 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march=sm_89.c
 create mode 100644 libgomp/testsuite/libgomp.c/declare-variant-3-sm89.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 21f3dcd9d009..a2d21b5f3436 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -5645,7 +5645,7 @@ case "${target}" in
sm_30 | sm_35 | sm_37 \
| sm_52 | sm_53 \
| sm_70 | sm_75 \
-   | sm_80 )
+   | sm_80 | sm_89 )
TM_MULTILIB_CONFIG="$TM_MULTILIB_CONFIG 
$nvptx_multilib"
;;
$with_arch )
diff --git a/gcc/config/nvptx/gen-multilib-matches-tests 
b/gcc/config/nvptx/gen-multilib-matches-tests
index 13b1c5b9d018..a07f19adbdb1 100644
--- a/gcc/config/nvptx/gen-multilib-matches-tests
+++ b/gcc/config/nvptx/gen-multilib-matches-tests
@@ -21,11 +21,12 @@ AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 AEMM .=misa?sm_80
+AEMM .=misa?sm_89
 CMMC
 
-BEGIN '--with-arch=sm_30', '--with-multilib-list=sm_30,sm_80'
+BEGIN '--with-arch=sm_30', '--with-multilib-list=sm_30,sm_89'
 SMOID sm_30
-SMOIL sm_30 sm_80
+SMOIL sm_30 sm_89
 AEMM .=misa?sm_30
 AEMM .=misa?sm_35
 AEMM .=misa?sm_37
@@ -33,11 +34,12 @@ AEMM .=misa?sm_52
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
+AEMM .=misa?sm_80
 CMMC
 
-BEGIN '--with-arch=sm_30', 
'--with-multilib-list=sm_30,sm_35,sm_37,sm_52,sm_53,sm_70,sm_75,sm_80'
+BEGIN '--with-arch=sm_30', 
'--with-multilib-list=sm_30,sm_35,sm_37,sm_52,sm_53,sm_70,sm_75,sm_80,sm_89'
 SMOID sm_30
-SMOIL sm_30 sm_35 sm_37 sm_52 sm_53 sm_70 sm_75 sm_80
+SMOIL sm_30 sm_35 sm_37 sm_52 sm_53 sm_70 sm_75 sm_80 sm_89
 AEMM .=misa?sm_30
 CMMC
 
@@ -53,6 +55,7 @@ AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 AEMM .=misa?sm_80
+AEMM .=misa?sm_89
 CMMC
 
 BEGIN '--with-arch=sm_35', '--with-multilib-list=sm_35,sm_30'
@@ -65,6 +68,7 @@ AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 AEMM .=misa?sm_80
+AEMM .=misa?sm_89
 CMMC
 
 
@@ -78,6 +82,7 @@ AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 AEMM .=misa?sm_80
+AEMM .=misa?sm_89
 CMMC
 
 
@@ -90,6 +95,7 @@ AEMM .=misa?sm_52
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM misa?sm_75=misa?sm_80
+AEMM misa?sm_75=misa?sm_89
 CMMC
 
 
@@ -103,6 +109,7 @@ AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 AEMM .=misa?sm_80
+AEMM .=misa?sm_89
 CMMC
 
 BEGIN '--with-arch=sm_53', '--with-multilib-list=sm_53,sm_37'
@@ -115,11 +122,12 @@ AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 AEMM .=misa?sm_80
+AEMM .=misa?sm_89
 CMMC
 
-BEGIN '--with-arch=sm_53', 
'--with-multilib-list=sm_53=sm_30,sm_35,sm_37,sm_52,sm_70,sm_75,sm_80'
+BEGIN '--with-arch=sm_53', 
'--with-multilib-list=sm_53=sm_30,sm_35,sm_37,sm_52,sm_70,sm_75,sm_80,sm_89'
 SMOID sm_53
-SMOIL sm_53 sm_30 sm_35 sm_37 sm_52 sm_70 sm_75 sm_80
+SMOIL sm_53 sm_30 sm_35 sm_37 sm_52 sm_70 sm_75 sm_80 sm_89
 AEMM .=misa?sm_53
 CMMC
 
@@ -135,6 +143,7 @@ AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 AEMM .=misa?sm_80
+AEMM .=misa?sm_89
 CMMC
 
 BEGIN '--with-arch=sm_70', '--with-multilib-list=sm_70,sm_30'
@@ -147,6 +156,7 @@ AEMM misa?sm_30=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 

Re: [PATCH] avoid-store-forwarding: Reject changes when an instruction may throw [PR117816]

2024-12-06 Thread Philipp Tomsich
Applied to master. Thanks!
--Philipp.


On Fri, 6 Dec 2024 at 06:03, Jeff Law  wrote:
>
>
>
> On 12/5/24 6:18 AM, Konstantinos Eleftheriou wrote:
> > From: kelefth 
> >
> > Avoid-store-forwarding doesn't handle the case where an instruction in the
> > store-load sequence contains a REG_EH_REGION note, leading to the insertion
> > of instructions after it, while it should be the last instruction in the
> > basic block. This causes an ICE when compiling using `-O 
> > -fnon-call-exceptions
> > -favoid-store-forwarding -fno-forward-propagate -finstrument-functions`.
> >
> > This patch rejects the transformation when there are instructions in the
> > sequence that may throw an exeption.
> >
> >   PR 117816
> >
> > gcc/ChangeLog:
> >
> >   * avoid-store-forwarding.cc
> >   (store_forwarding_analyzer::avoid_store_forwarding): Reject the
> >   transformation when having instructions that may throw exceptions
> >   in the sequence.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/pr117816.c: New test.
> I didn't see any note about testing, so I went ahead and
> bootstrap/regression tested this on x86, which passed with no issues.
>
> OK for the trunk.
>
> jeff
>


Re: [PATCH] AIX Build failure with default -std=gnu23.

2024-12-06 Thread swamy sangamesh
Dear Community,

Please let me know if the attached patch is fine.

Thanks,
Sangamesh

On Tue, Dec 3, 2024 at 11:19 PM swamy sangamesh 
wrote:

> Hi Eric,
>
> Thanks for the review.
>
> I too think removing the define is a better approach and seems these won't
> be needed.
> From the comment it looks like that these were added long back and
> conflicting declarations were their until C23 standard uncovered it.
>
> If removing define is fine then i can send a final patch.
>
> Thanks,
> Sangamesh
>
>
>
>
>
> On Tue, Dec 3, 2024 at 9:11 AM Eric Gallager  wrote:
>
>> On Mon, Dec 2, 2024 at 1:01 PM swamy sangamesh
>>  wrote:
>> >
>> > Dear Community,
>> >
>> > Please let me know your comment.
>> > Or is it more appropriate to have changes with header guard like this ?
>> >
>>
>> I personally think it's better to just remove the define, but if
>> you're going to leave it in and guard it with a macro instead, I'd use
>> something a bit more specific than just "_AIX".
>>
>> > --- a/libiberty/getopt.c
>> > +++ b/libiberty/getopt.c
>> > @@ -25,9 +25,11 @@
>> >  ^L
>> >  /* This tells Alpha OSF/1 not to define a getopt prototype in
>> .
>> > Ditto for AIX 3.2 and .  */
>> > +#ifndef _AIX
>> >  #ifndef _NO_PROTO
>> >  # define _NO_PROTO
>> >  #endif
>> > +#endif
>> >
>> >  #ifdef HAVE_CONFIG_H
>> >  # include 
>> >
>> >
>> > Thanks,
>> > Sangamesh
>> >
>> >
>> > On Thu, Nov 28, 2024 at 11:09 AM Sangamesh Mallayya <
>> swamy.sangam...@gmail.com> wrote:
>> >>
>> >>  libiberty/getopt.c file is defining _NO_PROTO which causes conflicting
>> >>  declarations for the functions in AIX header files like stdio.h &
>> stdlib.h.
>> >>  These declarations are being considered as errors in C23 which wasn't
>> >>  the case with C17.
>> >>
>> >> Here is the error we get.
>> >>
>> >> /gcc_build/./prev-gcc/xgcc -B/gcc_build/./prev-gcc/
>> -B/home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/bin/ -B/home/sangam
>> >> /install/GCC/powerpc-ibm-aix7.3.3.0/bin/
>> -B/home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/lib/ -isystem
>> /home/sangam/ins
>> >> tall/GCC/powerpc-ibm-aix7.3.3.0/include -isystem
>> /home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/sys-include   -fno-check
>> >> ing -c -DHAVE_CONFIG_H -g -O2 -fno-checking  -I.
>> -I/opt/freeware/src/packages/BUILD/gcc/libiberty/../include  -W -Wall -W
>> >> write-strings -Wc++-compat -Wstrict-prototypes -Wshadow=local
>> -pedantic  -D_GNU_SOURCE  /opt/freeware/src/packages/BUILD/
>> >> gcc/libiberty/getopt.c -o getopt.o
>> >>
>> >>
>> >> In file included from
>> /opt/freeware/src/packages/BUILD/gcc/libiberty/getopt.c:45:
>> >> /gcc_build/prev-gcc/include-fixed/stdio.h:593:12: error: conflicting
>> types for 'fgetpos64'; have 'int(FILE *, fpos64_t *)
>> >> ' {aka 'int(FILE *, long long int *)'}
>> >>   593 | extern int fgetpos64(FILE *, fpos64_t *);
>> >>   |^
>> >> /gcc_build/prev-gcc/include-fixed/stdio.h:298:17: note: previous
>> declaration of 'fgetpos64' with type 'int(void)'
>> >>   298 | extern int  fgetpos();
>> >>   | ^~~
>> >> /gcc_build/prev-gcc/include-fixed/stdio.h:594:14: error: conflicting
>> types for 'fopen64'; have 'FILE *(const char *, cons
>> >> t char *)'
>> >>   594 | extern FILE *fopen64(const char *, const char *);
>> >>   |  ^~~
>> >>
>> >> /gcc_build/prev-gcc/include-fixed/stdio.h:259:17: note: previous
>> declaration of 'fopen64' with type 'FILE *(void)'
>> >>   259 | extern FILE *   fopen();
>> >>   | ^
>> >> /gcc_build/prev-gcc/include-fixed/stdio.h:595:14: error: conflicting
>> types for 'freopen64'; have 'FILE *(const char *, co
>> >> nst char *, FILE *)'
>> >>   595 | extern FILE *freopen64(const char *, const char *, FILE *);
>> >>   |  ^
>> >> /gcc_build/prev-gcc/include-fixed/stdio.h:260:17: note: previous
>> declaration of 'freopen64' with type 'FILE *(void)'
>> >>   260 | extern FILE *   freopen();
>> >>   | ^~~
>> >> /gcc_build/prev-gcc/include-fixed/stdio.h:597:12: error: conflicting
>> types for 'fsetpos64'; have 'int(FILE *, const fpos6
>> >> 4_t *)' {aka 'int(FILE *, const long long int *)'}
>> >>   597 | extern int fsetpos64(FILE *, const fpos64_t *);
>> >>   |^
>> >> /gcc_build/prev-gcc/include-fixed/stdio.h:300:17: note: previous
>> declaration of 'fsetpos64' with type 'int(void)'
>> >>   300 | extern int  fsetpos();
>> >>   | ^~~
>> >> In file included from
>> /opt/freeware/src/packages/BUILD/gcc/libiberty/getopt.c:216:
>> >> /gcc_build/prev-gcc/include-fixed/stdlib.h: In function 'strtold':
>> >> /gcc_build/prev-gcc/include-fixed/stdlib.h:233:30: error: too many
>> arguments to function 'strtod'
>> >>
>> >>
>> >> Compiled with this patch on RHEL8.10 ppc64le as well.
>> >>
>> >> ---
>> >>  libiberty/getopt.c | 6 --
>> >>  1 file changed, 6 deletions(-)
>> >>
>> >> diff --git a/libiberty/getopt.c b/libiberty/getopt.c
>> >> index 2f7086cc0c8..48736

Re: [PATCH] AIX Build failure with default -std=gnu23.

2024-12-06 Thread Sam James
swamy sangamesh  writes:

> Dear Community,
>
> Please let me know if the attached patch is fine.

For such patches, I recommend CCing the maintainers of relevant
components. In this case, that's David Edelsohn, being the AIX
maintainer (done it for you here).

I can't approve it but I imagine the patch is fine given GCC dropped
support for such old AIX a long time ago.

>
> Thanks,
> Sangamesh
>
> On Tue, Dec 3, 2024 at 11:19 PM swamy sangamesh  
> wrote:
>
>  Hi Eric,
>
>  Thanks for the review.
>
>  I too think removing the define is a better approach and seems these won't 
> be needed.
>  From the comment it looks like that these were added long back and 
> conflicting declarations were their until C23
>  standard uncovered it.
>
>  If removing define is fine then i can send a final patch.
>
>  Thanks,
>  Sangamesh  
>
>  On Tue, Dec 3, 2024 at 9:11 AM Eric Gallager  wrote:
>
>  On Mon, Dec 2, 2024 at 1:01 PM swamy sangamesh
>   wrote:
>  >
>  > Dear Community,
>  >
>  > Please let me know your comment.
>  > Or is it more appropriate to have changes with header guard like this ?
>  >
>
>  I personally think it's better to just remove the define, but if
>  you're going to leave it in and guard it with a macro instead, I'd use
>  something a bit more specific than just "_AIX".
>
>  > --- a/libiberty/getopt.c
>  > +++ b/libiberty/getopt.c
>  > @@ -25,9 +25,11 @@
>  >  ^L
>  >  /* This tells Alpha OSF/1 not to define a getopt prototype in .
>  > Ditto for AIX 3.2 and .  */
>  > +#ifndef _AIX
>  >  #ifndef _NO_PROTO
>  >  # define _NO_PROTO
>  >  #endif
>  > +#endif
>  >
>  >  #ifdef HAVE_CONFIG_H
>  >  # include 
>  >
>  >
>  > Thanks,
>  > Sangamesh
>  >
>  >
>  > On Thu, Nov 28, 2024 at 11:09 AM Sangamesh Mallayya 
>  wrote:
>  >>
>  >>  libiberty/getopt.c file is defining _NO_PROTO which causes conflicting
>  >>  declarations for the functions in AIX header files like stdio.h & 
> stdlib.h.
>  >>  These declarations are being considered as errors in C23 which wasn't
>  >>  the case with C17.
>  >>
>  >> Here is the error we get.
>  >>
>  >> /gcc_build/./prev-gcc/xgcc -B/gcc_build/./prev-gcc/ 
> -B/home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/bin/ -
>  B/home/sangam
>  >> /install/GCC/powerpc-ibm-aix7.3.3.0/bin/ 
> -B/home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/lib/ -isystem
>  /home/sangam/ins
>  >> tall/GCC/powerpc-ibm-aix7.3.3.0/include -isystem 
> /home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/sys-include  
>  -fno-check
>  >> ing -c -DHAVE_CONFIG_H -g -O2 -fno-checking  -I. 
> -I/opt/freeware/src/packages/BUILD/gcc/libiberty/../include  -
>  W -Wall -W
>  >> write-strings -Wc++-compat -Wstrict-prototypes -Wshadow=local -pedantic  
> -D_GNU_SOURCE 
>  /opt/freeware/src/packages/BUILD/
>  >> gcc/libiberty/getopt.c -o getopt.o
>  >>
>  >>
>  >> In file included from 
> /opt/freeware/src/packages/BUILD/gcc/libiberty/getopt.c:45:
>  >> /gcc_build/prev-gcc/include-fixed/stdio.h:593:12: error: conflicting 
> types for 'fgetpos64'; have 'int(FILE *,
>  fpos64_t *)
>  >> ' {aka 'int(FILE *, long long int *)'}
>  >>   593 | extern int fgetpos64(FILE *, fpos64_t *);
>  >>   |^
>  >> /gcc_build/prev-gcc/include-fixed/stdio.h:298:17: note: previous 
> declaration of 'fgetpos64' with type 'int
>  (void)'
>  >>   298 | extern int  fgetpos();
>  >>   | ^~~
>  >> /gcc_build/prev-gcc/include-fixed/stdio.h:594:14: error: conflicting 
> types for 'fopen64'; have 'FILE *(const
>  char *, cons
>  >> t char *)'
>  >>   594 | extern FILE *fopen64(const char *, const char *);
>  >>   |  ^~~
>  >>
>  >> /gcc_build/prev-gcc/include-fixed/stdio.h:259:17: note: previous 
> declaration of 'fopen64' with type 'FILE *
>  (void)'
>  >>   259 | extern FILE *   fopen();
>  >>   | ^
>  >> /gcc_build/prev-gcc/include-fixed/stdio.h:595:14: error: conflicting 
> types for 'freopen64'; have 'FILE *(const
>  char *, co
>  >> nst char *, FILE *)'
>  >>   595 | extern FILE *freopen64(const char *, const char *, FILE *);
>  >>   |  ^
>  >> /gcc_build/prev-gcc/include-fixed/stdio.h:260:17: note: previous 
> declaration of 'freopen64' with type 'FILE *
>  (void)'
>  >>   260 | extern FILE *   freopen();
>  >>   | ^~~
>  >> /gcc_build/prev-gcc/include-fixed/stdio.h:597:12: error: conflicting 
> types for 'fsetpos64'; have 'int(FILE *,
>  const fpos6
>  >> 4_t *)' {aka 'int(FILE *, const long long int *)'}
>  >>   597 | extern int fsetpos64(FILE *, const fpos64_t *);
>  >>   |^
>  >> /gcc_build/prev-gcc/include-fixed/stdio.h:300:17: note: previous 
> declaration of 'fsetpos64' with type 'int
>  (void)'
>  >>   300 | extern int  fsetpos();
>  >>   | ^~~
>  >> In file included from 
> /opt/freeware/src/packages/BUILD/gcc/libiberty/getopt.c:216:
>  >> /gcc_build/prev-gcc/include-fixed/stdlib.h: In function 'strtold':
>  >> /gcc_build/prev-gcc/incl

Re: [patch,avr] Disable CRC lookup tables

2024-12-06 Thread Sam James
Georg-Johann Lay  writes:

> This patch disables CRC lookup tables which consume quite some RAM.

Given that -foptimize-crc is new, it may be useful to CC the pass
authors in case they have input.

>
> Ok for trunk?
>
> Johann


[PUSHED] nvptx: Support '-march=sm_52'

2024-12-06 Thread Thomas Schwinge
gcc/
* config/nvptx/nvptx-sm.def: Add '52'.
* config/nvptx/nvptx-gen.h: Regenerate.
* config/nvptx/nvptx-gen.opt: Likewise.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust.
* config/nvptx/nvptx.opt (-march-map=sm_52): Likewise.
* config.gcc: Likewise.
* doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_52'.
* config/nvptx/gen-multilib-matches-tests: Extend.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_52.c: Adjust.
* gcc.target/nvptx/march=sm_52.c: New.
libgomp/
* testsuite/libgomp.c/declare-variant-3-sm52.c: New.
* testsuite/libgomp.c/declare-variant-3.h: Adjust.
---
 gcc/config.gcc|  2 +-
 gcc/config/nvptx/gen-multilib-matches-tests   | 41 ---
 gcc/config/nvptx/nvptx-gen.h  |  1 +
 gcc/config/nvptx/nvptx-gen.opt|  3 ++
 gcc/config/nvptx/nvptx-sm.def |  1 +
 gcc/config/nvptx/nvptx.cc |  1 +
 gcc/config/nvptx/nvptx.opt|  2 +-
 gcc/doc/invoke.texi   |  2 +-
 .../gcc.target/nvptx/march-map=sm_52.c|  4 +-
 gcc/testsuite/gcc.target/nvptx/march=sm_52.c  | 19 +
 .../libgomp.c/declare-variant-3-sm52.c|  8 
 .../testsuite/libgomp.c/declare-variant-3.h   |  8 
 12 files changed, 81 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/march=sm_52.c
 create mode 100644 libgomp/testsuite/libgomp.c/declare-variant-3-sm52.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b68ede921ec9..21f3dcd9d009 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -5643,7 +5643,7 @@ case "${target}" in
case $nvptx_multilib in
#TODO 'sm_[...]' list per 'nvptx-sm.def'.
sm_30 | sm_35 | sm_37 \
-   | sm_53 \
+   | sm_52 | sm_53 \
| sm_70 | sm_75 \
| sm_80 )
TM_MULTILIB_CONFIG="$TM_MULTILIB_CONFIG 
$nvptx_multilib"
diff --git a/gcc/config/nvptx/gen-multilib-matches-tests 
b/gcc/config/nvptx/gen-multilib-matches-tests
index 87045040b11a..13b1c5b9d018 100644
--- a/gcc/config/nvptx/gen-multilib-matches-tests
+++ b/gcc/config/nvptx/gen-multilib-matches-tests
@@ -16,6 +16,7 @@ SMOIL sm_30
 AEMM .=misa?sm_30
 AEMM .=misa?sm_35
 AEMM .=misa?sm_37
+AEMM .=misa?sm_52
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
@@ -28,14 +29,15 @@ SMOIL sm_30 sm_80
 AEMM .=misa?sm_30
 AEMM .=misa?sm_35
 AEMM .=misa?sm_37
+AEMM .=misa?sm_52
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 CMMC
 
-BEGIN '--with-arch=sm_30', 
'--with-multilib-list=sm_30,sm_35,sm_37,sm_53,sm_70,sm_75,sm_80'
+BEGIN '--with-arch=sm_30', 
'--with-multilib-list=sm_30,sm_35,sm_37,sm_52,sm_53,sm_70,sm_75,sm_80'
 SMOID sm_30
-SMOIL sm_30 sm_35 sm_37 sm_53 sm_70 sm_75 sm_80
+SMOIL sm_30 sm_35 sm_37 sm_52 sm_53 sm_70 sm_75 sm_80
 AEMM .=misa?sm_30
 CMMC
 
@@ -46,6 +48,7 @@ SMOIL sm_35
 AEMM .=misa?sm_30
 AEMM .=misa?sm_35
 AEMM .=misa?sm_37
+AEMM .=misa?sm_52
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
@@ -57,6 +60,7 @@ SMOID sm_35
 SMOIL sm_35 sm_30
 AEMM .=misa?sm_35
 AEMM .=misa?sm_37
+AEMM .=misa?sm_52
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
@@ -69,6 +73,7 @@ SMOID sm_37
 SMOIL sm_37 sm_30
 AEMM misa?sm_30=misa?sm_35
 AEMM .=misa?sm_37
+AEMM .=misa?sm_52
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
@@ -76,11 +81,24 @@ AEMM .=misa?sm_80
 CMMC
 
 
+BEGIN '--with-arch=sm_52', '--with-multilib-list=sm_52,sm_75,sm_35'
+SMOID sm_52
+SMOIL sm_52 sm_75 sm_35
+AEMM misa?sm_35=misa?sm_30
+AEMM misa?sm_35=misa?sm_37
+AEMM .=misa?sm_52
+AEMM .=misa?sm_53
+AEMM .=misa?sm_70
+AEMM misa?sm_75=misa?sm_80
+CMMC
+
+
 BEGIN '--with-arch=sm_53', '--with-multilib-list=sm_53,sm_30'
 SMOID sm_53
 SMOIL sm_53 sm_30
 AEMM misa?sm_30=misa?sm_35
 AEMM misa?sm_30=misa?sm_37
+AEMM misa?sm_30=misa?sm_52
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
@@ -92,15 +110,16 @@ SMOID sm_53
 SMOIL sm_53 sm_37
 AEMM misa?sm_37=misa?sm_30
 AEMM misa?sm_37=misa?sm_35
+AEMM misa?sm_37=misa?sm_52
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
 AEMM .=misa?sm_80
 CMMC
 
-BEGIN '--with-arch=sm_53', 
'--with-multilib-list=sm_53,sm_30,sm_35,sm_37,sm_70,sm_75,sm_80'
+BEGIN '--with-arch=sm_53', 
'--with-multilib-list=sm_53=sm_30,sm_35,sm_37,sm_52,sm_70,sm_75,sm_80'
 SMOID sm_53
-SMOIL sm_53 sm_30 sm_35 sm_37 sm_70 sm_75 sm_80
+SMOIL sm_53 sm_30 sm_35 sm_37 sm_52 sm_70 sm_75 sm_80
 AEMM .=misa?sm_53
 CMMC
 
@@ -111,6 +130,7 @@ SMOIL sm_70
 AEMM .=misa?sm_30
 AEMM .=misa?sm_35
 AEMM .=misa?sm_37
+AEMM .=misa?sm_52
 AEMM .=misa?sm_53
 AEMM .=misa?sm_70
 AEMM .=misa?sm_75
@@ -122,6 +142,7 @@ SMOID sm_70
 SMOIL sm_70 sm_30
 AEMM misa?sm_30=misa?sm_35
 AEMM misa?sm_30=misa?sm_37
+AEMM misa?sm_30=misa?sm_52
 AEM

Re: [patch,lra] PR116778 we need a full live range info after rematerialization

2024-12-06 Thread Sam James
Denis Chertykov  writes:

> The fix for PR116778:
>

Added Vlad to CC.

> [...]
>
> diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
> index 49134ade713..510f7d927ab 100644
> --- a/gcc/lra-lives.cc
> +++ b/gcc/lra-lives.cc
> @@ -62,9 +62,10 @@ int lra_hard_reg_usage[FIRST_PSEUDO_REGISTER];
>  /* A global flag whose true value says to build live ranges for all
> pseudos, otherwise the live ranges only for pseudos got memory is
> build.  True value means also building copies and setting up hard
> -   register preferences.  The complete info is necessary only for the
> -   assignment pass.  The complete info is not needed for the
> -   coalescing and spill passes.   */
> +   register preferences.  The complete info is necessary for
> +   assignment, rematerialization and spill to register passes.  The
> +   complete info is not needed for the coalescing and spill to memory
> +   passes.  */
>  static bool complete_info_p;
>/* Pseudos live at current point in the RTL scan.  */
> diff --git a/gcc/lra.cc b/gcc/lra.cc
> index bc46f56cf20..a38df0e9b7a 100644
> --- a/gcc/lra.cc
> +++ b/gcc/lra.cc
> @@ -2552,7 +2552,7 @@ lra (FILE *f, int verbose)
>if (lra_remat ())
>   {
> /* We need full live info -- see the comment above.  */
> -   lra_create_live_ranges (lra_reg_spill_p, true);
> +   lra_create_live_ranges (true, true);
> live_p = true;
> if (! lra_need_for_spills_p ())
>   {


Re: [PATCH] Fix incorrect line numbers in large files bug#108900

2024-12-06 Thread Sam James
Jeremy Bettis  writes:

> Patch to fix known bug from
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108900
>
> diff -ur gcc-clean/gcc-14.2.0/libcpp/files.cc gcc-14.2.0/libcpp/files.cc
> --- gcc-clean/gcc-14.2.0/libcpp/files.cc 2024-08-01 08:17:17.0 +
> +++ gcc-14.2.0/libcpp/files.cc 2024-10-18 18:42:42.293245597 +

Please ideally use git-send-email and see
https://gcc.gnu.org/contribute.html#patches wrt ChangeLog format and so on.

> @@ -1005,6 +1005,11 @@
>  && type < IT_DIRECTIVE_HWM
>  && (pfile->line_table->highest_location
>   != LINE_MAP_MAX_LOCATION - 1));
> +  if (decrement && LINEMAPS_ORDINARY_USED (pfile->line_table)) {
> +const line_map_ordinary *map = LINEMAPS_LAST_ORDINARY_MAP
> (pfile->line_table);
> +if (map && map->start_location == pfile->line_table->highest_location)
> +  decrement = false;
> +  }
>if (decrement)
>  pfile->line_table->highest_location--;

Note that I suspect this may be fixed by the 64-bit location_t work that
is ongoing for trunk but it may still be desirable for 14 anyway.


Re: [PATCH v6 1/7] Honor TARGET_PROMOTE_PROTOTYPES during RTL expand

2024-12-06 Thread Richard Biener
On Wed, Dec 4, 2024 at 9:48 PM H.J. Lu  wrote:
>
> Promote integer arguments smaller than int if TARGET_PROMOTE_PROTOTYPES
> returns true.

This is OK when 2/7 got no negative comments and Jeff doesn't have
further input here.

(I think 1/7+2/7 are a good improvement on their own given the latent bugfixing)

Also CCed Eric who might also know calls.c quite well.

Richard.

> PR middle-end/112877
> * calls.c (initialize_argument_information): Promote small integer
> arguments if TARGET_PROMOTE_PROTOTYPES returns true.
>
> gcc/testsuite/
>
> PR middle-end/112877
> * gfortran.dg/pr112877-1.f90: New test.
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/calls.cc |  9 +
>  gcc/testsuite/gfortran.dg/pr112877-1.f90 | 17 +
>  2 files changed, 26 insertions(+)
>  create mode 100644 gcc/testsuite/gfortran.dg/pr112877-1.f90
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 8cf0f29b42c..78ead6fd4ed 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -1374,6 +1374,11 @@ initialize_argument_information (int num_actuals 
> ATTRIBUTE_UNUSED,
>}
>}
>
> +  bool promote_p
> += targetm.calls.promote_prototypes (fndecl
> +   ? TREE_TYPE (fndecl)
> +   : fntype);
> +
>/* I counts args in order (to be) pushed; ARGPOS counts in order written.  
> */
>for (argpos = 0; argpos < num_actuals; i--, argpos++)
>  {
> @@ -1383,6 +1388,10 @@ initialize_argument_information (int num_actuals 
> ATTRIBUTE_UNUSED,
>/* Replace erroneous argument with constant zero.  */
>if (type == error_mark_node || !COMPLETE_TYPE_P (type))
> args[i].tree_value = integer_zero_node, type = integer_type_node;
> +  else if (promote_p
> +  && INTEGRAL_TYPE_P (type)
> +  && TYPE_PRECISION (type) < TYPE_PRECISION (integer_type_node))
> +   type = integer_type_node;
>
>/* If TYPE is a transparent union or record, pass things the way
>  we would pass the first field of the union or record.  We have
> diff --git a/gcc/testsuite/gfortran.dg/pr112877-1.f90 
> b/gcc/testsuite/gfortran.dg/pr112877-1.f90
> new file mode 100644
> index 000..f5596f0d0ad
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/pr112877-1.f90
> @@ -0,0 +1,17 @@
> +! { dg-do compile }
> +! { dg-options "-Os" }
> +
> +program test
> +use iso_c_binding, only: c_short
> +interface
> +  subroutine foo(a) bind(c)
> +import c_short
> +integer(kind=c_short), intent(in), value :: a
> +  end subroutine foo
> +end interface
> +integer(kind=c_short) a(5);
> +call foo (a(3))
> +end
> +
> +! { dg-final { scan-assembler "movswl\t10\\(%rsp\\), %edi" { target { { 
> i?86-*-linux* i?86-*-gnu* x86_64-*-linux* x86_64-*-gnu* } && { ! ia32 } } } } 
> }
> +! { dg-final { scan-assembler "movswl\t-14\\(%ebp\\), %eax" { target { { 
> i?86-*-linux* i?86-*-gnu* x86_64-*-linux* x86_64-*-gnu* } && { ia32 } } } } }
> --
> 2.47.1
>


Re: [PATCH 3/3] dwarf: lto: Stabilize external die references.

2024-12-06 Thread Michal Jireš




On 11/27/24 3:12 PM, Richard Biener wrote:

I wonder why you could not always do this for a subset of symbols,
namely those exported from the current TU and building a symbol
based on the symbols assembler name?

That is, I dislike relying on a new flag_lto_debuginfo_assume_unique_filepaths
flag.


Most of these symbols are exported from the TU, this is the strictest
subset I found. They might not be used later, but in most cases we will
find out too late (in WPA).

Assembler names are not unique with static or weak symbols.
The only alternative might be hashing the DIE subtree. Which at least
for DW_TAG_namespace can lead to similar divergence as hashing the
entire file if the entire file is wrapped in the namespace.
And it is also unclear how to hash references to outside of the subtree
without essentially hashing the entire file.


I'd also really like to see a way to get rid of those symbols at link time :/
Or at least make them smaller?  For example by hashing the assembler
name?


For comparison cc1 size:
346708592 without the flag
373766424 with the flag
365183720 with 16 hex digits hash instead of assembler name

The main problem is the number of symbols:
269739 added symbols
 71536 other symbols

So we might be able to halve the size of added symbols, but I would
prefer to focus on removing them entirely.


The BFD linker has .gnu_lto_* special-casing for sections to discard,
maybe we can add a special .note section, .note.gnu.discard_syms with
a list of symbols to discard after link editing?


The alternative could be adding special-cased .gnu.lto_* prefix.
Which could be more easily stripped out after linking with other
linkers.

Though I like the .note idea more if we can special case it.
I will try to implement it in the BFD linker.


Re: [PATCH v6 2/7] Drop targetm.promote_prototypes from C, C++ and Ada frontends

2024-12-06 Thread Richard Biener
On Wed, Dec 4, 2024 at 9:48 PM H.J. Lu  wrote:
>
> Remove the targetm.calls.promote_prototypes call from C, C++ and Ada
> frontends.

I'm conditionally approving this unless FE maintainers complain before holidays
(the effect of the hook is re-instantiated during RTL expansion in 1/7).

I've added the FE maintainers to CC

Richard.

> gcc/
>
> PR c/48274
> PR middle-end/14907
> PR middle-end/112877
> * gimple.cc (gimple_builtin_call_types_compatible_p): Remove the
> targetm.calls.promote_prototypes call.
> * tree.cc (tree_builtin_call_types_compatible_p): Likewise.
>
> gcc/ada/
>
> PR middle-end/14907
> PR middle-end/112877
> * gcc-interface/utils.cc (create_param_decl): Remove the
> targetm.calls.promote_prototypes call.
>
> gcc/c/
>
> PR c/48274
> PR middle-end/14907
> PR middle-end/112877
> * c-decl.cc (start_decl): Remove the
> targetm.calls.promote_prototypes call.
> (store_parm_decls_oldstyle): Likewise.
> (finish_function): Likewise.
> * c-typeck.cc (convert_argument): Likewise.
> (c_safe_arg_type_equiv_p): Likewise.
>
> gcc/cp/
>
> PR middle-end/14907
> PR middle-end/112877
> * call.cc (type_passed_as): Remove the
> targetm.calls.promote_prototypes call.
> (convert_for_arg_passing): Likewise.
> * typeck.cc (cxx_safe_arg_type_equiv_p): Likewise.
>
> Signed-off-by: H.J. Lu 
> ---
>  gcc/ada/gcc-interface/utils.cc | 24 
>  gcc/c/c-decl.cc| 40 --
>  gcc/c/c-typeck.cc  | 19 
>  gcc/cp/call.cc | 10 -
>  gcc/cp/typeck.cc   | 13 ---
>  gcc/gimple.cc  | 10 +
>  gcc/tree.cc| 14 
>  7 files changed, 9 insertions(+), 121 deletions(-)
>
> diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
> index 8e8cf55ae12..cbbac5160d2 100644
> --- a/gcc/ada/gcc-interface/utils.cc
> +++ b/gcc/ada/gcc-interface/utils.cc
> @@ -3282,30 +3282,6 @@ tree
>  create_param_decl (tree name, tree type)
>  {
>tree param_decl = build_decl (input_location, PARM_DECL, name, type);
> -
> -  /* Honor TARGET_PROMOTE_PROTOTYPES like the C compiler, as not doing so
> - can lead to various ABI violations.  */
> -  if (targetm.calls.promote_prototypes (NULL_TREE)
> -  && INTEGRAL_TYPE_P (type)
> -  && TYPE_PRECISION (type) < TYPE_PRECISION (integer_type_node))
> -{
> -  /* We have to be careful about biased types here.  Make a subtype
> -of integer_type_node with the proper biasing.  */
> -  if (TREE_CODE (type) == INTEGER_TYPE
> - && TYPE_BIASED_REPRESENTATION_P (type))
> -   {
> - tree subtype
> -   = make_unsigned_type (TYPE_PRECISION (integer_type_node));
> - TREE_TYPE (subtype) = integer_type_node;
> - TYPE_BIASED_REPRESENTATION_P (subtype) = 1;
> - SET_TYPE_RM_MIN_VALUE (subtype, TYPE_MIN_VALUE (type));
> - SET_TYPE_RM_MAX_VALUE (subtype, TYPE_MAX_VALUE (type));
> - type = subtype;
> -   }
> -  else
> -   type = integer_type_node;
> -}
> -
>DECL_ARG_TYPE (param_decl) = type;
>return param_decl;
>  }
> diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
> index 1c11c216bd6..9642257c961 100644
> --- a/gcc/c/c-decl.cc
> +++ b/gcc/c/c-decl.cc
> @@ -5719,26 +5719,6 @@ start_decl (struct c_declarator *declarator, struct 
> c_declspecs *declspecs,
> DECL_EXTERNAL (decl) = !DECL_EXTERNAL (decl);
>  }
>
> -  if (TREE_CODE (decl) == FUNCTION_DECL
> -  && targetm.calls.promote_prototypes (TREE_TYPE (decl)))
> -{
> -  struct c_declarator *ce = declarator;
> -
> -  if (ce->kind == cdk_pointer)
> -   ce = declarator->declarator;
> -  if (ce->kind == cdk_function)
> -   {
> - tree args = ce->u.arg_info->parms;
> - for (; args; args = DECL_CHAIN (args))
> -   {
> - tree type = TREE_TYPE (args);
> - if (type && INTEGRAL_TYPE_P (type)
> - && TYPE_PRECISION (type) < TYPE_PRECISION 
> (integer_type_node))
> -   DECL_ARG_TYPE (args) = c_type_promotes_to (type);
> -   }
> -   }
> -}
> -
>if (TREE_CODE (decl) == FUNCTION_DECL
>&& DECL_DECLARED_INLINE_P (decl)
>&& DECL_UNINLINABLE (decl)
> @@ -11172,13 +11152,6 @@ store_parm_decls_oldstyle (tree fndecl, const struct 
> c_arg_info *arg_info)
>  useful for argument types like uid_t.  */
>   DECL_ARG_TYPE (parm) = TREE_TYPE (parm);
>
> - if (targetm.calls.promote_prototypes (TREE_TYPE 
> (current_function_decl))
> - && INTEGRAL_TYPE_P (TREE_TYPE (parm))
> - && (TYPE_PRECISION (TREE_TYPE (parm))
> - < TYPE_PRECISION (inte

Re: [PATCH v6 2/7] Drop targetm.promote_prototypes from C, C++ and Ada frontends

2024-12-06 Thread Eric Botcazou
> I'm conditionally approving this unless FE maintainers complain before
> holidays (the effect of the hook is re-instantiated during RTL expansion in
> 1/7).

FWIW I'm all for removing this piece of code from FEs.

-- 
Eric Botcazou




'gcc/config/nvptx/gen-multilib-matches.sh': Support '--selftest' (was: 'gcc/config/nvptx/t-nvptx': Don't use the 'shell' function of 'make' (was: nvptx: Allow '--with-arch' to override the default '-m

2024-12-06 Thread Thomas Schwinge
Hi!

On 2024-12-06T10:01:22+0100, I wrote:
> I recently learned that the exit status of the command invoked in a
> 'Makefile' via '$(shell [...])' effectively gets discarded (unless
> explicitly checking the GNU Make 4.2+ '.SHELLSTATUS' variable or jumping
> through other hoops).  I was under the assumption that an error in a
> 'shell' function would cause 'make' to error out, similarly to how it
> does in 'Makefile' rules...
>
> I learned this The Hard Way here:
>
> On 2022-06-15T23:18:10+0200, I wrote:
>> --- a/gcc/config/nvptx/t-nvptx
>> +++ b/gcc/config/nvptx/t-nvptx
>
>> +multilib_matches := $(shell $(srcdir)/config/nvptx/gen-multilib-matches.sh 
>> $(srcdir)/config/nvptx $(multilib_options_isa_default) 
>> "$(multilib_options_isa_list)")
>
> When recently working on changing nvptx multilib things, and for that
> enhancing nvptx' 'gen-multilib-matches.sh', I made an error in there, and
> then got confusing behavior in that I could still successfully 'make'
> GCC, and my changes "mostly appeared to work as expected", but not quite.
> This was due to garbage in 'MULTILIB_MATCHES', caused by a shell syntax
> error in 'gen-multilib-matches.sh' -- which '$(shell [...])' swept under
> the table.
>
> Pushed to trunk branch commit 490443357668a87e3c322f218873a7649a2552df
> "'gcc/config/nvptx/t-nvptx': Don't use the 'shell' function of 'make'",
> see attached.

To further improve reliability of nvptx 'gen-multilib-matches.sh', in

"Re: [PATCH] v2: Run selftests for C++ as well as C", I recently
mentioned the idea of adding selftesting to this script and attaching a
new 's-selftest-nvptx_gen-multilib-matches' rule to the existing GCC
selftest framework rules.  Instead, in a simpler way, let's just invoke
the 'gen-multilib-matches.sh --selftest' before actual use here:

> --- a/gcc/config/nvptx/t-nvptx
> +++ b/gcc/config/nvptx/t-nvptx
> @@ -43,12 +43,24 @@ MULTILIB_OPTIONS += mgomp
>  multilib_options_isa_list := $(TM_MULTILIB_CONFIG)
>  multilib_options_isa_default := $(word 1,$(multilib_options_isa_list))
>  multilib_options_misa_list := $(addprefix misa=,$(multilib_options_isa_list))
> +
> +t-nvptx-gen-multilib-matches: $(srcdir)/config/nvptx/gen-multilib-matches.sh 
> \
> +  $(srcdir)/config/nvptx/t-nvptx \
> +  Makefile \
> +  $(srcdir)/config/nvptx/nvptx-sm.def
> + $(SHELL) $< \
> +   $(dir $<) \
> +   $(multilib_options_isa_default) \
> +   '$(multilib_options_isa_list)' \
> +   > $@
> +
> +include t-nvptx-gen-multilib-matches

Pushed to trunk branch commit ccd6ec23177f7a4ed69fabad8e79d5d4da419fb2
"'gcc/config/nvptx/gen-multilib-matches.sh': Support '--selftest'", see
attached.


Grüße
 Thomas


>From ccd6ec23177f7a4ed69fabad8e79d5d4da419fb2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 2 Dec 2024 16:50:16 +0100
Subject: [PATCH] 'gcc/config/nvptx/gen-multilib-matches.sh': Support
 '--selftest'

..., and invoke that before actual use.

	gcc/
	* config/nvptx/gen-multilib-matches.sh: Support '--selftest'.
	* config/nvptx/t-nvptx (t-nvptx-gen-multilib-matches:): Invoke it.
	* config/nvptx/gen-multilib-matches-tests: New.
---
 gcc/config/nvptx/gen-multilib-matches-tests | 77 +++
 gcc/config/nvptx/gen-multilib-matches.sh| 82 -
 gcc/config/nvptx/t-nvptx|  2 +
 3 files changed, 159 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/nvptx/gen-multilib-matches-tests

diff --git a/gcc/config/nvptx/gen-multilib-matches-tests b/gcc/config/nvptx/gen-multilib-matches-tests
new file mode 100644
index ..c2775f268354
--- /dev/null
+++ b/gcc/config/nvptx/gen-multilib-matches-tests
@@ -0,0 +1,77 @@
+# Test cases for 'gen-multilib-matches.sh'.
+
+# Blank lines and lines beginning with '#' are ignored.
+
+# 'BEGIN [name]': clear state, begin test [name].
+# 'SSMS 30 35 53': set 'sms' to '30 35 53'.  Default: per 'nvptx-sm.def'.
+# 'SMOID sm_30': set 'multilib_options_isa_default' to 'sm_30'.  Default: unset.
+# 'SMOIL sm_35 sm_30': set 'multilib_options_isa_list' to 'sm_35 sm_30'.  Default: unset.
+# 'AEMM .=misa?sm_30': append '.=misa?sm_30' to expected "multilib matches".  Default: unset.
+# 'CMMC': compute "multilib matches" per the current settings, and compare to the expected.
+
+
+BEGIN '--with-arch=sm_30'
+SMOID sm_30
+SMOIL sm_30
+AEMM .=misa?sm_30
+AEMM .=misa?sm_35
+AEMM .=misa?sm_53
+AEMM .=misa?sm_70
+AEMM .=misa?sm_75
+AEMM .=misa?sm_80
+CMMC
+
+
+BEGIN '--with-arch=sm_35'
+SMOID sm_35
+SMOIL sm_35 sm_30
+AEMM .=misa?sm_35
+AEMM .=misa?sm_53
+AEMM .=misa?sm_70
+AEMM .=misa?sm_75
+AEMM .=misa?sm_80
+CMMC
+
+
+BEGIN '--with-arch=sm_53'
+SMOID sm_53
+SMOIL sm_53 sm_30
+AEMM misa?sm_30=misa?sm_35
+AEMM .=misa?sm_53
+AEMM .=misa?sm_70
+AEMM .=misa?sm_75
+AEMM .=misa?sm_80
+CMMC
+
+
+BEGIN '--with-arch=sm_70'
+SMOID sm_70
+SMOIL sm_70 sm_30
+AEMM misa?sm_30=misa?sm_35
+AEMM misa?sm_30=misa?sm_53
+AEMM .=misa?sm_70
+AEMM .=misa?sm_75
+AEMM 

Re: [PATCH v3] RISC-V: Add --with-cmodel configure option

2024-12-06 Thread Kito Cheng
committee to trunk

On Thu, Dec 5, 2024 at 2:59 PM Kito Cheng  wrote:
>
> From: Hau Hsu 
>
> Sometimes we want to use default cmodel other than medlow. Add a GCC
> configure option for that.
>
> gcc/ChangeLog:
>
> * config.gcc (riscv*-*-*): Add support for --with-cmodel configure 
> option.
> (all_defaults): Add cmodel.
> * config/riscv/riscv.h (TARGET_DEFAULT_CMODEL): Remove.
> * doc/install.texi: Document --with-cmodel configure option.
> * doc/invoke.texi (-mcmodel): Mention --with-cmodel configure option.
>
> Co-authored-by: Kito Cheng 
>
> ---
>
> v3:
>
> I've confirmed the v2 of this patch will break all other target's build,
> and I've fixed the issue in this version, tested on AArch64, passed 3 stage
> bootstrap.
>
> Also clean up this patch to remove unnecessary changes.
>
> Will commit after CI pass.
>
>
> ---
>  gcc/config.gcc   | 23 +--
>  gcc/config/riscv/riscv.h |  2 --
>  gcc/doc/install.texi |  4 
>  gcc/doc/invoke.texi  |  6 --
>  4 files changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index afa78453197..f4ae14c6db2 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -4711,7 +4711,7 @@ case "${target}" in
> ;;
>
> riscv*-*-*)
> -   supported_defaults="abi arch tune riscv_attribute isa_spec 
> tls"
> +   supported_defaults="abi arch tune riscv_attribute isa_spec 
> tls cmodel"
>
> case "${target}" in
> riscv-* | riscv32*) xlen=32 ;;
> @@ -4867,6 +4867,25 @@ case "${target}" in
> exit 1
> esac
> fi
> +
> +   # Handle --with-cmodel.
> +   # Make sure --with-cmodel is valid.  If it was not specified,
> +   # use medlow as the default value.
> +   case "${with_cmodel}" in
> +   "" | medlow)
> +   tm_defines="${tm_defines} 
> TARGET_DEFAULT_CMODEL=CM_MEDLOW"
> +   ;;
> +   medany)
> +   tm_defines="${tm_defines} 
> TARGET_DEFAULT_CMODEL=CM_MEDANY"
> +   ;;
> +   large)
> +   tm_defines="${tm_defines} 
> TARGET_DEFAULT_CMODEL=CM_LARGE"
> +   ;;
> +   *)
> +   echo "invalid option for --with-cmodel: 
> '${with_cmodel}', available values are 'medlow' 'medany' 'large'" 1>&2
> +   exit 1
> +   ;;
> +   esac
> ;;
>
> mips*-*-*)
> @@ -6046,7 +6065,7 @@ case ${target} in
>  esac
>
>  t=
> -all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 
> tune_64 schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt 
> synci tls lxc1-sxc1 madd4 isa_spec compact-branches msa"
> +all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 
> tune_64 schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt 
> synci tls lxc1-sxc1 madd4 isa_spec compact-branches msa cmodel"
>  for option in $all_defaults
>  do
> eval "val=\$with_"`echo $option | sed s/-/_/g`
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 8a8b08b6b51..09de74667a9 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -119,8 +119,6 @@ ASM_MISA_SPEC
>  "%{march=*:%:riscv_expand_arch(%*)} "  \
>  "%{!march=*:%{mcpu=*:%:riscv_expand_arch_from_cpu(%*)}} "
>
> -#define TARGET_DEFAULT_CMODEL CM_MEDLOW
> -
>  #define LOCAL_LABEL_PREFIX "."
>  #define USER_LABEL_PREFIX  ""
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 97d9aaffa69..4107697f10c 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -1537,6 +1537,10 @@ Use big endian by default.  Provide a multilib for 
> little endian.
>  Use little endian by default.  Provide a multilib for big endian.
>  @end table
>
> +@item --with-cmodel=@var{cmodel}
> +Specify what code model to use by default.
> +Currently only implemented for riscv*-*-*.
> +
>  @item --enable-threads
>  Specify that the target
>  supports threads.  This affects the Objective-C compiler and runtime
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index d2409a41d50..0107da32b64 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -31092,8 +31092,10 @@ element-misaligned vector memory access.
>  @item -mcmodel=medlow
>  Generate code for the medium-low code model. The program and its statically
>  defined symbols must lie within a single 2 GiB address range and must lie
> -between absolute addresses @minus{}2 GiB and +2 GiB. Programs can be
> -statically or dynamically linked. This is the default code model.
> +between absolute addresses @minus{}2 GiB and +2 GiB. Programs can be 
> statically
> +or dynamically linked. This is the default code model unles

[PATCH] Use new RAW_DATA_{U,S}CHAR_ELT macros in the middle-end and C FE

2024-12-06 Thread Jakub Jelinek
Hi!

During the patch review of the C++ #embed optimization, Jason asked for
a macro for the common
((const unsigned char *) RAW_DATA_POINTER (value))[i]
and ditto with signed char patterns which appear in a lot of places.
In the just committed patch I've added
+#define RAW_DATA_UCHAR_ELT(NODE, I) \
+  (((const unsigned char *) RAW_DATA_POINTER (NODE))[I])
+#define RAW_DATA_SCHAR_ELT(NODE, I) \
+  (((const signed char *) RAW_DATA_POINTER (NODE))[I])
macros for that in tree.h.

The following patch is just a cleanup to use those macros where appropriate.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-12-06  Jakub Jelinek  

gcc/
* gimplify.cc (gimplify_init_ctor_eval): Use RAW_DATA_UCHAR_ELT
macro.
* gimple-fold.cc (fold_array_ctor_reference): Likewise.
* tree-pretty-print.cc (dump_generic_node): Use RAW_DATA_UCHAR_ELT
and RAW_DATA_SCHAR_ELT macros.
* fold-const.cc (fold): Use RAW_DATA_UCHAR_ELT macro.
gcc/c/
* c-parser.cc (c_parser_get_builtin_args, c_parser_expression,
c_parser_expr_list): Use RAW_DATA_UCHAR_ELT macro.
* c-typeck.cc (digest_init): Use RAW_DATA_UCHAR_ELT and
RAW_DATA_SCHAR_ELT macros.
(add_pending_init, maybe_split_raw_data): Use RAW_DATA_UCHAR_ELT
macro.

--- gcc/gimplify.cc.jj  2024-12-02 11:08:52.077134419 +0100
+++ gcc/gimplify.cc 2024-12-05 19:05:53.921040446 +0100
@@ -5533,8 +5533,7 @@ gimplify_init_ctor_eval (tree object, ve
tree init
  = build2 (INIT_EXPR, TREE_TYPE (cref), cref,
build_int_cst (TREE_TYPE (value),
-  ((const unsigned char *)
-   RAW_DATA_POINTER (value))[i]));
+  RAW_DATA_UCHAR_ELT (value, i)));
gimplify_and_add (init, pre_p);
ggc_free (init);
  }
--- gcc/gimple-fold.cc.jj   2024-11-30 01:47:37.210351120 +0100
+++ gcc/gimple-fold.cc  2024-12-05 19:05:23.126471679 +0100
@@ -8389,9 +8389,7 @@ fold_array_ctor_reference (tree type, tr
return NULL_TREE;
  *suboff += access_index.to_uhwi () * BITS_PER_UNIT;
  unsigned o = (access_index - wi::to_offset (elt->index)).to_uhwi ();
- return build_int_cst (TREE_TYPE (val),
-   ((const unsigned char *)
-RAW_DATA_POINTER (val))[o]);
+ return build_int_cst (TREE_TYPE (val), RAW_DATA_UCHAR_ELT (val, o));
}
   if (!size && TREE_CODE (val) != CONSTRUCTOR)
{
--- gcc/tree-pretty-print.cc.jj 2024-11-23 13:00:31.386984106 +0100
+++ gcc/tree-pretty-print.cc2024-12-05 19:06:56.434165047 +0100
@@ -2625,11 +2625,9 @@ dump_generic_node (pretty_printer *pp, t
{
  if (TYPE_UNSIGNED (TREE_TYPE (node))
  || TYPE_PRECISION (TREE_TYPE (node)) > CHAR_BIT)
-   pp_decimal_int (pp, ((const unsigned char *)
-RAW_DATA_POINTER (node))[i]);
+   pp_decimal_int (pp, RAW_DATA_UCHAR_ELT (node, i));
  else
-   pp_decimal_int (pp, ((const signed char *)
-RAW_DATA_POINTER (node))[i]);
+   pp_decimal_int (pp, RAW_DATA_SCHAR_ELT (node, i));
  if (i == RAW_DATA_LENGTH (node) - 1U)
break;
  else if (i == 9 && RAW_DATA_LENGTH (node) > 20)
--- gcc/fold-const.cc.jj2024-11-11 09:51:41.418810377 +0100
+++ gcc/fold-const.cc   2024-12-05 19:04:50.200932753 +0100
@@ -14001,8 +14001,7 @@ fold (tree expr)
 - wi::to_offset (CONSTRUCTOR_ELT (op0, idx)->index));
gcc_checking_assert (o < RAW_DATA_LENGTH (val));
return build_int_cst (TREE_TYPE (val),
- ((const unsigned char *)
-  RAW_DATA_POINTER (val))[o.to_uhwi ()]);
+ RAW_DATA_UCHAR_ELT (val, o.to_uhwi ()));
  }
  }
 
--- gcc/c/c-parser.cc.jj2024-12-05 12:57:34.582801585 +0100
+++ gcc/c/c-parser.cc   2024-12-05 22:01:46.537894285 +0100
@@ -10811,8 +10811,7 @@ c_parser_get_builtin_args (c_parser *par
  for (unsigned int i = 0; i < (unsigned) RAW_DATA_LENGTH (value); i++)
{
  expr.value = build_int_cst (integer_type_node,
- ((const unsigned char *)
-  RAW_DATA_POINTER (value))[i]);
+ RAW_DATA_UCHAR_ELT (value, i));
  vec_safe_push (cexpr_list, expr);
}
  c_parser_consume_token (parser);
@@ -13751,8 +13750,7 @@ c_parser_expression (c_parser *parser)
  tree val = embed->value;
  unsigned last = RAW_DATA_LENGTH (val) - 1;
  next.value = build_int_cst (TREE_TYPE (val),
- 

nvptx: Clarify that our baseline is PTX ISA Version 3.1 (was: [committed][nvptx] Choose -mptx default based on -misa)

2024-12-06 Thread Thomas Schwinge
Hi!

On 2022-02-08T13:57:53+0100, Tom de Vries via Gcc-patches 
 wrote:
> --- a/gcc/config/nvptx/nvptx-opts.h
> +++ b/gcc/config/nvptx/nvptx-opts.h
> @@ -31,7 +31,9 @@ enum ptx_isa
>  
>  enum ptx_version
>  {
> +  PTX_VERSION_3_0,
>PTX_VERSION_3_1,
> +  PTX_VERSION_4_2,
>PTX_VERSION_6_0,
>PTX_VERSION_6_3,
>PTX_VERSION_7_0

Pushed to trunk branch commit 380ceb23b130a2b9ec541607a3eb1ffd0387c576
"nvptx: Clarify that our baseline is PTX ISA Version 3.1", see attached.


Grüße
 Thomas


>From 380ceb23b130a2b9ec541607a3eb1ffd0387c576 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sun, 10 Nov 2024 17:32:55 +0100
Subject: [PATCH] nvptx: Clarify that our baseline is PTX ISA Version 3.1

Added in commit decde11183bdccc46587d6614b75f3d56a2f2e4a
"[nvptx] Choose -mptx default based on -misa", 'PTX_VERSION_3_0' was added for
'first_ptx_version_supporting_sm' to return it for 'PTX_ISA_SM30' (as
documented by Nvidia).  It's however then immediately overridden to 3.1, which
in GCC/nvptx "has been the smallest version historically", and also '-mptx=3.0'
isn't exposed to the user.  As we also elsewhere (machine description etc.)
assume that our baseline is PTX ISA Version 3.1, there's no real value added in
maintaining 'PTX_VERSION_3_0' for purposes of 'first_ptx_version_supporting_sm'
only.

No change in behavior intended.

	gcc/
	* config/nvptx/nvptx-opts.h (enum ptx_version): Remove
	'PTX_VERSION_3_0'.
	* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm)
	(default_ptx_version_option, ptx_version_to_string)
	(ptx_version_to_number): Adjust.
	* config/nvptx/nvptx.h: Comment.
---
 gcc/config/nvptx/nvptx-opts.h | 4 +++-
 gcc/config/nvptx/nvptx.cc | 9 +
 gcc/config/nvptx/nvptx.h  | 2 ++
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index fb5147c143e9..d0b47f0aeeff 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -30,11 +30,13 @@ enum ptx_isa
 #undef NVPTX_SM
 };
 
+/* 'PTX_VERSION_[...]'s smaller than 'PTX_VERSION_3_1' are not listed here:
+   our baseline is PTX ISA Version 3.1.  */
+
 enum ptx_version
 {
   PTX_VERSION_unset,
   PTX_VERSION_default = PTX_VERSION_unset,
-  PTX_VERSION_3_0,
   PTX_VERSION_3_1,
   PTX_VERSION_4_2,
   PTX_VERSION_6_0,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 924399f7a35e..fb5a45a18e3c 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -212,7 +212,7 @@ first_ptx_version_supporting_sm (enum ptx_isa sm)
   switch (sm)
 {
 case PTX_ISA_SM30:
-  return PTX_VERSION_3_0;
+  return /* PTX_VERSION_3_0 not defined */ PTX_VERSION_3_1;
 case PTX_ISA_SM35:
   return PTX_VERSION_3_1;
 case PTX_ISA_SM53:
@@ -236,9 +236,6 @@ default_ptx_version_option (void)
   /* Pick a version that supports the sm.  */
   enum ptx_version res = first;
 
-  /* Pick at least 3.1.  This has been the smallest version historically.  */
-  res = MAX (res, PTX_VERSION_3_1);
-
   /* Pick at least 6.0, to enable using bar.warp.sync to have a way to force
  warp convergence.  */
   res = MAX (res, PTX_VERSION_6_0);
@@ -253,8 +250,6 @@ ptx_version_to_string (enum ptx_version v)
 {
   switch (v)
 {
-case PTX_VERSION_3_0:
-  return "3.0";
 case PTX_VERSION_3_1:
   return "3.1";
 case PTX_VERSION_4_2:
@@ -275,8 +270,6 @@ ptx_version_to_number (enum ptx_version v, bool major_p)
 {
   switch (v)
 {
-case PTX_VERSION_3_0:
-  return major_p ? 3 : 0;
 case PTX_VERSION_3_1:
   return major_p ? 3 : 1;
 case PTX_VERSION_4_2:
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 68ab011c5a16..792da4901d22 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -88,6 +88,8 @@
 
 #include "nvptx-gen.h"
 
+/* There are no 'TARGET_PTX_3_1' and smaller conditionals: our baseline is
+   PTX ISA Version 3.1.  */
 #define TARGET_PTX_6_0 (ptx_version_option >= PTX_VERSION_6_0)
 #define TARGET_PTX_6_3 (ptx_version_option >= PTX_VERSION_6_3)
 #define TARGET_PTX_7_0 (ptx_version_option >= PTX_VERSION_7_0)
-- 
2.34.1



nvptx: Support '--with-multilib-list' (was: Raise nvptx code generation to default PTX ISA 7.3, sm_52, therefore CUDA 11.3 (released 2021-04))

2024-12-06 Thread Thomas Schwinge
Hi!

On 2024-09-24T18:09:48+0200, I wrote:
> [...] build sm_30 multilib variants (either
> (3a) by default or (3b) upon 'configure'-time request via an additional
> option: '--with-multilib-list=default,sm_30' or similar), and the user
> builds with '-foffload-options=nvptx-none=-march=sm_30': [...]
>
> [...] I'll be happy to
> implement (3b) if people think that's still helpful.
>
> In fact, (3b) can then generally support 'configure'-time selection of
> further multilib variants to be built (for example,
> '--with-multilib-list=default,sm_30,sm_89') -- but not use them as
> default '-march=[...]', in contrast to what '--with-arch=sm_30' or
> '--with-arch=sm_89' does, for example.  I'll look into that.

Pushed to trunk branch commit 86b3a7532d56f74fcd1c362f2da7f95e8cc4e4a6
"nvptx: Support '--with-multilib-list'", see attached.


Grüße
 Thomas


>From 86b3a7532d56f74fcd1c362f2da7f95e8cc4e4a6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 27 Sep 2024 17:44:16 +0200
Subject: [PATCH] nvptx: Support '--with-multilib-list'

No change in behavior unless specifying it.

	gcc/
	* config.gcc: nvptx: Support '--with-multilib-list'.
	* config/nvptx/gen-multilib-matches.sh: Adjust.
	* configure.ac: Likewise.
	* configure: Regenerate.
	* doc/install.texi: Update.
	* doc/invoke.texi: Align.
	* config/nvptx/gen-multilib-matches-tests: Extend.
---
 gcc/config.gcc  |  69 +--
 gcc/config/nvptx/gen-multilib-matches-tests | 121 +++-
 gcc/config/nvptx/gen-multilib-matches.sh|  29 -
 gcc/configure   |   4 +-
 gcc/configure.ac|   2 +-
 gcc/doc/install.texi|  42 ++-
 gcc/doc/invoke.texi |  10 +-
 7 files changed, 242 insertions(+), 35 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f4ae14c6db2a..6381a5793194 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -5595,21 +5595,70 @@ case "${target}" in
 		;;
 	nvptx-*)
 		supported_defaults=arch
-		TM_MULTILIB_CONFIG=$with_arch
-		#TODO 'sm_[...]' list per 'nvptx-sm.def'.
-		case $with_arch in
-			sm_30 )
-# OK; default.
+
+		nvptx_multilibs_default=sm_30
+
+		case "x${with_multilib_list}" in
+		x | xno)
+			nvptx_multilibs=
+			;;
+		xdefault | xyes)
+			nvptx_multilibs=default
+			;;
+		*)
+			nvptx_multilibs=$with_multilib_list
+			;;
+		esac
+		nvptx_multilibs=`echo $nvptx_multilibs | sed -e 's/,/ /g'`
+		# Expand 'default'.
+		nvptx_multilibs_expanded=
+		for nvptx_multilib in $nvptx_multilibs; do
+			case $nvptx_multilib in
+			default )
+nvptx_multilibs_expanded="$nvptx_multilibs_expanded $nvptx_multilibs_default"
+;;
+			* )
+nvptx_multilibs_expanded="$nvptx_multilibs_expanded $nvptx_multilib"
 ;;
-			sm_35 | sm_53 | sm_70 | sm_75 | sm_80 )
-# OK, but we'd like 'sm_30', too.
-TM_MULTILIB_CONFIG="$TM_MULTILIB_CONFIG sm_30"
+			esac
+		done
+		# The '--with-arch=[...]' one comes first.
+		nvptx_multilibs=$with_arch$nvptx_multilibs_expanded
+		# Filter out any duplicates.
+		nvptx_multilibs_filtered=
+		for nvptx_multilib in $nvptx_multilibs; do
+			case " $nvptx_multilibs_filtered " in
+			*" $nvptx_multilib "* )
+:
+;;
+			* )
+nvptx_multilibs_filtered="$nvptx_multilibs_filtered $nvptx_multilib"
+;;
+			esac
+		done
+		nvptx_multilibs=$nvptx_multilibs_filtered
+		# Verify, and build 'TM_MULTILIB_CONFIG'.
+		TM_MULTILIB_CONFIG=
+		for nvptx_multilib in $nvptx_multilibs; do
+			case $nvptx_multilib in
+			#TODO 'sm_[...]' list per 'nvptx-sm.def'.
+			sm_30 | sm_35 \
+			| sm_53 \
+			| sm_70 | sm_75 \
+			| sm_80 )
+TM_MULTILIB_CONFIG="$TM_MULTILIB_CONFIG $nvptx_multilib"
+;;
+			$with_arch )
+echo "Unknown arch used in --with-arch=$nvptx_multilib" 1>&2
+exit 1
 ;;
 			* )
-echo "Unknown arch used in --with-arch=$with_arch" 1>&2
+echo "Unknown arch used in --with-multilib-list: $nvptx_multilib" 1>&2
 exit 1
 ;;
-		esac
+			esac
+		done
+		TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^ //'`
 		;;
 
 	powerpc*-*-* | rs6000-*-*)
diff --git a/gcc/config/nvptx/gen-multilib-matches-tests b/gcc/config/nvptx/gen-multilib-matches-tests
index c2775f268354..b93369149465 100644
--- a/gcc/config/nvptx/gen-multilib-matches-tests
+++ b/gcc/config/nvptx/gen-multilib-matches-tests
@@ -10,7 +10,7 @@
 # 'CMMC': compute "multilib matches" per the current settings, and compare to the expected.
 
 
-BEGIN '--with-arch=sm_30'
+BEGIN '--with-arch=sm_30', '--with-multilib-list=sm_30'
 SMOID sm_30
 SMOIL sm_30
 AEMM .=misa?sm_30
@@ -21,8 +21,35 @@ AEMM .=misa?sm_75
 AEMM .=misa?sm_80
 CMMC
 
+BEGIN '--with-arch=sm_30', '--with-multilib-list=sm_30,sm_80'
+SMOID sm_30
+SMOIL sm_30 sm_80
+AEMM .=misa?sm_30
+AEMM .=misa?sm_35
+AEMM .=misa?sm_53
+AEMM .=misa?sm_70
+AEMM .=misa?sm_75
+CMMC
+
+BEGIN '--with-arch=sm_30', '--with-multilib-list=sm_30,sm_35,sm_53,sm_70,sm_75,sm_80'
+SMOID sm_30
+SMOIL sm_30 sm_35 sm_53 sm_70 sm_75 sm

nvptx: Expose '-mptx=4.2' (was: [committed][nvptx] Choose -mptx default based on -misa)

2024-12-06 Thread Thomas Schwinge
Hi!

On 2022-02-08T14:56:40+0100, Tom de Vries via Gcc-patches 
 wrote:
> On 2/8/22 14:24, Tobias Burnus wrote:
>> if I understand the patch correctly, -misa=sm_53 -mptx=3.1 will ...

> $ ./gcc.sh ~/hello.c -misa=sm_53 -mptx=3.1
> cc1: error: PTX version (-mptx) needs to be at least 4.2 to support 
> selected -misa (sm_53)

>> I think that's okay but -mptx only supports the values 3.1, 6.3, and 7.0.
>
> I know.  I'm sort of hoping that the new default setting will make using 
> -mptx unnecessary.

>> it only occurs when both are specified in
>> the way shown above. Thus, we can live with that. (Misleading message
>> for odd corner case, only. In particular, I am not sure we really want
>> to add another PTX version...)
>
> Agreed, it's misleading, but I'm hoping people will just specify the sm 
> version.

But there's no reason not to expose '-mptx=4.2' to the user?

Going forward, we'll need more PTX ISA versions added at least for
internal use (to correctly describe PTX instructions etc.), and I don't
want to spend time to decide which of these to expose to the user and
which not to.  Therefore, let's just expose all of them.  I've pushed
to trunk branch commit 1af83aa09979e5f2ca36f844d56ccd629268057d
"nvptx: Expose '-mptx=4.2'", see attached.


Grüße
 Thomas


>From 1af83aa09979e5f2ca36f844d56ccd629268057d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sun, 10 Nov 2024 17:35:07 +0100
Subject: [PATCH] nvptx: Expose '-mptx=4.2'

'PTX_VERSION_4_2' was added in commit decde11183bdccc46587d6614b75f3d56a2f2e4a
"[nvptx] Choose -mptx default based on -misa" for use for '-march=sm_52'
('first_ptx_version_supporting_sm', 'PTX_ISA_SM53'), as documented by Nvidia.
However, '-mptx=4.2' wasn't exposed to the user, but there's no reason not to.

	gcc/
	* config/nvptx/nvptx.h (TARGET_PTX_4_2): New.
	* config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue'
	'4.2' for 'PTX_VERSION_4_2'.
	* doc/invoke.texi (Nvidia PTX Options): Document '-mptx=4.2'.
	gcc/testsuite/
	* gcc.target/nvptx/mptx=4.2.c: New.
---
 gcc/config/nvptx/nvptx.h  |  1 +
 gcc/config/nvptx/nvptx.opt|  3 +++
 gcc/doc/invoke.texi   | 10 +++---
 gcc/testsuite/gcc.target/nvptx/mptx=4.2.c | 19 +++
 4 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/mptx=4.2.c

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 792da4901d22..d9a5e541257d 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -90,6 +90,7 @@
 
 /* There are no 'TARGET_PTX_3_1' and smaller conditionals: our baseline is
PTX ISA Version 3.1.  */
+#define TARGET_PTX_4_2 (ptx_version_option >= PTX_VERSION_4_2)
 #define TARGET_PTX_6_0 (ptx_version_option >= PTX_VERSION_6_0)
 #define TARGET_PTX_6_3 (ptx_version_option >= PTX_VERSION_6_3)
 #define TARGET_PTX_7_0 (ptx_version_option >= PTX_VERSION_7_0)
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 53ddf451836e..408c88354446 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -127,6 +127,9 @@ Known PTX ISA versions (for use with the -mptx= option):
 EnumValue
 Enum(ptx_version) String(3.1) Value(PTX_VERSION_3_1)
 
+EnumValue
+Enum(ptx_version) String(4.2) Value(PTX_VERSION_4_2)
+
 EnumValue
 Enum(ptx_version) String(6.0) Value(PTX_VERSION_6_0)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 33a1b6b7983a..a2234725e671 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -30070,9 +30070,13 @@ capable.  For instance, for @option{-march-map=sm_50} select
 
 @opindex mptx
 @item -mptx=@var{version-string}
-Generate code for the specified PTX ISA version (e.g.@: @samp{7.0}).
-Valid version strings include @samp{3.1}, @samp{6.0}, @samp{6.3}, and
-@samp{7.0}.  The default PTX ISA version is 6.0, unless a higher
+Generate code for the specified PTX ISA version.
+Valid version strings are
+@samp{3.1},
+@samp{4.2},
+@samp{6.0}, @samp{6.3},
+and @samp{7.0}.
+The default PTX ISA version is 6.0, unless a higher
 version is required for specified PTX ISA target architecture via
 option @option{-march=}.
 
diff --git a/gcc/testsuite/gcc.target/nvptx/mptx=4.2.c b/gcc/testsuite/gcc.target/nvptx/mptx=4.2.c
new file mode 100644
index ..e17ee1babf96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/mptx=4.2.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble } */
+/* { dg-options {-march=sm_30 -mptx=4.2} } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { scan-assembler-times {(?n)^	\.version	4\.2$} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^	\.target	sm_30$} 1 } } */
+
+#if __PTX_ISA_VERSION_MAJOR__ != 4
+#error wrong value for __PTX_ISA_VERSION_MAJOR__
+#endif
+
+#if __PTX_ISA_VERSION_MINOR__ != 2
+#error wrong value for __PTX_ISA_VERSION_MINOR__
+#endif
+
+#if __PTX_SM__ != 300
+#error wrong value for __PTX_SM__
+#endif
+
+int dummy;
-- 
2.34.1



'gcc/config/nvptx/gen-multilib-matches.sh': Encapsulate main logic (was: nvptx: Allow '--with-arch' to override the default '-misa' (was: nvptx multilib setup))

2024-12-06 Thread Thomas Schwinge
Hi!

On 2022-06-15T23:18:10+0200, I wrote:
> --- /dev/null
> +++ b/gcc/config/nvptx/gen-multilib-matches.sh
> @@ -0,0 +1,60 @@
> +[...]

Pushed to trunk branch commit b352f89d81bb30dbeb406ff7e4d148e2fb640975
"'gcc/config/nvptx/gen-multilib-matches.sh': Encapsulate main logic",
see attached.


Grüße
 Thomas


>From b352f89d81bb30dbeb406ff7e4d148e2fb640975 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 2 Dec 2024 16:34:03 +0100
Subject: [PATCH] 'gcc/config/nvptx/gen-multilib-matches.sh': Encapsulate main
 logic

Refactoring for later extension.  No change in behavior intended.

	gcc/
	* config/nvptx/gen-multilib-matches.sh: Encapsulate main logic.
---
 gcc/config/nvptx/gen-multilib-matches.sh | 71 
 1 file changed, 47 insertions(+), 24 deletions(-)

diff --git a/gcc/config/nvptx/gen-multilib-matches.sh b/gcc/config/nvptx/gen-multilib-matches.sh
index a39baee5cd24..e52d57130476 100755
--- a/gcc/config/nvptx/gen-multilib-matches.sh
+++ b/gcc/config/nvptx/gen-multilib-matches.sh
@@ -23,8 +23,7 @@
 set -e
 
 nvptx_sm_def="$1/nvptx-sm.def"
-multilib_options_isa_default=$2
-multilib_options_isa_list=$3
+shift
 
 sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
 
@@ -33,33 +32,57 @@ sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
 # ('misa=sm_SM'; thus not remapped), or has to be remapped to the "next lower"
 # variant that does get built.
 
-multilib_matches=
+print_multilib_matches() {
+local sms
+sms=${1?}
+shift
+local multilib_options_isa_default
+multilib_options_isa_default=${1?}
+shift
+local multilib_options_isa_list
+multilib_options_isa_list=${1?}
+shift
+[ $# = 0 ]
 
-# The "lowest" variant has to be built.
-sm_next_lower=INVALID
+local multilib_matches
+multilib_matches=
 
-for sm in $sms; do
-if [ x"sm_$sm" = x"$multilib_options_isa_default" ]; then
-	sm_map=.
-elif expr " $multilib_options_isa_list " : ".* sm_$sm " > /dev/null; then
-	sm_map=
-else
-	sm_map=$sm_next_lower
-fi
+local sm_next_lower
+unset sm_next_lower
 
-if [ x"$sm_map" = x ]; then
-	sm_next_lower=$sm
-else
-	# Output format as required for 'MULTILIB_MATCHES'.
-	if [ x"$sm_map" = x. ]; then
-	multilib_matches_sm=".=misa?sm_$sm"
+local sm
+for sm in $sms; do
+	local sm_map
+	unset sm_map
+	if [ x"sm_$sm" = x"$multilib_options_isa_default" ]; then
+	sm_map=.
+	elif expr " $multilib_options_isa_list " : ".* sm_$sm " > /dev/null; then
+	sm_map=
 	else
-	multilib_matches_sm="misa?sm_$sm_map=misa?sm_$sm"
+	# Assert here that a "next lower" variant is available; the
+	# "lowest" variant always does get built.
+	sm_map=${sm_next_lower?}
 	fi
-	multilib_matches="$multilib_matches $multilib_matches_sm"
 
-	sm_next_lower=$sm_map
-fi
-done
+	if [ x"${sm_map?}" = x ]; then
+	sm_next_lower=$sm
+	else
+	local multilib_matches_sm
+	unset multilib_matches_sm
+	# Output format as required for 'MULTILIB_MATCHES'.
+	if [ x"$sm_map" = x. ]; then
+		multilib_matches_sm=".=misa?sm_$sm"
+	else
+		multilib_matches_sm="misa?sm_$sm_map=misa?sm_$sm"
+	fi
+	multilib_matches="$multilib_matches ${multilib_matches_sm?}"
+
+	sm_next_lower=$sm_map
+	fi
+done
+
+echo "$multilib_matches"
+}
 
+multilib_matches=$(print_multilib_matches "$sms" "$@")
 echo "multilib_matches := $multilib_matches"
-- 
2.34.1



[PATCH] config: nvptx: fix bashisms with gen-copyright.sh use

2024-12-06 Thread Sam James
Providing parameters to `.` when sourcing is a bashism and not supported
by POSIX shell which causes a build failure when compiling a toolchain
for nvptx-none with dash as /bin/sh.

gen-copyright.sh takes a parameter for the format of copyright notice
required. Switch that to using an environment variable `NVPTX_GEN_COPYRIGHT`,
although this could be changed to a function if desired (just more churn
in gen-copyright.sh then).

gcc/ChangeLog:
PR target/117854

* config/nvptx/gen-copyright.sh: Read NVPTX_GEN_COPYRIGHT envvar.
* config/nvptx/gen-h.sh: Set NVPTX_GEN_COPYRIGHT.
* config/nvptx/gen-opt.sh: Ditto.
---
Testing it now with a build for nvptx-none. Is this approach OK or
would you prefer the function approach (which will make the diff larger
because of reformatting)?

 gcc/config/nvptx/gen-copyright.sh | 2 +-
 gcc/config/nvptx/gen-h.sh | 2 +-
 gcc/config/nvptx/gen-opt.sh   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/gen-copyright.sh 
b/gcc/config/nvptx/gen-copyright.sh
index d0a86acb832c..b140de0eb76d 100644
--- a/gcc/config/nvptx/gen-copyright.sh
+++ b/gcc/config/nvptx/gen-copyright.sh
@@ -18,7 +18,7 @@
 # along with GCC; see the file COPYING3.  If not see
 # .
 
-style="$1"
+style="${1:-${NVPTX_GEN_COPYRIGHT}}"
 case $style in
 opt)
 ;;
diff --git a/gcc/config/nvptx/gen-h.sh b/gcc/config/nvptx/gen-h.sh
index ea75e127cdeb..beafd5a9d2c4 100644
--- a/gcc/config/nvptx/gen-h.sh
+++ b/gcc/config/nvptx/gen-h.sh
@@ -32,7 +32,7 @@ EOF
 # Separator.
 echo
 
-. $gen_copyright_sh c
+NVPTX_GEN_COPYRIGHT=c . $gen_copyright_sh
 
 # Separator.
 echo
diff --git a/gcc/config/nvptx/gen-opt.sh b/gcc/config/nvptx/gen-opt.sh
index 6022f51f8975..267a5005f66b 100644
--- a/gcc/config/nvptx/gen-opt.sh
+++ b/gcc/config/nvptx/gen-opt.sh
@@ -36,7 +36,7 @@ EOF
 # Separator.
 echo
 
-. $gen_copyright_sh opt
+NVPTX_GEN_COPYRIGHT=opt . $gen_copyright_sh
 
 # Not emitting the following here (in addition to having it in 'nvptx.opt'), as
 # we'll otherwise run into:
-- 
2.47.1



Re: [PATCH] Fix inaccuracy in cunroll/cunrolli when considering what's innermost loop.

2024-12-06 Thread Richard Biener
On Thu, 5 Dec 2024, liuhongt wrote:

> r15-919-gef27b91b62c3aa removed 1 / 3 size reduction for innermost
> loop, but it doesn't accurately remember what's "innermost" for 2
> testcases in PR117888.
> 
> 1) For pass_cunroll, the "innermost" loop could be an originally outer
> loop with inner loop completely unrolled by cunrolli. The patch moves
> local variable cunrolli to parameter of tree_unroll_loops_completely
> and passes it directly from execute of the pass.
> 
> 2) For pass_cunrolli, cunrolli is set to false when the sibling loop
> of a innermost loop is completely unrolled, and it inaccurately
> takes the innermost loop as an "outer" loop. The patch add another
> paramter visited_innermost to helps recognizing the innermost loop.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/117888
>   * tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely): Add
>   a new parameter visited_innermost to avoid inaccurately taking
>   innermost as an outer loop after it's sibling loop is
>   completed unrolled.
>   (canonicalize_loop_induction_variables): Ditto.
>   (canonicalize_induction_variables): Ditto.
>   (tree_unroll_loops_completely_1):  Ditto.
>   (tree_unroll_loops_completely): Move local variable cunrolli
>   to parameter to indicate it's from pass cunrolli.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr117888-2.c: New test.
>   * gcc.dg/vect/pr117888-1.c: Ditto.
> ---
>  gcc/testsuite/gcc.dg/pr117888-2.c  | 38 ++
>  gcc/testsuite/gcc.dg/vect/pr117888-1.c | 71 ++
>  gcc/tree-ssa-loop-ivcanon.cc   | 50 +-
>  3 files changed, 145 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr117888-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr117888-1.c
> 
> diff --git a/gcc/testsuite/gcc.dg/pr117888-2.c 
> b/gcc/testsuite/gcc.dg/pr117888-2.c
> new file mode 100644
> index 000..1749f1509a6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr117888-2.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize 
> -fdump-tree-cunroll-details" } */
> +/* { dg-additional-options "--param max-completely-peeled-insns=200" { 
> target powerpc64*-*-* } } */
> +
> +typedef struct {
> +  double real;
> +  double imag;
> +} complex;
> +
> +typedef struct { complex e[3][3]; } su3_matrix;
> +
> +void mult_su3_nn( su3_matrix *a, su3_matrix *b, su3_matrix *c )
> +{
> +  int i,j;
> +  double t,ar,ai,br,bi,cr,ci;
> +  for(i=0;i<3;i++)
> +for(j=0;j<3;j++){
> +
> +  ar=a->e[i][0].real; ai=a->e[i][0].imag;
> +  br=b->e[0][j].real; bi=b->e[0][j].imag;
> +  cr=ar*br; t=ai*bi; cr -= t;
> +  ci=ar*bi; t=ai*br; ci += t;
> +
> +  ar=a->e[i][1].real; ai=a->e[i][1].imag;
> +  br=b->e[1][j].real; bi=b->e[1][j].imag;
> +  t=ar*br; cr += t; t=ai*bi; cr -= t;
> +  t=ar*bi; ci += t; t=ai*br; ci += t;
> +
> +  ar=a->e[i][2].real; ai=a->e[i][2].imag;
> +  br=b->e[2][j].real; bi=b->e[2][j].imag;
> +  t=ar*br; cr += t; t=ai*bi; cr -= t;
> +  t=ar*bi; ci += t; t=ai*br; ci += t;
> +
> +  c->e[i][j].real=cr;
> +  c->e[i][j].imag=ci;
> +}
> +}
> +/* { dg-final { scan-tree-dump-times "optimized: loop with 2 iterations 
> completely unrolled" 1 "cunroll" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/pr117888-1.c 
> b/gcc/testsuite/gcc.dg/vect/pr117888-1.c
> new file mode 100644
> index 000..4796a7c83c1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr117888-1.c
> @@ -0,0 +1,71 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -funroll-loops -fdump-tree-vect-details" } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_shift } */
> +/* { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } } */
> +/* { dg-additional-options "--param max-completely-peeled-insns=200" { 
> target powerpc64*-*-* } } */
> +
> +typedef unsigned short ggml_fp16_t;
> +static float table_f32_f16[1 << 16];
> +
> +inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
> +  unsigned short s;
> +  __builtin_memcpy(&s, &f, sizeof(unsigned short));
> +  return table_f32_f16[s];
> +}
> +
> +typedef struct {
> +  ggml_fp16_t d;
> +  ggml_fp16_t m;
> +  unsigned char qh[4];
> +  unsigned char qs[32 / 2];
> +} block_q5_1;
> +
> +typedef struct {
> +  float d;
> +  float s;
> +  char qs[32];
> +} block_q8_1;
> +
> +void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const void * 
> restrict vx, const void * restrict vy) {
> +  const int qk = 32;
> +  const int nb = n / qk;
> +
> +  const block_q5_1 * restrict x = vx;
> +  const block_q8_1 * restrict y = vy;
> +
> +  float sumf = 0.0;
> +
> +  for (int i = 0; i < nb; i++) {
> +unsigned qh;
> +__builtin_memcpy(&qh, x[i].qh, sizeof(qh));
> +
> +int sumi = 0;
> +
> +if (qh) {
> +  for (int j = 0; j < qk/2; ++j) {
>

[PUSHED] 'gcc/config/nvptx/gen-*.sh': Simplify interface

2024-12-06 Thread Thomas Schwinge
What we currently pass in as '$1' is simply 'dirname "$0"'.

gcc/
* config/nvptx/gen-h.sh: Don't pass in '$1'; compute it locally.
* config/nvptx/gen-multilib-matches.sh: Likewise.
* config/nvptx/gen-omp-device-properties.sh: Likewise.
* config/nvptx/gen-opt.sh: Likewise.
* config/nvptx/t-nvptx (s-nvptx-gen-h:, s-nvptx-gen-opt:)
(t-nvptx-gen-multilib-matches:): Adjust.
* config/nvptx/t-omp-device (omp-device-properties-nvptx):
Likewise.
---
 gcc/config/nvptx/gen-h.sh | 8 ++--
 gcc/config/nvptx/gen-multilib-matches.sh  | 8 ++--
 gcc/config/nvptx/gen-omp-device-properties.sh | 6 +-
 gcc/config/nvptx/gen-opt.sh   | 8 ++--
 gcc/config/nvptx/t-nvptx  | 5 ++---
 gcc/config/nvptx/t-omp-device | 2 +-
 6 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/gcc/config/nvptx/gen-h.sh b/gcc/config/nvptx/gen-h.sh
index ea75e127cdeb..bc4ce9af1e2a 100644
--- a/gcc/config/nvptx/gen-h.sh
+++ b/gcc/config/nvptx/gen-h.sh
@@ -18,8 +18,12 @@
 # along with GCC; see the file COPYING3.  If not see
 # .
 
-nvptx_sm_def="$1/nvptx-sm.def"
-gen_copyright_sh="$1/gen-copyright.sh"
+
+nvptx_dir=$(dirname "$0")
+
+
+nvptx_sm_def="$nvptx_dir/nvptx-sm.def"
+gen_copyright_sh="$nvptx_dir/gen-copyright.sh"
 
 sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
 
diff --git a/gcc/config/nvptx/gen-multilib-matches.sh 
b/gcc/config/nvptx/gen-multilib-matches.sh
index e52d57130476..09761a9e6907 100755
--- a/gcc/config/nvptx/gen-multilib-matches.sh
+++ b/gcc/config/nvptx/gen-multilib-matches.sh
@@ -22,11 +22,15 @@
 
 set -e
 
-nvptx_sm_def="$1/nvptx-sm.def"
-shift
+
+nvptx_dir=$(dirname "$0")
+
+
+nvptx_sm_def="$nvptx_dir/nvptx-sm.def"
 
 sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
 
+
 # Every variant in 'sms' has to either be remapped to the default variant
 # ('.', which is always built), or does get built as non-default variant
 # ('misa=sm_SM'; thus not remapped), or has to be remapped to the "next lower"
diff --git a/gcc/config/nvptx/gen-omp-device-properties.sh 
b/gcc/config/nvptx/gen-omp-device-properties.sh
index 3666f9746d1a..5995d49ed72c 100644
--- a/gcc/config/nvptx/gen-omp-device-properties.sh
+++ b/gcc/config/nvptx/gen-omp-device-properties.sh
@@ -18,7 +18,11 @@
 # along with GCC; see the file COPYING3.  If not see
 # .
 
-nvptx_sm_def="$1/nvptx-sm.def"
+
+nvptx_dir=$(dirname "$0")
+
+
+nvptx_sm_def="$nvptx_dir/nvptx-sm.def"
 
 sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
 
diff --git a/gcc/config/nvptx/gen-opt.sh b/gcc/config/nvptx/gen-opt.sh
index 6022f51f8975..103bcddc02b5 100644
--- a/gcc/config/nvptx/gen-opt.sh
+++ b/gcc/config/nvptx/gen-opt.sh
@@ -18,8 +18,12 @@
 # along with GCC; see the file COPYING3.  If not see
 # .
 
-nvptx_sm_def="$1/nvptx-sm.def"
-gen_copyright_sh="$1/gen-copyright.sh"
+
+nvptx_dir=$(dirname "$0")
+
+
+nvptx_sm_def="$nvptx_dir/nvptx-sm.def"
+gen_copyright_sh="$nvptx_dir/gen-copyright.sh"
 
 sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
 
diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index 6c6a6329f0f8..00a7b15496e0 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -16,7 +16,7 @@ mkoffload$(exeext): mkoffload.o collect-utils.o 
libcommon-target.a $(LIBIBERTY)
 $(srcdir)/config/nvptx/nvptx.h: $(srcdir)/config/nvptx/nvptx-gen.h
 $(srcdir)/config/nvptx/nvptx-gen.h: s-nvptx-gen-h; @true
 s-nvptx-gen-h: $(srcdir)/config/nvptx/nvptx-sm.def
-   $(SHELL) $(srcdir)/config/nvptx/gen-h.sh "$(srcdir)/config/nvptx" \
+   $(SHELL) $(srcdir)/config/nvptx/gen-h.sh \
  > tmp-nvptx-gen.h
$(SHELL) $(srcdir)/../move-if-change \
  tmp-nvptx-gen.h $(srcdir)/config/nvptx/nvptx-gen.h
@@ -25,7 +25,7 @@ s-nvptx-gen-h: $(srcdir)/config/nvptx/nvptx-sm.def
 $(srcdir)/config/nvptx/nvptx-gen.opt: s-nvptx-gen-opt; @true
 s-nvptx-gen-opt: $(srcdir)/config/nvptx/nvptx-sm.def \
   $(srcdir)/config/nvptx/gen-opt.sh
-   $(SHELL) $(srcdir)/config/nvptx/gen-opt.sh "$(srcdir)/config/nvptx" \
+   $(SHELL) $(srcdir)/config/nvptx/gen-opt.sh \
  > tmp-nvptx-gen.opt
$(SHELL) $(srcdir)/../move-if-change \
  tmp-nvptx-gen.opt $(srcdir)/config/nvptx/nvptx-gen.opt
@@ -49,7 +49,6 @@ t-nvptx-gen-multilib-matches: 
$(srcdir)/config/nvptx/gen-multilib-matches.sh \
   Makefile \
   $(srcdir)/config/nvptx/nvptx-sm.def
$(SHELL) $< \
- $(dir $<) \
  $(multilib_options_isa_default) \
  '$(multilib_options_isa_list)' \
  > $@
diff --git a/gcc/config/nvptx/t-omp-device b/gcc/config/nvptx/t-omp-device
index c2b28a41ee41..67852f4b 100644
--- a/gcc/config/nvptx/t-omp-device
+++ b/gcc/config/nvptx/t-omp-device
@@ -1,3 +1,3 @@
 omp-device-properties-nvptx: $(srcdir)/config/nvptx/nvptx-sm.def
$(SHELL) $(s

Re: [PATCH] Use new RAW_DATA_{U,S}CHAR_ELT macros in the middle-end and C FE

2024-12-06 Thread Richard Biener
On Fri, 6 Dec 2024, Jakub Jelinek wrote:

> Hi!
> 
> During the patch review of the C++ #embed optimization, Jason asked for
> a macro for the common
> ((const unsigned char *) RAW_DATA_POINTER (value))[i]
> and ditto with signed char patterns which appear in a lot of places.
> In the just committed patch I've added
> +#define RAW_DATA_UCHAR_ELT(NODE, I) \
> +  (((const unsigned char *) RAW_DATA_POINTER (NODE))[I])
> +#define RAW_DATA_SCHAR_ELT(NODE, I) \
> +  (((const signed char *) RAW_DATA_POINTER (NODE))[I])
> macros for that in tree.h.
> 
> The following patch is just a cleanup to use those macros where appropriate.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-12-06  Jakub Jelinek  
> 
> gcc/
>   * gimplify.cc (gimplify_init_ctor_eval): Use RAW_DATA_UCHAR_ELT
>   macro.
>   * gimple-fold.cc (fold_array_ctor_reference): Likewise.
>   * tree-pretty-print.cc (dump_generic_node): Use RAW_DATA_UCHAR_ELT
>   and RAW_DATA_SCHAR_ELT macros.
>   * fold-const.cc (fold): Use RAW_DATA_UCHAR_ELT macro.
> gcc/c/
>   * c-parser.cc (c_parser_get_builtin_args, c_parser_expression,
>   c_parser_expr_list): Use RAW_DATA_UCHAR_ELT macro.
>   * c-typeck.cc (digest_init): Use RAW_DATA_UCHAR_ELT and
>   RAW_DATA_SCHAR_ELT macros.
>   (add_pending_init, maybe_split_raw_data): Use RAW_DATA_UCHAR_ELT
>   macro.
> 
> --- gcc/gimplify.cc.jj2024-12-02 11:08:52.077134419 +0100
> +++ gcc/gimplify.cc   2024-12-05 19:05:53.921040446 +0100
> @@ -5533,8 +5533,7 @@ gimplify_init_ctor_eval (tree object, ve
>   tree init
> = build2 (INIT_EXPR, TREE_TYPE (cref), cref,
>   build_int_cst (TREE_TYPE (value),
> -((const unsigned char *)
> - RAW_DATA_POINTER (value))[i]));
> +RAW_DATA_UCHAR_ELT (value, i)));
>   gimplify_and_add (init, pre_p);
>   ggc_free (init);
> }
> --- gcc/gimple-fold.cc.jj 2024-11-30 01:47:37.210351120 +0100
> +++ gcc/gimple-fold.cc2024-12-05 19:05:23.126471679 +0100
> @@ -8389,9 +8389,7 @@ fold_array_ctor_reference (tree type, tr
>   return NULL_TREE;
> *suboff += access_index.to_uhwi () * BITS_PER_UNIT;
> unsigned o = (access_index - wi::to_offset (elt->index)).to_uhwi ();
> -   return build_int_cst (TREE_TYPE (val),
> - ((const unsigned char *)
> -  RAW_DATA_POINTER (val))[o]);
> +   return build_int_cst (TREE_TYPE (val), RAW_DATA_UCHAR_ELT (val, o));
>   }
>if (!size && TREE_CODE (val) != CONSTRUCTOR)
>   {
> --- gcc/tree-pretty-print.cc.jj   2024-11-23 13:00:31.386984106 +0100
> +++ gcc/tree-pretty-print.cc  2024-12-05 19:06:56.434165047 +0100
> @@ -2625,11 +2625,9 @@ dump_generic_node (pretty_printer *pp, t
>   {
> if (TYPE_UNSIGNED (TREE_TYPE (node))
> || TYPE_PRECISION (TREE_TYPE (node)) > CHAR_BIT)
> - pp_decimal_int (pp, ((const unsigned char *)
> -  RAW_DATA_POINTER (node))[i]);
> + pp_decimal_int (pp, RAW_DATA_UCHAR_ELT (node, i));
> else
> - pp_decimal_int (pp, ((const signed char *)
> -  RAW_DATA_POINTER (node))[i]);
> + pp_decimal_int (pp, RAW_DATA_SCHAR_ELT (node, i));
> if (i == RAW_DATA_LENGTH (node) - 1U)
>   break;
> else if (i == 9 && RAW_DATA_LENGTH (node) > 20)
> --- gcc/fold-const.cc.jj  2024-11-11 09:51:41.418810377 +0100
> +++ gcc/fold-const.cc 2024-12-05 19:04:50.200932753 +0100
> @@ -14001,8 +14001,7 @@ fold (tree expr)
>- wi::to_offset (CONSTRUCTOR_ELT (op0, idx)->index));
>   gcc_checking_assert (o < RAW_DATA_LENGTH (val));
>   return build_int_cst (TREE_TYPE (val),
> -   ((const unsigned char *)
> -RAW_DATA_POINTER (val))[o.to_uhwi ()]);
> +   RAW_DATA_UCHAR_ELT (val, o.to_uhwi ()));
> }
> }
>  
> --- gcc/c/c-parser.cc.jj  2024-12-05 12:57:34.582801585 +0100
> +++ gcc/c/c-parser.cc 2024-12-05 22:01:46.537894285 +0100
> @@ -10811,8 +10811,7 @@ c_parser_get_builtin_args (c_parser *par
> for (unsigned int i = 0; i < (unsigned) RAW_DATA_LENGTH (value); i++)
>   {
> expr.value = build_int_cst (integer_type_node,
> -   ((const unsigned char *)
> -RAW_DATA_POINTER (value))[i]);
> +   RAW_DATA_UCHAR_ELT (value, i));
> vec_safe_push (cexpr_list, expr);
>   }
> c_parser_consume_token (parser);
> @@ -13751,8 +13750,7 @@ c_parser_expression (c_parser *parser)
> 

[PATCH] testsuite/117714 - gcc.dg/vect/slp-reduc-4.c FAILs on 32-bit SPARC

2024-12-06 Thread Richard Biener
The testcase tries to ensure we can elide all permutations when
vectorizing a MAX reduction.  For SPARC the issue is that the
MAX reduction isn't supported and since we're trying to fall back
to single-lane SLP the dumps contain VEC_PERM_EXPR for the
interleaving permute lowering.  Before all-SLP that wouldn't
be in the dumps when doing non-SLP, but eventually we'd fail to
vectorize so no VEC_PERM_EXPRs would be in the dumps either.

The following adds vect_no_int_min_max to the set of xfails for
this particular scan as well, like the existing check for vectorizing.

Pushed.

PR testsuite/117714
* gcc.dg/vect/slp-reduc-4.c: Add vect_no_int_min_max to the
XFAIL for the VEC_PERM_EXPR scan.
---
 gcc/testsuite/gcc.dg/vect/slp-reduc-4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c 
b/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c
index e2fe01bb13d..23c1a7373d7 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c
@@ -61,5 +61,5 @@ int main (void)
reduction exceeds the number of elements in a 128-bit granule.  */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { ! vect_multiple_sizes } xfail { vect_no_int_min_max || { aarch64_sve 
&& vect_variable_length } } } } } */
 /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
vect_multiple_sizes && { ! { vect_load_lanes && vect_strided8 } } } } } } */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" { xfail { 
aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" { xfail { { 
aarch64_sve && vect_variable_length } || vect_no_int_min_max } } } } */
 
-- 
2.43.0


[PUSHED] nvptx: Tag '-misa=[...]', '-mptx=[...]' as 'Negative' of themselves [PR117916]

2024-12-06 Thread Thomas Schwinge
This issue is similar to what a year ago I resolved for GCN in PR112669
"GCN: wrong 'LIBRARY_PATH' in presence of several different '-march=[...]' 
flags".

Given the current standard nvptx configuration, we get:

$ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -mptx=6.3
.
$ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -mptx=3.1
mptx-3.1

... as expected.  The following, however, is not:

$ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -mptx=3.1 
-mptx=6.3
mptx-3.1

This should print '.'.

Or, in a '--with-arch=sm_70' configuration:

$ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -misa=sm_70
.
$ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -misa=sm_30
misa-sm_30

... as expected.  The following, however, are not:

$ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -misa=sm_30 
-misa=sm_70
misa-sm_30
$ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -misa=sm_30 
-march=sm_70
misa-sm_30
$ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -march=sm_30 
-march=sm_70
misa-sm_30
$ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -march=sm_30 
-misa=sm_70
misa-sm_30

These should all print '.'.

Even worse:

$ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -mgomp 
-mptx=3.1 -mptx=_
.

This should print 'mgomp'.  Otherwise, for OpenMP offloading compilation
the wrong (non-'mgomp') multilib is linked in ('.'), and linking fails
due to 'unresolved symbol __nvptx_uni'.

PR target/117916
gcc/
* config/nvptx/nvptx.opt (misa=, mptx=): Tag as 'Negative' of
themselves.
---
 gcc/config/nvptx/nvptx.opt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index c04074052286..53ddf451836e 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -56,7 +56,7 @@ Target Mask(GOMP)
 Generate code for OpenMP offloading: enables -msoft-stack and -muniform-simt.
 
 misa=
-Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
Init(PTX_ISA_unset)
+Target RejectNegative Negative(misa=) ToLower Joined Enum(ptx_isa) 
Var(ptx_isa_option) Init(PTX_ISA_unset)
 Specify the PTX ISA target architecture to use.
 
 march=
@@ -140,7 +140,7 @@ EnumValue
 Enum(ptx_version) String(_) Value(PTX_VERSION_default)
 
 mptx=
-Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) 
Init(PTX_VERSION_unset)
+Target RejectNegative Negative(mptx=) ToLower Joined Enum(ptx_version) 
Var(ptx_version_option) Init(PTX_VERSION_unset)
 Specify the PTX ISA version to use.
 
 minit-regs=
-- 
2.34.1



'gcc/config/nvptx/t-nvptx': Don't use the 'shell' function of 'make' (was: nvptx: Allow '--with-arch' to override the default '-misa' (was: nvptx multilib setup))

2024-12-06 Thread Thomas Schwinge
Hi!

I recently learned that the exit status of the command invoked in a
'Makefile' via '$(shell [...])' effectively gets discarded (unless
explicitly checking the GNU Make 4.2+ '.SHELLSTATUS' variable or jumping
through other hoops).  I was under the assumption that an error in a
'shell' function would cause 'make' to error out, similarly to how it
does in 'Makefile' rules...

I learned this The Hard Way here:

On 2022-06-15T23:18:10+0200, I wrote:
> --- a/gcc/config/nvptx/t-nvptx
> +++ b/gcc/config/nvptx/t-nvptx

> +multilib_matches := $(shell $(srcdir)/config/nvptx/gen-multilib-matches.sh 
> $(srcdir)/config/nvptx $(multilib_options_isa_default) 
> "$(multilib_options_isa_list)")

When recently working on changing nvptx multilib things, and for that
enhancing nvptx' 'gen-multilib-matches.sh', I made an error in there, and
then got confusing behavior in that I could still successfully 'make'
GCC, and my changes "mostly appeared to work as expected", but not quite.
This was due to garbage in 'MULTILIB_MATCHES', caused by a shell syntax
error in 'gen-multilib-matches.sh' -- which '$(shell [...])' swept under
the table.

Pushed to trunk branch commit 490443357668a87e3c322f218873a7649a2552df
"'gcc/config/nvptx/t-nvptx': Don't use the 'shell' function of 'make'",
see attached.


Grüße
 Thomas


>From 490443357668a87e3c322f218873a7649a2552df Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 2 Dec 2024 15:06:58 +0100
Subject: [PATCH] 'gcc/config/nvptx/t-nvptx': Don't use the 'shell' function of
 'make'

The exit status of the command invoked in a 'Makefile' via '$(shell [...])'
effectively gets discarded (unless explicitly checking the GNU Make 4.2+
'.SHELLSTATUS' variable or jumping through other hoops).  In order to be able
to catch errors in what the 'shell' function invokes, let's make things
explicit: similar to how 'gcc/config/avr/t-avr' is doing with 't-multilib-avr',
for example.

	gcc/
	* config/nvptx/t-nvptx (multilib_matches): Don't use the 'shell'
	function of 'make'.
	* config/nvptx/gen-multilib-matches.sh: Adjust.
---
 gcc/config/nvptx/gen-multilib-matches.sh |  9 +++--
 gcc/config/nvptx/t-nvptx | 14 +-
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/gen-multilib-matches.sh b/gcc/config/nvptx/gen-multilib-matches.sh
index 44c758c3b1bf..a39baee5cd24 100755
--- a/gcc/config/nvptx/gen-multilib-matches.sh
+++ b/gcc/config/nvptx/gen-multilib-matches.sh
@@ -33,6 +33,8 @@ sms=$(grep ^NVPTX_SM $nvptx_sm_def | sed 's/.*(//;s/,.*//')
 # ('misa=sm_SM'; thus not remapped), or has to be remapped to the "next lower"
 # variant that does get built.
 
+multilib_matches=
+
 # The "lowest" variant has to be built.
 sm_next_lower=INVALID
 
@@ -50,11 +52,14 @@ for sm in $sms; do
 else
 	# Output format as required for 'MULTILIB_MATCHES'.
 	if [ x"$sm_map" = x. ]; then
-	echo ".=misa?sm_$sm"
+	multilib_matches_sm=".=misa?sm_$sm"
 	else
-	echo "misa?sm_$sm_map=misa?sm_$sm"
+	multilib_matches_sm="misa?sm_$sm_map=misa?sm_$sm"
 	fi
+	multilib_matches="$multilib_matches $multilib_matches_sm"
 
 	sm_next_lower=$sm_map
 fi
 done
+
+echo "multilib_matches := $multilib_matches"
diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index 9c5cbda00707..6c6a6329f0f8 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -43,12 +43,24 @@ MULTILIB_OPTIONS += mgomp
 multilib_options_isa_list := $(TM_MULTILIB_CONFIG)
 multilib_options_isa_default := $(word 1,$(multilib_options_isa_list))
 multilib_options_misa_list := $(addprefix misa=,$(multilib_options_isa_list))
+
+t-nvptx-gen-multilib-matches: $(srcdir)/config/nvptx/gen-multilib-matches.sh \
+  $(srcdir)/config/nvptx/t-nvptx \
+  Makefile \
+  $(srcdir)/config/nvptx/nvptx-sm.def
+	$(SHELL) $< \
+	  $(dir $<) \
+	  $(multilib_options_isa_default) \
+	  '$(multilib_options_isa_list)' \
+	  > $@
+
+include t-nvptx-gen-multilib-matches
+
 # Add the requested '-misa' variants as a multilib option ('misa=VAR1/misa=VAR2/misa=VAR3' etc.):
 empty :=
 space := $(empty) $(empty)
 MULTILIB_OPTIONS += $(subst $(space),/,$(multilib_options_misa_list))
 # ..., and remap '-misa' variants as appropriate:
-multilib_matches := $(shell $(srcdir)/config/nvptx/gen-multilib-matches.sh $(srcdir)/config/nvptx $(multilib_options_isa_default) "$(multilib_options_isa_list)")
 MULTILIB_MATCHES += $(multilib_matches)
 # ..., and don't actually build what's the default '-misa':
 MULTILIB_EXCEPTIONS += *misa=$(multilib_options_isa_default)*
-- 
2.34.1



Re: 'gcc/config/nvptx/gen-multilib-matches.sh': Support '--selftest'

2024-12-06 Thread Sam James
Sam James  writes:

> Hi!
>
> The script has #!/bin/sh shebang (and hence must have POSIX shell
> compatibility), but the patch introduces uses of the 'local' keyword
> which isn't in POSIX.
>
> While many shells do have the 'local' keyword, its behaviour isn't
> portable across those either, which is why it's likely it'll never
> be added to POSIX :(

BTW, shellcheck catches this, but unfortunately, checkbashisms does not.

>
> thanks,
> sam


Re: [patch,avr] Disable CRC lookup tables

2024-12-06 Thread Georg-Johann Lay

Am 06.12.24 um 13:23 schrieb Sam James:

Georg-Johann Lay  writes:


This patch disables CRC lookup tables which consume quite some RAM.


Given that -foptimize-crc is new, it may be useful to CC the pass
authors in case they have input.


CCing Mariam Arutunian


Ok for trunk?

Johann


The problem is not in the new CRC pass, but because AVR is a very
limited hardware.  Because AVR has no linear address space, .rodata
has to be placed in RAM, and most devices have just a few KiB of RAM,
or even less.

An extension PR49857 to put such lookup tables in flash / program memory
has been rejected by the global maintainers as "too specific", ignoring
most of the constraints imposed by the requirement of using named
address-spaces.

The point is that you cannot just put data in flash, one must also use
the correct instructions to access them, which is achieved by means
of avr specific named address-spaces like __flash.

This would require a new target hook like proposed in PR49857, which
could put lookup-tables into a non-generic address-space provided:

*) All respective data is put in the preferred address-space, and

*) All accesses have to use the same address-space as of 1),
independent of what the rest of the code may look like.

To date, only 3 lookup tables generated by GCC meet these criteria:

1) Lookup tables from tree-switch conversion.

2) Lookup tables from the current CRC work

3) vtables.  Though g++ does not, and probably never will, support
named address spaces.

Johann






Re: [PATCH] Fix incorrect line numbers in large files bug#108900

2024-12-06 Thread Lewis Hyatt
On Fri, Dec 6, 2024 at 7:27 AM Sam James  wrote:
>
> Jeremy Bettis  writes:
>
> > Patch to fix known bug from
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108900
> >
> > diff -ur gcc-clean/gcc-14.2.0/libcpp/files.cc gcc-14.2.0/libcpp/files.cc
> > --- gcc-clean/gcc-14.2.0/libcpp/files.cc 2024-08-01 08:17:17.0 +
> > +++ gcc-14.2.0/libcpp/files.cc 2024-10-18 18:42:42.293245597 +
>
> Please ideally use git-send-email and see
> https://gcc.gnu.org/contribute.html#patches wrt ChangeLog format and so on.
>
> > @@ -1005,6 +1005,11 @@
> >  && type < IT_DIRECTIVE_HWM
> >  && (pfile->line_table->highest_location
> >   != LINE_MAP_MAX_LOCATION - 1));
> > +  if (decrement && LINEMAPS_ORDINARY_USED (pfile->line_table)) {
> > +const line_map_ordinary *map = LINEMAPS_LAST_ORDINARY_MAP
> > (pfile->line_table);
> > +if (map && map->start_location == pfile->line_table->highest_location)
> > +  decrement = false;
> > +  }
> >if (decrement)
> >  pfile->line_table->highest_location--;
>
> Note that I suspect this may be fixed by the 64-bit location_t work that
> is ongoing for trunk but it may still be desirable for 14 anyway.

The 64-bit location_t will be merged this weekend BTW. I think that
Jeremy's patch is still good to have on top of that for GCC 15,
though, since technically we will still have the same issue in theory,
just it will apply only to impractically large sources. There is a
plugin-based test in the testsuite that can be used to still test this
change with 64-bit location_t too. (location_overflow_plugin.cc).

-Lewis


[PATCH] RISC-V: optimization on checking certain bits set ((x & mask) == val)

2024-12-06 Thread Oliver Kozul
The patch optimizes code generation for comparisons of the form
X & C1 == C2 by converting them to (X | ~C1) == (C2 | ~C1).
C1 is a constant that requires li and addi to be loaded,
while ~C1 requires a single lui instruction.

2024-12-06  Oliver Kozul  

  PR target/114087

gcc/ChangeLog:

  * config/riscv/riscv.md (*lui_constraint_and_to_or): New 
pattern.

gcc/testsuite/ChangeLog:

  * gcc.target/riscv/pr114087-1.c: New test.



CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/config/riscv/riscv.md   | 21 +
 gcc/testsuite/gcc.target/riscv/pr114087-1.c | 10 ++
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr114087-1.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 3a4cd1d93a0..add31bbf51c 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -858,6 +858,27 @@
   [(set_attr "type" "arith")
(set_attr "mode" "SI")])
 
+(define_insn_and_split "*lui_constraint_and_to_or"
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+(plus:ANYI (and:ANYI (match_operand:ANYI 1 "register_operand" "r")
+(match_operand 2 "const_int_operand"))
+(match_operand 3 "const_int_operand")))
+(clobber (match_scratch:X 4 "=&r"))]
+  "LUI_OPERAND (INTVAL (operands[2]) + 1)
+  && (INTVAL (operands[2]) & (-INTVAL (operands[3])))
+  == (-INTVAL (operands[3]))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4) (match_dup 5))
+   (set (match_dup 0) (ior:X (match_dup 1) (match_dup 4)))
+   (set (match_dup 4) (match_dup 6))
+   (set (match_dup 0) (minus:X (match_dup 0) (match_dup 4)))]
+  {
+operands[5] = GEN_INT (~INTVAL (operands[2]));
+operands[6] = GEN_INT ((~INTVAL (operands[2])) | (-INTVAL (operands[3])));
+  }
+  [(set_attr "type" "arith")])
+
 ;;
 ;;  
 ;;
diff --git a/gcc/testsuite/gcc.target/riscv/pr114087-1.c 
b/gcc/testsuite/gcc.target/riscv/pr114087-1.c
new file mode 100644
index 000..5e40b5f7b5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114087-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
+/* { dg-options "-march=rv64gc -mabi=lp64d" } */
+
+int pred1a(int x) {
+  return ((x & 0x5FFF) == 0x14501DEF);
+}
+
+/* { dg-final { scan-assembler  {or\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+} } } 
*/
-- 
2.43.0


Re: [patch,avr] Disable CRC lookup tables

2024-12-06 Thread Jeff Law




On 12/6/24 5:23 AM, Sam James wrote:

Georg-Johann Lay  writes:


This patch disables CRC lookup tables which consume quite some RAM.


Given that -foptimize-crc is new, it may be useful to CC the pass
authors in case they have input.
I think this is trivially OK for the AVR.  The bigger question is should 
we do something more general for -Os.


CRC generation through table lookups is going to take more data space. 
You need a 256 byte table for each unique CRC (sizes & polynomial), and 
the code to compute the index into the table can be (from a code size 
standpoint) relatively expensive as well, particularly on the 
micro-controllers if the crc is to be computed in a mode wider than a 
word on the target.


So I would actually even support a more general "don't optimize CRCs by 
default for -Os".



Jeff


[PATCH 1/2] Refactor final_value_replacement_loop [PR90594]

2024-12-06 Thread Feng Xue OS
This patch refactors the procedure in tree-scalar-evolution.cc in order to 
partially export its functionality to other module, so decomposes it to several 
relatively independent utility functions.

Thanks,
Feng
---
gcc/
PR tree-optimization/90594
* tree-scalar-evolution.cc (simple_scev_final_value): New function.
(apply_scev_final_value_replacement): Likewise.
(final_value_replacement_loop): Call new functions.
---
 gcc/tree-scalar-evolution.cc | 288 ---
 1 file changed, 165 insertions(+), 123 deletions(-)

diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index abb2bad7773..5733632aa78 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -3775,13 +3775,17 @@ analyze_and_compute_bitop_with_inv_effect (class loop* 
loop, tree phidef,
   return fold_build2 (code1, type, inv, match_op[0]);
 }
 
-/* Do final value replacement for LOOP, return true if we did anything.  */
+/* For induction VALUE of LOOP, return true if its SCEV is simple enough that
+   its final value at loop exit could be directly calculated based on the
+   initial value and loop niter, and this value is recorded in FINAL_VALUE,
+   also set REWRITE_OVERFLOW to true in the case that we need to rewrite the
+   final value to avoid overflow UB when replacement would really happen
+   later.  */
 
-bool
-final_value_replacement_loop (class loop *loop)
+static bool
+simple_scev_final_value (class loop *loop, tree value, tree *final_value,
+bool *rewrite_overflow)
 {
-  /* If we do not know exact number of iterations of the loop, we cannot
- replace the final value.  */
   edge exit = single_exit (loop);
   if (!exit)
 return false;
@@ -3790,100 +3794,170 @@ final_value_replacement_loop (class loop *loop)
   if (niter == chrec_dont_know)
 return false;
 
-  /* Ensure that it is possible to insert new statements somewhere.  */
-  if (!single_pred_p (exit->dest))
-split_loop_exit_edge (exit);
-
-  /* Set stmt insertion pointer.  All stmts are inserted before this point.  */
+  /* TODO: allow float value for fast math.  */
+  if (!POINTER_TYPE_P (TREE_TYPE (value))
+   && !INTEGRAL_TYPE_P (TREE_TYPE (value)))
+return false;
 
   class loop *ex_loop
-= superloop_at_depth (loop,
- loop_depth (exit->dest->loop_father) + 1);
+= superloop_at_depth (loop, loop_depth (exit->dest->loop_father) + 1);
 
-  bool any = false;
-  gphi_iterator psi;
-  for (psi = gsi_start_phis (exit->dest); !gsi_end_p (psi); )
+  bool folded_casts;
+  tree def = analyze_scalar_evolution_in_loop (ex_loop, loop, value,
+  &folded_casts);
+  tree bitinv_def, bit_def;
+  unsigned HOST_WIDE_INT niter_num;
+
+  if (def != chrec_dont_know)
+def = compute_overall_effect_of_inner_loop (ex_loop, def);
+
+  /* Handle bitop with invariant induction expression.
+
+ .i.e
+ for (int i =0 ;i < 32; i++)
+   tmp &= bit2;
+ if bit2 is an invariant in loop which could simple to tmp &= bit2.  */
+  else if ((bitinv_def
+   = analyze_and_compute_bitop_with_inv_effect (loop,
+value, niter)))
+def = bitinv_def;
+
+  /* Handle bitwise induction expression.
+
+ .i.e.
+ for (int i = 0; i != 64; i+=3)
+   res &= ~(1UL << i);
+
+ RES can't be analyzed out by SCEV because it is not polynomially
+ expressible, but in fact final value of RES can be replaced by
+ RES & CONSTANT where CONSTANT all ones with bit {0,3,6,9,... ,63}
+ being cleared, similar for BIT_IOR_EXPR/BIT_XOR_EXPR.  */
+  else if (tree_fits_uhwi_p (niter)
+  && (niter_num = tree_to_uhwi (niter)) != 0
+  && niter_num < TYPE_PRECISION (TREE_TYPE (value))
+  && (bit_def
+  = analyze_and_compute_bitwise_induction_effect (loop, value,
+  niter_num)))
+def = bit_def;
+
+  bool cond_overflow_p;
+  if (!tree_does_not_contain_chrecs (def)
+  || chrec_contains_symbols_defined_in_loop (def, ex_loop->num)
+  /* Moving the computation from the loop may prolong life range
+of some ssa names, which may cause problems if they appear
+on abnormal edges.  */
+  || contains_abnormal_ssa_name_p (def)
+  /* Do not emit expensive expressions.  The rationale is that
+when someone writes a code like
+
+while (n > 45) n -= 45;
+
+he probably knows that n is not large, and does not want it
+to be turned into n %= 45.  */
+  || expression_expensive_p (def, &cond_overflow_p))
+return false;
+
+  *final_value = def;
+
+  if ((folded_casts
+   && ANY_INTEGRAL_TYPE_P (TREE_TYPE (def))
+   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (def)))
+  || cond_overflow_p)
+*rewrite_overflow = true;
+  else
+*rewrite_overflow = false;
+

Re: GCN: Fix 'real_from_integer' usage (was: [committed, amdgcn] Zero-initialise masked load destinations)

2024-12-06 Thread Thomas Schwinge
Hi Andrew!

On 2024-12-05T15:14:45+0100, I wrote:
> On 2020-01-31T11:20:14+, Andrew Stubbs  wrote:
>> This is one of those things I don't know why we didn't notice sooner. 
>
> ..., and here's another thing I don't know why we didn't notice sooner.
> ;-P (Category: "don't we all love C++?!")
>
>> [...]
>> I also needed a convenient way to create 0.0 vector constants without 
>> uglifying the machine description code, so extending gcn_vec_constant 
>> seemed like a useful place to do it.
>
>> --- a/gcc/config/gcn/gcn.c
>> +++ b/gcc/config/gcn/gcn.c
>> @@ -992,9 +992,19 @@ gcn_vec_constant (machine_mode mode, int a)
>>  return CONST2_RTX (mode);*/
>>  
>>int units = GET_MODE_NUNITS (mode);
>> -  rtx tem = gen_int_mode (a, GET_MODE_INNER (mode));
>> -  rtvec v = rtvec_alloc (units);
>> +  machine_mode innermode = GET_MODE_INNER (mode);
>> +
>> +  rtx tem;
>> +  if (FLOAT_MODE_P (innermode))
>> +{
>> +  REAL_VALUE_TYPE rv;
>> +  real_from_integer (&rv, NULL, a, SIGNED);
>> +  tem = const_double_from_real_value (rv, innermode);
>> +}
>> +  else
>> +tem = gen_int_mode (a, innermode);
>>  
>> +  rtvec v = rtvec_alloc (units);
>>for (int i = 0; i < units; ++i)
>>  RTVEC_ELT (v, i) = tem;
>
> That's apparently not the proper way to use 'real_from_integer'.  Its
> second argument is a 'format_helper', which is a class defined in
> 'gcc/real.h', which has a templated constructor that is meant to receive
> a mode, so instead of 'NULL', this should pass in 'VOIDmode' (correct?).
> Anyway: until recently, this appeared to work (fine?) -- but broke with
> Andrew Pinski's recent commit b3f1b9e2aa079f8ec73e3cb48143a16645c49566
> "build: Remove INCLUDE_MEMORY [PR117737]":
>
> [...]
> In file included from ../../source-gcc/gcc/coretypes.h:507:0,
>  from ../../source-gcc/gcc/config/gcn/gcn.cc:24:
> ../../source-gcc/gcc/real.h: In instantiation of 
> ‘format_helper::format_helper(const T&) [with T = std::nullptr_t]’:
> ../../source-gcc/gcc/config/gcn/gcn.cc:1178:46:   required from here
> ../../source-gcc/gcc/real.h:233:17: error: no match for ‘operator==’ 
> (operand types are ‘std::nullptr_t’ and ‘machine_mode’)
>: m_format (m == VOIDmode ? 0 : REAL_MODE_FORMAT (m))
>  ^
> [...]
>
> Andrew P.'s commit doesn't touch 'gcc/config/gcn/gcn.cc'; the only part
> relevant here -- per my understanding -- should be:
>
> --- gcc/system.h
> +++ gcc/system.h
> @@ -224,0 +225 @@ extern int fprintf_unlocked (FILE *, const char *, ...);
> +# include 
> @@ -761,7 +761,0 @@ private:
> -/* Some of the headers included by  can use "abort" within a
> -   namespace, e.g. "_VSTD::abort();", which fails after we use the
> -   preprocessor to redefine "abort" as "fancy_abort" below.  */
> -
> -#ifdef INCLUDE_MEMORY
> -# include 
> -#endif
>
> In other words, (unconditional) '#include ' appears to preclude
> ability to convert 'NULL' into a mode?  (Or, I'm off-track, of course...)
>
> Either way: OK to push the attached "GCN: Fix 'real_from_integer' usage"
> after testing completes?

No issues found in testing.


Grüße
 Thomas


>From dfc2a5398979738ae25eb0258bbf64b82df621a5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 5 Dec 2024 14:28:26 +0100
Subject: [PATCH] GCN: Fix 'real_from_integer' usage
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The recent commit b3f1b9e2aa079f8ec73e3cb48143a16645c49566
"build: Remove INCLUDE_MEMORY [PR117737]" exposed an issue in code added in
2020 GCN back end commit 95607c12363712c39345e1d97f2c1aee8025e188
"Zero-initialise masked load destinations"; compilation now fails:

[...]
In file included from ../../source-gcc/gcc/coretypes.h:507:0,
 from ../../source-gcc/gcc/config/gcn/gcn.cc:24:
../../source-gcc/gcc/real.h: In instantiation of ‘format_helper::format_helper(const T&) [with T = std::nullptr_t]’:
../../source-gcc/gcc/config/gcn/gcn.cc:1178:46:   required from here
../../source-gcc/gcc/real.h:233:17: error: no match for ‘operator==’ (operand types are ‘std::nullptr_t’ and ‘machine_mode’)
   : m_format (m == VOIDmode ? 0 : REAL_MODE_FORMAT (m))
 ^
[...]

That's with 'g++ (GCC) 5.5.0', and seen similarly with
'g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0', for example.

	gcc/
	* config/gcn/gcn.cc (gcn_vec_constant): Fix 'real_from_integer'
	usage.
---
 gcc/config/gcn/gcn.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index d078392eeaf1..b60835d8df48 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -1175,7 +1175,7 @@ gcn_vec_constant (machine_mode mode, int a)
   if (FLOAT_MODE_P (innermode))
 {
   REAL_VALUE_TYPE rv;
-  real_from_integer (&rv, NULL, a, SIGNED);
+  real_from_integer (&rv, VOIDmode, a, SIGNED);
   tem = const_doub

Re: [patch,avr] Disable CRC lookup tables

2024-12-06 Thread Sam James
Georg-Johann Lay  writes:

> Am 06.12.24 um 13:23 schrieb Sam James:
>> Georg-Johann Lay  writes:
>> 
>>> This patch disables CRC lookup tables which consume quite some RAM.
>> Given that -foptimize-crc is new, it may be useful to CC the pass
>> authors in case they have input.
>
> CCing Mariam Arutunian
>
>>> Ok for trunk?
>>>
>>> Johann
>
> The problem is not in the new CRC pass, but because AVR is a very
> limited hardware.  Because AVR has no linear address space, .rodata
> has to be placed in RAM, and most devices have just a few KiB of RAM,
> or even less.

(To be clear, I do accept that -- it's more that it's: a) useful as a
heads-up to people; b) might affect their future testing; c) might mean
we end up talking about whether the patch has issues on other targets).

>
> An extension PR49857 to put such lookup tables in flash / program memory
> has been rejected by the global maintainers as "too specific", ignoring
> most of the constraints imposed by the requirement of using named
> address-spaces.
>
> The point is that you cannot just put data in flash, one must also use
> the correct instructions to access them, which is achieved by means
> of avr specific named address-spaces like __flash.
>
> This would require a new target hook like proposed in PR49857, which
> could put lookup-tables into a non-generic address-space provided:
>
> *) All respective data is put in the preferred address-space, and
>
> *) All accesses have to use the same address-space as of 1),
> independent of what the rest of the code may look like.
>
> To date, only 3 lookup tables generated by GCC meet these criteria:
>
> 1) Lookup tables from tree-switch conversion.
>
> 2) Lookup tables from the current CRC work
>
> 3) vtables.  Though g++ does not, and probably never will, support
> named address spaces.

Thanks for the extra context!

>
> Johann

sam


Re: [PATCH 2/2] Integrate scev-cprop into DCE [PR90594]

2024-12-06 Thread Feng Xue OS
Forgotten attaching the patch file.


From: Feng Xue OS 
Sent: Friday, December 6, 2024 9:57 PM
To: gcc-patches@gcc.gnu.org; Richard Biener
Subject: [PATCH 2/2] Integrate scev-cprop into DCE [PR90594]

Currently, if could, scev-cprop unconditionally replaces loop closed ssa with
an expression built from loop initial value and loop niter, which might cause
redundant code-gen when all interior computations related to IV inside loop
are also neccessary. As example, for the below case:

p = init_addr;

for (i = 0; i < N; i++)
  {
p++;
*p = ...;
  }

. = p;

Then scev-cprop would end up with code:

p = init_addr;

for (i = 0; i < N; i++)
  {
p++;
*p = ...;
  }

. = init_addr + N; // Redundant computation

For bitmask-manipulation loop, it may result in more and costy re-evaluation,
such as popcount. To target the issue, we need a means as statement necessity
propagation used in DCE, to figure out if impacted IVs are really needed. As
pointed out by Richard, we could wire scev-cprop into DCE, here this patch
makes the thing.

But one difference is that we consider retaining scev-cprop pass, and extends
its opt flag to support both this new (-ftree-scev-cprop[=1] by default) and
the original (-ftree-scev-cprop=2). In reality, I think the new way could
get us more compact and faster code at most occasions, however, it is possible
the original handling might be better, because replacement could impact folding
of statements following loop, for example,

p = init_addr;

for (i = 0; i < N; i++)
  {
p++;
*p = ...;
  }

p1 = p;

...

a = p1 - init_addr;  // a = (init_addr + N) - init_addr = N
b = p1 - N;  // b = (init_addr + N) - N = init_addr

It is hard to take this into cost-model consideration, in that global-wide
check on folding opportunities might be time-consuming, and the above case is
not that common. Therefore, as a backup, we leave the original means still
there, so that give user an ability to enable it when some case matches with
the scenario.

Thanks,
Feng
---
gcc/
PR tree-optimization/90594
* common.opt (ftree-scev-cprop=): New option.
(ftree-scev-cprop): Change it to be alias of ftree-scev-cprop=.
* tree-scalar-evolution.cc (simple_scev_final_value): Make it be
global function.
(apply_scev_final_value_replacement): Likewise.
* tree-scalar-evolution.h (scev_const_prop): Remove declaration.
(simple_scev_final_value): Add new declaration.
(apply_scev_final_value_replacement): Likewise.
* tree-ssa-dce.cc (stmt_stats): Add new field sccp_replaced_phis.
(scev_cprop_entry): New struct.
(scev_cprop_level): New static variable.
(scev_cprop_map): Likewise.
(mark_expr_operand_necessary): New function.
(get_loop_closed_phi_scev_replacement): Likewise.
(propagate_necessity): Change neccssity propagation for loop closed
phi when scev-cpropr is enabled.
(fold_scev_cprop_entry): New function.
(remove_dead_phis): Rename to replace_or_remove_phis. And do scev
final value replacement for loop closed phi.
(eliminate_unnecessary_stmts): Changed to call replace_or_remove_phis.
(print_stats): Print stats for replaced phi.
(tree_dce_init): Initialize scev_cprop_map.
(tree_dce_done): Delete scev_cprop_map.
(perform_tree_ssa_dce): Make it be global function. Add scev-cprop
specific handling.
* tree-ssa-dce.h (perform_tree_ssa_dce): Add new declaration.
* tree-ssa-loop.cc (pass_scev_cprop::execute): Changed to call
perform_tree_ssa_dce.
---
 gcc/common.opt   |   9 +-
 gcc/tree-scalar-evolution.cc |   6 +-
 gcc/tree-scalar-evolution.h  |   3 +-
 gcc/tree-ssa-dce.cc  | 251 ---
 gcc/tree-ssa-dce.h   |   1 +
 gcc/tree-ssa-loop.cc |   8 +-
 6 files changed, 249 insertions(+), 29 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index a42537c5f1e..98210ed72fd 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3425,8 +3425,15 @@ ftree-vect-loop-version
 Common Ignore
 Does nothing. Preserved for backward compatibility.

+; If this option is 1, only perform scev cprop when all statements to evaluate
+; related IV inside loop could be eliminated, if it is 2, perform scev cprop
+; unconditionally.
+ftree-scev-cprop=
+Common Joined RejectNegative UInteger Var(flag_tree_scev_cprop) Init(1) 
Optimization IntegerRange(0, 2)
+Enable copy propagation of scalar-evolution information.
+
 ftree-scev-cprop
-Common Var(flag_tree_scev_cprop) Init(1) Optimization
+Common Alias(ftree-scev-cprop=,1,0)
 Enable copy propagation of scalar-evolution information.

 ftrivial-auto-var-init=
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 5733632aa78..9e51b18b23

Re: [patch,avr] Disable CRC lookup tables

2024-12-06 Thread Richard Biener
On Fri, Dec 6, 2024 at 2:17 PM Georg-Johann Lay  wrote:
>
> Am 06.12.24 um 13:23 schrieb Sam James:
> > Georg-Johann Lay  writes:
> >
> >> This patch disables CRC lookup tables which consume quite some RAM.
> >
> > Given that -foptimize-crc is new, it may be useful to CC the pass
> > authors in case they have input.
>
> CCing Mariam Arutunian
>
> >> Ok for trunk?
> >>
> >> Johann
>
> The problem is not in the new CRC pass, but because AVR is a very
> limited hardware.  Because AVR has no linear address space, .rodata
> has to be placed in RAM, and most devices have just a few KiB of RAM,
> or even less.
>
> An extension PR49857 to put such lookup tables in flash / program memory
> has been rejected by the global maintainers as "too specific", ignoring
> most of the constraints imposed by the requirement of using named
> address-spaces.
>
> The point is that you cannot just put data in flash, one must also use
> the correct instructions to access them, which is achieved by means
> of avr specific named address-spaces like __flash.
>
> This would require a new target hook like proposed in PR49857, which
> could put lookup-tables into a non-generic address-space provided:
>
> *) All respective data is put in the preferred address-space, and
>
> *) All accesses have to use the same address-space as of 1),
> independent of what the rest of the code may look like.
>
> To date, only 3 lookup tables generated by GCC meet these criteria:
>
> 1) Lookup tables from tree-switch conversion.
>
> 2) Lookup tables from the current CRC work
>
> 3) vtables.  Though g++ does not, and probably never will, support
> named address spaces.

4) const global data

I think generally this might be a sound optimization - but as you said
it requires generating correct accesses in the first place which also
means once anything takes the address of the data all pointer uses
have to know beforehand.  Thus application is likely quite limited for
4) at least.

Maybe it's possible to perform half of the task by the linker via
relaxation?   Though I can easily guess it's somewhat difficult for AVR.

Richard.

> Johann
>
>
>
>


[PATCH 2/2] Integrate scev-cprop into DCE [PR90594]

2024-12-06 Thread Feng Xue OS
Currently, if could, scev-cprop unconditionally replaces loop closed ssa with
an expression built from loop initial value and loop niter, which might cause
redundant code-gen when all interior computations related to IV inside loop
are also neccessary. As example, for the below case:

p = init_addr;

for (i = 0; i < N; i++)
  {
p++;
*p = ...;
  }

. = p;

Then scev-cprop would end up with code:

p = init_addr;

for (i = 0; i < N; i++)
  {
p++;
*p = ...;
  }

. = init_addr + N; // Redundant computation

For bitmask-manipulation loop, it may result in more and costy re-evaluation,
such as popcount. To target the issue, we need a means as statement necessity
propagation used in DCE, to figure out if impacted IVs are really needed. As
pointed out by Richard, we could wire scev-cprop into DCE, here this patch
makes the thing.

But one difference is that we consider retaining scev-cprop pass, and extends
its opt flag to support both this new (-ftree-scev-cprop[=1] by default) and
the original (-ftree-scev-cprop=2). In reality, I think the new way could
get us more compact and faster code at most occasions, however, it is possible
the original handling might be better, because replacement could impact folding
of statements following loop, for example,

p = init_addr;

for (i = 0; i < N; i++)
  {
p++;
*p = ...;
  }

p1 = p;

...

a = p1 - init_addr;  // a = (init_addr + N) - init_addr = N
b = p1 - N;  // b = (init_addr + N) - N = init_addr

It is hard to take this into cost-model consideration, in that global-wide
check on folding opportunities might be time-consuming, and the above case is
not that common. Therefore, as a backup, we leave the original means still
there, so that give user an ability to enable it when some case matches with
the scenario.

Thanks,
Feng
---
gcc/
PR tree-optimization/90594
* common.opt (ftree-scev-cprop=): New option.
(ftree-scev-cprop): Change it to be alias of ftree-scev-cprop=.
* tree-scalar-evolution.cc (simple_scev_final_value): Make it be
global function.
(apply_scev_final_value_replacement): Likewise.
* tree-scalar-evolution.h (scev_const_prop): Remove declaration.
(simple_scev_final_value): Add new declaration.
(apply_scev_final_value_replacement): Likewise.
* tree-ssa-dce.cc (stmt_stats): Add new field sccp_replaced_phis.
(scev_cprop_entry): New struct.
(scev_cprop_level): New static variable.
(scev_cprop_map): Likewise.
(mark_expr_operand_necessary): New function.
(get_loop_closed_phi_scev_replacement): Likewise.
(propagate_necessity): Change neccssity propagation for loop closed
phi when scev-cpropr is enabled.
(fold_scev_cprop_entry): New function.
(remove_dead_phis): Rename to replace_or_remove_phis. And do scev
final value replacement for loop closed phi.
(eliminate_unnecessary_stmts): Changed to call replace_or_remove_phis.
(print_stats): Print stats for replaced phi.
(tree_dce_init): Initialize scev_cprop_map.
(tree_dce_done): Delete scev_cprop_map.
(perform_tree_ssa_dce): Make it be global function. Add scev-cprop
specific handling.
* tree-ssa-dce.h (perform_tree_ssa_dce): Add new declaration.
* tree-ssa-loop.cc (pass_scev_cprop::execute): Changed to call
perform_tree_ssa_dce.
---
 gcc/common.opt   |   9 +-
 gcc/tree-scalar-evolution.cc |   6 +-
 gcc/tree-scalar-evolution.h  |   3 +-
 gcc/tree-ssa-dce.cc  | 251 ---
 gcc/tree-ssa-dce.h   |   1 +
 gcc/tree-ssa-loop.cc |   8 +-
 6 files changed, 249 insertions(+), 29 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index a42537c5f1e..98210ed72fd 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3425,8 +3425,15 @@ ftree-vect-loop-version
 Common Ignore
 Does nothing. Preserved for backward compatibility.
 
+; If this option is 1, only perform scev cprop when all statements to evaluate
+; related IV inside loop could be eliminated, if it is 2, perform scev cprop
+; unconditionally.
+ftree-scev-cprop=
+Common Joined RejectNegative UInteger Var(flag_tree_scev_cprop) Init(1) 
Optimization IntegerRange(0, 2)
+Enable copy propagation of scalar-evolution information.
+
 ftree-scev-cprop
-Common Var(flag_tree_scev_cprop) Init(1) Optimization
+Common Alias(ftree-scev-cprop=,1,0)
 Enable copy propagation of scalar-evolution information.
 
 ftrivial-auto-var-init=
diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
index 5733632aa78..9e51b18b237 100644
--- a/gcc/tree-scalar-evolution.cc
+++ b/gcc/tree-scalar-evolution.cc
@@ -3381,7 +3381,7 @@ scev_finalize (void)
 }
 
 /* Returns true if the expression EXPR is considered to be too expensive
-   for scev_const_prop.  Sets *COND_OV

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-12-06 Thread Qing Zhao


> On Dec 5, 2024, at 17:31, Martin Uecker  wrote:
> 
> Am Donnerstag, dem 05.12.2024 um 21:09 + schrieb Qing Zhao:
>> 
>>> On Dec 3, 2024, at 10:29, Qing Zhao  wrote:
> 
> 
> 
 
>> 
 
 It would be clearer if you the syntax ".n" which resembles
 the syntax for designated initializers that is already used
 in initializers to refer to struct members.
 
 constexpr int n;
 struct foo {
 {
 char (*p)[n] __attribute__ ((counted_by (.n))
 int n;
 }
 
>>> Yes, I agree.
 
> 
> 
>>> 
>>> There is one important additional requirement:
>>> 
>>> x->n, x->p can ONLY be changed by changing the whole structure at 
>>> the same time. 
>>> Otherwise, x->n might not be consistent with x->p.
>> 
>> By itself, this would still not fix the issue I pointed out.
>> 
>> struct foo x;
>> x = .. ; // set the whole structure
>> char *p = x->p;
>> x = ... ; // set the whole structure
>> 
>> What is the bound for 'p' ?  
> 
> Since p was set to the pointer field of the old structure, then the 
> bound of it should be the old bound.
>> With current rules it would be the old bound.
> 
> I thought that this should be the correct behavior, isn’t it?
 
 Yes, sorry, what I meant was "with the current rules it would be
 the *new* bound”.
>>> 
>>> struct foo x;
>>> x=… ;  // set the whole structure 1
>>> char *p = x->p;
>>> x=… ;  // set the whole structure 2
>>> 
>>> In the above, when “set the whole structure 1”, x1, x1->n and x1->p are 
>>> set at the same time;
>>> After *p = x->p;the pointer “p” is pointing to “x1->p”, it’s bound 
>>> is “x1->n”;
>> 
>> I agree.
>>> 
>>> Then when “set the whole structure 2”, x2 is different than x1,  x2->n 
>>> and x2->p are set at the same time, the pointer
>>> ‘p’ still points to “x1->p”, therefore it’s bound should be “x1->n”. 
>>> 
>>> So, as long as the whole structure is set at the same time, should be 
>>> fine. 
>>> 
>>> Do I miss anything here?
>> 
>> I was talking aout the pointer "p" which was obtained before setting the
>> struct the second time in
>> 
>> char *p = x->p;
>> 
>> This pointer is still set to x1->p but the bound refers to x.n which is 
>> now set to x2->n.
> 
> You mean:
> 
> struct foo x;
> x=… ;  // set the whole structure 1
> char *p = x->p;
> x=… ;  // set the whole structure 2
> p[index] = 10;   // at this point, p’s bound is x2->n, not x1->n? 
> 
> Yes, you are right here. 
> 
> So, is there similar problem with the corresponding language extension? 
> 
 
 The language extension does not exist yet, so there is no problem.
>>> Yeah, I should mention this as “corresponding future language extension” -:)
 
 But I hope we will get it and then specify it so that this works
 correctly without this footgun.
 
 Of course, if GCC gets the "counted_by" attribute wrong, there will
 be arguments later in WG14 why the feature is then different to it.
>>> 
>>> I think that we need to resolve this issue first in the design of 
>>> “counted_by” for pointer fields. 
>>> I guess that we might need to come up with some additional limitations for 
>>> using the “counted_by”
>>> attribute for pointer fields at the source code level in order to avoid 
>>> such potential error.  But not
>>> sure what exactly the additional limitation should be at this moment.
>>> 
>>> Need some study here.
>> 
>> Actually, I found out that this is really not a problem with the current 
>> design, for the following new testing case I added for my current 
>> implementation of the counted_by for pointer field:
>> 
>> [ gcc.dg]$ cat pointer-counted-by-7.c
>> /* Test the attribute counted_by for pointer field and its usage in
>> * __builtin_dynamic_object_size.  */ 
>> /* { dg-do run } */
>> /* { dg-options "-O2" } */
>> 
>> #include "builtin-object-size-common.h"
>> 
>> struct annotated {
>>  int b;
>>  int *c __attribute__ ((counted_by (b)));
>> };
>> 
>> struct annotated *__attribute__((__noinline__)) setup (int attr_count)
>> {
>>  struct annotated *p_array_annotated
>>= (struct annotated *) malloc (sizeof (struct annotated));
>>  p_array_annotated->c = (int *) malloc (sizeof (int) * attr_count);
>>  p_array_annotated->b = attr_count;
>> 
>>  return p_array_annotated;
>> }
>> 
>> 
>> int main(int argc, char *argv[])
>> {
>>  struct annotated *x = setup (10); 
>>  int *p = x->c;
>>  x = setup (20);
>>  EXPECT(__builtin_dynamic_object_size (p, 1), 10 * sizeof (int));
>>  EXPECT(__builtin_dynamic_object_size (x->c, 1), 20 * sizeof (int));
>>  DONE ();
>> }
>> 
>> This test case pa

Re: [patch,avr] Disable CRC lookup tables

2024-12-06 Thread Georg-Johann Lay

Am 06.12.24 um 14:53 schrieb Richard Biener:

On Fri, Dec 6, 2024 at 2:17 PM Georg-Johann Lay  wrote:


Am 06.12.24 um 13:23 schrieb Sam James:

Georg-Johann Lay  writes:


This patch disables CRC lookup tables which consume quite some RAM.


Given that -foptimize-crc is new, it may be useful to CC the pass
authors in case they have input.


CCing Mariam Arutunian


Ok for trunk?

Johann


The problem is not in the new CRC pass, but because AVR is a very
limited hardware.  Because AVR has no linear address space, .rodata
has to be placed in RAM, and most devices have just a few KiB of RAM,
or even less.

An extension PR49857 to put such lookup tables in flash / program memory
has been rejected by the global maintainers as "too specific", ignoring
most of the constraints imposed by the requirement of using named
address-spaces.

The point is that you cannot just put data in flash, one must also use
the correct instructions to access them, which is achieved by means
of avr specific named address-spaces like __flash.

This would require a new target hook like proposed in PR49857, which
could put lookup-tables into a non-generic address-space provided:

*) All respective data is put in the preferred address-space, and

*) All accesses have to use the same address-space as of 1),
independent of what the rest of the code may look like.

To date, only 3 lookup tables generated by GCC meet these criteria:

1) Lookup tables from tree-switch conversion.

2) Lookup tables from the current CRC work

3) vtables.  Though g++ does not, and probably never will, support
named address spaces.


4) const global data


No. That won't work. Suppose:

foo.c:

const int ii = 1;
int inc (const int*);

int func (void)
{
return inc (&ii);
}

bar.c:

int inc (const int *p)
{
return 1 + *p;
}

This works when ii is in generic address-space.  But when you
put ii in a different AS (like e.g. __flash), then inc() will
read from the wrong AS.  You'd need a version of inc() that is

int inc_flash (const __flash int *p)
{
return 1 + *p;
}

TL;DR You don't have control over *all* accesses, but in foo.c
the pointer to ii may escape.

Notice that in, say, tree-switch-conversion this is different
because that pass knows *all* accesses to the table, and it
could attach an AS from a target hook to all accesses.

Once that pass has finished, there's no more way to retroactively
optimize this, because addresses may escape to other modules
or to inline asm, or there may be copies or the tree var.
You's have to find all tree vars ssa_names.


I think generally this might be a sound optimization - but as you said
it requires generating correct accesses in the first place which also
means once anything takes the address of the data all pointer uses
have to know beforehand.  Thus application is likely quite limited for
4) at least.

Maybe it's possible to perform half of the task by the linker via
relaxation?   Though I can easily guess it's somewhat difficult for AVR.

Richard.


No.  I don't see what the linker has to do with it.  Different ASes
even have different instructions to access them and support different
addressing modes and need different address registers.

Addressing modes are generated by the compiler and
TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P.  The linker cannot fix that.

Johann


Ping [PATCH v3 0/3] Match: support additional cases of unsigned scalar arithmetic

2024-12-06 Thread Akram Ahmad

Ping

On 27/11/2024 20:27, Akram Ahmad wrote:

Hi all,

This patch series adds support for 2 new cases of unsigned scalar saturating 
arithmetic
(one addition, one subtraction). This results in more valid patterns being 
recognised,
which results in a call to .SAT_ADD or .SAT_SUB where relevant.

v3 of this series now introduces support for dg-require-effective-target for 
both usadd
and ussub optabs as well as individual modes that these optabs may be 
implemented for.
aarch64 support for these optabs is in review, so there are currently no 
targets listed
in these effective-target options.

Regression tests for aarch64 all pass with no failures.

v3 changes:
- add support for new effective-target keywords.
- tests for the two new patterns now use the dg-require-effective-target so 
that they are
   skipped on relevant targets.

v2 changes:
- add new tests for both patterns (these will fail on targets which don't 
implement
   the standard insn names for IFN_SAT_ADD and IFN_SAT_SUB; another patch 
series adds
   support for this in aarch64).
- minor adjustment to the constraints on the match statement for 
usadd_left_part_1.

If this is OK for master, please commit these on my behalf, as I do not have 
the ability
to do so.

Many thanks,

Akram

---

Akram Ahmad (3):
   testsuite: Support dg-require-effective-target for us{add, sub}
   Match: support new case of unsigned scalar SAT_SUB
   Match: make SAT_ADD case 7 commutative

  gcc/match.pd  | 12 +++-
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c   | 22 
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c   | 22 
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c   | 22 
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 22 
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c   | 15 +
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c   | 15 +
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c   | 15 +
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 15 +
  gcc/testsuite/lib/target-supports.exp | 56 +++
  10 files changed, 214 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c



Re: [PATCH 1/3] tree-optimization/117467 - Do not calculate an entry range for invariant names.

2024-12-06 Thread Andrew MacLeod
I apologize, I could have sworn I checked in this patch set... but I see 
I did not.


re-bootstrapped and regression tested...  and committed now!  you can 
now invoke the range_query in your patches to check if something is 
non-zero.


Andrew

On 11/28/24 12:44, Jakub Jelinek wrote:

On Mon, Nov 25, 2024 at 07:55:46PM -0500, Andrew MacLeod wrote:

 From 97bea858ff782dc5c80490bb48cbd3241ad3413c Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 25 Nov 2024 09:50:33 -0500
Subject: [PATCH 1/3] Do not calculate an entry range for invariant names.

If an SSA_NAME is invariant, do not calculate an on_entry value.

PR tree-optimization/117467
* gimple-range-cache.cc (ranger_cache::entry_range): Do not
invoke range_from_dom for invariant ssa-names.

LGTM.

Jakub





Re: [PATCH] expr: Don't clear whole unions [PR116416]

2024-12-06 Thread Marek Polacek
On Mon, Oct 14, 2024 at 03:57:45PM -0400, Jason Merrill wrote:
> OK.

The patch was approved, but even after the r15-5746 + r15-5747 changes,
pr78687.C still FAILs:
.

Perhaps we should XFAIL the test for now then.  No other changes
besides that.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This PR reports a missed optimization.  When we have:

  Str str{"Test"};
  callback(str);

as in the test, we're able to evaluate the Str::Str() call at compile
time.  But when we have:

  callback(Str{"Test"});

we are not.  With this patch (in fact, it's Patrick's patch with a little
tweak), we turn

  callback (TARGET_EXPR >>
(const char *) "Test" )

into

  callback (TARGET_EXPR )

I explored the idea of calling maybe_constant_value for the whole
TARGET_EXPR in cp_fold.  That has three problems:
- we can't always elide a TARGET_EXPR, so we'd have to make sure the
  result is also a TARGET_EXPR;
- the resulting TARGET_EXPR must have the same flags, otherwise Bad
  Things happen;
- getting a new slot is also problematic.  I've seen a test where we
  had "TARGET_EXPR, D.2680", and folding the whole TARGET_EXPR
  would get us "TARGET_EXPR", but since we don't see the outer
  D.2680, we can't replace it with D.2681, and things break.

With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.

FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: 
.*.a = {}"
is easy.  Previously, we would call C::C, so .gimple has:

  D.2590 = {};
  C::C (&D.2590);
  D.2597 = D.2590;
  return D.2597;

Then .einline inlines the C::C call:

  D.2590 = {};
  D.2590.a = {}; // #1
  D.2590.b = 0;  // #2
  D.2597 = D.2590;
  D.2590 ={v} {CLOBBER(eos)};
  return D.2597;

then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
passes.  But with the patch, .gimple won't have that C::C call, so the
IL is of course going to look different.  The .optimized dump looks the
same though so there's no problem.

pr78687.C is XFAILed because the test passes with r15-5746 but not with
r15-5747 as well.

PR c++/116416

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_r) : Try to fold
TARGET_EXPR_INITIAL and replace it with the folded result if
it's TREE_CONSTANT.

gcc/testsuite/ChangeLog:

* g++.dg/analyzer/pr97116.C: Adjust dg-message.
* g++.dg/tree-ssa/pr78687.C: Add XFAIL.
* g++.dg/tree-ssa/pr90883.C: Adjust dg-final.
* g++.dg/cpp0x/constexpr-prvalue1.C: New test.
* g++.dg/cpp1y/constexpr-prvalue1.C: New test.

Co-authored-by: Patrick Palka 
---
 gcc/cp/cp-gimplify.cc | 10 +--
 gcc/testsuite/g++.dg/analyzer/pr97116.C   |  2 +-
 .../g++.dg/cpp0x/constexpr-prvalue1.C | 24 +++
 .../g++.dg/cpp1y/constexpr-prvalue1.C | 30 +++
 gcc/testsuite/g++.dg/tree-ssa/pr78687.C   |  3 +-
 gcc/testsuite/g++.dg/tree-ssa/pr90883.C   |  4 +--
 6 files changed, 67 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-prvalue1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-prvalue1.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index b011badf00f..623e2ee6e96 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1470,13 +1470,19 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void 
*data_)
   if (data->flags & ff_genericize)
cp_genericize_target_expr (stmt_p);
 
-  /* Folding might replace e.g. a COND_EXPR with a TARGET_EXPR; in
-that case, strip it in favor of this one.  */
   if (tree &init = TARGET_EXPR_INITIAL (stmt))
{
  cp_walk_tree (&init, cp_fold_r, data, NULL);
  cp_walk_tree (&TARGET_EXPR_CLEANUP (stmt), cp_fold_r, data, NULL);
  *walk_subtrees = 0;
+ if (!flag_no_inline)
+   {
+ tree folded = maybe_constant_init (init, TARGET_EXPR_SLOT (stmt));
+ if (folded != init && TREE_CONSTANT (folded))
+   init = folded;
+   }
+ /* Folding might replace e.g. a COND_EXPR with a TARGET_EXPR; in
+that case, strip it in favor of this one.  */
  if (TREE_CODE (init) == TARGET_EXPR)
{
  tree sub = TARGET_EXPR_INITIAL (init);
diff --git a/gcc/testsuite/g++.dg/analyzer/pr97116.C 
b/gcc/testsuite/g++.dg/analyzer/pr97116.C
index d8e08a73172..1c404c2ceb2 100644
--- a/gcc/testsuite/g++.dg/analyzer/pr97116.C
+++ b/gcc/testsuite/g++.dg/analyzer/pr97116.C
@@ -16,7 +16,7 @@ struct foo
 void test_1 (void)
 {
   foo *p = new(NULL) foo (42); // { dg-warning "non-null expected" "warning" }
-  // { dg-message "argument 'this' \\(\[^\n\]*\\) NULL where non-null 
expected" "final event" { target *-*-* } .-1 }
+  // { dg-message "argument 'this'( \\(\[^\n\]*\\))? NULL where non-null 
expected" "final event" { target *-*-* } .-1 }
 }
 
 int test_2 (void)
diff --git a/gcc/

Re: [PATCH] AIX Build failure with default -std=gnu23.

2024-12-06 Thread David Edelsohn
On Fri, Dec 6, 2024 at 2:17 PM Ian Lance Taylor  wrote:

> David Edelsohn  writes:
>
> > On Fri, Dec 6, 2024 at 12:25 PM Rainer Orth  >
> > wrote:
> >
> >> Hi David,
> >>
> >> > No objection from me, but Ian is the maintainer of libiberty, so I'll
> >> defer
> >> > to him, especially about style and overall software engineering.
> >> >
> >> > The C23 change presumably will break on Alpha OSF/1 as well.  Does GCC
> >> > still support OSF/1?  It might be preferred to delete the block
> entirely
> >> > instead of #ifndef _AIX.
> >>
> >> GCC 4.7 was the last release to support Tru64 UNIX (ex-OSF/1).  However,
> >> libiberty is also used outside of the toolchain, so that may affect the
> >> decision.
> >>
> >> However, IMO the Tru64 UNIX support can go for good now.
> >>
> >
> > Hi, Rainer
> >
> > Thanks for taking a look and commenting.
> >
> > It seems we both agree that it would be better to remove the entire block
> > defining _NO_PROTO because both of the systems are no longer supported.
> >
> > I'll give Ian the opportunity to comment.
>
> Looks good to me.  Thanks.
>
> Ian
>

Sangamesh,

Can you respin and test a revised patch that removes the conditional
_NO_PROTO definition instead of adding #ifndef _AIX?  I think that is what
Rainer and I would prefer because neither of the OSes is supported and we
don't need a fragile work-around.

Thanks, David


[pushed][PR117248][LRA]: Rewriting reg notes update and fix calculation of conflict hard regs of pseudo.

2024-12-06 Thread Vladimir Makarov

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117248

I tested the patch extensively on x86-64, aarch64, ppc64le as the patch 
contains big changes in live analysis and reg notes update.  I hope it 
will not result in new PRs.


commit 75e7d1600f47859df40b2ac0feff5a71e0dbb040
Author: Vladimir N. Makarov 
Date:   Fri Dec 6 16:16:28 2024 -0500

[PR117248][LRA]: Rewriting reg notes update and fix calculation of conflict hard regs of pseudo.

  LRA updates conflict hard regs of pseudo when some hard reg dies.  A
complicated PA div/mod insns reference for clobbered explicit hard regs and
hard reg as operands.  It prevents some hard reg dying although they
still conflict with pseudos living through.  Although on such insns LRA
updates wrongly reg notes (REG_DEAD, REG_UNUSED) which are used later in
rematerialization subpass.  The patch fixes the problems.

gcc/ChangeLog:

PR rtl-optimization/117248
* lra-lives.cc (start_living, start_dying): Remove.
(insn_regnos, out_insn_regnos, insn_regnos_live_after): New.
(sparseset_contains_pseudos_p): Remove.
(make_hard_regno_live, make_hard_regno_dead): Return true if
something in liveness is changed.
(mark_pseudo_live,  mark_pseudo_dead): Ditto.
(mark_regno_live, mark_regno_dead): Ditto.
(clear_sparseset_regnos, regnos_in_sparseset_p): Use set instead
of dead_set.
(process_bb_lives): Rewrite dealing with reg notes.  Update
conflict hard regs even when clobber hard reg is not marked as
dead.
(lra_create_live_ranges_1): Add initialization/finalization of
insn_regnos, out_insn_regnos, insn_regnos_live_after.

diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index 49134ade713..f1bb5701bc4 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -83,10 +83,10 @@ static sparseset pseudos_live_through_setjumps;
 /* Set of hard regs (except eliminable ones) currently live.  */
 static HARD_REG_SET hard_regs_live;
 
-/* Set of pseudos and hard registers start living/dying in the current
-   insn.  These sets are used to update REG_DEAD and REG_UNUSED notes
-   in the insn.  */
-static sparseset start_living, start_dying;
+/* Set of pseudos and hard registers in the current insn, only out/inout ones,
+   and the current insn pseudos and hard registers living right after the
+   insn.  */
+static sparseset insn_regnos, out_insn_regnos, insn_regnos_live_after;
 
 /* Set of pseudos and hard regs dead and unused in the current
insn.  */
@@ -227,17 +227,6 @@ enum point_type {
   USE_POINT
 };
 
-/* Return TRUE if set A contains a pseudo register, otherwise, return FALSE.  */
-static bool
-sparseset_contains_pseudos_p (sparseset a)
-{
-  int regno;
-  EXECUTE_IF_SET_IN_SPARSESET (a, regno)
-if (!HARD_REGISTER_NUM_P (regno))
-  return true;
-  return false;
-}
-
 /* Mark pseudo REGNO as living or dying at program point POINT, depending on
whether TYPE is a definition or a use.  If this is the first reference to
REGNO that we've encountered, then create a new live range for it.  */
@@ -276,29 +265,29 @@ update_pseudo_point (int regno, int point, enum point_type type)
 /* The corresponding bitmaps of BB currently being processed.  */
 static bitmap bb_killed_pseudos, bb_gen_pseudos;
 
-/* Record hard register REGNO as now being live.  It updates
-   living hard regs and START_LIVING.  */
-static void
+/* Record hard register REGNO as now being live.  Return true if REGNO liveness
+   changes.  */
+static bool
 make_hard_regno_live (int regno)
 {
   lra_assert (HARD_REGISTER_NUM_P (regno));
   if (TEST_HARD_REG_BIT (hard_regs_live, regno)
   || TEST_HARD_REG_BIT (eliminable_regset, regno))
-return;
+return false;
   SET_HARD_REG_BIT (hard_regs_live, regno);
-  sparseset_set_bit (start_living, regno);
   if (fixed_regs[regno] || TEST_HARD_REG_BIT (hard_regs_spilled_into, regno))
 bitmap_set_bit (bb_gen_pseudos, regno);
+  return true;
 }
 
-/* Process the definition of hard register REGNO.  This updates
-   hard_regs_live, START_DYING and conflict hard regs for living
-   pseudos.  */
-static void
+/* Process the definition of hard register REGNO.  This updates hard_regs_live
+   and conflict hard regs for living pseudos.  Return true if REGNO liveness
+   changes.  */
+static bool
 make_hard_regno_dead (int regno)
 {
   if (TEST_HARD_REG_BIT (eliminable_regset, regno))
-return;
+return false;
 
   lra_assert (HARD_REGISTER_NUM_P (regno));
   unsigned int i;
@@ -306,79 +295,89 @@ make_hard_regno_dead (int regno)
 SET_HARD_REG_BIT (lra_reg_info[i].conflict_hard_regs, regno);
 
   if (! TEST_HARD_REG_BIT (hard_regs_live, regno))
-return;
+return false;
   CLEAR_HARD_REG_BIT (hard_regs_live, regno);
-  sparseset_set_bit (start_dying, regno);
   if (fixed_regs[regno] || TEST_HARD_REG_BIT (hard_

Re: [PATCHv2] Invalid gimple __BB# accepted due to usage of atoi -> replace atoi with stroul in c_parser_gimple_parse_bb_spec [PR114541]

2024-12-06 Thread Heiko Eißfeldt

2nd try,

1. replaces atoi() with strtoul() with ERANGE checking (as before)
2. fixes the handling of misparsed 'bb_spec's in c_parser_gimple_if_stmt to 
return early.
3. adds a new test case.

I hope I am wright with the assumption that in c_parser_gimple_if_stmt
(cfun->curr_properties & PROP_cfg) should imply valid bb_spec's after goto.

PR c/114541
* gimple-parser.cc (c_parser_gimple_parse_bb_spec):
Use strtoul with ERANGE check instead of atoi to avoid UB

* gimple-parser.cc (c_parser_gimple_if_stmt):
require valid __BB# basic block indices after goto
in both branches otherwise return with c_parser_error

* gcc.dg/pr114541Andrew.c: New test based on
Andrew's template in the PR.

Signed-off-by: Heiko Eißfeldt 
On 12/5/24 8:45 AM, Richard Biener wrote:


On Thu, Dec 5, 2024 at 1:55 AM Heiko Eißfeldt wrote:

As commented in PR114541 here is a first patch that
1. replaces atoi() with strtoul() with ERANGE checking and
2. fixes the handling of misparsed gimple compounds to return early.
3. adds two new test cases.

There is more work to do for Andrews testcase to succeed, so PR114541
is not done yet.

===

Replace atoi() with strtoul() with ERANGE checking.

The function c_parser_gimple_parse_bb_spec uses atoi,
which can silently return valid numbers even for
some too large numbers in the string.

Furthermore in function c_parser_parse_gimple_body
handle the case of gimple compound statement errors
more generically. In the case of cdil != cdil_gimple
now consider them as errors and return early.
This avoids further processing with erroneous data.

c_parser_gimple_compound_statement returns whether the
compound statement ended with a return statement, not
whether there was an error, so this change looks wrong.

The hunk in c_parser_gimple_parse_bb_spec is OK.

Richard.


2024-12-05 Heiko Eißfeldt

PR c/114541
* gimple-parser.cc (c_parser_gimple_parse_bb_spec):
Use strtoul with ERANGE check instead of atoi

* gimple-parser.cc (c_parser_parse_gimple_body):
separate check for errors in c_parser_gimple_compound_statement
and special handling of cdil == cdil_gimple to allow
a return in case of errors for cdil != cdil_gimple

* gcc.dg/pr114541-else-BB#-and-garbagechar.c: New test.
* gcc.dg/pr114541-then-BB#-and-garbagechar.c: New test.


Signed-off-by: Heiko Eißfeldt






Re: [PATCH] Invalid gimple __BB# accepted due to usage of atoi -> replace atoi with stroul in c_parser_gimple_parse_bb_spec [PR114541]

2024-12-06 Thread Heiko Eißfeldt

and here is the forgotten patch (it is late...)
diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc
index 78e85d93487..b018bb6afb6 100644
--- a/gcc/c/gimple-parser.cc
+++ b/gcc/c/gimple-parser.cc
@@ -133,11 +133,21 @@ c_parser_gimple_parse_bb_spec (tree val, int *index)
 {
   if (!startswith (IDENTIFIER_POINTER (val), "__BB"))
 return false;
-  for (const char *p = IDENTIFIER_POINTER (val) + 4; *p; ++p)
-if (!ISDIGIT (*p))
-  return false;
-  *index = atoi (IDENTIFIER_POINTER (val) + 4);
-  return *index > 0;
+
+  const char *bb = IDENTIFIER_POINTER (val) + 4;
+  if (! ISDIGIT (*bb))
+return false;
+
+  char *pend;
+  errno = 0;
+  const unsigned long number = strtoul (bb, &pend, 10);
+  if (errno == ERANGE
+  || *pend != '\0'
+  || number > INT_MAX)
+return false;
+
+  *index = number;
+  return true;
 }
 
 /* See if VAL is an identifier matching __BB and return 
@@ -2384,11 +2394,20 @@ c_parser_gimple_if_stmt (gimple_parser &parser, 
gimple_seq *seq)
   c_parser_consume_token (parser);
   int dest_index;
   profile_probability prob;
-  if ((cfun->curr_properties & PROP_cfg)
- && c_parser_gimple_parse_bb_spec_edge_probability (label, parser,
-&dest_index, 
&prob))
-   parser.push_edge (parser.current_bb->index, dest_index,
- EDGE_TRUE_VALUE, prob);
+  if (cfun->curr_properties & PROP_cfg)
+   {
+ if (c_parser_gimple_parse_bb_spec_edge_probability (label, parser,
+ &dest_index, 
&prob))
+   {
+ parser.push_edge (parser.current_bb->index, dest_index,
+   EDGE_TRUE_VALUE, prob);
+   }
+ else
+   {
+ c_parser_error (parser, "expected valid __BB#");
+ return;
+   }
+   }
   else
t_label = lookup_label_for_goto (loc, label);
   if (! c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>"))
@@ -2421,11 +2440,20 @@ c_parser_gimple_if_stmt (gimple_parser &parser, 
gimple_seq *seq)
   c_parser_consume_token (parser);
   int dest_index;
   profile_probability prob;
-  if ((cfun->curr_properties & PROP_cfg)
- && c_parser_gimple_parse_bb_spec_edge_probability (label, parser,
-&dest_index, 
&prob))
-   parser.push_edge (parser.current_bb->index, dest_index,
- EDGE_FALSE_VALUE, prob);
+  if (cfun->curr_properties & PROP_cfg)
+   {
+ if (c_parser_gimple_parse_bb_spec_edge_probability (label, parser,
+ &dest_index, 
&prob))
+   {
+ parser.push_edge (parser.current_bb->index, dest_index,
+  EDGE_FALSE_VALUE, prob);
+   }
+ else
+   {
+ c_parser_error (parser, "expected valid __BB#");
+ return;
+   }
+   }
   else
f_label = lookup_label_for_goto (loc, label);
   if (! c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>"))
diff --git a/gcc/testsuite/gcc.dg/pr114541Andrew.c 
b/gcc/testsuite/gcc.dg/pr114541Andrew.c
new file mode 100644
index 000..bc92473a79e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114541Andrew.c
@@ -0,0 +1,28 @@
+/* PR middle-end/114541 */
+/* { dg-do compile } */
+/* { dg-options "-O -fgimple -c" } */
+
+void __GIMPLE (ssa,startwith ("dse2")) foo ()
+{
+  int a;
+
+__BB(2):
+  if (a_5(D) > 4)
+goto __BB4294967299;  /* { dg-error "expected valid __BB# before ';' 
token" } */
+  else
+goto __BB4;
+
+__BB(3):
+  a_2 = 10;
+  goto __BB5;
+
+__BB(4):
+  a_3 = 20;
+  goto __BB5;
+
+__BB(5):
+  a_1 = __PHI (__BB3: a_2, __BB4: a_3);
+  a_4 = a_1 + 4;
+
+return;
+}


Re: [patch,avr] Disable CRC lookup tables

2024-12-06 Thread Oleg Endo


On Fri, 2024-12-06 at 16:51 +0100, Georg-Johann Lay wrote:
> 
> The CRC tables ARE put into .rodata, not into .data.
> 
> The correct question is: Why is avr putting .rodata into RAM?
> 
> Suppose the following C code:
> 
> char read_c (const char *p)
> {
>  return p[1];
> }
> 
> Where p may point to .rodata, .data, .bss etc.
> Now suppose .rodata is located in flash and .data and .bss
> are located in RAM.

Thanks for the detailed explanation!

I feel you.  I've been coding on mcs51 MCUs for a couple of years now (on
SDCC, meh)


> This would imply that there is some means to tell apart
> different address spaces by looking at p.  This is *not* the
> case.  In particular, flash address 0x4 looks exactly the same
> like RAM address 0x4.  Both are 16-bit address 0x0004, and there
> is no way to tell them apart.  
> 

I'm a little surprised that there's no way to get pointer/symbol meta-
information from within the backend code to identify data being accessed
that will end up in __code / __flash / .rodata / .text.

On SDCC/mcs51 things are simple enough so that we can at least try to lookup
a given symbol from the backend code and see if it's got a known address
space assigned to it (via (forward) declaration).  This can be used to
optimize accesses to e.g. SFRs even during later stages of compilation.

Best regards,
Oleg Endo


Re: [committed] RISC-V: Add const to function_shape::get_name [NFC]

2024-12-06 Thread Kito Cheng
Thanks for notifying me that, I just reverted that first and will
investigate next week :)

On Sat, Dec 7, 2024 at 3:30 AM Mark Wielaard  wrote:
>
> Hi Kito,
>
> On Thu, Dec 05, 2024 at 03:12:03PM +0800, Kito Cheng wrote:
> > function_shape::get_name is the funciton for building intrinsic function 
> > name,
> > the result should not be changed by others once it built.
> >
> > So add const to the return type to make sure no one change that by
> > accident.
>
> This seems to have broken bootstrap on risc-v:
> https://builder.sourceware.org/buildbot/#/builders/310/builds/681
>
> In file included from ../../gcc/gcc/../libcpp/include/symtab.h:21,
>  from ../../gcc/gcc/tree-core.h:23,
>  from ../../gcc/gcc/tree.h:23,
>  from ../../gcc/gcc/config/riscv/riscv-vector-builtins.cc:27:
> ../../gcc/gcc/config/riscv/riscv-vector-builtins.cc: In member function ‘void 
> riscv_vector::function_builder::add_unique_function(const 
> riscv_vector::function_instance&, const riscv_vector::function_shape*, tree, 
> vec&, riscv_vector::required_ext)’:
> ../../gcc/gcc/../include/obstack.h:421:22: error: cast from type ‘const 
> char*’ to type ‘void*’ casts away qualifiers [-Werror=cast-qual]
>   421 |void *__obj = (void *) (OBJ);  
> \
>   |  ^~
> ../../gcc/gcc/config/riscv/riscv-vector-builtins.cc:4011:3: note: in 
> expansion of macro ‘obstack_free’
>  4011 |   obstack_free (&m_string_obstack, name);
>   |   ^~~~
> ../../gcc/gcc/config/riscv/riscv-vector-builtins.cc: In member function ‘void 
> riscv_vector::function_builder::add_overloaded_function(const 
> riscv_vector::function_instance&, const riscv_vector::function_shape*, 
> riscv_vector::required_ext)’:
> ../../gcc/gcc/../include/obstack.h:421:22: error: cast from type ‘const 
> char*’ to type ‘void*’ casts away qualifiers [-Werror=cast-qual]
>   421 |void *__obj = (void *) (OBJ);  
> \
>   |  ^~
> ../../gcc/gcc/config/riscv/riscv-vector-builtins.cc:4032:7: note: in 
> expansion of macro ‘obstack_free’
>  4032 |   obstack_free (&m_string_obstack, name);
>   |   ^~~~
> cc1plus: all warnings being treated as errors
> make[3]: *** [../../gcc/gcc/config/riscv/t-riscv:32: riscv-vector-builtins.o] 
> Error 1


Re: [patch,avr] Disable CRC lookup tables

2024-12-06 Thread Georg-Johann Lay

Am 06.12.24 um 15:50 schrieb Oleg Endo:

On Fri, 2024-12-06 at 06:32 -0700, Jeff Law wrote:


On 12/6/24 5:23 AM, Sam James wrote:

Georg-Johann Lay  writes:


This patch disables CRC lookup tables which consume quite some RAM.


Given that -foptimize-crc is new, it may be useful to CC the pass
authors in case they have input.

I think this is trivially OK for the AVR.  The bigger question is should
we do something more general for -Os.

CRC generation through table lookups is going to take more data space.
You need a 256 byte table for each unique CRC (sizes & polynomial), and
the code to compute the index into the table can be (from a code size
standpoint) relatively expensive as well, particularly on the
micro-controllers if the crc is to be computed in a mode wider than a
word on the target.

So I would actually even support a more general "don't optimize CRCs by
default for -Os".


I've been putting CRC tables for many years into .text / .rodata on various
MCU projects.  Never considered putting them into .data, since flash is
usually a lot larger than RAM.  What's the reasoning behind putting the
tables in .data?

Best regards,
Oleg Endo


The CRC tables ARE put into .rodata, not into .data.

The correct question is: Why is avr putting .rodata into RAM?

Suppose the following C code:

char read_c (const char *p)
{
return p[1];
}

Where p may point to .rodata, .data, .bss etc.
Now suppose .rodata is located in flash and .data and .bss
are located in RAM.  Then you'd need the following code to
access c[1]:

if (ADDR_SPACE (p) == GENERIC)
   Use LD, LDD with address R, R++, --R or R+const
   where R is one of the address registers X, Y or Z.
   Also may use direct addressing.
else if (ADDR_SPACE (p) == __flash)
   Use LPM with Z or Z++ addressing mode.  No other
   addressing mode or address reg or instruction is
   allowed.

This would imply that there is some means to tell apart
different address spaces by looking at p.  This is *not* the
case.  In particular, flash address 0x4 looks exactly the same
like RAM address 0x4.  Both are 16-bit address 0x0004, and there
is no way to tell them apart.  It would have been possible to
reserve one bit of the address for the address space and let
the linker set that bit depending on the address space where
a symbol is placed.  This means you'd have to tell apart the
addresses at run-time.

This is not the approach taken by the avr tools.
(This was the design decision back then, and it was a good
decision, even in retrospect IMO.)

Instead, they put .rodata in RAM, and when the user wants a
table in flash, she has to put it in section .progmem.data
by hand and use inline assembly to access them.

https://avrdudes.github.io/avr-libc/avr-libc-user-manual/group__avr__pgmspace.html

As an alternative, a named address space can be used when
available, but all such objects have to be put in that
address space by hand, and all objects have to be accessed
through pointers qualified as __flash.

There are more 16-bit address spaces like __flash1, __flash2
__flash3, __flash4, __flash5 for larger devices.  And there
is a 24-bit address space that can host references to any
address space, including generic.  But using them comes
with quite some overhead because the decision which AS to use
has to be taken at run-time.  This also means the addressing
mode is limited to the ones supported by /all/ address spaces,
since you don't know at compile time where the address points to.

Johann.


Re: [PATCH] c++: handle misspelled concepts and missing #include

2024-12-06 Thread Jason Merrill

On 11/15/24 7:56 PM, David Malcolm wrote:

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/cp/ChangeLog:
* name-lookup.cc (suggest_alternative_in_explicit_scope):
Gracefully handle non-namespaces, such as scoped enums.
* parser.cc (cp_parser_name_lookup_error): Provide
a name_hint for the case where we're in an explicit scope.
* std-name-hint.gperf: Add .
* std-name-hint.h: Regenerate.

gcc/testsuite/ChangeLog:
* g++.dg/concepts/missing-header.C: New test.
* g++.dg/concepts/misspelled-concept.C: New test.

Signed-off-by: David Malcolm 


OK.

Jason



Re: [PATCH v4 2/7] OpenMP: middle-end support for dispatch + adjust_args

2024-12-06 Thread Paul-Antoine Arras

Hi Tobias,

Thanks for your thorough review.

On 09/10/2024 14:55, Tobias Burnus wrote:

Paul-Antoine Arras wrote:

This patch adds middle-end support for the `dispatch` construct and the
`adjust_args` clause. The heavy lifting is done in 
`gimplify_omp_dispatch` and
`gimplify_call_expr` respectively. For `adjust_args`, this mostly 
consists in

emitting a call to `gomp_get_mapped_ptr` for the adequate device.


omp_get_… not gomp_get_…


Fixed.


For dispatch, the following steps are performed:

* Handle the device clause, if any: set the default-device ICV at the 
top of the

dispatch region and restore its previous value at the end.

* Handle novariants and nocontext clauses, if any. Evaluate compile-time
constants and select a variant, if possible. Otherwise, emit code to 
handle all

possible cases at run time.

* If depend clauses are present, add a taskwait construct before the 
dispatch

region and move them there.


The latter is not done here – but already in the front ends, i.e. 
OMP_TASK are handled in part 3 (C), 4 (C++) and 6 (Fortran) of this series.


Forgot to move that during a previous iteration. Fixed now.


...


--- a/gcc/gimple.cc
+++ b/gcc/gimple.cc


...


+/* Build a GIMPLE_OMP_DISPATCH statement.
+
+   BODY is the target function call to be dispatched.
+   CLAUSES are any of the OMP dispatch construct's clauses: ...  */


Looks as if you planned to add something here. How about:
s/: ..././ ?


Right, fixed.



@@ -4067,23 +4069,125 @@ gimplify_call_expr (tree *expr_p, gimple_seq 
*pre_p, bool want_value)



+  if (flag_openmp && EXPR_P (CALL_EXPR_FN (*expr_p))
+  && DECL_P (TREE_OPERAND (CALL_EXPR_FN (*expr_p), 0))
+  && (adjust_args_list = lookup_attribute (
+    "omp declare variant variant adjust_args",
+    DECL_ATTRIBUTES (
+  TREE_OPERAND (CALL_EXPR_FN (*expr_p), 0
+   != NULL_TREE
+  && gimplify_omp_ctxp != NULL
+  && gimplify_omp_ctxp->code == OMP_DISPATCH
+  && !gimplify_omp_ctxp->in_call_args)
+    {


!= should be under 'a' of 'adjust (remove one space)


clang-format is consistently doing it wrong... I have to be careful and 
fix it manually.


And I wonder whether it is a bit more readable and a tiny bit faster if 
you move the gimplify_omp_ctx checks directly after flag_openmp

and only if successfull ('&&') check for the attributes.


Agreed!


+  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IS_DEVICE_PTR)
+    {
+  tree decl1 = DECL_NAME (OMP_CLAUSE_DECL (c));
+  tree decl2 = tree_strip_nop_conversions (*arg_p);
+  if (TREE_CODE (decl2) == ADDR_EXPR)
+    decl2 = TREE_OPERAND (decl2, 0);
+  gcc_assert (TREE_CODE (decl2) == VAR_DECL
+  || TREE_CODE (decl2) == PARM_DECL);


The first one can be 'VAR_P (decl2)'. I keep wondering whether there


Changed.


can be cases where that's not true (e.g. VAR_DECL) or some indirect ref.


I don't remember running into such case. I think this is here just for 
extra safety.



For Fortran, I could imagine that array descriptors make problems, e.g.

subroutine f(x)
   integer, pointer :: x(:)

where 'x->data' is the device pointer and not 'x'.

(TODO: something to check + possibly to revisite when handling the 
Fortran part; for now (including C/C++), I think we can leave it as is.)


Or something with reference types (→ C++, Fortran), albeit that's more 
for need_device_addr / has_device_addr, which is not yet implemeted.



+  bool need_device_ptr = false;
+  for (tree arg
+   = TREE_PURPOSE (TREE_VALUE (adjust_args_list));
+   arg != NULL; arg = TREE_CHAIN (arg))
+    {


...


+    }
+
+  if (need_device_ptr && !is_device_ptr)


Actually, the is_device_ptr loop is only needed when need_device_ptr 
(or, later, need_device_addr) is true; I wonder whether it should be 
swapped and is_device_ptr only be checked conditionally?


Good point! Changed as suggested.


+  *arg_p = (TREE_CODE (*arg_p) == NOP_EXPR)
+ ? TREE_OPERAND (*arg_p, 0)
+ : *arg_p;


Use tree_strip_nop_conversions or STRIP_NOPS ? However, it is not clear 
why it is needed here ...


Correct, it is not needed here. Removed.


+  gimplify_arg (arg_p, pre_p, loc);
+  gimplify_arg (&device_num, pre_p, loc);
+  call = gimple_build_call (fn, 2, *arg_p, device_num);
+  tree mapped_arg
+    = create_tmp_var (gimple_call_return_type (call));
+  gimple_call_set_lhs (call, mapped_arg);
+  gimplify_seq_add_stmt (pre_p, call);
+
+  *arg_p = mapped_arg;


This line causes the following to attempt to fail:


+  // Mark mapped argument as device pointer to ensure
+  // idempotency in gimplification
+  gcc_assert (gimp

Re: [PATCH] AArch64: Cleanup alignment macros

2024-12-06 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Change the AARCH64_EXPAND_ALIGNMENT macro into proper function calls to make
> future changes easier.  Use the existing alignment settings, however avoid
> overaligning small array's or structs to 64 bits when there is no benefit.
> This gives a small reduction in data and stack size.

So just to be sure I understand: we still want to align (say) an array
of 4 chars to 32 bits so that the LDR & STR are aligned, and an array of
3 chars to 32 bits so that the LDRH & STRH for the leading two bytes are
aligned?  Is that right?  We don't seem to take advantage of the padding
and do an LDR & STR for the 3-byte case, either for globals or on the stack.

If so, what's the advantage of aligning (say) a 6-byte array to 64 bits
rather than 32 bits, given that we don't use a 64-bit LDR & STR?
Could we save more with size < 64 instead of size <= 32?

Thanks,
Richard

> Passes regress & bootstrap, OK for commit?
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.h (AARCH64_EXPAND_ALIGNMENT): Remove.
> (DATA_ALIGNMENT): Use aarch64_data_alignment.
> (LOCAL_ALIGNMENT): Use aarch64_stack_alignment.
> * config/aarch64/aarch64.cc (aarch64_data_alignment): New function.
> (aarch64_stack_alignment): Likewise.
> * config/aarch64/aarch64-protos.h (aarch64_data_alignment): New 
> prototype.
> (aarch64_stack_alignment): Likewise.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 6da81556110c978a9de6f6fad5775c9d1b10..4133a47693b24abca071a7f77fcdbb91d3dc261a
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1207,4 +1207,7 @@ extern void aarch64_adjust_reg_alloc_order ();
>  bool aarch64_optimize_mode_switching (aarch64_mode_entity);
>  void aarch64_restore_za (rtx);
>
> +extern unsigned aarch64_data_alignment (const_tree exp, unsigned align);
> +extern unsigned aarch64_stack_alignment (const_tree exp, unsigned align);
> +
>  #endif /* GCC_AARCH64_PROTOS_H */
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 
> f07b2c49f0d9abd3309afb98499ab7eebcff05bd..64f55f6f94e37bffa6b1e7403274ec5f5d906095
>  100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -121,24 +121,11 @@
> of LSE instructions.  */
>  #define TARGET_OUTLINE_ATOMICS (aarch64_flag_outline_atomics)
>
> -/* Align definitions of arrays, unions and structures so that
> -   initializations and copies can be made more efficient.  This is not
> -   ABI-changing, so it only affects places where we can see the
> -   definition.  Increasing the alignment tends to introduce padding,
> -   so don't do this when optimizing for size/conserving stack space.  */
> -#define AARCH64_EXPAND_ALIGNMENT(COND, EXP, ALIGN) \
> -  (((COND) && ((ALIGN) < BITS_PER_WORD)  
>   \
> -&& (TREE_CODE (EXP) == ARRAY_TYPE  \
> -   || TREE_CODE (EXP) == UNION_TYPE\
> -   || TREE_CODE (EXP) == RECORD_TYPE)) ? BITS_PER_WORD : (ALIGN))
> -
> -/* Align global data.  */
> -#define DATA_ALIGNMENT(EXP, ALIGN) \
> -  AARCH64_EXPAND_ALIGNMENT (!optimize_size, EXP, ALIGN)
> -
> -/* Similarly, make sure that objects on the stack are sensibly aligned.  */
> -#define LOCAL_ALIGNMENT(EXP, ALIGN)\
> -  AARCH64_EXPAND_ALIGNMENT (!flag_conserve_stack, EXP, ALIGN)
> +/* Align global data as an optimization.  */
> +#define DATA_ALIGNMENT(EXP, ALIGN) aarch64_data_alignment (EXP, ALIGN)
> +
> +/* Align stack data as an optimization.  */
> +#define LOCAL_ALIGNMENT(EXP, ALIGN) aarch64_stack_alignment (EXP, ALIGN)
>
>  #define STRUCTURE_SIZE_BOUNDARY8
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> c78845fc27e6d6a8a1631b487b19fb3143a231ac..5369129d4a405afe5a760081149da1347e7b8842
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -2651,6 +2651,60 @@ aarch64_constant_alignment (const_tree exp, 
> HOST_WIDE_INT align)
>return align;
>  }
>
> +/* Align definitions of arrays, unions and structures so that
> +   initializations and copies can be made more efficient.  This is not
> +   ABI-changing, so it only affects places where we can see the
> +   definition.  Increasing the alignment tends to introduce padding,
> +   so don't do this when optimizing for size/conserving stack space.  */
> +
> +unsigned
> +aarch64_data_alignment (const_tree type, unsigned align)
> +{
> +  if (optimize_size)
> +return align;
> +
> +  if (AGGREGATE_TYPE_P (type))
> +{
> +  unsigned HOST_WIDE_INT size = 0;
> +
> +  if (TYPE_SIZE (type) && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
> + && tree_fits_uhwi_p (TYPE_SIZE (type)))
> +   size = tree_to_uhwi (TYPE_SIZE (type));
> +
> +  /* A

Patch ping (Re: [PATCH] analyzer: Handle nonnull_if_nonzero attribute [PR117023])

2024-12-06 Thread Jakub Jelinek
Hi!

I'd like to ping the
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668699.html
patch.

The patches it depended on are already committed and there is a patch
which depends on this (the builtins shift from nonnull to nonnull_if_nonzero
where needed) which has been approved but can't be committed.

Thanks

> 2024-11-14  Jakub Jelinek  
> 
>   PR c/117023
> gcc/analyzer/
>   * sm-malloc.cc (malloc_state_machine::on_stmt): Handle
>   also nonnull_if_nonzero attributes.
> gcc/testsuite/
>   * c-c++-common/analyzer/call-summaries-malloc.c
>   (test_use_without_check): Pass 4 rather than sz to memset.
>   * c-c++-common/analyzer/strncpy-1.c (test_null_dst,
>   test_null_src): Pass 42 rather than count to strncpy.

Jakub



[PATCH] arm,testsuite: Add -mtune=cortex-m55 to dlstp-int8x16.c

2024-12-06 Thread Christophe Lyon
Like dlstp-compile-asm-1.c, this test would fail if GCC is configured
with non-default options, such as -mtune=cortex-a9.

Force -mtune=cortex-m55 to avoid this unexpected issue.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-int8x16.c: Add -mtune=cortex-m55
---
 gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16.c 
b/gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16.c
index d5f22b50262..8ec0a57a783 100644
--- a/gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { arm*-*-* } } } */
 /* { dg-require-effective-target arm_v8_1m_mve_ok } */
-/* { dg-options "-O2 -save-temps" } */
+/* { dg-options "-O2 -save-temps -mtune=cortex-m55" } */
 /* { dg-add-options arm_v8_1m_mve } */
 
 #include 
-- 
2.34.1



Re: [PATCH] AIX Build failure with default -std=gnu23.

2024-12-06 Thread David Edelsohn
No objection from me, but Ian is the maintainer of libiberty, so I'll defer
to him, especially about style and overall software engineering.

The C23 change presumably will break on Alpha OSF/1 as well.  Does GCC
still support OSF/1?  It might be preferred to delete the block entirely
instead of #ifndef _AIX.

Thanks, David

On Fri, Dec 6, 2024 at 7:20 AM Sam James  wrote:

> swamy sangamesh  writes:
>
> > Dear Community,
> >
> > Please let me know if the attached patch is fine.
>
> For such patches, I recommend CCing the maintainers of relevant
> components. In this case, that's David Edelsohn, being the AIX
> maintainer (done it for you here).
>
> I can't approve it but I imagine the patch is fine given GCC dropped
> support for such old AIX a long time ago.
>
> >
> > Thanks,
> > Sangamesh
> >
> > On Tue, Dec 3, 2024 at 11:19 PM swamy sangamesh <
> swamy.sangam...@gmail.com> wrote:
> >
> >  Hi Eric,
> >
> >  Thanks for the review.
> >
> >  I too think removing the define is a better approach and seems these
> won't be needed.
> >  From the comment it looks like that these were added long back and
> conflicting declarations were their until C23
> >  standard uncovered it.
> >
> >  If removing define is fine then i can send a final patch.
> >
> >  Thanks,
> >  Sangamesh
> >
> >  On Tue, Dec 3, 2024 at 9:11 AM Eric Gallager 
> wrote:
> >
> >  On Mon, Dec 2, 2024 at 1:01 PM swamy sangamesh
> >   wrote:
> >  >
> >  > Dear Community,
> >  >
> >  > Please let me know your comment.
> >  > Or is it more appropriate to have changes with header guard like this
> ?
> >  >
> >
> >  I personally think it's better to just remove the define, but if
> >  you're going to leave it in and guard it with a macro instead, I'd use
> >  something a bit more specific than just "_AIX".
> >
> >  > --- a/libiberty/getopt.c
> >  > +++ b/libiberty/getopt.c
> >  > @@ -25,9 +25,11 @@
> >  >  ^L
> >  >  /* This tells Alpha OSF/1 not to define a getopt prototype in
> .
> >  > Ditto for AIX 3.2 and .  */
> >  > +#ifndef _AIX
> >  >  #ifndef _NO_PROTO
> >  >  # define _NO_PROTO
> >  >  #endif
> >  > +#endif
> >  >
> >  >  #ifdef HAVE_CONFIG_H
> >  >  # include 
> >  >
> >  >
> >  > Thanks,
> >  > Sangamesh
> >  >
> >  >
> >  > On Thu, Nov 28, 2024 at 11:09 AM Sangamesh Mallayya <
> swamy.sangam...@gmail.com> wrote:
> >  >>
> >  >>  libiberty/getopt.c file is defining _NO_PROTO which causes
> conflicting
> >  >>  declarations for the functions in AIX header files like stdio.h &
> stdlib.h.
> >  >>  These declarations are being considered as errors in C23 which
> wasn't
> >  >>  the case with C17.
> >  >>
> >  >> Here is the error we get.
> >  >>
> >  >> /gcc_build/./prev-gcc/xgcc -B/gcc_build/./prev-gcc/
> -B/home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/bin/ -
> >  B/home/sangam
> >  >> /install/GCC/powerpc-ibm-aix7.3.3.0/bin/
> -B/home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/lib/ -isystem
> >  /home/sangam/ins
> >  >> tall/GCC/powerpc-ibm-aix7.3.3.0/include -isystem
> /home/sangam/install/GCC/powerpc-ibm-aix7.3.3.0/sys-include
> >  -fno-check
> >  >> ing -c -DHAVE_CONFIG_H -g -O2 -fno-checking  -I.
> -I/opt/freeware/src/packages/BUILD/gcc/libiberty/../include  -
> >  W -Wall -W
> >  >> write-strings -Wc++-compat -Wstrict-prototypes -Wshadow=local
> -pedantic  -D_GNU_SOURCE
> >  /opt/freeware/src/packages/BUILD/
> >  >> gcc/libiberty/getopt.c -o getopt.o
> >  >>
> >  >>
> >  >> In file included from
> /opt/freeware/src/packages/BUILD/gcc/libiberty/getopt.c:45:
> >  >> /gcc_build/prev-gcc/include-fixed/stdio.h:593:12: error: conflicting
> types for 'fgetpos64'; have 'int(FILE *,
> >  fpos64_t *)
> >  >> ' {aka 'int(FILE *, long long int *)'}
> >  >>   593 | extern int fgetpos64(FILE *, fpos64_t *);
> >  >>   |^
> >  >> /gcc_build/prev-gcc/include-fixed/stdio.h:298:17: note: previous
> declaration of 'fgetpos64' with type 'int
> >  (void)'
> >  >>   298 | extern int  fgetpos();
> >  >>   | ^~~
> >  >> /gcc_build/prev-gcc/include-fixed/stdio.h:594:14: error: conflicting
> types for 'fopen64'; have 'FILE *(const
> >  char *, cons
> >  >> t char *)'
> >  >>   594 | extern FILE *fopen64(const char *, const char *);
> >  >>   |  ^~~
> >  >>
> >  >> /gcc_build/prev-gcc/include-fixed/stdio.h:259:17: note: previous
> declaration of 'fopen64' with type 'FILE *
> >  (void)'
> >  >>   259 | extern FILE *   fopen();
> >  >>   | ^
> >  >> /gcc_build/prev-gcc/include-fixed/stdio.h:595:14: error: conflicting
> types for 'freopen64'; have 'FILE *(const
> >  char *, co
> >  >> nst char *, FILE *)'
> >  >>   595 | extern FILE *freopen64(const char *, const char *, FILE *);
> >  >>   |  ^
> >  >> /gcc_build/prev-gcc/include-fixed/stdio.h:260:17: note: previous
> declaration of 'freopen64' with type 'FILE *
> >  (void)'
> >  >>   260 | extern FILE *   freopen();
> >  >>   | ^~~
> >  >> /gcc_build/prev-gcc/include-

Re: [PATCH] diagnostics: UX: add doc URLs for attributes (v2)

2024-12-06 Thread David Malcolm
On Thu, 2024-11-21 at 17:36 -0500, David Malcolm wrote:
> This is v2 of the patch; v1 was here:
>   https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655541.html
> 
> Changed in v2:
> * added a new TARGET_DOCUMENTATION_NAME hook for figuring out which
>   documentation URL to use when there are multiple per-target docs,
>   such as for __attribute__((interrupt)); implemented this for all
>   targets that have target-specific attributes
> * moved attribute_urlifier and its support code to a new
>   gcc-attribute-urlifier.cc since it needs to use targetm for the
>   above; gcc-urlifier.o is used by the driver.
> * fixed extend.texi so that some attributes that failed to appear in
>   attr-urls.def now do so (affected nvptx "kernel" and "shared"
> attrs)
> * regenerated attr-urls.def for the above fix, and bringing in
>   attributes added since v1 of the patch
> 
> In r14-5118-gc5db4d8ba5f3de I added a mechanism to automatically add
> documentation URLs to quoted strings in diagnostics.
> In r14-6920-g9e49746da303b8 I added a mechanism to generate URLs for
> mentions of command-line options in quoted strings in diagnostics.
> 
> This patch does a similar thing for attributes.  It adds a new Python
> 3
> script to scrape the generated HTML looking for documentation of
> attributes, and uses this to (re)generate a new gcc/attr-urls.def
> file.
> 
> Running "make regenerate-attr-urls" after rebuilding the HTML docs
> will
> regenerate gcc/attr-urls.def in the source directory.
> 
> The patch uses this to optionally add doc URLs for attributes in any
> diagnostic emitted during the lifetime of a auto_urlify_attributes
> instance, and adds such instances everywhere that a diagnostic refers
> to a diagnostic within quotes (based on grepping the source tree
> for references to attributes in strings and in code).
> 
> For example, given:
> 
> $ ./xgcc -B. -S ../../src/gcc/testsuite/gcc.dg/attr-access-2.c
> ../../src/gcc/testsuite/gcc.dg/attr-access-2.c:14:16: warning:
> attribute ‘access(read_write, 2, 3)’ positional argument 2 conflicts
> with previous designation by argument 1 [-Wattributes]
> 
> with this patch the quoted text `access(read_write, 2, 3)'
> automatically gains the URL for our docs for "access":
> https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-access-function-attribute
> in a sufficiently modern terminal.
> 
> Like r14-6920-g9e49746da303b8 this avoids the Makefile target
> depending on the generated HTML, since a missing URL is a minor
> problem, whereas requiring all users to build HTML docs seems more
> involved.  Doing so also avoids Python 3 as a build requirement for
> everyone, but instead just for developers addding attributes.
> Like the options, we could add a CI test for this.
> 
> The patch gathers both general and target-specific attributes.
> For example, the function attribute "interrupt" has 19 URLs within
> our
> docs: one common, and 18 target-specific ones.
> The patch adds a new target hook used when selecting the most
> appropriate one.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?

I did a successful "all-configs" build of this, so with my "diagnostic
messages" maintainer hat on I self-approved it, and pushed this to
trunk as r15-5988-g5a022062d22e0b.

Dave



[committed][OG14] openmp: Fix error reporting in parsing of C++ OpenMP to/from clause

2024-12-06 Thread Andrew Stubbs
From: Kwok Cheung Yeung 

The final 'else' when checking the motion modifiers is nested one level
too deep.

This patch should be folded into "OpenMP: Enable 'declare mapper' mappers for
'target update' directives" when merging to mainline.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_clause_from_to): Move an "else" clause to
a higher nesting level.
---
 gcc/cp/parser.cc | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 4157d912039..f52446c5e46 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -42058,16 +42058,16 @@ cp_parser_omp_clause_from_to (cp_parser *parser, enum 
omp_clause_code kind,
  mapper_modifier = true;
  pos += 3;
}
- else
-   {
- cp_parser_error (parser, "% or % clause with "
-  "modifier other than % or %");
- cp_parser_skip_to_closing_parenthesis (parser,
-/*recovering=*/true,
-/*or_comma=*/false,
-/*consume_paren=*/true);
- return list;
-   }
+   }
+  else
+   {
+ cp_parser_error (parser, "% or % clause with "
+  "modifier other than % or %");
+ cp_parser_skip_to_closing_parenthesis (parser,
+/*recovering=*/true,
+/*or_comma=*/false,
+/*consume_paren=*/true);
+ return list;
}
 }
 
-- 
2.46.0



Re: [PATCH] Fix incorrect line numbers in large files bug#108900

2024-12-06 Thread Jeremy Bettis
On Fri, Dec 6, 2024 at 5:27 AM Sam James  wrote:

> Please ideally use git-send-email and see
> https://gcc.gnu.org/contribute.html#patches wrt ChangeLog format and so
> on.
>
>
Perhaps you should document in that contribute page how to install
git-send-email. It is not a standard git command.

In any case, you have the patch here, and also in the linked bug. I'm sure
you can apply it if you want the bug fixed.

-- 
Jeremy Bettis | ChromeOS FAFT lead


smime.p7s
Description: S/MIME Cryptographic Signature


[committed] arm: testsuite: fix some legacy C tests

2024-12-06 Thread Richard Earnshaw
These tests all lack ISO-C style function definitions.  Some
deliberatly so.  Rather than try to adjust the code and risk changing
the nature of the test, add -std=c17 to the test options.

gcc/testsuite/ChangeLog:

* gcc.target/arm/20031108-1.c: Add -std=c17.
* gcc.target/arm/fp16-unprototyped-1.c: Likewise.
* gcc.target/arm/fp16-unprototyped-2.c: Likewise.
* gcc.target/arm/neon-thumb2-move.c: Likewise.
* gcc.target/arm/pr67756.c: Likewise.
* gcc.target/arm/pr81863.c: Likewise.
---
 gcc/testsuite/gcc.target/arm/20031108-1.c  | 2 +-
 gcc/testsuite/gcc.target/arm/fp16-unprototyped-1.c | 2 +-
 gcc/testsuite/gcc.target/arm/fp16-unprototyped-2.c | 2 +-
 gcc/testsuite/gcc.target/arm/neon-thumb2-move.c| 2 +-
 gcc/testsuite/gcc.target/arm/pr67756.c | 2 +-
 gcc/testsuite/gcc.target/arm/pr81863.c | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/20031108-1.c 
b/gcc/testsuite/gcc.target/arm/20031108-1.c
index 7923e115139..b99db7aa194 100644
--- a/gcc/testsuite/gcc.target/arm/20031108-1.c
+++ b/gcc/testsuite/gcc.target/arm/20031108-1.c
@@ -1,7 +1,7 @@
 /* PR optimization/10467  */
 /* { dg-do compile } */
 /* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
-/* { dg-options "-O2 -mthumb" } */
+/* { dg-options "-O2 -mthumb -std=c17" } */
 
 typedef enum {Ident_1} Enumeration;
 
diff --git a/gcc/testsuite/gcc.target/arm/fp16-unprototyped-1.c 
b/gcc/testsuite/gcc.target/arm/fp16-unprototyped-1.c
index 70c29564888..c76f5377ca3 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-unprototyped-1.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-unprototyped-1.c
@@ -2,7 +2,7 @@
function in another compilation unit.  */
 
 /* { dg-do run } */
-/* { dg-options "-mfp16-format=ieee" } */
+/* { dg-options "-mfp16-format=ieee -std=c17" } */
 /* { dg-additional-sources "fp16-unprototyped-2.c" } */
 
 #include 
diff --git a/gcc/testsuite/gcc.target/arm/fp16-unprototyped-2.c 
b/gcc/testsuite/gcc.target/arm/fp16-unprototyped-2.c
index 0c0f9cda6ba..2aee1dc4a15 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-unprototyped-2.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-unprototyped-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mfp16-format=ieee" } */
+/* { dg-options "-mfp16-format=ieee -std=c17" } */
 
 extern int f ();
 
diff --git a/gcc/testsuite/gcc.target/arm/neon-thumb2-move.c 
b/gcc/testsuite/gcc.target/arm/neon-thumb2-move.c
index d8c6748d4ee..b155be08820 100644
--- a/gcc/testsuite/gcc.target/arm/neon-thumb2-move.c
+++ b/gcc/testsuite/gcc.target/arm/neon-thumb2-move.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_neon_ok } */
 /* { dg-require-effective-target arm_thumb2_ok } */
-/* { dg-options "-O2 -mthumb" } */
+/* { dg-options "-O2 -mthumb -std=c17" } */
 /* { dg-add-options arm_neon } */
 /* { dg-prune-output "switch .* conflicts with" } */
 
diff --git a/gcc/testsuite/gcc.target/arm/pr67756.c 
b/gcc/testsuite/gcc.target/arm/pr67756.c
index d2e1a8270d6..240192dd56c 100644
--- a/gcc/testsuite/gcc.target/arm/pr67756.c
+++ b/gcc/testsuite/gcc.target/arm/pr67756.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_hard_vfp_ok } */
-/* { dg-options "-O2 -mapcs -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16" } 
*/
+/* { dg-options "-O2 -mapcs -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 
-std=c17" } */
 
 int inode_permission (), try_break_deleg ();
 int mutex_lock (), mutex_unlock ();
diff --git a/gcc/testsuite/gcc.target/arm/pr81863.c 
b/gcc/testsuite/gcc.target/arm/pr81863.c
index a96f3b58411..25f8966e73c 100644
--- a/gcc/testsuite/gcc.target/arm/pr81863.c
+++ b/gcc/testsuite/gcc.target/arm/pr81863.c
@@ -3,7 +3,7 @@
 /* { dg-require-effective-target arm_arch_v7a_arm_ok } */
 /* { dg-skip-if "-mslow-flash-data and -mword-relocations incompatible" { 
*-*-* } { "-mslow-flash-data" } } */
 /* { dg-skip-if "-mpure-code and -mword-relocations incompatible" { *-*-* } { 
"-mpure-code" } } */
-/* { dg-options "-O2 -mword-relocations" } */
+/* { dg-options "-O2 -mword-relocations -std=c17" } */
 /* { dg-add-options arm_arch_v7a_arm } */
 /* { dg-final { scan-assembler-not "\[\\t \]+movw" } } */
 
-- 
2.34.1



Re: [PATCH] arm,testsuite: Add -mtune=cortex-m55 to dlstp-int8x16.c

2024-12-06 Thread Richard Earnshaw (lists)
On 06/12/2024 16:09, Christophe Lyon wrote:
> Like dlstp-compile-asm-1.c, this test would fail if GCC is configured
> with non-default options, such as -mtune=cortex-a9.
> 
> Force -mtune=cortex-m55 to avoid this unexpected issue.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/dlstp-int8x16.c: Add -mtune=cortex-m55
> ---
>  gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16.c 
> b/gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16.c
> index d5f22b50262..8ec0a57a783 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/dlstp-int8x16.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile { target { arm*-*-* } } } */
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
> -/* { dg-options "-O2 -save-temps" } */
> +/* { dg-options "-O2 -save-temps -mtune=cortex-m55" } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  
>  #include 

OK.

R.


[committed][PR tree-optimization/117895] Fix sparc libgo build failure with CRC opts enabled

2024-12-06 Thread Jeff Law


So as noted in the BZ, sparc builds of the golang libraries were failing 
due to the CRC code.


Ultimately this was another mode problem in the table expansion. 
Essentially when the mode of the resultant crc was different than the 
mode of the input data we could create mixed mode operations which is a 
no-no.  Not entirely sure how we were getting away with it before, but 
it was clearly wrong.


The mode of the crc will always be at least as large at the mode of the 
data for the cases we support.  So the code has been adjusted to convert 
the data's mode to the crc's mode and do all the ops in the crc mode.


That fixes the libgo build problem on sparc and I've verfied that there 
aren't any regressions on x86_64 as well as all the embedded targets in 
my tester.


Pushing to the trunk.

Jeff

commit 669afc8c47363f1b4643d487e1daa06364926434
Author: Jeff Law 
Date:   Fri Dec 6 13:40:25 2024 -0700

[PR tree-optimization/117895] Fix sparc libgo build failure with CRC opts 
enabled

So as noted in the BZ, sparc builds of the golang libraries were failing 
due to
the CRC code.

Ultimately this was another mode problem in the table expansion.  
Essentially
when the mode of the resultant crc was different than the mode of the input
data we could create mixed mode operations which is a no-no.  Not entirely 
sure
how we were getting away with it before, but it was clearly wrong.

The mode of the crc will always be at least as large at the mode of the data
for the cases we support.  So the code has been adjusted to convert the 
data's
mode to the crc's mode and do all the ops in the crc mode.

That fixes the libgo build problem on sparc and I've verfied that there 
aren't
any regressions on x86_64 as well as all the embedded targets in my tester.

PR tree-optimization/117895
gcc/
* expr.cc (calculate_table_based_CRC): Drop CRC_MODE argument.
Convert DATA to CRC's mode, then do calculations in CRC's mode.
(expand_crc_table_based): Corresponding changes.
(expand_reversed_crc_table_based): Corresponding changes.

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 5578e3d9e99..980ac415cfc 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -14336,29 +14336,32 @@ generate_crc_table (unsigned HOST_WIDE_INT 
polynom, unsigned short crc_bits)
 void
 calculate_table_based_CRC (rtx *crc, const rtx &input_data,
   const rtx &polynomial,
-  machine_mode crc_mode, machine_mode data_mode)
+  machine_mode data_mode)
 {
-  unsigned short crc_bit_size = GET_MODE_BITSIZE (crc_mode).to_constant ();
-  unsigned short data_size = GET_MODE_SIZE (data_mode).to_constant ();
   machine_mode mode = GET_MODE (*crc);
+  unsigned short crc_bit_size = GET_MODE_BITSIZE (mode).to_constant ();
+  unsigned short data_size = GET_MODE_SIZE (data_mode).to_constant ();
   rtx tab = generate_crc_table (UINTVAL (polynomial), crc_bit_size);

   for (unsigned short i = 0; i < data_size; i++)
 {
   /* crc >> (crc_bit_size - 8).  */
-  *crc = force_reg (crc_mode, *crc);
+  *crc = force_reg (mode, *crc);
   rtx op1 = expand_shift (RSHIFT_EXPR, mode, *crc, crc_bit_size - 8,
  NULL_RTX, 1);

   /* data >> (8 * (GET_MODE_SIZE (data_mode).to_constant () - i - 1)). 
 */
   unsigned range_8 = 8 * (data_size - i - 1);
-  rtx data = force_reg (data_mode, input_data);
+  /* CRC's mode is always at least as wide as INPUT_DATA.  Convert
+INPUT_DATA into CRC's mode.  */
+  rtx data = gen_reg_rtx (mode);
+  convert_move (data, input_data, 1);
   data = expand_shift (RSHIFT_EXPR, mode, data, range_8, NULL_RTX, 1);

-  /* data >> (8 * (GET_MODE_SIZE (data_mode)
+  /* data >> (8 * (GET_MODE_SIZE (mode)
.to_constant () - i - 1)) & 0xFF.  
*/
   rtx data_final = expand_and (mode, data,
-  gen_int_mode (255, data_mode), NULL_RTX);
+  gen_int_mode (255, mode), NULL_RTX);

   /* (crc >> (crc_bit_size - 8)) ^ data_8bit.  */
   rtx in = expand_binop (mode, xor_optab, op1, data_final,
@@ -14367,7 +14370,7 @@ calculate_table_based_CRC (rtx *crc, const rtx 
&input_data,
   /* ((crc >> (crc_bit_size - 8)) ^ data_8bit) & 0xFF.  */
   rtx index = expand_and (mode, in, gen_int_mode (255, mode),
  NULL_RTX);
-  int log_crc_size = exact_log2 (GET_MODE_SIZE (crc_mode).to_constant 
());
+  int log_crc_size = exact_log2 (GET_MODE_SIZE (mode).to_constant ());
   index = expand_shift (LSHIFT_EXPR, mode, index,
log_

Re: [PATCH] libstdc++: editorconfig: Adjust wildcard patterns

2024-12-06 Thread Jonathan Wakely
On Fri, 6 Dec 2024 at 18:01,  wrote:
>
> From: Matthew Malcomson 
>
> According to the editorconfig file format description, a match against
> one of multiple different strings is described with those different
> strings separated by commas and within curly braces.  E.g.
> [{x,y}.txt]
>
> https://editorconfig.org/, under "Wildcard Patterns".
>
> The current libstdc++-v3/.editorconfig file has a few places where we
> match against similar globs by using strings separated by commas but
> without the curly braces.  E.g.
> [*.h,*.cc]

Huh, I wonder why I thought that was valid.

> This doesn't take affect in neovim nor emacs (as far as I can tell), I
> haven't looked into other editors.
> I would expect that following the standard syntax described in the
> documentation would satisfy more editors.  Hence this patch suggests
> following that standard by using something like:
> [*.{h,cc}]
>
> libstdc++-v3/ChangeLog:
>
> * .editorconfig: Adjust globbing style to standard syntax.

OK, thanks

>
> Signed-off-by: Matthew Malcomson 
> ---
>  libstdc++-v3/.editorconfig | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/.editorconfig b/libstdc++-v3/.editorconfig
> index 88107cedda2..c95e4e26f7a 100644
> --- a/libstdc++-v3/.editorconfig
> +++ b/libstdc++-v3/.editorconfig
> @@ -5,14 +5,14 @@ root = true
>  end_of_line = lf
>  insert_final_newline = true
>
> -[*.h,*.cc]
> +[*.{h,cc}]
>  charset = utf-8
>  indent_style = tab
>  indent_size = 2
>  tab_width = 8
>  trim_trailing_whitespace = true
>
> -[Makefile*,ChangeLog*]
> +[{Makefile,ChangeLog}*]
>  indent_style = tab
>  indent_size = 8
>  trim_trailing_whitespace = true
> --
> 2.43.0
>



Re: [PATCH] AIX Build failure with default -std=gnu23.

2024-12-06 Thread Ian Lance Taylor
David Edelsohn  writes:

> On Fri, Dec 6, 2024 at 12:25 PM Rainer Orth 
> wrote:
>
>> Hi David,
>>
>> > No objection from me, but Ian is the maintainer of libiberty, so I'll
>> defer
>> > to him, especially about style and overall software engineering.
>> >
>> > The C23 change presumably will break on Alpha OSF/1 as well.  Does GCC
>> > still support OSF/1?  It might be preferred to delete the block entirely
>> > instead of #ifndef _AIX.
>>
>> GCC 4.7 was the last release to support Tru64 UNIX (ex-OSF/1).  However,
>> libiberty is also used outside of the toolchain, so that may affect the
>> decision.
>>
>> However, IMO the Tru64 UNIX support can go for good now.
>>
>
> Hi, Rainer
>
> Thanks for taking a look and commenting.
>
> It seems we both agree that it would be better to remove the entire block
> defining _NO_PROTO because both of the systems are no longer supported.
>
> I'll give Ian the opportunity to comment.

Looks good to me.  Thanks.

Ian


Re: [committed] RISC-V: Add const to function_shape::get_name [NFC]

2024-12-06 Thread Mark Wielaard
Hi Kito,

On Thu, Dec 05, 2024 at 03:12:03PM +0800, Kito Cheng wrote:
> function_shape::get_name is the funciton for building intrinsic function name,
> the result should not be changed by others once it built.
> 
> So add const to the return type to make sure no one change that by
> accident.

This seems to have broken bootstrap on risc-v:
https://builder.sourceware.org/buildbot/#/builders/310/builds/681

In file included from ../../gcc/gcc/../libcpp/include/symtab.h:21,
 from ../../gcc/gcc/tree-core.h:23,
 from ../../gcc/gcc/tree.h:23,
 from ../../gcc/gcc/config/riscv/riscv-vector-builtins.cc:27:
../../gcc/gcc/config/riscv/riscv-vector-builtins.cc: In member function ‘void 
riscv_vector::function_builder::add_unique_function(const 
riscv_vector::function_instance&, const riscv_vector::function_shape*, tree, 
vec&, riscv_vector::required_ext)’:
../../gcc/gcc/../include/obstack.h:421:22: error: cast from type ‘const char*’ 
to type ‘void*’ casts away qualifiers [-Werror=cast-qual]
  421 |void *__obj = (void *) (OBJ);
  \
  |  ^~
../../gcc/gcc/config/riscv/riscv-vector-builtins.cc:4011:3: note: in expansion 
of macro ‘obstack_free’
 4011 |   obstack_free (&m_string_obstack, name);
  |   ^~~~
../../gcc/gcc/config/riscv/riscv-vector-builtins.cc: In member function ‘void 
riscv_vector::function_builder::add_overloaded_function(const 
riscv_vector::function_instance&, const riscv_vector::function_shape*, 
riscv_vector::required_ext)’:
../../gcc/gcc/../include/obstack.h:421:22: error: cast from type ‘const char*’ 
to type ‘void*’ casts away qualifiers [-Werror=cast-qual]
  421 |void *__obj = (void *) (OBJ);
  \
  |  ^~
../../gcc/gcc/config/riscv/riscv-vector-builtins.cc:4032:7: note: in expansion 
of macro ‘obstack_free’
 4032 |   obstack_free (&m_string_obstack, name);
  |   ^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [../../gcc/gcc/config/riscv/t-riscv:32: riscv-vector-builtins.o] 
Error 1


Re: [PATCH] AArch64: Cleanup alignment macros

2024-12-06 Thread Wilco Dijkstra
Hi Richard,

>> A common case is a constant string which is compared against some
>> argument. Most string functions work on 8 or 16-byte quantities. If we
>> ensure the whole array fits in one aligned load, we save time in the
>> string function.
>>
>> Runtime data collected for strlen calls shows 97+% has 8-byte alignment
>> or higher - this kind of overalignment helps achieving that.
>
> Ah, ok.  But aren't we then losing that advantage for 4-byte arrays?
> Or are you assuming a 4-byte path too?  Or is strlen just very unlikely
> for such small data?

The advantage comes from being aligned enough. Eg. a strlen implementation
may start like this:

bic src, srcin, 15
ld1 {vdata.16b}, [src]  // 16-byte aligned 
load
cmeqvhas_nul.16b, vdata.16b, 0  // check for NUL byte

It always does a 16-byte aligned load and test for the end of the string. So we 
want
to ensure that small strings fully fit inside the first 16-byte load (if not, 
it takes almost
twice the number of instructions even if the string is only 4 bytes). 4-byte 
alignment
is enough to ensure this.

Another approach is to always load the first 16 bytes from the start of the 
string
(if not close to the end of a page). That is often an unaligned load, and then 
the
difference between 4- and 8-byte alignment is negligible.

Cheers,
Wilco



Re: [PATCH 2/3] c++: consolidate location printing in error.cc [PR116253]

2024-12-06 Thread Jason Merrill

On 11/12/24 9:02 AM, David Malcolm wrote:

Consolidate the location-printing logic in cp/error.cc, as preliminary
work towards supporting nested diagnostics (PR other/116253).


OK.


gcc/cp/ChangeLog:
PR other/116253
* error.cc (print_location): Move to earlier in the file.
(print_instantiation_partial_context_line): Replace
location-printing logic with a call to print_location.
(print_instantiation_partial_context): Likewise, splitting up
pp_verbatim calls.
(maybe_print_constexpr_context): Likewise.

Signed-off-by: David Malcolm 
---
  gcc/cp/error.cc | 71 +
  1 file changed, 24 insertions(+), 47 deletions(-)

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 7ef79b90868f..23cfee4405ed 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -3818,6 +3818,20 @@ print_instantiation_full_context 
(diagnostic_text_output_format &text_output)
print_instantiation_partial_context (text_output, p, location);
  }
  
+static void

+print_location (diagnostic_text_output_format &text_output,
+   location_t loc)
+{
+  expanded_location xloc = expand_location (loc);
+  pretty_printer *const pp = text_output.get_printer ();
+  if (text_output.show_column_p ())
+pp_verbatim (pp, _("%r%s:%d:%d:%R   "),
+"locus", xloc.file, xloc.line, xloc.column);
+  else
+pp_verbatim (pp, _("%r%s:%d:%R   "),
+"locus", xloc.file, xloc.line);
+}
+
  /* Helper function of print_instantiation_partial_context() that
 prints a single line of instantiation context.  */
  
@@ -3829,17 +3843,10 @@ print_instantiation_partial_context_line (diagnostic_text_output_format &text_ou

if (loc == UNKNOWN_LOCATION)
  return;
  
-  expanded_location xloc = expand_location (loc);

+  print_location (text_output, loc);
  
pretty_printer *const pp = text_output.get_printer ();
  
-  if (text_output.show_column_p ())

-pp_verbatim (pp, _("%r%s:%d:%d:%R   "),
-"locus", xloc.file, xloc.line, xloc.column);
-  else
-pp_verbatim (pp, _("%r%s:%d:%R   "),
-"locus", xloc.file, xloc.line);
-
if (t != NULL)
  {
if (t->list_p ())
@@ -3912,22 +3919,11 @@ print_instantiation_partial_context 
(diagnostic_text_output_format &text_output,
}
if (t != NULL && skip > 0)
{
- expanded_location xloc;
- xloc = expand_location (loc);
- pretty_printer *const pp = text_output.get_printer ();
- if (text_output.show_column_p ())
-   pp_verbatim (pp,
-_("%r%s:%d:%d:%R   [ skipping %d instantiation "
-  "contexts, use -ftemplate-backtrace-limit=0 to "
-  "disable ]\n"),
-"locus", xloc.file, xloc.line, xloc.column, skip);
- else
-   pp_verbatim (pp,
-_("%r%s:%d:%R   [ skipping %d instantiation "
-  "contexts, use -ftemplate-backtrace-limit=0 to "
-  "disable ]\n"),
-"locus", xloc.file, xloc.line, skip);
-
+ print_location (text_output, loc);
+ pp_verbatim (text_output.get_printer (),
+  _("[ skipping %d instantiation contexts,"
+" use -ftemplate-backtrace-limit=0 to disable ]\n"),
+  skip);
  do {
loc = t->locus;
t = t->next;
@@ -3973,36 +3969,17 @@ maybe_print_constexpr_context 
(diagnostic_text_output_format &text_output)
  
FOR_EACH_VEC_ELT (call_stack, ix, t)

  {
-  expanded_location xloc = expand_location (EXPR_LOCATION (t));
const char *s = expr_as_string (t, 0);
pretty_printer *const pp = text_output.get_printer ();
-  if (text_output.show_column_p ())
-   pp_verbatim (pp,
-_("%r%s:%d:%d:%R   in % expansion of %qs"),
-"locus", xloc.file, xloc.line, xloc.column, s);
-  else
-   pp_verbatim (pp,
-_("%r%s:%d:%R   in % expansion of %qs"),
-"locus", xloc.file, xloc.line, s);
+  print_location (text_output, EXPR_LOCATION (t));
+  pp_verbatim (pp,
+  _("in % expansion of %qs"),
+  s);
pp_newline (pp);
  }
  }
  
  
-static void

-print_location (diagnostic_text_output_format &text_output,
-   location_t loc)
-{
-  expanded_location xloc = expand_location (loc);
-  pretty_printer *const pp = text_output.get_printer ();
-  if (text_output.show_column_p ())
-pp_verbatim (pp, _("%r%s:%d:%d:%R   "),
- "locus", xloc.file, xloc.line, xloc.column);
-  else
-pp_verbatim (pp, _("%r%s:%d:%R   "),
- "locus", xloc.file, xloc.line);
-}
-
  static void
  print_constrained_decl_info (diagnostic_text_output_format &text_output,
 tree decl)




Re: [PATCH] AArch64: Cleanup alignment macros

2024-12-06 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Hi Richard,
>
>> So just to be sure I understand: we still want to align (say) an array
>> of 4 chars to 32 bits so that the LDR & STR are aligned, and an array of
>> 3 chars to 32 bits so that the LDRH & STRH for the leading two bytes are
>> aligned?  Is that right?  We don't seem to take advantage of the padding
>> and do an LDR & STR for the 3-byte case, either for globals or on the stack.
>
> Taking advantage of padding is possible within the compilation unit for
> data that is defined locally (and not interposable), and always with LTO.
>
>> If so, what's the advantage of aligning (say) a 6-byte array to 64 bits
>> rather than 32 bits, given that we don't use a 64-bit LDR & STR?
>> Could we save more with size < 64 instead of size <= 32?
>
> A common case is a constant string which is compared against some
> argument. Most string functions work on 8 or 16-byte quantities. If we
> ensure the whole array fits in one aligned load, we save time in the
> string function.
>
> Runtime data collected for strlen calls shows 97+% has 8-byte alignment
> or higher - this kind of overalignment helps achieving that.

Ah, ok.  But aren't we then losing that advantage for 4-byte arrays?
Or are you assuming a 4-byte path too?  Or is strlen just very unlikely
for such small data?

> There are likely some further tweaks we could do in the future: 1/2-byte
> objects are unlikely to benefit even from 4-byte alignment.

Yeah, was wondering about that too (but realised it was outside the
intended scope of the patch).

Thanks,
Richard

> And large objects may benefit from higher alignment (allowing 16-byte
> aligned LDP for loading values or faster memcpy of whole structs).
>
> Cheers,
> Wilco


Re: [PATCH 1/3] dwarf: Delete dead code.

2024-12-06 Thread Michal Jireš




On 11/27/24 2:59 PM, Richard Biener wrote:


Did you test with -fdebug-types-section?  This code should be still needed
to generate the linkonce debug-type sections.  Note it doesn't work (very well)
when combined with LTO.


I used the tests in testsuite and now I also tested it with
nontrivial project. The added sections (.debug_types/.debug_info)
seem to come from output_comdat_type_unit

Do you have something more specific that should use this code?


My reasoning why this branch cannot be taken was:
1) oldsym/die_symbol is non-null only for comp_unit_die()
2) comp_unit_die() does not have a parent
3) comdat_type_p = true is set only for DIEs with parents
So both non-zero die_symbol and comdat_type_p cannot happen.

Michal


Re: [PATCH v2] c++: P2865R5, Remove Deprecated Array Comparisons from C++26 [PR117788]

2024-12-06 Thread Jason Merrill

On 12/6/24 12:29 PM, Marek Polacek wrote:

On Thu, Dec 05, 2024 at 01:15:49PM -0500, Jason Merrill wrote:

On 12/4/24 12:27 PM, Marek Polacek wrote:

On Tue, Dec 03, 2024 at 04:27:22PM -0500, Jason Merrill wrote:

On 12/3/24 2:46 PM, Marek Polacek wrote:

On Thu, Nov 28, 2024 at 12:04:56PM -0500, Jason Merrill wrote:

On 11/27/24 9:06 PM, Marek Polacek wrote:

Not a bugfix, but this should only affect C++26.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8--
This patch implements P2865R5 by promoting the warning to error in C++26
only.  -Wno-array-compare shouldn't disable the error, so adjust the call
sites as well.


I think it's fine for -Wno-array-compare to suppress the error (and
-Wno-error=array-compare to reduce it to a warning), so how about
DK_PERMERROR rather than DK_ERROR?


Sounds good.

We also need SFINAE for this when !tf_warning_or_error.


I've added Warray-compare-1.C, which has:

 template
 void f (int(*)[arr1 == arr2 ? I : I]);

but when we call cp_build_binary_op from the parser, complain is
tf_warning_or_error, so we warn (as does clang++).  I suspect
that goes against [temp.deduct.general]/8.


No, that's fine; in C++26 that template is IFNDR because no well-formed
instantiation exists, it's OK for us to give a diagnostic and then continue
just like in a non-template.


Ah yes.


I'm not sure there is a SFINAE situation where this would come up, but I'd
still like to adjust this:


@@ -6125,11 +6124,10 @@ cp_build_binary_op (const op_location_t &location,
"comparison with string literal results "
"in unspecified behavior");
}
-  else if (warn_array_compare
-  && TREE_CODE (TREE_TYPE (orig_op0)) == ARRAY_TYPE
+  else if (TREE_CODE (TREE_TYPE (orig_op0)) == ARRAY_TYPE
   && TREE_CODE (TREE_TYPE (orig_op1)) == ARRAY_TYPE
   && code != SPACESHIP_EXPR
-  && (complain & tf_warning))
+  && (complain & tf_warning_or_error))
do_warn_array_compare (location, code,
   tree_strip_any_location_wrapper (orig_op0),
   tree_strip_any_location_wrapper (orig_op1));


If we happen to get here when not complaining, we'll silently accept it.
Either we should handle that case by returning error_mark_node in C++26 and
above, or we should assert that it can't happen.


We actually can get there.  But returning error_mark_node in C++26
causes problems: we hit:

  /* If we ran into a problem, make sure we complained.  */
  gcc_assert (seen_error ());

because a permerror doesn't count as an error.  Either we'd have to go
back to DK_ERROR, or leave the patch as-is.


Hmm, I guess cp_seen_error should also consider werrorcount.


That still wouldn't work with -Wno-array-compare.  Nor would adding
permerrorcount.

I suppose I could still add permerrorcount and do permerrorcount++;,
and have cp_seen_error check permerrorcount.  Does that seem acceptable?


If we didn't actually give an error, we shouldn't return 
error_mark_node.  That's what the assert is checking, and it's important 
to preserve that property (outside of SFINAE).  An error_mark_node 
without an error means silently generating garbage.


Jason



[PATCH] libstdc++: Add workaround for read(2) EINVAL on macOS and FreeBSD [PR102259]

2024-12-06 Thread Jonathan Wakely
On macOS and FreeBSD read(2) system call can return EINVAL for large
sizes, so limit the maximum that we try to read. The calling code in
filebuf::xsgetn will loop until it gets the size it wants, so we don't
need to loop here.

libstdc++-v3/ChangeLog:

PR libstdc++/102259
* config/io/basic_file_stdio.cc (basic_file::xsgetn): Limit n to
INT_MAX-1 when _GLIBCXX_READ_RETURNS_EINVAL_OVER_INT_MAX is
defined.
* config/os/bsd/darwin/os_defines.h 
(_GLIBCXX_READ_RETURNS_EINVAL_OVER_INT_MAX):
Define.
* config/os/bsd/freebsd/os_defines.h 
(_GLIBCXX_READ_RETURNS_EINVAL_OVER_INT_MAX):
Define.
---

Any suggestions for a better name for the new macro?

I haven't tested this, but I'll ask the bug submitter to do so. If they
don't do so, I'll ask Iain or try a FreeBSD VM next week some time.

 libstdc++-v3/config/io/basic_file_stdio.cc  | 6 ++
 libstdc++-v3/config/os/bsd/darwin/os_defines.h  | 3 +++
 libstdc++-v3/config/os/bsd/freebsd/os_defines.h | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/libstdc++-v3/config/io/basic_file_stdio.cc 
b/libstdc++-v3/config/io/basic_file_stdio.cc
index 9b529490f08..508e2d2a469 100644
--- a/libstdc++-v3/config/io/basic_file_stdio.cc
+++ b/libstdc++-v3/config/io/basic_file_stdio.cc
@@ -338,6 +338,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 if (__ret == 0 && ferror(this->file()))
   __ret = -1;
 #else
+
+#ifdef _GLIBCXX_READ_RETURNS_EINVAL_OVER_INT_MAX
+if (__builtin_expect(__n >= __INT_MAX__, 0))
+  __n = __INT_MAX__ - 1;
+#endif
+
 do
   __ret = read(this->fd(), __s, __n);
 while (__ret == -1L && errno == EINTR);
diff --git a/libstdc++-v3/config/os/bsd/darwin/os_defines.h 
b/libstdc++-v3/config/os/bsd/darwin/os_defines.h
index 6bc7930bdba..826c863e481 100644
--- a/libstdc++-v3/config/os/bsd/darwin/os_defines.h
+++ b/libstdc++-v3/config/os/bsd/darwin/os_defines.h
@@ -54,4 +54,7 @@
 // No support for referencing weak symbols without a definition.
 #define _GLIBCXX_USE_WEAK_REF 0
 
+// read(2) can return EINVAL for n > INT_MAX.
+#define _GLIBCXX_READ_RETURNS_EINVAL_OVER_INT_MAX 1
+
 #endif
diff --git a/libstdc++-v3/config/os/bsd/freebsd/os_defines.h 
b/libstdc++-v3/config/os/bsd/freebsd/os_defines.h
index 125dfdc1888..4889bb4ec00 100644
--- a/libstdc++-v3/config/os/bsd/freebsd/os_defines.h
+++ b/libstdc++-v3/config/os/bsd/freebsd/os_defines.h
@@ -50,4 +50,7 @@
 #define _GLIBCXX_USE_C99_FLOAT_TRANSCENDENTALS_DYNAMIC 0
 #endif
 
+// read(2) can return EINVAL for n > INT_MAX.
+#define _GLIBCXX_READ_RETURNS_EINVAL_OVER_INT_MAX 1
+
 #endif
-- 
2.47.1



[PATCH] s390: Fix UNSPEC_CC_TO_INT canonicalization

2024-12-06 Thread Juergen Christ
Canonicalization of comparisons for UNSPEC_CC_TO_INT missed one case
causing unnecessarily complex code.  This especially seems to hit the
Linux kernel.

gcc/ChangeLog:

* config/s390/s390.cc (s390_canonicalize_comparison): Add
  missing UNSPEC_CC_TO_INT case.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ccusage.c: New test.

Signed-off-by: Juergen Christ 

Bootstrapped and regression tested on s390.  Okay for trunk?
Okay to backport to GCC 14?

---
 gcc/config/s390/s390.cc |  2 +-
 gcc/testsuite/gcc.target/s390/ccusage.c | 37 +
 2 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/ccusage.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 25d43ae3e138..c36c33ff8280 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -1859,7 +1859,7 @@ s390_canonicalize_comparison (int *code, rtx *op0, rtx 
*op1,
   && CONST_INT_P (XEXP (*op0, 1))
   && CONST_INT_P (*op1)
   && INTVAL (XEXP (*op0, 1)) == -3
-  && *code == EQ)
+  && (*code == EQ || *code == NE))
 {
   if (INTVAL (*op1) == 0)
{
diff --git a/gcc/testsuite/gcc.target/s390/ccusage.c 
b/gcc/testsuite/gcc.target/s390/ccusage.c
new file mode 100644
index ..e25f712e25ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ccusage.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=zEC12 -mzarch" } */
+
+static __attribute__((always_inline)) inline
+int __atomic_dec_and_test(int *ptr)
+{
+int cc;
+asm volatile(
+"   alsi%[ptr],-1\n"
+: "=@cc" (cc), [ptr] "+QS" (*ptr) : : "memory");
+return (cc == 0) || (cc == 2);
+}
+ 
+int a;
+void dummy(void);
+long fu(void)
+{
+if (__atomic_dec_and_test(&a))
+return 5;
+return 8;
+}
+ 
+void bar(void)
+{
+if (__atomic_dec_and_test(&a))
+dummy();
+}
+
+int foo(int x)
+{
+int cc;
+asm volatile ("ahi %[x],42\n"
+: [x] "+d" (x), "=@cc" (cc));
+return !(cc == 0 || cc == 2) ? 42 : 13;
+}
+
+/* { dg-final { scan-assembler-not {ipm} } } */
-- 
2.43.5



[Fortran, Patch, PR107635, Part 1] Rework handling of allocatable components in derived type coarrays.

2024-12-06 Thread Andre Vehreschild
Hi all,

I had to dive deeply into the issue with handling allocatable components in
derived types and to find a future proof solution. I hope do have found a
universal and flexible one now:

For each allocatable (or pointer) component in a derived type a coarray token
is required. While this is quite easy for the current compilation unit, it is
very difficult to do for arbitrary compilation units or when one gets libraries
that are not aware of coarray's needs. The approach this patch now implements
is to delegate the evaluation of a reference into a coarray into a separate
access routine. This routine is present on every image, because it is generated
by the compiler at the time it knows that a coarray access is done. With
external compilation units or libraries this solves the access issue, because
each image knows how to access its own objects, but does not need a coarray
token for allocatable (or pointer) components anymore. The access on the remote
image's object is done by the remote image itself (for the MPI implementation in
a separate thread). Therefore it knows about the bounds of arrays, allocation
and association state of components and can handle those.

Furthermore is this approach faster, because it has O(1) complexity regarding
the communication. The old approach was O(N) where N is the number of
allocatable/pointer components + array descriptors on the path of the access.
The new approach sends a set of parameters to the remote image and gets the
desired data in return.

At the moment the patch handles only getting of data from a remote image. It is
split into two patchsets. The first one does some preparatory clean up, like
stopping to add caf_get calls into the expression tree and removing them
afterwards again, where they are in the way.

The second patch is then doing the access routine creation. Unfortunately is
this the longer patch. I have also updated the documentation of the caf API. I
hope to not have overlooked something.

This is the first part of a series to rework all coarray access routines to use
the new approach and then remove the deprecated calls. This makes things
clearer and easier to maintain, although the tree-dump now presents some more
generated routines, which might look odd.

Bootstrapped and regtested ok on x86_64-pc-linux-gnu / Fedora 39 and 41. Ok for
mainline?

I will continue working on the coarray stuff and fix upcoming bugs in the
future.

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From 152e827c791fbdc8e457352e85ceaa7dd9e59a5d Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 31 Oct 2024 15:35:47 +0100
Subject: [PATCH 1/2] Fortran: Remove adding and removing of caf_get.
 [PR107635]

Preparatory work for PR107635.

During resolve prevent adding caf_get calls for expressions on the
left-hand-side of an assignment and removing them later on again.

Furthermore has the caf_token in a component become a pointer to
the component and not the backend_decl of the caf-component.
In some cases the caf_token was added as last component in a derived
type and not as the next one following the component that it was
needed to be associated to.

gcc/fortran/ChangeLog:

	* gfortran.h (gfc_comp_caf_token): Convenient macro for
	accessing caf_token's tree.
	* resolve.cc (gfc_resolve_ref): Backup caf_lhs when resolving
	expr in array_ref.
	(remove_caf_get_intrinsic): Removed.
	(resolve_variable): Set flag caf_lhs when resolving lhs of
	assignment to prevent insertion of caf_get.
	(resolve_lock_unlock_event): Same, but the lhs is the parameter.
	(resolve_ordinary_assign): Move conversion to caf_send to
	resolve_codes.
	(resolve_codes): Adress caf_get and caf_send here.
	(resolve_fl_derived0): Set component's caf_token when token is
	necessary.
	* trans-array.cc (gfc_conv_array_parameter): Get a coarray for
	expression that have a corank.
	(structure_alloc_comps): Use macro to get caf_token's tree.
	(gfc_alloc_allocatable_for_assignment): Same.
	* trans-expr.cc (gfc_get_ultimate_alloc_ptr_comps_caf_token):
	Same.
	(gfc_trans_structure_assign): Same.
	* trans-intrinsic.cc (conv_expr_ref_to_caf_ref): Same.
	(has_ref_after_cafref): New function to figure that after a
	reference of a coarray another reference is present.
	(conv_caf_send): Get rhs from correct place, when caf_get is
	not removed.
	* trans-types.cc (gfc_get_derived_type): Get caf_token from
	component and no longer guessing.
---
 gcc/fortran/gfortran.h |   3 +-
 gcc/fortran/resolve.cc | 165 +
 gcc/fortran/trans-array.cc |  30 +++---
 gcc/fortran/trans-expr.cc  |  15 ++-
 gcc/fortran/trans-intrinsic.cc |  32 ++-
 gcc/fortran/trans-types.cc |  44 -
 6 files changed, 158 insertions(+), 131 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index d08439019a3..d66c13b2661 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1214,11 +1214,12 @@ typedef struct gfc_component

[PATCH] arm: remove obsolete vcond expanders

2024-12-06 Thread Richard Earnshaw
The vcond{,u} expander paterns have been declared as obsolete.  Remove
them from the Arm backend.

gcc/ChangeLog:

PR target/114189
* config/arm/arm-protos.h (arm_expand_vcond): Delete prototype.
* config/arm/arm.cc (arm_expand_vcond): Delete function.
* config/arm/vec-common.md (vcond): Delete pattern
(vcond): Likewise.
(vcond): Likewise.
(vcondu): Likewise.
---
 gcc/config/arm/arm-protos.h  |  1 -
 gcc/config/arm/arm.cc| 44 --
 gcc/config/arm/vec-common.md | 71 
 3 files changed, 116 deletions(-)

No regressions in the testsuite.  I'll push next week if there are no
objections.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 7311ad4d8e4..155507f4745 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -406,7 +406,6 @@ extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, 
rtx, bool);
 #endif /* RTX_CODE */
 
 extern bool arm_gen_setmem (rtx *);
-extern void arm_expand_vcond (rtx *, machine_mode);
 extern void arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
 
 extern bool arm_autoinc_modes_ok_p (machine_mode, enum arm_auto_incmodes);
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 1fbc4c22f22..bc6f9345d1e 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -31803,50 +31803,6 @@ arm_expand_vector_compare (rtx target, rtx_code code, 
rtx op0, rtx op1,
 }
 }
 
-/* Expand a vcond or vcondu pattern with operands OPERANDS.
-   CMP_RESULT_MODE is the mode of the comparison result.  */
-
-void
-arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode)
-{
-  /* When expanding for MVE, we do not want to emit a (useless) vpsel in
- arm_expand_vector_compare, and another one here.  */
-  rtx mask;
-
-  if (TARGET_HAVE_MVE)
-mask = gen_reg_rtx (arm_mode_to_pred_mode (cmp_result_mode).require ());
-  else
-mask = gen_reg_rtx (cmp_result_mode);
-
-  bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]),
-operands[4], operands[5], true);
-  if (inverted)
-std::swap (operands[1], operands[2]);
-  if (TARGET_NEON)
-  emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0],
-   mask, operands[1], operands[2]));
-  else
-{
-  machine_mode cmp_mode = GET_MODE (operands[0]);
-
-  switch (GET_MODE_CLASS (cmp_mode))
-   {
-   case MODE_VECTOR_INT:
- emit_insn (gen_mve_q (VPSELQ_S, VPSELQ_S, cmp_mode, operands[0],
-   operands[1], operands[2], mask));
- break;
-   case MODE_VECTOR_FLOAT:
- if (TARGET_HAVE_MVE_FLOAT)
-   emit_insn (gen_mve_q_f (VPSELQ_F, cmp_mode, operands[0],
-   operands[1], operands[2], mask));
- else
-   gcc_unreachable ();
- break;
-   default:
- gcc_unreachable ();
-   }
-}
-}
 
 #define MAX_VECT_LEN 16
 
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index ff1c27a0d71..0b426cdaff7 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -416,77 +416,6 @@ (define_expand "vlshr3"
 }
 })
 
-;; Conditional instructions.  These are comparisons with conditional moves for
-;; vectors.  They perform the assignment:
-;;
-;; Vop0 = (Vop4  Vop5) ? Vop1 : Vop2;
-;;
-;; where op3 is <, <=, ==, !=, >= or >.  Operations are performed
-;; element-wise.
-
-(define_expand "vcond"
-  [(set (match_operand:VDQWH 0 "s_register_operand")
-   (if_then_else:VDQWH
- (match_operator 3 "comparison_operator"
-   [(match_operand:VDQWH 4 "s_register_operand")
-(match_operand:VDQWH 5 "reg_or_zero_operand")])
- (match_operand:VDQWH 1 "s_register_operand")
- (match_operand:VDQWH 2 "s_register_operand")))]
-  "ARM_HAVE__ARITH
-   && !TARGET_REALLY_IWMMXT
-   && (! || flag_unsafe_math_optimizations)"
-{
-  arm_expand_vcond (operands, mode);
-  DONE;
-})
-
-(define_expand "vcond"
-  [(set (match_operand: 0 "s_register_operand")
-   (if_then_else:
- (match_operator 3 "comparison_operator"
-   [(match_operand:V32 4 "s_register_operand")
-(match_operand:V32 5 "reg_or_zero_operand")])
- (match_operand: 1 "s_register_operand")
- (match_operand: 2 "s_register_operand")))]
-  "ARM_HAVE__ARITH
-   && !TARGET_REALLY_IWMMXT
-   && (! || flag_unsafe_math_optimizations)"
-{
-  arm_expand_vcond (operands, mode);
-  DONE;
-})
-
-(define_expand "vcond"
-  [(set (match_operand: 0 "s_register_operand")
-   (if_then_else:
- (match_operator 3 "comparison_operator"
-   [(match_operand:V16 4 "s_register_operand")
-(match_operand:V16 5 "reg_or_zero_operand")])
- (match_operand: 1 "s_register_operand")
- (match_operand: 2 "s_register_operand")))]
-  "ARM_HAVE__ARITH
-   && !TA

[committed] i386: Fix gcc.target/i386/pr101716.c (and some related cleanups)

2024-12-06 Thread Uros Bizjak
Fix pr101716.c testcase scan-assembler failure.  The combine pass will not
combine instructions that use registers in TARGET_CLASS_LIKELY_SPILLED
class, such as %eax return register in AREG class.

Change the testcase to use pseudos only and explicitly scan for
zero_extendsidi pattern name.

While looking there, also clean ix86_decompose_address a bit: eliminate
common code and use UINTVAL and HOST_WIDE_INT_UC macros in the condition
for AND wrapped address.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_decompose_address): Eliminate
common code and use UINTVAL and HOST_WIDE_INT_UC macros
in the condition for AND wrapped address.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr101716.c (dg-options): Add -dp.
(dg-final): Scan for zero_extendsidi.
(sample1): Change the code to use pseudos only.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b426d29fcb5..0cdc2838bbc 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -10806,36 +10806,26 @@ ix86_decompose_address (rtx addr, struct ix86_address 
*out)
  if (CONST_INT_P (addr))
return false;
}
-  else if (GET_CODE (addr) == AND
-  && const_32bit_mask (XEXP (addr, 1), DImode))
-   {
- addr = lowpart_subreg (SImode, XEXP (addr, 0), DImode);
- if (addr == NULL_RTX)
-   return false;
-
- if (CONST_INT_P (addr))
-   return false;
-   }
   else if (GET_CODE (addr) == AND)
{
- /* For ASHIFT inside AND, combine will not generate
-canonical zero-extend. Merge mask for AND and shift_count
-to check if it is canonical zero-extend.  */
- tmp = XEXP (addr, 0);
  rtx mask = XEXP (addr, 1);
- if (tmp && GET_CODE(tmp) == ASHIFT)
+ rtx shift_val;
+
+ if (const_32bit_mask (mask, DImode)
+ /* For ASHIFT inside AND, combine will not generate
+canonical zero-extend. Merge mask for AND and shift_count
+to check if it is canonical zero-extend.  */
+ || (CONST_INT_P (mask)
+ && GET_CODE (XEXP (addr, 0)) == ASHIFT
+ && CONST_INT_P (shift_val = XEXP (XEXP (addr, 0), 1))
+ && ((UINTVAL (mask)
+  | ((HOST_WIDE_INT_1U << INTVAL (shift_val)) - 1))
+ == HOST_WIDE_INT_UC (0x
{
- rtx shift_val = XEXP (tmp, 1);
- if (CONST_INT_P (mask) && CONST_INT_P (shift_val)
- && (((unsigned HOST_WIDE_INT) INTVAL(mask)
- | ((HOST_WIDE_INT_1U << INTVAL(shift_val)) - 1))
- == 0x))
-   {
- addr = lowpart_subreg (SImode, XEXP (addr, 0),
-DImode);
-   }
+ addr = lowpart_subreg (SImode, XEXP (addr, 0), DImode);
+ if (addr == NULL_RTX)
+   return false;
}
-
}
 }
 
diff --git a/gcc/testsuite/gcc.target/i386/pr101716.c 
b/gcc/testsuite/gcc.target/i386/pr101716.c
index 5e3ea64a320..25d3c41357e 100644
--- a/gcc/testsuite/gcc.target/i386/pr101716.c
+++ b/gcc/testsuite/gcc.target/i386/pr101716.c
@@ -1,11 +1,10 @@
 /* PR target/101716 */
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -dp" } */
+/* { dg-final { scan-assembler-not "zero_extendsidi" } } */
 
-/* { dg-final { scan-assembler "leal\[\\t \]\[^\\n\]*eax" } } */
-/* { dg-final { scan-assembler-not "movl\[\\t \]\[^\\n\]*eax" } } */
-
-unsigned long long sample1(unsigned long long m) {
-unsigned int t = -1;
-return (m << 1) & t;
+void sample1 (unsigned long long x, unsigned long long *r)
+{
+  unsigned int t = -1;
+  *r = (x << 1) & t;
 }


Re: [PATCH v3] arm: [MVE intrinsics] Fix support for predicate constants [PR target/114801]

2024-12-06 Thread Christophe Lyon
On Fri, 6 Dec 2024 at 12:41, Richard Earnshaw (lists)
 wrote:
>
> On 04/12/2024 20:56, Christophe Lyon wrote:
> > On Wed, 4 Dec 2024 at 12:39, Richard Earnshaw (lists)
> >  wrote:
> >>
> >> On 25/11/2024 20:08, Christophe Lyon wrote:
> >>> In this PR, we have to handle a case where MVE predicates are supplied
> >>> as a const_int, where individual predicates have illegal boolean
> >>> values (such as 0xc for a 4-bit boolean predicate).  To avoid the ICE,
> >>> fix the constant (any non-zero value is converted to all 1s) and emit
> >>> a warning.
> >>>
> >>> On MVE, V8BI and V4BI multi-bit masks are interpreted byte-by-byte at
> >>> instruction level, but end-users should describe lanes rather than
> >>> bytes (so all bytes of a true-predicated lane should be '1'), see
> >>> https://developer.arm.com/documentation/101028/0012/14--M-profile-Vector-Extension--MVE--intrinsics.
> >>>
> >>> Since gen_lowpart can ICE on a subreg, we force predicates in a subreg
> >>> into a reg, after removing subreg of the same size as the target
> >>> (HImode) which would be made redundant by gen_lowpart and confuse the
> >>> DLSTP optimization.
> >>>
> >>> 2024-11-20  Christophe Lyon  
> >>>   Jakub Jelinek  
> >>>
> >>>   PR target/114801
> >>>   gcc/
> >>>   * config/arm/arm-mve-builtins.cc
> >>>   (function_expander::add_input_operand): Handle CONST_INT
> >>>   predicates.
> >>>
> >>>   gcc/testsuite/
> >>>   * gcc.target/arm/mve/pr108443.c: Update predicate constant.
> >>>   * gcc.target/arm/mve/pr114801.c: New test.
> >>> ---
> >>>  gcc/config/arm/arm-mve-builtins.cc  | 37 ++-
> >>>  gcc/testsuite/gcc.target/arm/mve/pr108443.c |  4 +--
> >>>  gcc/testsuite/gcc.target/arm/mve/pr114801.c | 39 +
> >>>  3 files changed, 77 insertions(+), 3 deletions(-)
> >>>  create mode 100644 gcc/testsuite/gcc.target/arm/mve/pr114801.c
> >>>
> >>> diff --git a/gcc/config/arm/arm-mve-builtins.cc 
> >>> b/gcc/config/arm/arm-mve-builtins.cc
> >>> index 255aed25600..5ff32ce06b7 100644
> >>> --- a/gcc/config/arm/arm-mve-builtins.cc
> >>> +++ b/gcc/config/arm/arm-mve-builtins.cc
> >>> @@ -2352,7 +2352,42 @@ function_expander::add_input_operand (insn_code 
> >>> icode, rtx x)
> >>>mode = GET_MODE (x);
> >>>  }
> >>>else if (VALID_MVE_PRED_MODE (mode))
> >>> -x = gen_lowpart (mode, x);
> >>> +{
> >>> +  if (CONST_INT_P (x) && (mode == V8BImode || mode == V4BImode))
> >>> + {
> >>> +   /* In V8BI or V4BI each element has 2 or 4 bits, if those bits 
> >>> aren't
> >>> +  all the same, gen_lowpart might ICE.  Canonicalize all the 2 
> >>> or 4
> >>> +  bits to all ones if any of them is non-zero.  V8BI and V4BI
> >>> +  multi-bit masks are interpreted byte-by-byte at instruction 
> >>> level,
> >>> +  but such constants should describe lanes, rather than bytes.  
> >>> See
> >>> +  
> >>> https://developer.arm.com/documentation/101028/0012/14--M-profile-Vector-Extension--MVE--intrinsics.
> >>>   */
> >>
> >> Apart from being an overly long line, deep links like this are generally 
> >> not very stable.  I suggest we just say something like "See the section on 
> >> MVE intrinsics in the Arm ACLE specification".
> >
> > Right, I was wondering what was the best practice, I think I've seen
> > such links recently, not sure where.
> > I'll update the comment, and the commit message.
> >
> >>
> >>> +   unsigned HOST_WIDE_INT xi = UINTVAL (x);
> >>> +   xi |= ((xi & 0x) << 1) | ((xi & 0x) >> 1);
> >>> +   if (mode == V4BImode)
> >>> + xi |= ((xi & 0x) << 2) | ((xi & 0x) >> 2);
> >>> +   if (xi != UINTVAL (x))
> >>> + inform (location, "constant predicate argument %d (%wx) does"
> >>> + " not map to %d lane numbers, converted to %wx",
> >>> + opno, UINTVAL (x) & 0x, mode == V8BImode ? 8 : 4,
> >>> + xi & 0x);
> >>
> >> I think this should be a warning (so that werror can work with it).  
> >> Otherwise such messages can't be faulted.
> > OK, I will change this.
> >
> >>
> >>> +
> >>> +   x = gen_int_mode (xi, HImode);
> >>> + }
> >>> +  else if (SUBREG_P (x))
> >>> + {
> >>> +   /* Already of the right size, drop the subreg which will be made
> >>> +  redundant by gen_lowpart below.  */
> >>> +   if (GET_MODE_SIZE (GET_MODE (x)) == GET_MODE_SIZE (HImode)
> >>> +   || SUBREG_BYTE (x) == 0)
> >>> + x = SUBREG_REG (x);
> >>> +
> >>> +   /* gen_lowpart on a SUBREG can ICE.  */
> >>> +   if (gen_lowpart_common (mode, x) == 0)
> >>> + x = force_reg (GET_MODE (x), x);
> >>> + }
> >>> +
> >>> +  x = gen_lowpart (mode, x);
> >>
> >> I wonder if this is overly complex.  Wouldn't it be better to just write 
> >> here:
> >>
> >>   else if (!REG_P (x))
> >> x = force_reg (GET_MODE (x), x);
> >>
> >> and then let the optimizers clean thin

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-12-06 Thread Qing Zhao


> On Dec 6, 2024, at 10:56, Martin Uecker  wrote:
> 
> Am Freitag, dem 06.12.2024 um 14:16 + schrieb Qing Zhao:
>> 
>>> On Dec 5, 2024, at 17:31, Martin Uecker  wrote:
>>> 
>>> Am Donnerstag, dem 05.12.2024 um 21:09 + schrieb Qing Zhao:
 
> On Dec 3, 2024, at 10:29, Qing Zhao  wrote:
>>> 
>>> 
>>> 
>> 
 
>> 
>> It would be clearer if you the syntax ".n" which resembles
>> the syntax for designated initializers that is already used
>> in initializers to refer to struct members.
>> 
>> constexpr int n;
>> struct foo {
>> {
>> char (*p)[n] __attribute__ ((counted_by (.n))
>> int n;
>> }
>> 
> Yes, I agree.
>> 
>>> 
>>> 
> 
> There is one important additional requirement:
> 
> x->n, x->p can ONLY be changed by changing the whole structure at 
> the same time. 
> Otherwise, x->n might not be consistent with x->p.
 
 By itself, this would still not fix the issue I pointed out.
 
 struct foo x;
 x = .. ; // set the whole structure
 char *p = x->p;
 x = ... ; // set the whole structure
 
 What is the bound for 'p' ?  
>>> 
>>> Since p was set to the pointer field of the old structure, then the 
>>> bound of it should be the old bound.
 With current rules it would be the old bound.
>>> 
>>> I thought that this should be the correct behavior, isn’t it?
>> 
>> Yes, sorry, what I meant was "with the current rules it would be
>> the *new* bound”.
> 
> struct foo x;
> x=… ;  // set the whole structure 1
> char *p = x->p;
> x=… ;  // set the whole structure 2
> 
> In the above, when “set the whole structure 1”, x1, x1->n and x1->p 
> are set at the same time;
> After *p = x->p;the pointer “p” is pointing to “x1->p”, it’s 
> bound is “x1->n”;
 
 I agree.
> 
> Then when “set the whole structure 2”, x2 is different than x1,  
> x2->n and x2->p are set at the same time, the pointer
> ‘p’ still points to “x1->p”, therefore it’s bound should be “x1->n”. 
> 
> So, as long as the whole structure is set at the same time, should be 
> fine. 
> 
> Do I miss anything here?
 
 I was talking aout the pointer "p" which was obtained before setting 
 the
 struct the second time in
 
 char *p = x->p;
 
 This pointer is still set to x1->p but the bound refers to x.n which 
 is 
 now set to x2->n.
>>> 
>>> You mean:
>>> 
>>> struct foo x;
>>> x=… ;  // set the whole structure 1
>>> char *p = x->p;
>>> x=… ;  // set the whole structure 2
>>> p[index] = 10;   // at this point, p’s bound is x2->n, not x1->n? 
>>> 
>>> Yes, you are right here. 
>>> 
>>> So, is there similar problem with the corresponding language extension? 
>>> 
>> 
>> The language extension does not exist yet, so there is no problem.
> Yeah, I should mention this as “corresponding future language extension” 
> -:)
>> 
>> But I hope we will get it and then specify it so that this works
>> correctly without this footgun.
>> 
>> Of course, if GCC gets the "counted_by" attribute wrong, there will
>> be arguments later in WG14 why the feature is then different to it.
> 
> I think that we need to resolve this issue first in the design of 
> “counted_by” for pointer fields. 
> I guess that we might need to come up with some additional limitations 
> for using the “counted_by”
> attribute for pointer fields at the source code level in order to avoid 
> such potential error.  But not
> sure what exactly the additional limitation should be at this moment.
> 
> Need some study here.
 
 Actually, I found out that this is really not a problem with the current 
 design, for the following new testing case I added for my current 
 implementation of the counted_by for pointer field:
 
 [ gcc.dg]$ cat pointer-counted-by-7.c
 /* Test the attribute counted_by for pointer field and its usage in
 * __builtin_dynamic_object_size.  */ 
 /* { dg-do run } */
 /* { dg-options "-O2" } */
 
 #include "builtin-object-size-common.h"
 
 struct annotated {
 int b;
 int *c __attribute__ ((counted_by (b)));
 };
 
 struct annotated *__attribute__((__noinline__)) setup (int attr_count)
 {
 struct annotated *p_array_annotated
   = (struct annotated *) malloc (sizeof (struct annotated));
 p_array_annotated->c = (int *) malloc (sizeof (in

[PATCH] libstdc++: editorconfig: Adjust wildcard patterns

2024-12-06 Thread mmalcomson
From: Matthew Malcomson 

According to the editorconfig file format description, a match against
one of multiple different strings is described with those different
strings separated by commas and within curly braces.  E.g.
[{x,y}.txt]

https://editorconfig.org/, under "Wildcard Patterns".

The current libstdc++-v3/.editorconfig file has a few places where we
match against similar globs by using strings separated by commas but
without the curly braces.  E.g.
[*.h,*.cc]

This doesn't take affect in neovim nor emacs (as far as I can tell), I
haven't looked into other editors.
I would expect that following the standard syntax described in the
documentation would satisfy more editors.  Hence this patch suggests
following that standard by using something like:
[*.{h,cc}]

libstdc++-v3/ChangeLog:

* .editorconfig: Adjust globbing style to standard syntax.

Signed-off-by: Matthew Malcomson 
---
 libstdc++-v3/.editorconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/.editorconfig b/libstdc++-v3/.editorconfig
index 88107cedda2..c95e4e26f7a 100644
--- a/libstdc++-v3/.editorconfig
+++ b/libstdc++-v3/.editorconfig
@@ -5,14 +5,14 @@ root = true
 end_of_line = lf
 insert_final_newline = true
 
-[*.h,*.cc]
+[*.{h,cc}]
 charset = utf-8
 indent_style = tab
 indent_size = 2
 tab_width = 8
 trim_trailing_whitespace = true
 
-[Makefile*,ChangeLog*]
+[{Makefile,ChangeLog}*]
 indent_style = tab
 indent_size = 8
 trim_trailing_whitespace = true
-- 
2.43.0



Re: 3rd Ping: [Middle-end][PATCH v4 0/3][RFC]Provide more contexts for -Warray-bounds and -Wstringop-* warning messages

2024-12-06 Thread Sam James
Qing Zhao  writes:

> This is the 3rd ping of the Middle-end review for this patch.
>

Jeff, would you be able to take a look? (In part because I know
you've had a lot of comments and feedback on the middle-end warnings
before). The diagnostics bits are OK'd already.

I've been running this on distro builds for a few months now and had
great results with it so far (including finding some real bugs in
packages that I'd previously dismissed as probable-FPs).

I can also chuck it in to our general testing builds if it'd help any.

> Thanks a lot!
>
> Qing
>
>> On Nov 26, 2024, at 10:30, Qing Zhao  wrote:
>> 
>> Another ping on the Middle-end review of this patch. 
>> 
>> This patch has been waiting for the middle-end review for a long time. 
>> 
>> Please review it and provide any feedback, I believe that this should be a 
>> nice improvement to GCC diagnostic in general. 
>> 
>> Thanks.
>> 
>> Qing
>> 
>>> On Nov 15, 2024, at 10:34, Qing Zhao  wrote:
>>> 
>>> Gentle ping on the middle-end review for this patch. 
>>> 
>>> There are two parts of this patch:
>>> 
>>> 1. Diagnostic part (Part 2), which has been reviewed by David;
>>> 2. Middle end part (Part 1 and 3), mainly on the copy_history information 
>>> collection during transformation. 
>>> 
>>> Thanks,
>>> 
>>> Qing
>>> 
>>> 
 On Nov 5, 2024, at 11:31, Qing Zhao  wrote:
 
 Hi,
 
 This is the 4th version of the patch for fixing PR109071.
 
 Compared to the 3nd version:
 https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666870.html
 https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666872.html
 https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666871.html
 
 The major improvements to this patch are:
 
 1. Divide the patch into 3 parts:
  Part 1: Add new data structure move_history, record move_history during
  transformation;
  Part 2: In warning analysis, Use the new move_history to form a rich
  location with a sequence of events, to report more context info
  of the warnings.
  Part 3: Add debugging mechanism for move_history.
 
 2. Major change to the above Part 2, completely rewritten based on David's
 new class lazy_diagnostic_path. 
 
 3. Fix all issues identied By Sam;
 A. fix PR117375 (Bug in tree-ssa-sink.cc);
 B. documentation clarification;
 C. Add all the duplicated PRs in the commit comments;
 
 4. Bootstrap GCC with the new -fdiagnostics-details on by default (Init 
 (1)).
 exposed some ICE similar as PR117375 in tree-ssa-sink.cc, fixed.
 
 
 bootstrapping and regression testing on both x86 and aarch64.
 
 Please let me know any comment and suggestion.
 
 Thanks.
 
 Qing
 Qing Zhao (3):
 Provide more contexts for -Warray-bounds, -Wstringop-* warning
  messages due to code movements from compiler transformation (Part 1)
  [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
 Provide more contexts for -Warray-bounds, -Wstringop-* warning
  messages due to code movements from compiler transformation (Part 2)
  [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
 Provide more contexts for -Warray-bounds, -Wstringop-* warning
  messages due to code movements from compiler transformation (Part 3)
  [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179]
 
 
>>> 
>> 


Re: [PATCH] c: special-case some "bool" errors with C23 [PR117629]

2024-12-06 Thread Sam James
David Malcolm  writes:

> This patch attempts to provide better error messages for
> code compiled with C23 that hasn't been updated for
> "bool", "true", and "false" becoming keywords (based on
> a brief review of the Gentoo bug tracker links given at
> https://gcc.gnu.org/pipermail/gcc/2024-November/245185.html).
>
> [...]

Thanks a lot David -- I'm going to give it a spin on some codebases over
the weekend.

I have seen some other instances with constexpr, static_assert, and
unreachable, but that looks like it might be easy to add on top of this
and maybe I could have a go at doing that after.


Re: [RFC][PATCH] aarch64: Fold lsl+lsr+orr to rev for half-width shifts

2024-12-06 Thread Richard Sandiford
Sorry for the slow reply.

Dhruv Chawla  writes:
> This patch modifies the intrinsic expanders to expand svlsl and svlsr to
> unpredicated forms when the predicate is a ptrue. It also folds the
> following pattern:
>
>lsl , , 
>lsr , , 
>orr , , 
>
> to:
>
>revb/h/w , 
>
> when the shift amount is equal to half the bitwidth of the 
> register.
>
> This relies on the RTL combiners combining the "ior (ashift, ashiftrt)"
> pattern to a "rotate" when the shift amount is half the element width.
> In the case of the shift amount being 8, a "bswap" is generated.
>
> While this works well, the problem is that the matchers for instructions
> like SRA and ADR expect the shifts to be in an unspec form. So, to keep
> matching the patterns when the unpredicated instructions are generated,
> they have to be duplicated to also accept the unpredicated form. Looking
> for feedback on whether this is a good way to proceed with this problem
> or how to do this in a better way.

Yeah, there are pros and cons both ways.  IIRC, there are two main
reasons why the current code keeps the predicate for shifts by constants
before register allocation:

(1) it means the SVE combine patterns see a constant form for all shifts.

(2) it's normally better to have a single pattern that matches both
constant and non-constant forms, to give the RA more freedom.

But (2) isn't really mutually exclusive with lowering before RA.
And in practice, there probably aren't many (any?) combine patterns
that handle both constant and non-constant shift amounts.

So yeah, it might make sense to switch approach and lower shifts by
constants immediately.  But if we do, I think that should be a
pre-patch, without any intrinsic or rotate/bswap changes.  And all patterns
except @aarch64_pred_ should handle only the lowered form,
rather than having patterns for both forms.

E.g. I think we should change @aarch64_adr_shift and
*aarch64_adr_shift to use the lowered form, rather than keep
them as-is and add other patterns.  (And then we can probably merge
those two patterns, rather than have the current expand/insn pair.)

I think changing this is too invasive for GCC 15, but I'll try to review
any patches to do that so that they're ready for GCC 16 stage 1.

Thanks,
Richard

>
> The patch was bootstrapped and regtested on aarch64-linux-gnu.
>
> -- 
> Regards,
> Dhruv
>
> From 026c972dba99b59c24771cfca632f3cd4e1df323 Mon Sep 17 00:00:00 2001
> From: Dhruv Chawla 
> Date: Sat, 16 Nov 2024 19:40:03 +0530
> Subject: [PATCH] aarch64: Fold lsl+lsr+orr to rev for half-width
>  shifts
>
> This patch modifies the intrinsic expanders to expand svlsl and svlsr to
> unpredicated forms when the predicate is a ptrue. It also folds the
> following pattern:
>
>   lsl , , 
>   lsr , , 
>   orr , , 
>
> to:
>
>   revb/h/w , 
>
> when the shift amount is equal to half the bitwidth of the 
> register. Patterns in the machine descriptor files are also updated to
> accept the unpredicated forms of the instructions.
>
> Bootstrapped and regtested on aarch64-linux-gnu.
>
> Signed-off-by: Dhruv Chawla 
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve-builtins-base.cc
>   (svlsl_impl::expand): Define.
>   (svlsr_impl): New class.
>   (svlsr_impl::fold): Define.
>   (svlsr_impl::expand): Likewise.
>   * config/aarch64/aarch64-sve.md
>   (*v_rev): New pattern.
>   (*v_revvnx8hi): Likewise.
>   (@aarch64_adr_shift_unpred): Likewise.
>   (*aarch64_adr_shift_unpred): Likewise.
>   (*aarch64_adr_shift_sxtw_unpred): Likewise.
>   (*aarch64_adr_shift_uxtw_unpred): Likewise.
>   (3): Update to emit unpredicated forms.
>   (*post_ra_v_ashl3): Rename to ...
>   (*v_ashl3): ... this.
>   (*post_ra_v_3): Rename to ...
>   (*v_3): ... this.
>   * config/aarch64/aarch64-sve2.md
>   (@aarch64_sve_add__unpred): New pattern.
>   (*aarch64_sve2_sra_unpred): Likewise.
>   (*bitmask_shift_plus_unpred): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/shift_rev_1.c: New test.
>   * gcc.target/aarch64/sve/shift_rev_2.c: Likewise.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  29 +++-
>  gcc/config/aarch64/aarch64-sve.md | 138 --
>  gcc/config/aarch64/aarch64-sve2.md|  36 +
>  .../gcc.target/aarch64/sve/shift_rev_1.c  |  83 +++
>  .../gcc.target/aarch64/sve/shift_rev_2.c  |  63 
>  5 files changed, 337 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_2.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 87e9909b55a..d91182b6454 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -1947,6 +1947,33 @@ public:
>{
>  return f.fo

Re: [patch,avr] Disable CRC lookup tables

2024-12-06 Thread Oleg Endo
On Fri, 2024-12-06 at 06:32 -0700, Jeff Law wrote:
> 
> On 12/6/24 5:23 AM, Sam James wrote:
> > Georg-Johann Lay  writes:
> > 
> > > This patch disables CRC lookup tables which consume quite some RAM.
> > 
> > Given that -foptimize-crc is new, it may be useful to CC the pass
> > authors in case they have input.
> I think this is trivially OK for the AVR.  The bigger question is should 
> we do something more general for -Os.
> 
> CRC generation through table lookups is going to take more data space. 
> You need a 256 byte table for each unique CRC (sizes & polynomial), and 
> the code to compute the index into the table can be (from a code size 
> standpoint) relatively expensive as well, particularly on the 
> micro-controllers if the crc is to be computed in a mode wider than a 
> word on the target.
> 
> So I would actually even support a more general "don't optimize CRCs by 
> default for -Os".
> 

I've been putting CRC tables for many years into .text / .rodata on various
MCU projects.  Never considered putting them into .data, since flash is
usually a lot larger than RAM.  What's the reasoning behind putting the
tables in .data?

Best regards,
Oleg Endo


Re: GCN: Fix 'real_from_integer' usage

2024-12-06 Thread Andrew Stubbs

On 12/6/24 13:56, Thomas Schwinge wrote:

Hi Andrew!

On 2024-12-05T15:14:45+0100, I wrote:

On 2020-01-31T11:20:14+, Andrew Stubbs  wrote:

This is one of those things I don't know why we didn't notice sooner.


..., and here's another thing I don't know why we didn't notice sooner.
;-P (Category: "don't we all love C++?!")


[...]
I also needed a convenient way to create 0.0 vector constants without
uglifying the machine description code, so extending gcn_vec_constant
seemed like a useful place to do it.



--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -992,9 +992,19 @@ gcn_vec_constant (machine_mode mode, int a)
  return CONST2_RTX (mode);*/
  
int units = GET_MODE_NUNITS (mode);

-  rtx tem = gen_int_mode (a, GET_MODE_INNER (mode));
-  rtvec v = rtvec_alloc (units);
+  machine_mode innermode = GET_MODE_INNER (mode);
+
+  rtx tem;
+  if (FLOAT_MODE_P (innermode))
+{
+  REAL_VALUE_TYPE rv;
+  real_from_integer (&rv, NULL, a, SIGNED);
+  tem = const_double_from_real_value (rv, innermode);
+}
+  else
+tem = gen_int_mode (a, innermode);
  
+  rtvec v = rtvec_alloc (units);

for (int i = 0; i < units; ++i)
  RTVEC_ELT (v, i) = tem;


That's apparently not the proper way to use 'real_from_integer'.  Its
second argument is a 'format_helper', which is a class defined in
'gcc/real.h', which has a templated constructor that is meant to receive
a mode, so instead of 'NULL', this should pass in 'VOIDmode' (correct?).
Anyway: until recently, this appeared to work (fine?) -- but broke with
Andrew Pinski's recent commit b3f1b9e2aa079f8ec73e3cb48143a16645c49566
"build: Remove INCLUDE_MEMORY [PR117737]":

 [...]
 In file included from ../../source-gcc/gcc/coretypes.h:507:0,
  from ../../source-gcc/gcc/config/gcn/gcn.cc:24:
 ../../source-gcc/gcc/real.h: In instantiation of 
‘format_helper::format_helper(const T&) [with T = std::nullptr_t]’:
 ../../source-gcc/gcc/config/gcn/gcn.cc:1178:46:   required from here
 ../../source-gcc/gcc/real.h:233:17: error: no match for ‘operator==’ 
(operand types are ‘std::nullptr_t’ and ‘machine_mode’)
: m_format (m == VOIDmode ? 0 : REAL_MODE_FORMAT (m))
  ^
 [...]

Andrew P.'s commit doesn't touch 'gcc/config/gcn/gcn.cc'; the only part
relevant here -- per my understanding -- should be:

 --- gcc/system.h
 +++ gcc/system.h
 @@ -224,0 +225 @@ extern int fprintf_unlocked (FILE *, const char *, ...);
 +# include 
 @@ -761,7 +761,0 @@ private:
 -/* Some of the headers included by  can use "abort" within a
 -   namespace, e.g. "_VSTD::abort();", which fails after we use the
 -   preprocessor to redefine "abort" as "fancy_abort" below.  */
 -
 -#ifdef INCLUDE_MEMORY
 -# include 
 -#endif

In other words, (unconditional) '#include ' appears to preclude
ability to convert 'NULL' into a mode?  (Or, I'm off-track, of course...)

Either way: OK to push the attached "GCN: Fix 'real_from_integer' usage"
after testing completes?


No issues found in testing.


I don't understand how that change broke anything, but if NULL was 
supposed to mean VOIDmode (which was probably defined to zero??) and it 
fixes the observed problem, then I think the patch is fine.


Andrew




Grüße
  Thomas






Re: [patch,lra] PR116778 we need a full live range info after rematerialization

2024-12-06 Thread Vladimir Makarov
The proposed patch can be a fix and you can commit it.  The only request 
is not to close PR for now.


LRA rematerialization sub-pass rematerializes insn containing only 
pseudos assigned to hard regs and should not change live-range of 
spilling pseudos.  So  sentence "Rematerialization sometimes can be like 
spilling pseudos into registers." should be wrong for LRA design.


Unfortunately, spilling of p184 happens after several sub-passes 
including rematerialization.  That is because p184 was assigned to 
non-eliminable reg 29.


I think the full and correct solution would be a modification of 
lra_need_for_spills_p function to deal with pseudos assigned to 
non-eliminable reg.


I'll work on such solution on the next week.  But I again you can commit 
your patch right now as a partial solution.


Thank you for working on the PR and making a progress on it.


On 12/5/24 12:26, Denis Chertykov wrote:

The fix for PR116778:

Brief:
The bug appears in LRA after rematerialization pass while creating 
live ranges.

File lra.cc:
*
  /* Now we know what pseudos should be spilled.  Try to
 rematerialize them first.  */
  if (lra_remat ())
{
  /* We need full live info -- see the comment above.  */
  lra_create_live_ranges (lra_reg_spill_p, true);
*
Wrong call `lra_create_live_ranges (lra_reg_spill_p, true)'
It have to be `lra_create_live_ranges (true, true)'.

The explanation:
**
int main (void)
{
  if (a.u33 * a.u33 != 0)
--^
    goto abrt;
  if (a.u33 * a.u40 * a.u33 != 0)
**
The bug appears here.

Part of the expression `a.u33 * a.u33'
Before LRA:
*
(insn 13 11 15 2 (set (reg:QI 184 [ _1+3 ])
    (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  
)
    (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 
86 {movqi_insn_split}

 (nil))
(insn 15 13 16 2 (set (reg:QI 64 [ a+4 ])
    (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  
)
    (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 
86 {movqi_insn_split}

 (nil))
(insn 16 15 20 2 (set (reg:QI 185 [ _1+4 ])
    (zero_extract:QI (reg:QI 64 [ a+4 ])
    (const_int 1 [0x1])
    (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split}
 (nil))
*

After LRA:
*
(insn 587 11 13 2 (set (reg:QI 24 r24 [368])
    (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  
)
    (const_int 3 [0x3]))) [1 a+3 S1 A8])) "bf.c":11:8 
86 {movqi_insn_split}

 (nil))
(insn 13 587 15 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28)
    (const_int 1 [0x1])) [4 %sfp+1 S1 A8])
    (reg:QI 24 r24 [368])) "bf.c":11:8 86 {movqi_insn_split}
 (nil))
(insn 15 13 16 2 (set (reg:QI 6 r6 [orig:64 a+4 ] [64])
    (mem/c:QI (const:HI (plus:HI (symbol_ref:HI ("a") [flags 0x2]  
)
    (const_int 4 [0x4]))) [1 a+4 S1 A8])) "bf.c":11:8 
86 {movqi_insn_split}

 (nil))
(insn 16 15 572 2 (set (reg:QI 24 r24 [orig:185 _1+4 ] [185])
    (zero_extract:QI (reg:QI 6 r6 [orig:64 a+4 ] [64])
    (const_int 1 [0x1])
    (const_int 0 [0]))) "bf.c":11:8 985 {*extzvqi_split}
 (nil))
(insn 572 16 20 2 (set (mem/c:QI (plus:HI (reg/f:HI 28 r28)
    (const_int 1 [0x1])) [4 %sfp+1 S1 A8])
    (reg:QI 24 r24 [orig:185 _1+4 ] [185])) "bf.c":11:8 86 
{movqi_insn_split}

 (nil))
*
Insn 13 and insn 572 use sfp+1 as a spill slot, but in IRA pass it was 
a two

different pseudos r184 and r185.
Insns 13 use sfp+1 as a spill slot for r184
Insns 572 use the same slot for r185. It's wrong.

Here we have a rematerialization.

Fragment from bf.c.317r.reload:
** 


 Rematerialization #1: 

df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 
14 (    1)
df_worklist_dataflow_doublequeue: n_basic_blocks 14 n_edges 18 count 
14 (    1)


Cands:
0 (nop=0, remat_regno=185, reload_regno=359):
(insn 16 15 572 2 (set (reg:QI 359 [orig:185 _1+4 ] [185])
    (zero_extract:QI (reg:QI 64 [ a+4 ])
    (const_int 1 [0x1])
    (const_int 0 [0]))) "bf.c":11:8 985 
{*extzvqi_split}

 (nil))

** 


[...]
** 


Ranges after the compression:
 r185: [0..1]
   Frame pointer can not be eliminated anymore
   Spilling non-eliminable hard r

[PATCH] middle-end/117932 - speed up DF solver

2024-12-06 Thread Richard Biener
The following addresses slow bitmap operations for maintaining the
iteration order of df_worklist_dataflow_doublequeue for large number
of basic-blocks.  One change is switching the worklist and pending
bitmaps to tree view, another change is avoiding the fully populated
initial bitmap for the first iteration and instead special-casing that
plus avoiding all forward worklist bitmap sets in that iteration.
Usually second or later iterations are sparse, so optimizing the first
iteration is worthwhile.  In fact both changes in isolation achieve
the speedup below already, the combination accounts for a minor
additional speedup.

For PR117932 when isolating from ext-dce and fold-mem-offset issues
this results in a 10% compile-time reduction.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.  I plan
to push this in the next days unless there are comments suggesting
otherwise.

Richard.

PR middle-end/117932
* df-core.cc (df_worklist_propagate_forward): When WORKLIST
is NULL, do not set bits there.
(df_worklist_propagate_backward): Likewise.
(df_worklist_dataflow_doublequeue): Separate first pass
over all blocks with NULL worklist.
(df_worklist_dataflow): Do not initialize pending and adjust.
---
 gcc/df-core.cc | 70 ++
 1 file changed, 48 insertions(+), 22 deletions(-)

diff --git a/gcc/df-core.cc b/gcc/df-core.cc
index 0f27bd2524b..99fe466d053 100644
--- a/gcc/df-core.cc
+++ b/gcc/df-core.cc
@@ -872,7 +872,8 @@ make_pass_df_finish (gcc::context *ctxt)
Given a BB_INDEX, do the dataflow propagation
and set bits on for successors in PENDING for earlier
and WORKLIST for later in bbindex_to_postorder
-   if the out set of the dataflow has changed.
+   if the out set of the dataflow has changed.  When WORKLIST
+   is NULL we are processing all later blocks.
 
AGE specify time when BB was visited last time.
AGE of 0 means we are visiting for first time and need to
@@ -925,7 +926,10 @@ df_worklist_propagate_forward (struct dataflow *dataflow,
{
  if (bbindex_to_postorder[bb_index]
  < bbindex_to_postorder[ob_index])
-   bitmap_set_bit (worklist, bbindex_to_postorder[ob_index]);
+   {
+ if (worklist)
+   bitmap_set_bit (worklist, bbindex_to_postorder[ob_index]);
+   }
  else
bitmap_set_bit (pending, bbindex_to_postorder[ob_index]);
}
@@ -979,7 +983,10 @@ df_worklist_propagate_backward (struct dataflow *dataflow,
{
  if (bbindex_to_postorder[bb_index]
  < bbindex_to_postorder[ob_index])
-   bitmap_set_bit (worklist, bbindex_to_postorder[ob_index]);
+   {
+ if (worklist)
+   bitmap_set_bit (worklist, bbindex_to_postorder[ob_index]);
+   }
  else
bitmap_set_bit (pending, bbindex_to_postorder[ob_index]);
}
@@ -1010,26 +1017,55 @@ df_worklist_propagate_backward (struct dataflow 
*dataflow,
 
 static void
 df_worklist_dataflow_doublequeue (struct dataflow *dataflow,
- bitmap pending,
   sbitmap considered,
   int *blocks_in_postorder,
  unsigned *bbindex_to_postorder,
- int n_blocks)
+ unsigned n_blocks)
 {
   enum df_flow_dir dir = dataflow->problem->dir;
   int dcount = 0;
-  bitmap worklist = BITMAP_ALLOC (&df_bitmap_obstack);
   int age = 0;
   bool changed;
   vec last_visit_age = vNULL;
   vec last_change_age = vNULL;
   int prev_age;
 
+  bitmap worklist = BITMAP_ALLOC (&df_bitmap_obstack);
+  bitmap_tree_view (worklist);
+
   last_visit_age.safe_grow_cleared (n_blocks, true);
   last_change_age.safe_grow_cleared (n_blocks, true);
 
-  /* Double-queueing. Worklist is for the current iteration,
- and pending is for the next. */
+  /* We start with processing all blocks, populating pending for the
+ next iteration.  */
+  bitmap pending = BITMAP_ALLOC (&df_bitmap_obstack);
+  bitmap_tree_view (pending);
+  for (unsigned index = 0; index < n_blocks; ++index)
+{
+  unsigned bb_index = blocks_in_postorder[index];
+  dcount++;
+  prev_age = last_visit_age[index];
+  if (dir == DF_FORWARD)
+   changed = df_worklist_propagate_forward (dataflow, bb_index,
+bbindex_to_postorder,
+NULL, pending,
+considered,
+last_change_age,
+prev_age);
+  else
+   changed = df_worklist_propagate_backward (dataflow, bb_index,
+  

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-12-06 Thread Martin Uecker
Am Freitag, dem 06.12.2024 um 14:16 + schrieb Qing Zhao:
> 
> > On Dec 5, 2024, at 17:31, Martin Uecker  wrote:
> > 
> > Am Donnerstag, dem 05.12.2024 um 21:09 + schrieb Qing Zhao:
> > > 
> > > > On Dec 3, 2024, at 10:29, Qing Zhao  wrote:
> > 
> > 
> > 
> > > > > 
> > > > > > > 
> > > > > > > > > 
> > > > > > > > > It would be clearer if you the syntax ".n" which resembles
> > > > > > > > > the syntax for designated initializers that is already used
> > > > > > > > > in initializers to refer to struct members.
> > > > > > > > > 
> > > > > > > > > constexpr int n;
> > > > > > > > > struct foo {
> > > > > > > > > {
> > > > > > > > > char (*p)[n] __attribute__ ((counted_by (.n))
> > > > > > > > > int n;
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > Yes, I agree.
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > There is one important additional requirement:
> > > > > > > > > > > > 
> > > > > > > > > > > > x->n, x->p can ONLY be changed by changing the whole 
> > > > > > > > > > > > structure at the same time. 
> > > > > > > > > > > > Otherwise, x->n might not be consistent with x->p.
> > > > > > > > > > > 
> > > > > > > > > > > By itself, this would still not fix the issue I pointed 
> > > > > > > > > > > out.
> > > > > > > > > > > 
> > > > > > > > > > > struct foo x;
> > > > > > > > > > > x = .. ; // set the whole structure
> > > > > > > > > > > char *p = x->p;
> > > > > > > > > > > x = ... ; // set the whole structure
> > > > > > > > > > > 
> > > > > > > > > > > What is the bound for 'p' ?  
> > > > > > > > > > 
> > > > > > > > > > Since p was set to the pointer field of the old structure, 
> > > > > > > > > > then the bound of it should be the old bound.
> > > > > > > > > > > With current rules it would be the old bound.
> > > > > > > > > > 
> > > > > > > > > > I thought that this should be the correct behavior, isn’t 
> > > > > > > > > > it?
> > > > > > > > > 
> > > > > > > > > Yes, sorry, what I meant was "with the current rules it would 
> > > > > > > > > be
> > > > > > > > > the *new* bound”.
> > > > > > > > 
> > > > > > > > struct foo x;
> > > > > > > > x=… ;  // set the whole structure 1
> > > > > > > > char *p = x->p;
> > > > > > > > x=… ;  // set the whole structure 2
> > > > > > > > 
> > > > > > > > In the above, when “set the whole structure 1”, x1, x1->n and 
> > > > > > > > x1->p are set at the same time;
> > > > > > > > After *p = x->p;the pointer “p” is pointing to “x1->p”, 
> > > > > > > > it’s bound is “x1->n”;
> > > > > > > 
> > > > > > > I agree.
> > > > > > > > 
> > > > > > > > Then when “set the whole structure 2”, x2 is different than x1, 
> > > > > > > >  x2->n and x2->p are set at the same time, the pointer
> > > > > > > > ‘p’ still points to “x1->p”, therefore it’s bound should be 
> > > > > > > > “x1->n”. 
> > > > > > > > 
> > > > > > > > So, as long as the whole structure is set at the same time, 
> > > > > > > > should be fine. 
> > > > > > > > 
> > > > > > > > Do I miss anything here?
> > > > > > > 
> > > > > > > I was talking aout the pointer "p" which was obtained before 
> > > > > > > setting the
> > > > > > > struct the second time in
> > > > > > > 
> > > > > > > char *p = x->p;
> > > > > > > 
> > > > > > > This pointer is still set to x1->p but the bound refers to x.n 
> > > > > > > which is 
> > > > > > > now set to x2->n.
> > > > > > 
> > > > > > You mean:
> > > > > > 
> > > > > > struct foo x;
> > > > > > x=… ;  // set the whole structure 1
> > > > > > char *p = x->p;
> > > > > > x=… ;  // set the whole structure 2
> > > > > > p[index] = 10;   // at this point, p’s bound is x2->n, not x1->n? 
> > > > > > 
> > > > > > Yes, you are right here. 
> > > > > > 
> > > > > > So, is there similar problem with the corresponding language 
> > > > > > extension? 
> > > > > > 
> > > > > 
> > > > > The language extension does not exist yet, so there is no problem.
> > > > Yeah, I should mention this as “corresponding future language 
> > > > extension” -:)
> > > > > 
> > > > > But I hope we will get it and then specify it so that this works
> > > > > correctly without this footgun.
> > > > > 
> > > > > Of course, if GCC gets the "counted_by" attribute wrong, there will
> > > > > be arguments later in WG14 why the feature is then different to it.
> > > > 
> > > > I think that we need to resolve this issue first in the design of 
> > > > “counted_by” for pointer fields. 
> > > > I guess that we might need to come up with some additional limitations 
> > > > for using the “counted_by”
> > > > attribute for pointer fields at the source code level in order to avoid 
> > > > such potential error.  But not
> > > > sure what exactly the additional limitation should be at this moment.
> > > > 
> > > > Need some study here.
> > > 
> > > Actually, I found out that this is really not a problem with the current 
> > > design, for the following new testing case I added for my current 
> > > implementation of the c

Re: [PATCH v2] c++: P2865R5, Remove Deprecated Array Comparisons from C++26 [PR117788]

2024-12-06 Thread Marek Polacek
On Thu, Dec 05, 2024 at 01:15:49PM -0500, Jason Merrill wrote:
> On 12/4/24 12:27 PM, Marek Polacek wrote:
> > On Tue, Dec 03, 2024 at 04:27:22PM -0500, Jason Merrill wrote:
> > > On 12/3/24 2:46 PM, Marek Polacek wrote:
> > > > On Thu, Nov 28, 2024 at 12:04:56PM -0500, Jason Merrill wrote:
> > > > > On 11/27/24 9:06 PM, Marek Polacek wrote:
> > > > > > Not a bugfix, but this should only affect C++26.
> > > > > > 
> > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > 
> > > > > > -- >8--
> > > > > > This patch implements P2865R5 by promoting the warning to error in 
> > > > > > C++26
> > > > > > only.  -Wno-array-compare shouldn't disable the error, so adjust 
> > > > > > the call
> > > > > > sites as well.
> > > > > 
> > > > > I think it's fine for -Wno-array-compare to suppress the error (and
> > > > > -Wno-error=array-compare to reduce it to a warning), so how about
> > > > > DK_PERMERROR rather than DK_ERROR?
> > > > 
> > > > Sounds good.
> > > > > We also need SFINAE for this when !tf_warning_or_error.
> > > > 
> > > > I've added Warray-compare-1.C, which has:
> > > > 
> > > > template
> > > > void f (int(*)[arr1 == arr2 ? I : I]);
> > > > 
> > > > but when we call cp_build_binary_op from the parser, complain is
> > > > tf_warning_or_error, so we warn (as does clang++).  I suspect
> > > > that goes against [temp.deduct.general]/8.
> > > 
> > > No, that's fine; in C++26 that template is IFNDR because no well-formed
> > > instantiation exists, it's OK for us to give a diagnostic and then 
> > > continue
> > > just like in a non-template.
> > 
> > Ah yes.
> > 
> > > I'm not sure there is a SFINAE situation where this would come up, but I'd
> > > still like to adjust this:
> > > 
> > > > @@ -6125,11 +6124,10 @@ cp_build_binary_op (const op_location_t 
> > > > &location,
> > > > "comparison with string literal results "
> > > > "in unspecified behavior");
> > > > }
> > > > -  else if (warn_array_compare
> > > > -  && TREE_CODE (TREE_TYPE (orig_op0)) == ARRAY_TYPE
> > > > +  else if (TREE_CODE (TREE_TYPE (orig_op0)) == ARRAY_TYPE
> > > >&& TREE_CODE (TREE_TYPE (orig_op1)) == ARRAY_TYPE
> > > >&& code != SPACESHIP_EXPR
> > > > -  && (complain & tf_warning))
> > > > +  && (complain & tf_warning_or_error))
> > > > do_warn_array_compare (location, code,
> > > >tree_strip_any_location_wrapper 
> > > > (orig_op0),
> > > >tree_strip_any_location_wrapper 
> > > > (orig_op1));
> > > 
> > > If we happen to get here when not complaining, we'll silently accept it.
> > > Either we should handle that case by returning error_mark_node in C++26 
> > > and
> > > above, or we should assert that it can't happen.
> > 
> > We actually can get there.  But returning error_mark_node in C++26
> > causes problems: we hit:
> > 
> >  /* If we ran into a problem, make sure we complained.  */
> >  gcc_assert (seen_error ());
> > 
> > because a permerror doesn't count as an error.  Either we'd have to go
> > back to DK_ERROR, or leave the patch as-is.
> 
> Hmm, I guess cp_seen_error should also consider werrorcount.

That still wouldn't work with -Wno-array-compare.  Nor would adding
permerrorcount.

I suppose I could still add permerrorcount and do permerrorcount++;,
and have cp_seen_error check permerrorcount.  Does that seem acceptable?

Marek



  1   2   >