PATCH v2] x86: Enable *mov_(and|or_store) only for -Oz

2025-05-25 Thread H.J. Lu
On Sun, May 25, 2025 at 8:12 AM H.J. Lu  wrote:
>
> On Sun, May 25, 2025 at 7:47 AM H.J. Lu  wrote:
> >
> > commit ef26c151c14a87177d46fd3d725e7f82e040e89f
> > Author: Roger Sayle 
> > Date:   Thu Dec 23 12:33:07 2021 +
> >
> > x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.
> >
> > transformed "mov $0,mem" to the shorter and "$0,mem" for -Oz.  But
> >
> > (define_insn "*mov_and"
> >   [(set (match_operand:SWI248 0 "memory_operand" "=m")
> > (match_operand:SWI248 1 "const0_operand"))
> >(clobber (reg:CC FLAGS_REG))]
> >   "reload_completed"
> >   "and{}\t{%1, %0|%0, %1}"
> >   [(set_attr "type" "alu1")
> >(set_attr "mode" "")
> >(set_attr "length_immediate" "1")])
> >
> > isn't guarded for -Oz.  As a result, "and $0,mem" is generated without
> > -Oz.  Enable *mov_and only for -Oz.
> >
> > gcc/
> >
> > PR target/120427
> > * config/i386/i386.md (*mov_and): Enable only for -Oz.
> >
> > gcc/testsuite/
> >
> > PR target/120427
> > * gcc.target/i386/pr120427.c: New test.
> >
> > OK for master?
> >
>
> "mov $-1,mem" has the same issue.  Here is the updated patch to also
> enable "or $-1,mem" only for -Oz.
>
> OK for master?

It doesn't work since  "*mov_or" was extended from load.  Here is
the v2 patch:

1. Add "*mov_or_store" for "or $-1,mem".
2. Rename "*mov_or" to "*mov_or_load", replacing
nonimmediate_operand with register_operand.
3. Enable "*mov_and" and "*mov_or_store" only for -Oz.

Tested on x86-64.

-- 
H.J.
From be013c2d0bde068804fda3db6b05c89d7a26d54e Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 25 May 2025 07:40:29 +0800
Subject: [PATCH v2] x86: Enable *mov_(and|or_store) only for -Oz

commit ef26c151c14a87177d46fd3d725e7f82e040e89f
Author: Roger Sayle 
Date:   Thu Dec 23 12:33:07 2021 +

x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.

added "*mov_and" and extended "*mov_or" to transform
"mov $0,mem" to the shorter "and $0,mem" and "mov $-1,mem" to the shorter
"or $-1,mem" for -Oz.  But the new pattern:

(define_insn "*mov_and"
  [(set (match_operand:SWI248 0 "memory_operand" "=m")
(match_operand:SWI248 1 "const0_operand"))
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed"
  "and{}\t{%1, %0|%0, %1}"
  [(set_attr "type" "alu1")
   (set_attr "mode" "")
   (set_attr "length_immediate" "1")])

and the extended pattern:

(define_insn "*mov_or"
  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
(match_operand:SWI248 1 "constm1_operand"))
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed"
  "or{}\t{%1, %0|%0, %1}"
  [(set_attr "type" "alu1")
   (set_attr "mode" "")
   (set_attr "length_immediate" "1")])

aren't guarded for -Oz.  As a result, "and $0,mem" and "or $-1,mem" are
generated without -Oz.  This patch:

1. Add "*mov_or_store" for "or $-1,mem".
2. Rename "*mov_or" to "*mov_or_load", replacing
nonimmediate_operand with register_operand.
3. Enable "*mov_and" and "*mov_or_store" only for -Oz.

gcc/

	PR target/120427
	* config/i386/i386.md (*mov_and): Enable only for -Oz.
	(*mov_or_store): New.
	(*mov_or): Renamed to ...
	(*mov_or_load): This.  Replace nonimmediate_operand with
	register_operand.

gcc/testsuite/

	PR target/120427
	* gcc.target/i386/pr120427-1.c: New test.
	* gcc.target/i386/pr120427-2.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.md| 18 +++---
 gcc/testsuite/gcc.target/i386/pr120427-1.c | 28 ++
 gcc/testsuite/gcc.target/i386/pr120427-2.c | 28 ++
 3 files changed, 71 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120427-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120427-2.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b7a18d583da..e55dd27cfcf 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2442,14 +2442,26 @@ (define_insn "*mov_and"
   [(set (match_operand:SWI248 0 "memory_operand" "=m")
 	(match_operand:SWI248 1 "const0_operand"))
(clobber (reg:CC FLAGS_REG))]
-  "reload_completed"
+  "reload_completed
+   && optimize_insn_for_size_p () && optimize_size > 1"
   "and{}\t{%1, %0|%0, %1}"
   [(set_attr "type" "alu1")
(set_attr "mode" "")
(set_attr "length_immediate" "1")])
 
-(define_insn "*mov_or"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
+(define_insn "*mov_or_store"
+  [(set (match_operand:SWI248 0 "memory_operand" "=m")
+	(match_operand:SWI248 1 "constm1_operand"))
+   (clobber (reg:CC FLAGS_REG))]
+  "reload_completed
+   && optimize_insn_for_size_p () && optimize_size > 1"
+  "or{}\t{%1, %0|%0, %1}"
+  [(set_attr "type" "alu1")
+   (set_attr "mode" "")
+   (set_attr "length_immediate" "1")])
+
+(define_insn "*mov_or_load"
+  [(set (match_operand:SWI248 0 "register_operand" "=r")
 	(match_operand:SWI248 1 "constm1_operand"))
(clobber (reg:CC FLAGS_REG))]
   "reload_completed"
diff --git a/gcc/testsuite/gcc.target/i386/pr120427-1.c b/gcc/testsuite/gcc.target/i386/pr1

Re: [PATCH] Enable mcf thread model for aarch64-*-mingw*.

2025-05-25 Thread LIU Hao

在 2025-5-16 16:50, LIU Hao 写道:
This is a leftover of d6d7afcdbc04adb0ec42a44b2d7e05600945af42. After this change, configuration files of 
all three thread models are in 'libgcc/config/mingw/'.


The patch has been bootstrapped on {x86_64,i686}-w64-mingw32. ARM64 port is still working in progress and 
I will keep an eye on it for GCC 16.




Ping.


--
Best regards,
LIU Hao


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH] i386: Quote user-defined symbols in assembly in Intel syntax

2025-05-25 Thread Jonathan Yong

On 5/20/25 3:06 AM, LIU Hao wrote:

在 2025-5-13 17:18, LIU Hao 写道:

Hello,

Attached is a patch for PR 53929, but is also required by PR 80881.


Ping.

Also I just notice that Clang also quotes mangled MSVC++ symbols in this 
way, at least since Clang 3.5, so it's accepted by both GAS and LLVM:

(https://gcc.godbolt.org/z/9xjKb4YP6)

     ```
     "??0foo@@QEAA@XZ":
     mov rax, rcx
     mov ecx, dword ptr [rip + "?init@foo@@2HA"]
     mov dword ptr [rax], ecx
     ret
     ```





Pushed to master branch, thanks.



Re: [PATCH] Enable mcf thread model for aarch64-*-mingw*.

2025-05-25 Thread Jonathan Yong

On 5/16/25 8:50 AM, LIU Hao wrote:
This is a leftover of d6d7afcdbc04adb0ec42a44b2d7e05600945af42. After 
this change, configuration files of all three thread models are in 
'libgcc/config/mingw/'.


The patch has been bootstrapped on {x86_64,i686}-w64-mingw32. ARM64 port 
is still working in progress and I will keep an eye on it for GCC 16.


Pushed to master branch, thanks.



[PATCH v3] x86: Enable *mov_(and|or) only for -Oz

2025-05-25 Thread H.J. Lu
On Sun, May 25, 2025 at 7:02 PM H.J. Lu  wrote:
>
> On Sun, May 25, 2025 at 8:12 AM H.J. Lu  wrote:
> >
> > On Sun, May 25, 2025 at 7:47 AM H.J. Lu  wrote:
> > >
> > > commit ef26c151c14a87177d46fd3d725e7f82e040e89f
> > > Author: Roger Sayle 
> > > Date:   Thu Dec 23 12:33:07 2021 +
> > >
> > > x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.
> > >
> > > transformed "mov $0,mem" to the shorter and "$0,mem" for -Oz.  But
> > >
> > > (define_insn "*mov_and"
> > >   [(set (match_operand:SWI248 0 "memory_operand" "=m")
> > > (match_operand:SWI248 1 "const0_operand"))
> > >(clobber (reg:CC FLAGS_REG))]
> > >   "reload_completed"
> > >   "and{}\t{%1, %0|%0, %1}"
> > >   [(set_attr "type" "alu1")
> > >(set_attr "mode" "")
> > >(set_attr "length_immediate" "1")])
> > >
> > > isn't guarded for -Oz.  As a result, "and $0,mem" is generated without
> > > -Oz.  Enable *mov_and only for -Oz.
> > >
> > > gcc/
> > >
> > > PR target/120427
> > > * config/i386/i386.md (*mov_and): Enable only for -Oz.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/120427
> > > * gcc.target/i386/pr120427.c: New test.
> > >
> > > OK for master?
> > >
> >
> > "mov $-1,mem" has the same issue.  Here is the updated patch to also
> > enable "or $-1,mem" only for -Oz.
> >
> > OK for master?
>
> It doesn't work since  "*mov_or" was extended from load.  Here is
> the v2 patch:
>
> 1. Add "*mov_or_store" for "or $-1,mem".
> 2. Rename "*mov_or" to "*mov_or_load", replacing
> nonimmediate_operand with register_operand.
> 3. Enable "*mov_and" and "*mov_or_store" only for -Oz.
>
> Tested on x86-64.

Here is the v3 patch.   Change "*mov_or" to define_insn_and_split
and split it to "mov $-1,mem" if not -Oz.  Don't transform "mov $-1,reg" to
"push $-1; pop reg" for -Oz since it should be transformed to "or $-1,reg".

Tested on x86-64.   OK for master?

Thanks.

-- 
H.J.
From 01f897854a20d4b3144587eed0e3933223bbefba Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sun, 25 May 2025 07:40:29 +0800
Subject: [PATCH v3] x86: Enable *mov_(and|or) only for -Oz

commit ef26c151c14a87177d46fd3d725e7f82e040e89f
Author: Roger Sayle 
Date:   Thu Dec 23 12:33:07 2021 +

x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.

added "*mov_and" and extended "*mov_or" to transform
"mov $0,mem" to the shorter "and $0,mem" and "mov $-1,mem" to the shorter
"or $-1,mem" for -Oz.  But the new pattern:

(define_insn "*mov_and"
  [(set (match_operand:SWI248 0 "memory_operand" "=m")
(match_operand:SWI248 1 "const0_operand"))
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed"
  "and{}\t{%1, %0|%0, %1}"
  [(set_attr "type" "alu1")
   (set_attr "mode" "")
   (set_attr "length_immediate" "1")])

and the extended pattern:

(define_insn "*mov_or"
  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
(match_operand:SWI248 1 "constm1_operand"))
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed"
  "or{}\t{%1, %0|%0, %1}"
  [(set_attr "type" "alu1")
   (set_attr "mode" "")
   (set_attr "length_immediate" "1")])

aren't guarded for -Oz.  As a result, "and $0,mem" and "or $-1,mem" are
generated without -Oz.  Enable *mov_and" only for -Oz.  Change
"*mov_or" to define_insn_and_split and split it to "mov $-1,mem"
if not -Oz.  Don't transform "mov $-1,reg" to "push $-1; pop reg" for
-Oz since it should be transformed to "or $-1,reg".

gcc/

	PR target/120427
	* config/i386/i386.md (*mov_and): Enable only for -Oz.
	(*mov_or): Changed to define_insn_and_split.  Split it
	to "mov $-1,mem" if not -Oz.
	(peephole2): Don't transform "mov $-1,reg" to "push $-1; pop reg"
	for -Oz since it will be transformed to "or $-1,reg".

gcc/testsuite/

	PR target/120427
	* gcc.target/i386/cold-attribute-4.c: Compile with -Oz.
	* gcc.target/i386/pr120427-1.c: New test.
	* gcc.target/i386/pr120427-2.c: Likewise.
	* gcc.target/i386/pr120427-3.c: Likewise.
	* gcc.target/i386/pr120427-4.c: Likewise.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.md   | 11 -
 .../gcc.target/i386/cold-attribute-4.c|  2 +-
 gcc/testsuite/gcc.target/i386/pr120427-1.c| 28 
 gcc/testsuite/gcc.target/i386/pr120427-2.c| 28 
 gcc/testsuite/gcc.target/i386/pr120427-3.c| 45 +++
 gcc/testsuite/gcc.target/i386/pr120427-4.c|  6 +++
 6 files changed, 117 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120427-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120427-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120427-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr120427-4.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b7a18d583da..266d404362d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2442,18 +2442,24 @@ (define_insn "*mov_and"
   [(set (match_operand:SWI248 0 "memory_operand" "=m")
 	(match_operand:SWI248 1 "const0_operand"))
(clobber (reg:CC FLAGS_REG))]
-  "reload_

Re: [PATCH] aarch64: Use LDR for first-element loads for Advanced SIMD

2025-05-25 Thread Dhruv Chawla

On 08/05/25 18:43, Richard Sandiford wrote:

External email: Use caution opening links or attachments


Dhruv Chawla  writes:

This patch modifies Advanced SIMD assembly generation to emit an LDR
instruction when a vector is created using a load to the first element with the
other elements being zero.

This is similar to what *aarch64_combinez already does.

Example:

uint8x16_t foo(uint8_t *x) {
uint8x16_t r = vdupq_n_u8(0);
r = vsetq_lane_u8(*x, r, 0);
return r;
}

Currently, this generates:

foo:
   moviv0.4s, 0
   ld1 {v0.b}[0], [x0]
   ret

After applying the patch, this generates:

foo:
   ldr b0, [x0]
   ret

Bootstrapped and regtested on aarch64-linux-gnu. Tested on
aarch64_be-unknown-linux-gnu as well.

Signed-off-by: Dhruv Chawla 

gcc/ChangeLog:

   * config/aarch64/aarch64-simd.md
   (*aarch64_simd_vec_set_low): New pattern.

gcc/testsuite/ChangeLog:

   * gcc.target/aarch64/simd/ldr_first_le.c: New test.
   * gcc.target/aarch64/simd/ldr_first_be.c: Likewise.
---
   gcc/config/aarch64/aarch64-simd.md|  12 ++
   .../gcc.target/aarch64/simd/ldr_first_be.c| 140 ++
   .../gcc.target/aarch64/simd/ldr_first_le.c| 139 +
   3 files changed, 291 insertions(+)
   create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/ldr_first_be.c
   create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/ldr_first_le.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index e2afe87e513..7be1c685fcf 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1164,6 +1164,18 @@
 [(set_attr "type" "neon_logic")]
   )

+(define_insn "*aarch64_simd_vec_set_low"
+  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
+ (vec_merge:VALL_F16
+ (vec_duplicate:VALL_F16
+ (match_operand: 1 "aarch64_simd_nonimmediate_operand" "m"))


The constraint should be "Utv" rather than "m", since the operand doesn't
accept all addresses that are valid for .  E.g. a normal SImode
memory would allow [reg, #imm], whereas this address does't.


+ (match_operand:VALL_F16 3 "aarch64_simd_imm_zero" "i")
+ (match_operand:SI 2 "immediate_operand" "i")))]


I think we should drop the two "i"s here, since the pattern doesn't
accept all immediates.  The predicate on the final operand should be
const_int_operand rather than immediate_operand.

Otherwise it looks good.  But I think we should think about how we
plan to integrate the related optimisation for register inputs.  E.g.:

int32x4_t foo(int32_t x) {
 return vsetq_lane_s32(x, vdupq_n_s32(0), 0);
}

generates:

foo:
 moviv0.4s, 0
 ins v0.s[0], w0
 ret

rather than a single UMOV.  Same idea when the input is in an FPR rather
than a GPR, but using FMOV rather than UMOV.

Conventionally, the register and memory forms should be listed as
alternatives in a single pattern, but that's somewhat complex because of
the different instruction availability for 64-bit+32-bit, 16-bit, and
8-bit register operations.

My worry is that if we handle the register case as an entirely separate
patch, it would have to rewrite this one.


I have been experimenting with this, and yeah, it gets quite messy when
trying to handle both memory and register cases together. Would it be okay
to enable the register case only for 64-/32-bit sizes? It would complicate
the code only a little and could still be done with a single pattern. I've
attached a patch that does the same.

-- >8 --

[PATCH] aarch64: Use LDR/FMOV for first-element loads/writes for Advanced SIMD

This patch modifies Advanced SIMD assembly generation to emit either an
LDR or an FMOV instruction when a load/write to the first element of a
vector is done when the other elements are zero.

The register move case is only enabled for 32-bit or 64-bit element sizes, as
FMOV has no 8-bit mode and 16-bit mode requires FEAT_FP16.

This is similar to what *aarch64_combinez already does.

Example:

uint8x16_t foo(uint8_t *x) {
  uint8x16_t r = vdupq_n_u8(0);
  r = vsetq_lane_u8(*x, r, 0);
  return r;
}

Currently, this generates:

foo:
moviv0.4s, 0
ld1 {v0.b}[0], [x0]
ret

After applying the patch, this generates:

foo:
ldr b0, [x0]
ret

Bootstrapped and regtested on aarch64-linux-gnu. Tested on
an aarch64_be-unknown-linux-gnu cross-build as well.

Signed-off-by: Dhruv Chawla 

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(*aarch64_simd_vec_set_low): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr109072_1.c (s32x4_2): Remove XFAIL.
* gcc.target/aarch64/simd/ldr_first_le.c: New test.
* gcc.target/aarch64/simd/ldr_first_be.c: Likewise.
* gcc.target/aarch64/simd/ins_first_le.c: Likewise.
* gcc.target/aarch64/simd/ins_first_be.c: Likewise.
---
 gcc/config/aarch64/aarch64-simd.md   

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-25 Thread Andrew Pinski
On Tue, May 20, 2025 at 3:09 AM Kugan Vivekanandarajah
 wrote:
>
> Thanks Richard for the review.
>
> > On 20 May 2025, at 2:47 am, Richard Sandiford  
> > wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Kugan Vivekanandarajah  writes:
> >> diff --git a/Makefile.in b/Makefile.in
> >> index b1ed67d3d4f..b5e3e520791 100644
> >> --- a/Makefile.in
> >> +++ b/Makefile.in
> >> @@ -4271,7 +4271,7 @@ all-stageautoprofile-bfd: 
> >> configure-stageautoprofile-bfd
> >>  $(HOST_EXPORTS) \
> >>  $(POSTSTAGE1_HOST_EXPORTS)  \
> >>  cd $(HOST_SUBDIR)/bfd && \
> >> - $$s/gcc/config/i386/$(AUTO_PROFILE) \
> >> + $$s/gcc/config/@cpu_type@/$(AUTO_PROFILE) \
> >>  $(MAKE) $(BASE_FLAGS_TO_PASS) \
> >>  CFLAGS="$(STAGEautoprofile_CFLAGS)" \
> >>  GENERATOR_CFLAGS="$(STAGEautoprofile_GENERATOR_CFLAGS)" \
> >
> > The usual style seems to be to assign @foo@ to a makefile variable
> > called foo or FOO, rather than to use @foo@ directly in rules.  Otherwise
> > the makefile stuff looks good.
> >
> > I don't feel qualified to review the script, but some general shell stuff:
> >
> >> diff --git a/gcc/config/aarch64/gcc-auto-profile 
> >> b/gcc/config/aarch64/gcc-auto-profile
> >> new file mode 100755
> >> index 000..0ceec035e69
> >> --- /dev/null
> >> +++ b/gcc/config/aarch64/gcc-auto-profile
> >> @@ -0,0 +1,51 @@
> >> +#!/bin/sh
> >> +# Profile workload for gcc profile feedback (autofdo) using Linux perf.
> >> +# Copyright The GNU Toolchain Authors.
> >> +#
> >> +# This file is part of GCC.
> >> +#
> >> +# GCC is free software; you can redistribute it and/or modify it under
> >> +# the terms of the GNU General Public License as published by the Free
> >> +# Software Foundation; either version 3, or (at your option) any later
> >> +# version.
> >> +
> >> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> >> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> >> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> >> +# for more details.
> >> +
> >> +# You should have received a copy of the GNU General Public License
> >> +# along with GCC; see the file COPYING3.  If not see
> >> +# .  */
> >> +
> >> +# Run perf record with branch stack sampling and check for
> >> +# specific error message to see if it is supported.
> >> +use_brbe=true
> >> +output=$(perf record -j any,u ls 2>&1)
> >
> > How about using /bin/true rather than ls for the test program?
> >
> >> +if [[ "$output" = *"Error::P: PMU Hardware or event type doesn't support 
> >> branch stack sampling."* ]]; then
> >
> > [[ isn't POSIX, or at least dash doesn't accept it.  Since this script
> > is effectively linux-specific, we can probably assume that /bin/bash
> > exists and use that in the #! line.
> >
> > If we use bash, then the test could use =~ rather than an exact match.
> > This could be useful if perf prints other diagnostics besides the
> > one being tested for, or if future versions of perf alter the wording
> > slightly.
> >
> >> +  use_brbe=false
> >> +fi
> >> +
> >> +FLAGS=u
> >> +if [ "$1" = "--kernel" ] ; then
> >> +  FLAGS=k
> >> +  shift
> >> +fi
> >> +if [ "$1" = "--all" ] ; then
> >
> > How about making this an elif, so that we don't accept --kernel --all?
> >
> >> +  FLAGS=u,k
> >> +  shift
> >> +fi
> >> +
> >> +if [ "$use_brbe" = true ] ; then
> >> +  if grep -q hypervisor /proc/cpuinfo ; then
> >> +echo >&2 "Warning: branch profiling may not be functional in VMs"
> >> +  fi
> >> +  set -x
> >> +  perf record -j any,$FLAGS "$@"
> >> +  set +x
> >> +else
> >> +  set -x
> >> +  echo >&2 "Warning: branch profiling may not be functional without BRBE"
> >> +  perf record "$@"
> >> +  set +x
> >
> > Putting the set -x after the echo seems better, as for the "then" branch.
>
> Here is the revised version that handles the above comments.


>   * Makefile.def: AUTO_PROFILE based on cpu_type.
>   * Makefile.in: Likewise.

Makefile.in is a generated file (from Makefile.def and Makefile.tpl),
It looks like you edited the file instead of regenerated it.
Can you please regenerate the file and/or provide the corresponding
corrected changes to Makefile.def/Makefile.tpl which was used to
regenerate Makefile.in?

This is what https://gcc.gnu.org/pipermail/gcc-testresults/2025-May/848013.html
is about too.

Thanks,
Andrew Pinski


>
> Thanks,
> Kugan
>
>
>
> >
> > Thanks,
> > Richard
> >
> >> +fi
>


Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-25 Thread Kugan Vivekanandarajah


> On 26 May 2025, at 2:25 pm, Andrew Pinski  wrote:
>
> External email: Use caution opening links or attachments
>
>
> On Tue, May 20, 2025 at 3:09 AM Kugan Vivekanandarajah
>  wrote:
>>
>> Thanks Richard for the review.
>>
>>> On 20 May 2025, at 2:47 am, Richard Sandiford  
>>> wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Kugan Vivekanandarajah  writes:
 diff --git a/Makefile.in b/Makefile.in
 index b1ed67d3d4f..b5e3e520791 100644
 --- a/Makefile.in
 +++ b/Makefile.in
 @@ -4271,7 +4271,7 @@ all-stageautoprofile-bfd: 
 configure-stageautoprofile-bfd
 $(HOST_EXPORTS) \
 $(POSTSTAGE1_HOST_EXPORTS)  \
 cd $(HOST_SUBDIR)/bfd && \
 - $$s/gcc/config/i386/$(AUTO_PROFILE) \
 + $$s/gcc/config/@cpu_type@/$(AUTO_PROFILE) \
 $(MAKE) $(BASE_FLAGS_TO_PASS) \
 CFLAGS="$(STAGEautoprofile_CFLAGS)" \
 GENERATOR_CFLAGS="$(STAGEautoprofile_GENERATOR_CFLAGS)" \
>>>
>>> The usual style seems to be to assign @foo@ to a makefile variable
>>> called foo or FOO, rather than to use @foo@ directly in rules.  Otherwise
>>> the makefile stuff looks good.
>>>
>>> I don't feel qualified to review the script, but some general shell stuff:
>>>
 diff --git a/gcc/config/aarch64/gcc-auto-profile 
 b/gcc/config/aarch64/gcc-auto-profile
 new file mode 100755
 index 000..0ceec035e69
 --- /dev/null
 +++ b/gcc/config/aarch64/gcc-auto-profile
 @@ -0,0 +1,51 @@
 +#!/bin/sh
 +# Profile workload for gcc profile feedback (autofdo) using Linux perf.
 +# Copyright The GNU Toolchain Authors.
 +#
 +# This file is part of GCC.
 +#
 +# GCC is free software; you can redistribute it and/or modify it under
 +# the terms of the GNU General Public License as published by the Free
 +# Software Foundation; either version 3, or (at your option) any later
 +# version.
 +
 +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
 +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
 +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 +# for more details.
 +
 +# You should have received a copy of the GNU General Public License
 +# along with GCC; see the file COPYING3.  If not see
 +# .  */
 +
 +# Run perf record with branch stack sampling and check for
 +# specific error message to see if it is supported.
 +use_brbe=true
 +output=$(perf record -j any,u ls 2>&1)
>>>
>>> How about using /bin/true rather than ls for the test program?
>>>
 +if [[ "$output" = *"Error::P: PMU Hardware or event type doesn't support 
 branch stack sampling."* ]]; then
>>>
>>> [[ isn't POSIX, or at least dash doesn't accept it.  Since this script
>>> is effectively linux-specific, we can probably assume that /bin/bash
>>> exists and use that in the #! line.
>>>
>>> If we use bash, then the test could use =~ rather than an exact match.
>>> This could be useful if perf prints other diagnostics besides the
>>> one being tested for, or if future versions of perf alter the wording
>>> slightly.
>>>
 +  use_brbe=false
 +fi
 +
 +FLAGS=u
 +if [ "$1" = "--kernel" ] ; then
 +  FLAGS=k
 +  shift
 +fi
 +if [ "$1" = "--all" ] ; then
>>>
>>> How about making this an elif, so that we don't accept --kernel --all?
>>>
 +  FLAGS=u,k
 +  shift
 +fi
 +
 +if [ "$use_brbe" = true ] ; then
 +  if grep -q hypervisor /proc/cpuinfo ; then
 +echo >&2 "Warning: branch profiling may not be functional in VMs"
 +  fi
 +  set -x
 +  perf record -j any,$FLAGS "$@"
 +  set +x
 +else
 +  set -x
 +  echo >&2 "Warning: branch profiling may not be functional without BRBE"
 +  perf record "$@"
 +  set +x
>>>
>>> Putting the set -x after the echo seems better, as for the "then" branch.
>>
>> Here is the revised version that handles the above comments.
>
>
>>  * Makefile.def: AUTO_PROFILE based on cpu_type.
>>  * Makefile.in: Likewise.
>
> Makefile.in is a generated file (from Makefile.def and Makefile.tpl),
> It looks like you edited the file instead of regenerated it.
> Can you please regenerate the file and/or provide the corresponding
> corrected changes to Makefile.def/Makefile.tpl which was used to
> regenerate Makefile.in?
>
> This is what 
> https://gcc.gnu.org/pipermail/gcc-testresults/2025-May/848013.html
> is about too.


Apologies for the breakage.

Attached patch fixes this, Is this OK?


Thanks,
Kugan



>
> Thanks,
> Andrew Pinski
>
>
>>
>> Thanks,
>> Kugan
>>
>>
>>
>>>
>>> Thanks,
>>> Richard
>>>
 +fi




0001-AUTOFDO-Fix-autogen-remake-issue.patch
Description: 0001-AUTOFDO-Fix-autogen-remake-issue.patch


Re: [PATCH] fortran: add constant input support for trig functions with half-revolutions

2025-05-25 Thread Steve Kargl
On Sun, May 25, 2025 at 04:56:48AM +, Yuao Ma wrote:
> 
> Thanks for your review! I've updated the patch.
> 
> > this range_check() is unneeded.
> 
> Done.
> 
> > As a side note, the error message is slightly misleading
> > (although it will not be issued).  Technically, x = -1 or 1
> > are allowed values, and neither is **between** -1 and 1.
> 
> You're right, the original message was a bit imprecise. I've updated this and
> five other similar error messages in the patch for better accuracy.
> 

Thanks for addressing the issues.

I looked at the patch in a bit more detail, and
I am not thrilled with large-scale whitespace
changes mingled with functional changes.  It makes
the patch harder to read and review.  

-- 
Steve


[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vxor.vv to vxor.vx on GR2VR cost

2025-05-25 Thread pan2 . li
From: Pan Li 

This patch would like to introduce the combine of vec_dup + vxor.vv into
vxor.vx on the cost value of GR2VR.  The late-combine will take place if
the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15
in test.  There will be two cases for the combine:

Case 0:
 |   ...
 |   vmv.v.x
 | L1:
 |   vxor.vv
 |   J L1
 |   ...

Case 1:
 |   ...
 | L1:
 |   vmv.v.x
 |   vxor.vv
 |   J L1
 |   ...

Both will be combined to below if the cost of GR2VR is zero.
 |   ...
 | L1:
 |   vxor.vx
 |   J L1
 |   ...

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

Pan Li (3):
  RISC-V: Combine vec_duplicate + vxor.vv to vxor.vx on GR2VR cost
  RISC-V: Add test for vec_duplicate + vxor.vv combine case 0 with GR2VR cost 
0, 2 and 15
  RISC-V: Add test for vec_duplicate + vxor.vv combine case 1 with GR2VR cost 
0, 1 and 2

 gcc/config/riscv/riscv-v.cc   |   2 +
 gcc/config/riscv/riscv.cc |   1 +
 gcc/config/riscv/vector-iterators.md  |   2 +-
 .../riscv/rvv/autovec/vx_vf/vx-1-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-1-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-2-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-3-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-4-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-5-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-i16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-i32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-i64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-i8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u16.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u32.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u64.c|   2 +
 .../riscv/rvv/autovec/vx_vf/vx-6-u8.c |   2 +
 .../riscv/rvv/autovec/vx_vf/vx_binary_data.h  | 392 ++
 .../rvv/autovec/vx_vf/vx_vxor-run-1-i16.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-i32.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-i64.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-i8.c  |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-u16.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-u32.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-u64.c |  15 +
 .../rvv/autovec/vx_vf/vx_vxor-run-1-u8.c  |  15 +
 60 files changed, 612 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-i8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-u16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-u32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-u64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vxor-run-1-u8.c

-- 
2.43.0



[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vxor.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-05-25 Thread pan2 . li
From: Pan Li 

Add asm dump check test for vec_duplicate + vxor.vv combine to vxor.vx,
with the GR2VR cost is 0, 1 and 2.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check
for vxor.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c  | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c | 2 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c  | 2 ++
 24 files changed, 48 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c
index ffad2a27f92..58dc66dcec9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c
@@ -10,9 +10,11 @@ DEF_VX_BINARY_CASE_1_WRAP(T, -, sub, VX_BINARY_BODY_X16)
 DEF_VX_BINARY_REVERSE_CASE_1_WRAP(T, -, rsub, VX_BINARY_REVERSE_BODY_X16)
 DEF_VX_BINARY_CASE_1_WRAP(T, &, and, VX_BINARY_BODY_X16)
 DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X16)
+DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY_X16)
 
 /* { dg-final { scan-assembler {vadd.vx} } } */
 /* { dg-final { scan-assembler {vsub.vx} } } */
 /* { dg-final { scan-assembler {vrsub.vx} } } */
 /* { dg-final { scan-assembler {vand.vx} } } */
 /* { dg-final { scan-assembler {vor.vx} } } */
+/* { dg-final { scan-assembler {vxor.vx} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c
index 275a11e9158..b13ec16983c 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c
@@ -10,9 +10,11 @@ DEF_VX_BINARY_CASE_1_WRAP(T, -, sub, VX_BINARY_BODY_X4)
 DEF_VX_BINARY_REVERSE_CASE_1_WRAP(T, -, rsub, VX_BINARY_REVERSE_BODY_X4)
 DEF_VX_BINARY_CASE_1_WRAP(T, &, and, VX_BINARY_BODY_X4)
 DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X4)
+DEF_V

[PATCH v2] RISC-V: Add minimal support of double trap extension 1.0

2025-05-25 Thread Jerry Zhang Jian
Add support of double trap extension [1], enabling GCC
to recognize the following extensions at compile time.

New extensions:
- ssdbltrp
- smdbltrp

[1] 
https://github.com/riscv/riscv-double-trap/releases/download/v1.0/riscv-double-trap.pdf

gcc/ChangeLog:
* config/riscv/riscv-ext.def: New extensions
* config/riscv/riscv-ext.opt: Auto re-generated

gcc/testsuite/ChangeLog:
* gcc/testsuite/gcc.target/riscv/arch-57.c: New test
* gcc/testsuite/gcc.target/riscv/arch-58.c: New test

Signed-off-by: Jerry Zhang Jian 
---
 gcc/config/riscv/riscv-ext.def   | 26 
 gcc/config/riscv/riscv-ext.opt   |  4 
 gcc/doc/riscv-ext.texi   |  8 
 gcc/testsuite/gcc.target/riscv/arch-57.c |  6 ++
 gcc/testsuite/gcc.target/riscv/arch-58.c |  6 ++
 5 files changed, 50 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-57.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-58.c

diff --git a/gcc/config/riscv/riscv-ext.def b/gcc/config/riscv/riscv-ext.def
index 97b576617ad..dbda8ded397 100644
--- a/gcc/config/riscv/riscv-ext.def
+++ b/gcc/config/riscv/riscv-ext.def
@@ -1727,6 +1727,19 @@ DEFINE_RISCV_EXT(
   /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
   /* EXTRA_EXTENSION_FLAGS */ 0)
 
+DEFINE_RISCV_EXT(
+  /* NAME */ smdbltrp,
+  /* UPPERCAE_NAME */ SMDBLTRP,
+  /* FULL_NAME */ "Double Trap Extensions",
+  /* DESC */ "",
+  /* URL */ ,
+  /* DEP_EXTS */ ({"zicsr"}),
+  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
+  /* FLAG_GROUP */ sm,
+  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
+  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
+  /* EXTRA_EXTENSION_FLAGS */ 0)
+
 DEFINE_RISCV_EXT(
   /* NAME */ ssaia,
   /* UPPERCAE_NAME */ SSAIA,
@@ -1818,6 +1831,19 @@ DEFINE_RISCV_EXT(
   /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
   /* EXTRA_EXTENSION_FLAGS */ 0)
 
+DEFINE_RISCV_EXT(
+  /* NAME */ ssdbltrp,
+  /* UPPERCAE_NAME */ SSDBLTRP,
+  /* FULL_NAME */ "Double Trap Extensions",
+  /* DESC */ "",
+  /* URL */ ,
+  /* DEP_EXTS */ ({"zicsr"}),
+  /* SUPPORTED_VERSIONS */ ({{1, 0}}),
+  /* FLAG_GROUP */ ss,
+  /* BITMASK_GROUP_ID */ BITMASK_NOT_YET_ALLOCATED,
+  /* BITMASK_BIT_POSITION*/ BITMASK_NOT_YET_ALLOCATED,
+  /* EXTRA_EXTENSION_FLAGS */ 0)
+
 DEFINE_RISCV_EXT(
   /* NAME */ supm,
   /* UPPERCAE_NAME */ SUPM,
diff --git a/gcc/config/riscv/riscv-ext.opt b/gcc/config/riscv/riscv-ext.opt
index 9199aa31b42..5e9c5f56ad6 100644
--- a/gcc/config/riscv/riscv-ext.opt
+++ b/gcc/config/riscv/riscv-ext.opt
@@ -343,6 +343,8 @@ Mask(SMNPM) Var(riscv_sm_subext)
 
 Mask(SMSTATEEN) Var(riscv_sm_subext)
 
+Mask(SMDBLTRP) Var(riscv_sm_subext)
+
 Mask(SSAIA) Var(riscv_ss_subext)
 
 Mask(SSCOFPMF) Var(riscv_ss_subext)
@@ -357,6 +359,8 @@ Mask(SSTC) Var(riscv_ss_subext)
 
 Mask(SSSTRICT) Var(riscv_ss_subext)
 
+Mask(SSDBLTRP) Var(riscv_ss_subext)
+
 Mask(SUPM) Var(riscv_su_subext)
 
 Mask(SVINVAL) Var(riscv_sv_subext)
diff --git a/gcc/doc/riscv-ext.texi b/gcc/doc/riscv-ext.texi
index bd3d29c75ab..7a22d841d1b 100644
--- a/gcc/doc/riscv-ext.texi
+++ b/gcc/doc/riscv-ext.texi
@@ -510,6 +510,10 @@
 @tab 1.0
 @tab State enable extension
 
+@item smdbltrp
+@tab 1.0
+@tab Double Trap Extensions
+
 @item ssaia
 @tab 1.0
 @tab Advanced interrupt architecture extension for supervisor-mode
@@ -538,6 +542,10 @@
 @tab 1.0
 @tab ssstrict extension
 
+@item ssdbltrp
+@tab 1.0
+@tab Double Trap Extensions
+
 @item supm
 @tab 1.0
 @tab supm extension
diff --git a/gcc/testsuite/gcc.target/riscv/arch-57.c 
b/gcc/testsuite/gcc.target/riscv/arch-57.c
new file mode 100644
index 000..42cf30a3171
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-57.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64i_smdbltrp -mabi=lp64d" } */
+
+void foo(){}
+
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv64i2p1_zicsr2p0_smdbltrp1p0\"" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-58.c 
b/gcc/testsuite/gcc.target/riscv/arch-58.c
new file mode 100644
index 000..88b20dfb6c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-58.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64i_ssdbltrp -mabi=lp64d" } */
+
+void foo(){}
+
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv64i2p1_zicsr2p0_ssdbltrp1p0\"" } } */
-- 
2.49.0



[pushed] c++: dump_template_bindings tweak

2025-05-25 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

in r12-1100 we stopped printing template bindings like T = T.  The check for
this relied on TREE_CHAIN of a TEMPLATE_TYPE_PARM holding the declaration of
that type-parameter.  This should be written as TYPE_STUB_DECL.  In
addition, TYPE_STUB_DECL is only set on the TYPE_MAIN_VARIANT, so we need to
check that as well.  Which is also desirable because volatile T is visibly
distinct from T.

gcc/cp/ChangeLog:

* error.cc (dump_template_bindings): Correct skipping of
redundant bindings.
---
 gcc/cp/error.cc | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 75bf7dcef62..305064d476c 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -541,12 +541,13 @@ dump_template_bindings (cxx_pretty_printer *pp, tree 
parms, tree args,
  /* If the template argument repeats the template parameter (T = T),
 skip the parameter.*/
  if (arg && TREE_CODE (arg) == TEMPLATE_TYPE_PARM
-   && TREE_CODE (parm_i) == TREE_LIST
-   && TREE_CODE (TREE_VALUE (parm_i)) == TYPE_DECL
-   && TREE_CODE (TREE_TYPE (TREE_VALUE (parm_i)))
-== TEMPLATE_TYPE_PARM
-   && DECL_NAME (TREE_VALUE (parm_i))
-== DECL_NAME (TREE_CHAIN (arg)))
+ && arg == TYPE_MAIN_VARIANT (arg)
+ && TREE_CODE (parm_i) == TREE_LIST
+ && TREE_CODE (TREE_VALUE (parm_i)) == TYPE_DECL
+ && (TREE_CODE (TREE_TYPE (TREE_VALUE (parm_i)))
+ == TEMPLATE_TYPE_PARM)
+ && (DECL_NAME (TREE_VALUE (parm_i))
+ == DECL_NAME (TYPE_STUB_DECL (arg
continue;
 
  semicolon_or_introducer ();

base-commit: e3d3d6d7d2c8ab73ff597f4c82514c3217256567
-- 
2.49.0